CN104902246B

CN104902246B - Video monitoring method and device

Info

Publication number: CN104902246B
Application number: CN201510334845.9A
Authority: CN
Inventors: 潘华东; 程淼; 潘石柱; 张兴明
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2015-06-17
Filing date: 2015-06-17
Publication date: 2020-07-28
Anticipated expiration: 2035-06-17
Also published as: CN104902246A

Abstract

The invention provides a video monitoring method and a video monitoring device, and relates to the field of monitoring. The video monitoring method comprises the following steps: acquiring a video image; acquiring three-dimensional coordinate information of a target according to the video image; an event occurrence is extracted based on a positional relationship of the target and a virtual door, wherein the virtual door includes three-dimensional coordinate information. By the method, the three-dimensional coordinate information of the target is acquired according to the video image, and the position relation between the virtual door and the target is judged based on the three-dimensional coordinate information of the target, so that the occurrence of the event is extracted, the event misjudgment caused by the perspective effect in the two-dimensional image is effectively avoided, and the accuracy of the event judgment is improved.

Description

Video monitoring method and device

Technical Field

The invention relates to the field of monitoring, in particular to a video monitoring method and a video monitoring device.

Background

The intelligent video behavior analysis system has high application value in various monitoring places, and the basic general method is that background modeling is carried out on input video, a background image and an image of a current frame are utilized to detect a moving target, the moving target is subsequently tracked, classified and behavior analysis is carried out, or a specified type of target is directly detected from the video in a training and recognition mode, the detected target is tracked and analyzed, and early warning judgment is carried out on behavior events, so that the purpose of intelligent monitoring is achieved.

In behavior analysis, tripwire detection and area intrusion detection are basic detection functions. The basic implementation is as follows: setting at least one line segment or an area in a video image, detecting whether a moving object in the video crosses the line segment or enters/leaves the area, and generating an alarm if an event occurs. The tripwire detection is characterized in that at least one line segment with a direction is arranged in a video image, whether a moving target moves from one side of a line to the other side is detected, and if the tripwire action occurs, an alarm event is generated; the regional intrusion detection is characterized in that at least one detection region is arranged in a video image, whether a moving target enters the region from the outside of the region is detected, and if regional intrusion behavior occurs, an alarm event is generated.

The existing tripwire and area intrusion detection technology judges whether to trigger corresponding rules or not according to the intersection of a target, a set tripwire and an area directly on an image plane. Because the imaging of the camera has perspective effect, when the target in the image intersects with a trip line or an area, the action of line mixing or entering of people does not necessarily occur in the real world, so that misjudgment is easy to occur, and false alarm is caused.

Disclosure of Invention

An object of the present invention is to solve the problem of misjudgment of events due to the perspective effect of a camera.

According to an aspect of the present invention, there is provided a video surveillance method, including: acquiring a video image; acquiring three-dimensional coordinate information of a target according to the video image; an event occurrence is extracted based on a positional relationship of the target and a virtual door, wherein the virtual door includes three-dimensional coordinate information.

Optionally, the virtual door is a door area perpendicular to the ground, and an intersection line of the virtual door and the ground is a straight line, a line segment or a broken line.

Optionally, the obtaining three-dimensional coordinate information of the target according to the video image includes: comparing the continuous frame video images or comparing the video images with the background images to obtain change points or point groups in the video images; extracting points or point groups from the change points or point groups as targets; three-dimensional coordinate information of the target is determined.

Optionally, the video image is a flat video image; acquiring three-dimensional coordinate information of a target according to a video image comprises: acquiring plane coordinate information of a target through a plane video image; and 3D reconstruction is carried out through a 3D reconstruction algorithm according to the plane coordinate information to obtain the three-dimensional coordinate information of the target.

Optionally, the device for acquiring video images comprises a 2D camera.

Optionally, the video images comprise planar video images of a plurality of different shooting positions; acquiring three-dimensional coordinate information of a target according to a video image comprises: 3D reconstruction is carried out on the plurality of plane video images to obtain 3D video images; and acquiring three-dimensional coordinate information of the target according to the 3D video image.

Optionally, the apparatus for acquiring video images comprises more than two 2D cameras or binocular vision based 3D cameras.

Optionally, the video image is a depth image; the three-dimensional coordinate information of the target is obtained according to the video image.

Optionally, the apparatus for acquiring depth images comprises a distance sensitive device or a 3D camera.

Optionally, extracting the event occurrence based on the positional relationship of the target and the virtual door is: and extracting the event occurrence according to the position relation between the horizontal coordinate information in the three-dimensional coordinate information of the target and a virtual door, wherein the virtual door comprises the horizontal coordinate information in the three-dimensional coordinates.

Optionally, the method further comprises: determining a motion track of a target in a video image according to a plurality of frames of video images; determining three-dimensional coordinate information of a motion track of a target; and extracting event occurrence based on the motion trail of the target and the position relation of the virtual door.

Optionally, the event comprises being located inside the virtual door, being located outside the virtual door, being located in the area of the virtual door, passing through the virtual door from the outside inwards, passing through the virtual door from the inside outwards, moving from the outside inwards and not passing through the virtual door and/or moving from the inside outwards and not passing through the virtual door.

Optionally, the type of object includes a human, an animal, and/or a car.

Optionally, the method further includes sending alarm information if a predetermined event is extracted, where the alarm information includes intrusion position information and intrusion direction information.

Alternatively, the extracting of the event occurrence based on the positional relationship between the target and the virtual door includes counting a number of consecutive frames of the event, and judging the event occurrence when the number of frames is greater than a predetermined alarm number of frames.

By the method, the three-dimensional coordinate information of the target is acquired according to the video image, and the position relation between the virtual door and the target is judged based on the three-dimensional coordinate information of the target, so that the occurrence of the event is extracted, the event misjudgment caused by the perspective effect in the two-dimensional image is effectively avoided, and the accuracy of the event judgment is improved.

According to another aspect of the present invention, there is provided a video monitoring apparatus comprising: the video acquisition module is used for acquiring a video image; the three-dimensional coordinate determination module is used for acquiring three-dimensional coordinate information of the target according to the video image; and the event extraction module is used for extracting event occurrence based on the position relation between the target and the virtual door, wherein the virtual door comprises three-dimensional coordinate information.

Optionally, the target acquisition module includes: the frame comparison unit is used for comparing continuous frame video images or comparing the video images with background images to obtain change points or point groups in the video images; a target determination unit for extracting a point or a point group from the change point or the point group as a target; and the three-dimensional coordinate extraction unit is used for determining the three-dimensional coordinate information of the target.

Optionally, the video image is a flat video image; the three-dimensional coordinate determination module includes: the plane coordinate acquisition unit is used for acquiring plane coordinate information of the target through the plane video image; and the three-dimensional coordinate extraction unit is used for performing 3D reconstruction through a 3D reconstruction algorithm according to the plane coordinate information to obtain the three-dimensional coordinate information of the target.

Optionally, the video capture module comprises a 2D camera.

Optionally, the video images comprise planar video images of a plurality of different shooting positions; the three-dimensional coordinate determination module includes: the 3D reconstruction unit is used for performing 3D reconstruction on the plurality of plane video images to obtain 3D video images; and the three-dimensional coordinate extraction unit is used for acquiring the three-dimensional coordinate information of the target according to the 3D video image.

Optionally, the video capture module comprises more than two 2D cameras or binocular vision based 3D cameras.

Optionally, the video image is a depth image; and the three-dimensional coordinate determination module is used for acquiring the three-dimensional coordinate information of the target according to the depth image.

Optionally, the video capture module comprises a distance sensitive device or a 3D camera.

Optionally, the event extraction module is configured to extract an event occurrence according to a position relationship between horizontal coordinate information in the three-dimensional coordinate information of the target and a virtual door, where the virtual door includes the horizontal coordinate information in the three-dimensional coordinate.

Optionally, the three-dimensional coordinate determination module further comprises: the track determining unit is used for determining the motion track of the target in the video image; the three-dimensional coordinate extraction unit is also used for determining three-dimensional coordinate information of the motion trail of the target; and the event extraction module is also used for extracting event occurrence based on the motion trail of the target and the position relation of the virtual door.

Optionally, a target type analysis module is further included for analyzing the target type, the target type including a human, an animal and/or a car.

Optionally, the system further comprises an alarm module, configured to send alarm information according to the extracted predetermined event, where the alarm information includes intrusion position information and/or intrusion direction information.

Optionally, the event extraction module is further configured to count a number of consecutive frames of the event, and determine that the event occurs when the number of frames is greater than a predetermined number of alarm frames.

By the device, the three-dimensional coordinate information of the target is acquired according to the video image, and the position relation between the virtual door and the target is judged based on the three-dimensional coordinate information of the target, so that an event is extracted, the event misjudgment caused by the perspective effect in the two-dimensional image is effectively avoided, and the accuracy of event judgment is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of one embodiment of a video surveillance method of the present invention.

FIG. 2 is a flow chart of another embodiment of a video surveillance method of the present invention.

FIG. 3 is a flow chart of one embodiment of determining a three-dimensional coordinate portion of a target of the present invention.

FIG. 4 is a flow chart of yet another embodiment of a video surveillance method of the present invention.

FIG. 5 is a flow chart of yet another embodiment of a video surveillance method of the present invention.

FIG. 6 is a schematic diagram of one embodiment of a video surveillance apparatus of the present invention.

FIG. 7 is a diagram of one embodiment of a three-dimensional coordinate determination module in a video surveillance apparatus of the present invention.

Fig. 8 is a schematic diagram of another embodiment of a three-dimensional coordinate determination module in the video surveillance apparatus of the present invention.

Fig. 9 is a schematic diagram of a three-dimensional coordinate determination module in the video surveillance apparatus according to another embodiment of the present invention.

Fig. 10 is a schematic diagram of another embodiment of a video surveillance apparatus of the present invention.

Detailed Description

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

A flow diagram of one embodiment of a video surveillance method of the present invention is shown in fig. 1.

In step 101, the image pickup apparatus acquires a video image of a monitored area.

In step 102, three-dimensional coordinate information of an object to be monitored in a video image is determined. The target can be obtained by comparing the current image with the background image, or can be obtained by comparing the previous and next frames of images. The target may be a moving or stationary object, or may be a pixel or group of pixels that change in the image. The three-dimensional coordinate information of the target determined according to the video image can be in a camera coordinate system or a ground coordinate system.

In step 103, the position relationship between the target and the virtual door is determined based on the three-dimensional coordinate information of the target and the virtual door, thereby extracting the occurrence of the event.

In one embodiment, the video surveillance may be acquired simultaneously for multiple targets, thereby reducing missed extraction of events.

The virtual door is a door area vertical to the ground, and the intersection line of the virtual door and the ground can be a straight line, a line segment or a broken line. By the method, the boundary of the area to be monitored and protected can be defined as much as possible, monitoring is carried out from the ground to the space, and the comprehensiveness and the accuracy of event extraction are improved.

The virtual door extends upward on the basis of the straight line, line segment or broken line, and the height may be infinite or may be a predetermined height. The virtual door can be arranged in a mode of arranging an interface line between the virtual door and the ground; the virtual door can also be set directly by defining a convex polygon, the polygon is vertical to the ground, and the lower boundary of the polygon is the intersection line of the virtual door and the ground; the distance between the virtual door and the camera can be set; or the boundary line between the virtual door extension surface and the ground is set firstly, then the virtual door area is set, and the upper boundary and the lower boundary of the virtual door can be appointed by the user image, or the height is set. Through the mode, the virtual door can be freely set according to the monitoring requirement, so that the method is more flexible, and the video monitoring area is more targeted.

In one embodiment, the video images of consecutive frames may be compared, or the video images may be compared with the background image, and the change point or point group may be obtained according to the difference of the values of the pixels at the same position. The values of the pixels may include color information, gray information, or depth information according to the type of the video image. The target can be obtained from the change point or the point group by denoising and eliminating the error point, or a threshold is set, and the change point or the point group is determined as the target when the change value exceeds the threshold value.

By such a method, a point or a point group which changes can be captured as a target, so that the monitoring sensitivity is improved, and the omission probability is reduced.

A flow diagram of another embodiment of the video surveillance method of the present invention is shown in fig. 2.

In step 201, the image capturing apparatus acquires a video image of a monitored area, which is a flat video image.

In step 202, planar coordinate information of the target is determined. The target can be an object which is still and moving in a monitored area, and can also be a point or a point group of which the value of a pixel point changes through the comparison of front and back frames of a plane video image or the comparison with a background image.

In step 203, 3D reconstruction is performed on the planar video image through a 3D algorithm to obtain three-dimensional coordinate information of the target.

In step 204, the relative position relationship between the three-dimensional coordinate information of the target and the three-dimensional coordinate information of the virtual door is determined, and the occurrence of an event is determined.

By the method, 3D reconstruction can be performed according to the single-plane video image and the plane coordinate information of the target, the three-dimensional coordinate information of the target is obtained, and the occurrence of the event is judged according to the three-dimensional coordinate information.

FIG. 3 is a flow diagram of one embodiment for determining three-dimensional coordinates of an object based on a single plane video image and plane coordinate information of the object.

In step 301, a vanishing line (a most basic concept in projection geometry, that is, a straight line formed by intersection points of all parallel lines in the real world on an image after projection) equation is obtained by using height information of an object at three different positions on the plane in the image. The height of the object in space needs to be known, the three different positions obtained need not be in a straight line, and the height information can be expressed in pixels.

In step 302, the length information of a straight line on the ground plane is used to indicate the rotation angle of the acquisition camera around the X-axis and the rotation angle around the Y-axis. The length of the acquired straight line in space is known, and the length information of the straight line in the image can be represented by pixels.

In step 303, a 2D to 3D mapping matrix is obtained using the vanishing line equation and the rotation angle of the camera.

In step 304, three-dimensional coordinates are obtained according to the mapping matrix in step 303 and the plane coordinates.

By the method, the three-dimensional coordinates can be obtained according to the plane coordinates of the target and the virtual door, so that the event extraction can be carried out according to the three-dimensional coordinates.

In one embodiment, the implementation method for obtaining the corresponding relationship between the 2D planar video image and the three-dimensional object is as follows:

firstly, calibrating a plane concerned by video monitoring, and obtaining the corresponding Euclidean distance of any two points on the plane in a real world coordinate system after calibration.

The correspondence between the two-dimensional image and the three-dimensional object can be expressed as the following formula:

wherein λ represents the distortion coefficient of the video camera, and λ is reduced to 1 in consideration of the smaller distortion of the general video camera, so the important point of the plane calibration is to obtain the projection matrix p, which can be determined by α (α represents the rotation angle (tilt angle) of the camera around the X axis, β represents the rotation angle (pan angle) of the camera around the Y axis) and the vanishing equation through derivation, and the details of derivation can be referred to in "Fengjun L v Tao Zhao Ram new elevation, self calibration of a camera from video of a walking human, ICPR, 2002".

Then, the vanishing line equation is obtained by using the height information (pixel representation) of an object (with known height in space) at three different positions (not on a straight line) on the ground plane in the image, and the length information (pixel representation) of a straight line (with known length in space) on the ground plane is used to obtain α, so as to obtain a projection matrix P to calibrate a plane.

A. The user finds a ground plane on the input video image, arbitrarily specifies two points on the ground plane, and the pixel position thereof is expressed as (u)₁,v₁) And (u)₂,v₂) And giving the Euclidean distance d between the coordinates of the two points in the real world coordinate system.

B. The optimal α is calculated using a method that first discretizes α between 0 and 360 degrees, respectively, for each possible α combination (α)_i,β_i) Constructing a mapping matrix P_iThe pixel position (u) obtained in step A₁,v₁) And (u)₂,v₂) By P_iCalculating the Euclidean distance d once after obtaining the corresponding three-dimensional real world coordinates_iD, which has the smallest error with d_iCorresponding (α)_i,β_i) As camera or camcorder parameters.

Since α are all between 0 degrees and 360 degrees in size, α can be discretized separately, such as α for 1 degree, 2 degrees, ….360 degrees, β for 1 degree, 2 degrees, ….360 degrees, as a set of candidate combinations for each of the possible angle values (α)_i,β_i)。

Slightly deforming equation 1 to obtain equation 2:

wherein P is^-1Representing the inverse of the matrix P, i.e. P^-1The dimension of the matrix P is 3 ×, but considering that the nominal point is located on the ground plane in the real world, i.e. the coordinate in the Z direction is 0, the matrix P is degenerated to a matrix of 3 ×, which can be inverted.

Will (u)₁,v₁)、(u₂,v₂) Substituting the above formula to obtain the world coordinates (X) of the two points₁,Y₁,Z₁) And (X)₂,Y₂,Z₂) Then, calculate its Euclidean distance

Error △ with d (α)_i,β_i) The specific definition can have a plurality of expressions, and only two of the more common expressions are recommended

Or | (X)₁-X₂)²+(Y₁-Y₂)²+(Z₁-Z₂)²-d²|。

For all possible values of α, the set of parameters α with the smallest error is selected^*,β^*As the optimum parameters:

C. and calculating a vanishing line equation. In the prior art, any method for obtaining the wire-out is applicable to the present invention, wherein the calculation method of the wire-out can be referred to in the literature "Single-View Metrology: Algorithms and Applications", Antonio Criminisi, proceedings of 24DAGM symposium on Pattern Recognition ".

D. And calculating a projection matrix P from the two-dimensional coordinates to the three-dimensional coordinates of the estimation matrix.

After acquiring the camera parameters α, a projection matrix may be obtained by:

P＝K[R|t](4)

wherein, the matrix P is a mapping matrix of 3 × 4, K is an internal parameter matrix of 3 × 3,

(u₀,v₀) The eigen points, which represent the video image, may be represented by the center point of the video image,

representing the focal length of the camera or video camera, R is a rotation matrix of 3 × 3, represented by equation (5), where α represents the rotation angle of the camera about the X-axis (tilt angle), β is the rotation angle about the Y-axis (pan angle), γ is the rotation angle about the Z-axis (yaw angle),

γ approximates the tilt of the equation of vanishing line with respect to the horizontal.

t is a matrix of 3 × 1, which can be expressed as t ═ R [0, Hc,0]^THc denotes the height of the finder device from the ground, T denotes the pair [0, Hc,0]A transposition operation is performed.

E. Since the virtual door and the target are on the same horizontal plane, that is, the Z-axis coordinate is the same, in the calculation process, the calculation of the Z-axis is omitted, and only the XY-axis coordinate of the target in the three-dimensional coordinate is needed to be obtained. And (4) substituting any coordinate point on the ground plane in the image into the mapping matrix P by using a formula (6) to obtain the XY axis coordinate under the corresponding three-dimensional coordinate. P^-1Represents the inverse of the mapping matrix of 3 × 3 after the degeneration process.

Using the above method, a matrix formula (6) for conversion between 2D and 3D can be obtained, and the three-dimensional coordinates corresponding to the plane coordinates can be obtained by substituting the plane coordinates into the formula (6).

Before image calibration, preprocessing such as noise reduction filtering, image enhancement and/or electronic image stabilization can be performed on the image, so that the detection accuracy is improved.

By the method, the plane coordinate of the target can be converted into the three-dimensional coordinate under the condition that the image acquired by only one 2D camera is a plane video image, so that the purpose of extracting the event based on the three-dimensional coordinate of the object is realized with less cost and equipment.

Fig. 4 is a flowchart of another embodiment of a video monitoring method according to the present invention.

In step 401, a plurality of flat video images taken from different positions are acquired. The plurality of plane video images are video images for monitoring the same area.

In step 402, a plurality of flat video images are 3D reconstructed to obtain 3D video images.

In step 403, three-dimensional coordinate information of the object is acquired from the 3D video image. The method can acquire the changed points or point groups according to the change of the coordinates and the color information of the pixel points in the 3D video image, extract the target from the changed points or point groups, and acquire the three-dimensional coordinate information of the target according to the 3D video image.

In step 404, the position relationship between the target and the virtual door is determined according to the three-dimensional coordinate information of the target and the three-dimensional coordinate information of the virtual door, and the occurrence of the event is extracted.

By the method, a plurality of plane video images shot from different positions in the same monitoring area can be obtained, 3D reconstruction is carried out on the plane video images to obtain a 3D video image, and target three-dimensional coordinate information is determined according to the 3D video image, so that the three-dimensional coordinate information can be used for extracting an event, and misjudgment of the event caused by perspective effect when two-dimensional coordinates are used for extracting the event is avoided. Compared with a method for performing 3D reconstruction by using a single-plane video image, the method is more accurate, so that the error of event extraction is reduced.

In one embodiment, two 2D cameras are used to capture planar video images from two non-identical position shots. The method for 3D reconstruction of two planar video images may be specifically as follows:

firstly, a binocular stereo vision system based on two cameras must be installed on a stable platform, when shooting a monitored scene, it is ensured that internal parameters (such as focal length) of the cameras and the position relationship of the two cameras cannot be changed, otherwise, the system is calibrated again. And acquiring imaging pictures of the two cameras, analyzing the result and extracting depth information.

For higher accuracy, the focal length and base length of the cameras can be increased while keeping the surveillance zone as close as possible to the stereo vision system, ensuring that the overlap area of the surveillance zone is large enough and that the two cameras are roughly aligned, i.e. the rotation angle of each camera about the optical axis cannot be too large.

A. Distortion is eliminated. The lens distortion in the radial and tangential directions is eliminated using a mathematical method,

radial distortion causes the rays to bend more far from the center of the lens than near the center. For radial distortion, the imaging position is corrected according to equation (7).

x₁＝x(1+k₁r²+k₂r⁴+k₃r⁶)

y₁＝y(1+k₁r²+k₂r⁴+k₃r⁶) (7)

Here, x₁，y₁For the corrected new position, x, y are the original positions, and for the cheap network camera, the first two items are used; the third term is used with a large distortion, such as a fisheye camera.

Tangential distortion is caused by lens manufacturing defects that cause the lens itself to be non-parallel to the image plane. For tangential distortion, the imaging position is corrected according to equation (8).

x₂＝x+[2p₁y+p₂(r²+2x²)]

y₂＝y+[p₁(r²+2y²)+2p₂x](8)

Here, x₂，y₂For the corrected new position, x, y are the original positions.

B. And (5) correcting the camera. The angle and distance of the camera are adjusted to output a corrected image with the lines aligned (the image is in one plane, each line on the image is exactly aligned).

C. Image matching: searching the same characteristics in the visual fields of the two cameras, and outputting a disparity map, wherein the difference value refers to the difference value x of the same coordinate on the two images on the x coordinate₁-x₂。

D. And (5) re-projecting. Knowing the relative geometric positions of the two cameras, the disparity map is converted to distance by triangulation.

As shown in fig. 3, the depth information Z value can be deduced using similar triangles.

Wherein x is₁-x₂Namely, the parallax is represented by pixel points in dimension, the pixel points in dimension f, and the pixel points in dimension T, which is the center distance between the two cameras, is generally set to be millimeters.

By the method, the distance between the plane video image and the camera can be obtained, so that the depth information is obtained on the basis of the plane coordinate, and the three-dimensional coordinate is obtained.

In one embodiment, video images may be acquired using a binocular vision based 3D camera. The method can simplify the processes of multi-plane camera position selection, installation and calibration, and is more convenient to use.

Fig. 5 is a flow chart of a video monitoring method according to still another embodiment of the present invention.

In step 501, a depth image is acquired by a 3D camera, a distance sensitive device, or the like. Such as kinect sensors, three-dimensional laser scanners or photographic scanners. The environment in front of the lens is sensed through the sensor, the physical distance between an article and the sensor is judged in a black-white spectrum mode, depth information of each point in the visual field of the lens is collected, and therefore a depth image is obtained. The values of the pixels in the depth image are depth information, that is, the numerical value of each pixel expresses the distance information between the physical environment corresponding to the pixel and the camera. In one embodiment, a kinect sensor is used for obtaining depth information, an infrared emitter of the kinect sensor emits infrared rays, structured light is formed through a grating, a speckle pattern is projected on the surface of an object, a CMOS camera shoots the speckle image, the approximate depth distance of the object to be measured is obtained according to the distance corresponding to a reference speckle pattern, the speckle pattern of the object to be measured is locally compensated by a triangulation method, a depth image is obtained, one depth image is integrated every 30ms, and the depth image is displayed by a 3D effect model. In one embodiment, the distance between the object in front of the lens and the lens can be obtained by a three-dimensional laser scanner or a camera scanner through a laser measurement principle, and a depth image composed of dense point clouds containing three-dimensional coordinate information is formed.

In step 502, three-dimensional coordinate information of the target is acquired from the depth image. The target may be an object located in the monitored area, a point or a group of points where the depth information changes by comparing the depth images of consecutive frames, or a point or a group of points where the depth information changes by comparing the depth images of background.

In step 503, an event occurrence is extracted based on three-dimensional coordinate information of the target in a real environment and the positional relationship of the virtual door. Extractable events include the presence of an object inside the virtual door, the passage of an object through the virtual door from the outside to the inside, or the location of an object outside the virtual door, etc. Whether to alarm or not can be judged and alarm information can be determined according to the relative position relation between the target and the virtual door.

By the method, the three-dimensional coordinate information of the target is acquired through the depth image, and the position relation between the virtual door and the target is judged based on the three-dimensional coordinate information of the target, so that the occurrence of an event is extracted, the event misjudgment caused by the perspective effect in the two-dimensional image is effectively avoided, and the accuracy of event judgment is improved.

In one embodiment, a 2D camera may be used to obtain a flat video image of a monitored area, the flat video image and a depth image monitor the same area, a point or a group of points that changes is extracted as a target according to a change in color information in the flat video image, and then three-dimensional coordinate information of the target is obtained according to the depth image.

By the method, the target can be obtained according to the plane video image with higher definition and color information, and the phenomenon that the target is mistakenly judged as noise due to unobvious depth information change is prevented, so that the probability of missing capture of the target is reduced, and the monitoring is more reliable.

In one embodiment, since the target may be in a motion state, the occurrence of an event may be extracted according to a motion trajectory of the target.

And extracting the moving target by comparing the front and the back multi-frame video images to obtain the moving track of the target. The method in fig. 2, 3 or 4 may be adopted to obtain the three-dimensional coordinate information of the motion trajectory according to the type of the obtained video image, analyze the three-dimensional coordinate information of the motion trajectory of the target and the three-dimensional coordinate information of the virtual door, and judge the relative position relationship between the two, thereby judging the occurrence of the event.

Extracting the occurrence of an event according to the motion trajectory of the target and the three-dimensional coordinate information of the virtual door, the extracted event may include: through the virtual door from outside-in, from inside-out, moving from outside-in and not through the virtual door, moving from inside-out and not through the virtual door. By the method, the targets can be continuously monitored, and the accuracy of event extraction is improved.

In one embodiment, the method for extracting the event according to the three-dimensional coordinate information of the target and the position relationship of the virtual door is as follows:

A. and acquiring three-dimensional coordinate information of the target and the virtual door. A reference line is determined, where a line is selected that is perpendicular to the lower boundary of the image through the center lowest point of the image.

B. Respectively calculating included angles between a connecting line from each line segment endpoint set by the virtual gate in the current frame image to the reference point coordinate and a reference straight line, and respectively recording the included angles as theta₁，θ₂…θ_mM is the number of end points, the included angle α between the line from the target coordinate point to the reference point coordinate in the current frame image and the reference straight line is calculated, and theta is calculated₁，θ₂…θ_mAnd α are sorted according to the magnitude of the values, and the minimum value of theta greater than α is selected as T₁Selecting a maximum value of theta less than α and recording the maximum value as T₂Record T₁、T₂Three-dimensional coordinate (x) of the corresponding line segment end point after conversion₁,y₁) And (x)₂,y₂) Recording the three-dimensional coordinates (x, y) of the moving object after conversion at the moment, and recording the three-dimensional coordinates (x, y) of the reference point after conversionX,Y)。

C. Respectively calculating the included angles between the connecting lines from the end points of the line segments set by the virtual gate in the previous frame image to the reference point coordinates and the reference straight line, and respectively recording the included angles as theta₁′，θ₂′…θ_m'm is the number of end points, the included angle α' between the line connecting the target coordinate point to the reference point coordinate in the previous frame image and the reference straight line is calculated, and theta is calculated₁′，θ₂′…θ_m'and α' are sorted according to the magnitude of the values, and the minimum value of theta 'larger than α' is selected as T₁'T' is taken as the maximum value of theta 'smaller than α'₂Record T₁′、T′₂Three-dimensional coordinate (x) of the corresponding line segment end point after conversion₁′,y₁') and (x)₂′,y₂') the three-dimensional coordinates (x ', y ') of the moving object after conversion at this time are recorded.

D. Separately calculate T₁，T₂Three-dimensional coordinate (x) of the corresponding line segment end point after conversion₁,y₁) And (x)₂,y₂) Distance d from three-dimensional coordinates (X, Y) after conversion of reference points₁，d₂And calculating the distance d between the three-dimensional coordinates (X, Y) of the moving target after conversion and the three-dimensional coordinates (X, Y) of the reference point after conversion.

d＝((X-x)²+(Y-y)²)^1/2(10)

Determining d and d₁And d₂Of the cell, three results are possible: d to d₁And d₂Are all larger, d is greater than d₁And d₂Are all small, d is between d₁And d₂The results are indicated as 1.1,1.2 and 1.3, respectively.

E. Separately calculate T₁′，T′₂Three-dimensional coordinate (x) of the corresponding line segment end point after conversion₁′,y₁') and (x)₂′,y₂') distance d from the three-dimensional coordinates (X, Y) after the conversion of the reference point₁′，d₂'calculating the distance d' between the three-dimensional coordinates (X, Y) of the moving object after conversion and the three-dimensional coordinates (X, Y) of the reference point after conversion.

Determining d' and d₁' and d₂The size of `, may occurThree results: d' to d₁' and d₂'all are large, d' is greater than d₁' and d₂'all are small, d' is between d₁' and d₂' between, the results are denoted 2.1,2.2,2.3, respectively.

F. And judging the movement direction according to the result.

Results 1.1,2.1 combinations: the distance between the moving target and the reference point is always larger than the distance between the end point of the line segment set by the virtual gate and the reference point, and the situation of passing through the virtual gate does not occur.

Results 1.1,2.2 combinations: the distance between the moving target and the reference point is from less than the distance between the moving target and the reference point and the distance between the moving target and.

Results 1.1,2.3 combination: the distance between the moving target and the reference point is from less than the distance between the moving target and the reference point and the distance between the moving target and.

Results 1.2,2.1 combinations: the distance between the moving target and the reference point is larger than the distance between the moving target and the reference point and the distance between the moving target and the.

Results 1.2,2.2 combinations: the distance between the moving target and the reference point is always smaller than the distance between the end point of the line segment set by the virtual gate and the reference point, and the situation of passing through the virtual gate does not occur.

Results 1.2,2.3 combination: the distance between the moving target and the reference point is larger than the distance between the moving target and the reference point and the distance between the moving target and the.

Results 1.3,2.1 combinations: the distance between the moving target and the reference point is larger than the distance between the moving target and the reference point and the distance between the moving target and the.

Results 1.3,2.2 combinations: the distance between the moving target and the reference point is from less than the distance between the moving target and the reference point and the distance between the moving target and.

Results 1.3,2.3 combinations: the distance between the moving target and the reference point is always between the distance between the end point of the line segment set by the virtual door and the reference point, the situation of passing through the virtual door does not occur, and no alarm is given.

By the method, the occurrence of the event can be extracted according to the motion state of the target, the motion direction of the target is judged, whether the target passes through the virtual door is judged, and the accurate and detailed event extraction effect is achieved.

In one embodiment, the three-dimensional coordinate information of the target acquired from the video image is three-dimensional coordinate information in a camera coordinate system, and the three-dimensional coordinate information of the virtual door is three-dimensional coordinate information in a ground coordinate system. It is necessary to unify the three-dimensional coordinate information of the target and the three-dimensional coordinate information of the virtual door into the same coordinate system. In one embodiment, the three-dimensional coordinate information of the target in the camera coordinate system is converted into the ground coordinate system according to the relation between the camera coordinate system and the ground coordinate system. The virtual door can be a door area vertical to the ground, coordinate systems of the virtual door and the target are unified to a ground coordinate system, the relative position relation between the virtual door and the target can be judged only according to the horizontal coordinate information of the virtual door and the target, and the occurrence of an event can be judged according to the relative position relation between the virtual door and the target.

By the method, the three-dimensional coordinate information of the virtual door and the target can be unified into a ground coordinate system, the position relation of the virtual door and the target can be judged in the same coordinate system, and the accuracy of event extraction is improved. Under the condition that the virtual door is perpendicular to the ground, the relative position relation of the virtual door and the ground is judged only according to the horizontal coordinate information, and the complexity of event extraction is reduced.

In one embodiment, the three-dimensional coordinate information of the virtual door is three-dimensional coordinate information in a camera coordinate system, or the three-dimensional coordinate information of the virtual door in the ground coordinate system may be converted into the camera coordinate system according to a relationship between the camera coordinate system and the ground coordinate system, so as to obtain a relative position relationship between the target and the virtual door in the camera coordinate system, and determine the occurrence of the event according to the relative position relationship between the target and the virtual door.

By the method, the three-dimensional coordinate information of the virtual door and the target can be unified in the camera coordinate system, the position relation of the virtual door and the target can be judged in the same coordinate system, and the accuracy of event extraction is improved. The three-dimensional coordinate information of the target does not need to be converted, so that the data processing steps are simplified.

In one embodiment, the three-dimensional coordinate information of the target obtained from the video image is three-dimensional coordinate information in a ground coordinate system, and the three-dimensional coordinate information of the virtual door is three-dimensional coordinate information in the ground coordinate system. The relative position relationship between the virtual door and the target can be judged only according to the horizontal coordinate information of the virtual door and the target, and the occurrence of the event can be judged according to the relative position relationship between the virtual door and the target.

By such a method, the occurrence of an event is judged only from the horizontal coordinate information of the two, and the complexity of event extraction can be reduced.

In one embodiment, the positional relationship of the target to the virtual door includes being inside the virtual door, being outside the virtual door, being in the area of the virtual door, passing through the virtual door from the outside inward, passing through the virtual door from the inside outward, moving from the outside inward and not passing through the virtual door, or moving from the inside outward and not passing through the virtual door. In these events, it can be determined which one or ones are the events requiring alarm according to specific needs, such as being located inside the virtual door, passing through the virtual door from outside to inside, and the like. The method can facilitate the user to select the event needing alarming according to the specific use scene, thereby increasing the available scenes of the method.

In one embodiment, an alarm function is also included. Events requiring an alarm are scheduled, such as being located within the virtual door, being in the area of the virtual door, traversing the virtual door from the outside to the inside, and/or traversing the virtual door from the inside to the outside, etc. When these events occur, an alarm is triggered. By the method, the alarm information can be clearly provided for the user, and the user is helped to process the occurred events in time.

In one embodiment, target type analysis is also included. The type of object includes a human, an animal, or a car. And acquiring the type of the target through image matching, thereby enriching the event extraction information. By the method, the target type needing alarming can be selected, and the workload of a user is reduced.

In one embodiment, the number of frames of event extraction is counted, and the event is determined to occur when the extraction of the event reaches a predetermined number of frames. By the method, misjudgment can be prevented, and part of false alarms can be filtered.

A schematic diagram of one embodiment of a video surveillance apparatus of the invention is shown in fig. 6. The system comprises a video acquisition module 601, a distance sensing device and a monitoring module, wherein the video acquisition module is used for acquiring video images of a monitored area, and can be a single or multiple 2D cameras, 3D cameras or distance sensing devices; the video acquisition module 601 sends the acquired video image to the three-dimensional coordinate determination module 602, and the three-dimensional coordinate determination module 602 acquires three-dimensional coordinate information of a target according to the video image, where the target may be obtained by comparing a current image with a background image, or may be obtained by comparing previous and subsequent frames of images, and the target may be a moving or static object, or may be a pixel point or a point group that changes in the image. The three-dimensional coordinate determination module 602 sends the three-dimensional coordinate information of the target to the event extraction module 603; the event extraction module 603 determines the position relationship between the target and the virtual door according to the three-dimensional coordinate information of the target and the three-dimensional coordinate information of the virtual door, and extracts the occurrence of the event.

The virtual door is a door area vertical to the ground, and the intersection line of the virtual door and the ground can be a straight line, a line segment or a broken line. The device can define the boundary of the area to be monitored and protected as much as possible, and monitors all the areas from the ground to the space, thereby improving the comprehensiveness and the accuracy of event extraction.

The virtual door extends upward on the basis of the straight line, the line segment or the broken line, and the height can be infinite or preset. The virtual door can be arranged in a mode of arranging an interface line between the virtual door and the ground; or can be set by directly defining a convex polygon, wherein the polygon is vertical to the ground, and the lower boundary of the polygon is the intersection line of the virtual door and the ground; the distance between the virtual door and the camera can be set; or the boundary line between the virtual door extension surface and the ground is set firstly, then the virtual door area is set, and the upper boundary and the lower boundary of the virtual door can be appointed by the user image, or the height is set. The setting of the virtual door may be realized by an interface of an external program. The device can freely set the virtual door according to the monitoring requirement, has more flexibility and ensures that the video monitoring area has more pertinence.

FIG. 7 is a diagram of one embodiment of a three-dimensional coordinate determination module in a video surveillance apparatus of the present invention. The reference numeral 701 denotes a frame comparison unit, which is used to compare consecutive frames of a video acquired by the video acquisition module, or compare a video image of a current frame with a background image, and acquire a pixel point or a point group of which the pixel value changes in the video image. The pixel value may be color information, three-dimensional coordinate information, or depth information according to the kind of video image. The reference numeral 702 denotes an object determining unit, which is configured to extract an object by performing operations such as screening or setting a threshold on a change point or a point group extracted from the video image by the frame comparing unit 701. The three-dimensional coordinate extraction unit 703 determines three-dimensional coordinate information of the target from the video image.

Such a device can capture a point or a point group that changes as a target, thereby improving the sensitivity of monitoring and reducing the probability of omission.

In one embodiment, the video image acquired by the video acquisition module is a flat video image. As shown in fig. 8 (a), 801a is a plane coordinate acquiring unit for acquiring plane coordinate information of an object from an acquired plane video image. Reference numeral 802 denotes a three-dimensional coordinate extracting unit, which converts coordinates using the plane coordinate information of the target acquired by the plane coordinate acquiring unit 801a to acquire three-dimensional coordinate information of the target. The conversion of the 2D coordinates into the 3D coordinates can be achieved by the above formula (6).

By the aid of the device, under the condition that the video acquisition module is a 2D camera and the acquired video image is a single 2D video image, the three-dimensional coordinate information of the target can be acquired, and the event can be judged according to the three-dimensional coordinate information, so that event misjudgment caused by perspective effect is avoided.

In one embodiment, the three-dimensional coordinate determination module is shown in fig. 9 (a), where 901a is a frame comparison unit, configured to compare planar video images of consecutive frames, or compare the planar video images with a background image, to obtain a changed pixel point or a point group. 902a is an object determining unit for extracting an object from the change point or point group acquired by the frame comparing unit 901 a. 903a is a plane coordinate acquiring unit for acquiring plane coordinate information of the target from the plane video image, and 904 is a three-dimensional coordinate extracting unit for acquiring three-dimensional coordinate information of the target according to the plane coordinate information of the target and the operational relationship between the plane coordinate and the three-dimensional coordinate.

According to the device, the changed points or point groups can be obtained through frame comparison according to the single-plane video image as the target, the monitoring sensitivity is improved, the omission probability is reduced, the three-dimensional coordinate information of the target is obtained, and the event is judged through the three-dimensional coordinate information, so that the event misjudgment caused by the perspective effect is avoided.

In one embodiment, the video images acquired by the video acquisition module are a plurality of plane video images of the same area shot from different shooting positions. As shown in fig. 8 (b), 801b is a 3D reconstruction unit, configured to perform 3D reconstruction on a plurality of planar video images of the same area captured from different capturing positions to obtain a 3D video image; reference numeral 802 denotes a three-dimensional coordinate extracting unit which acquires three-dimensional coordinate information of the target from the 3D video image obtained by the 3D reconstruction unit 801 b.

The device can acquire a plurality of plane video images shot from different positions in the same monitoring area, perform 3D reconstruction on the plane video images to acquire a 3D video image, and determine the target three-dimensional coordinate according to the 3D video image, thereby extracting an event by using three-dimensional coordinate information and avoiding event misjudgment caused by perspective effect when two-dimensional coordinates are used for extracting the event.

In one embodiment, the three-dimensional coordinate determination module is shown in fig. 9 (b), where 901b is a 3D reconstruction unit, and performs 3D reconstruction on a plurality of planar video images captured from different positions to obtain a 3D video image. And 902b is a frame comparison unit, configured to compare previous and subsequent frames of the reconstructed 3D video image, or compare the 3D video image with a background image, to obtain a point or a point group at which three-dimensional coordinate information or color information changes. 903b is an object determining unit that extracts an object from the change point or point group determined by the frame matching unit 902 b. Reference numeral 904 denotes a three-dimensional coordinate extraction unit, which reconstructs an acquired 3D video image from the 3D reconstruction unit 901b to acquire three-dimensional coordinate information of the target.

The device can acquire the 3D video image through 3D reconstruction when the acquired image is a planar video image at a plurality of different shooting positions, acquire the change point or the point group in the 3D video image as a target, improve the monitoring sensitivity and reduce the omission probability. The event extraction can be carried out by using the three-dimensional coordinate information, and the event misjudgment caused by the perspective effect when the event extraction is carried out by using the two-dimensional coordinate is avoided.

In one embodiment, the video acquisition module is a distance sensitive device, the acquired video image is a depth image, and the three-dimensional coordinate determination module acquires three-dimensional coordinate information of the target according to the depth image. According to the device, the three-dimensional coordinate information of the target is acquired through the depth image, and the position relation between the virtual door and the target is judged based on the three-dimensional coordinate information of the target, so that an event is extracted, the event misjudgment caused by the perspective effect in the two-dimensional image is effectively avoided, and the accuracy of event judgment is improved.

In one embodiment, as shown in fig. 7, the frame comparison unit 701 compares depth images of consecutive frames, or compares a depth image with a background depth image, to obtain a pixel point or a point group with changed depth information; the target determination unit 702 extracts a target from the change point or the point group; the three-dimensional coordinate extraction unit 703 acquires three-dimensional coordinate information of the target from the depth image.

The device can extract the change point or the point group as the target according to the change of the depth information in the depth image, improve the monitoring sensitivity and reduce the omission probability. The event extraction can be carried out by using the three-dimensional coordinate information, and the event misjudgment caused by the perspective effect when the event extraction is carried out by using the two-dimensional coordinate is avoided.

In one embodiment, the video capture module is a device comprising a distance sensitive device and a 2D camera, such as Kinect, PMD CARMERA or MESA SR, and the images obtained are depth images and flat video images. The continuous frames of the plane video image can be compared with the background image, the point or the point group with the changed color information is determined, and the target is extracted from the changed point or the point group. And acquiring three-dimensional coordinate information of the target according to the depth image.

The device can acquire the target according to the plane video image with higher definition and color information, and prevents the error judgment caused by unobvious depth information change from being noise, thereby reducing the probability of missing capture of the target and ensuring more reliable monitoring.

In one embodiment, the three-dimensional coordinate determination module further comprises a track determination unit, wherein the track determination unit determines the motion track of the target by recording the position information of the target in the multi-frame video images, and then determines the three-dimensional coordinate information of the motion track of the target by the three-dimensional coordinate extraction unit.

Extracting the occurrence of an event according to the motion trajectory of the target and the three-dimensional coordinate information of the virtual door, the extracted event may include: through the virtual door from outside-in, from inside-out, moving from outside-in and not through the virtual door, moving from inside-out and not through the virtual door. By the device, the targets can be continuously monitored, and the accuracy of event extraction is improved.

In one embodiment, a schematic diagram of a video surveillance apparatus of the present invention is shown in FIG. 10. The operation processes of 1001, 1002 and 1003 are respectively a video capture module, a three-dimensional coordinate determination module and an event extraction module, and are similar to those in fig. 6. The video surveillance apparatus may also include an alarm module 1005. When the event extraction module extracts a predetermined event that needs to be alarmed, an alarm signal and event information are sent to the alarm module 1005 to trigger alarming. The device can clearly provide alarm information for the user and help the user to timely process the occurred events. The predetermined events requiring an alarm may include being located within a virtual door, being in the area of the virtual door, traversing the virtual door from the outside inward and/or traversing the virtual door from the inside outward, etc.

In one embodiment, the video surveillance apparatus further comprises a target type analysis module 1004. The target type analyzing module 1004 obtains the target type through matching the video image obtained by the video capturing module, and the target type may include people, animals and/or vehicles. And acquiring the type of the target through image matching, thereby enriching the event extraction information. In one embodiment, the alert module 1005 may obtain the object type and provide the object type information to the user. By the aid of the device, the target type needing alarming can be selected, and workload of a user is reduced.

In one embodiment, the event extraction module counts the number of extracted events occurring, and determines that an event occurs when the extracted event reaches a predetermined number of frames. The device can prevent misjudgment and filter part of false alarms.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention and not to limit it; although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art will understand that: modifications to the specific embodiments of the invention or equivalent substitutions for parts of the technical features may be made; without departing from the spirit of the present invention, it is intended to cover all aspects of the invention as defined by the appended claims.

Claims

1. A video surveillance method characterized by:

acquiring a video image;

acquiring three-dimensional coordinate information of a target according to the video image;

extracting event occurrence based on the change situation of the position relation of the target and the virtual door, comprising:

acquiring a straight line which is perpendicular to the lower boundary of the video image through the lowest point of the center of the video image, and taking the straight line as a reference straight line;

acquiring an included angle α between a connecting line of the target and the reference point and a reference straight line, and a distance d between the target and the reference point;

obtaining the included angle between the reference line and the connecting line of each end point of the virtual door and the ground intersecting line, determining the distance d between the reference line and the end point corresponding to the smallest included angle between the reference line and the connecting line of the end point to the reference point larger than α₁And the distance d between the end point corresponding to the included angle between the connecting line from the maximum end point to the reference point and the reference straight line, which is smaller than α, and the reference point₂；

Determining d and d₁、d₂The magnitude relationship of (1);

according to d and d₁、d₂Determining the change condition of the position relationship between the target and the virtual door according to the change condition of the previous frame and the current frame;

the virtual door comprises three-dimensional coordinate information, the virtual door is a door area vertical to the ground, and the intersection line of the virtual door and the ground is a straight line, a line segment or a broken line.

2. The method of claim 1, wherein the obtaining three-dimensional coordinate information of the object from the video image comprises:

comparing the video images of continuous frames or comparing the video images with background images to obtain change points or point groups in the video images;

extracting a point or a point group from the change point or the point group as a target;

and determining three-dimensional coordinate information of the target.

3. The method of claim 1, wherein the video image is a flat video image;

the acquiring three-dimensional coordinate information of the target according to the video image comprises:

acquiring plane coordinate information of the target through the plane video image;

and 3D reconstruction is carried out through a 3D reconstruction algorithm according to the plane coordinate information to obtain the three-dimensional coordinate information of the target.

4. The method of claim 3, wherein the device that obtains the video image comprises a 2D camera.

5. The method of claim 1, wherein the video images comprise planar video images of a plurality of different capture locations;

3D reconstruction is carried out on the plurality of plane video images to obtain 3D video images;

and acquiring the three-dimensional coordinate information of the target according to the 3D video image.

6. The method of claim 5, wherein the equipment for acquiring the video images comprises more than two 2D cameras or binocular vision based 3D cameras.

7. The method of claim 1, wherein the video image is a depth image;

the three-dimensional coordinate information of the target is obtained according to the video image.

8. The method of claim 7, wherein the device that acquires the depth image comprises a distance sensitive device or a 3D camera.

9. The method of claim 1, wherein the extracting an event based on the positional relationship of the target and the virtual door occurs as:

and extracting the occurrence of an event according to the position relation between the horizontal coordinate information in the three-dimensional coordinate information of the target and a virtual door, wherein the virtual door comprises the horizontal coordinate information in the three-dimensional coordinates.

10. The method of claim 1, further comprising:

determining the motion track of the target according to the plurality of frames of video images;

determining three-dimensional coordinate information of the motion trajectory of the target;

extracting an event occurrence based on the motion trajectory of the target and a positional relationship of the virtual door.

11. The method of claim 1, wherein the event comprises being inside the virtual door, being outside the virtual door, being in the area of the virtual door, passing through the virtual door from the outside inward, passing through the virtual door from the inside outward, moving from the outside inward without passing through the virtual door, and/or moving from the inside outward without passing through the virtual door.

12. The method of claim 1, wherein the type of object comprises a human, an animal, and/or a car.

13. The method according to claim 1, further comprising sending alarm information if a predetermined event is extracted, wherein the alarm information comprises intrusion position information and/or intrusion direction information.

14. The method of claim 1, wherein extracting the event occurrence based on the positional relationship between the target and the virtual door comprises counting a number of consecutive frames of the event, and determining the event occurrence when the number of frames is greater than a predetermined alarm number of frames.

15. A video monitoring apparatus, comprising:

the video acquisition module is used for acquiring a video image;

the three-dimensional coordinate determination module is used for acquiring three-dimensional coordinate information of the target according to the video image;

the event extraction module is used for extracting event occurrence based on the change situation of the position relation between the target and the virtual door, and comprises: acquiring a straight line which is perpendicular to the lower boundary of the video image through the lowest point of the center of the video image, and taking the straight line as a reference straight line;

Determining d and d₁、d₂The magnitude relationship of (1);

16. The apparatus of claim 15, wherein the three-dimensional coordinate determination module comprises:

the frame comparison unit is used for comparing the video images of continuous frames or comparing the video images with background images to obtain change points or point groups in the video images;

a target determination unit configured to extract a point or a point group as a target from the change point or the point group;

and the three-dimensional coordinate extraction unit is used for determining the three-dimensional coordinate information of the target.

17. The apparatus of claim 15, wherein the video image is a flat video image;

the three-dimensional coordinate determination module includes:

the plane coordinate acquisition unit is used for acquiring plane coordinate information of the target through the plane video image;

and the three-dimensional coordinate extraction unit is used for performing 3D reconstruction through a 3D reconstruction algorithm according to the plane coordinate information to obtain the three-dimensional coordinate information of the target.

18. The apparatus of claim 17, wherein the video capture module comprises a 2D camera.

19. The apparatus of claim 15, wherein the video image comprises a planar video image of a plurality of different capture locations;

the three-dimensional coordinate determination module includes:

the 3D reconstruction unit is used for performing 3D reconstruction on the plurality of plane video images to obtain 3D video images;

and the three-dimensional coordinate extraction unit is used for acquiring the three-dimensional coordinate information of the target according to the 3D video image.

20. The apparatus of claim 19, wherein the video capture module comprises two or more 2D cameras or binocular vision based 3D cameras.

21. The apparatus of claim 15, wherein the video image is a depth image;

and the three-dimensional coordinate determination module is used for acquiring the three-dimensional coordinate information of the target according to the depth image.

22. The device of claim 21, wherein the video capture module comprises a distance sensitive device or a 3D camera.

23. The apparatus of claim 15, wherein the event extraction module is configured to extract the event occurrence according to a position relationship between horizontal coordinate information in the three-dimensional coordinate information of the target and a virtual door, wherein the virtual door includes the horizontal coordinate information in the three-dimensional coordinates.

24. The apparatus of claim 15,

the three-dimensional coordinate determination module includes:

the track determining unit is used for determining the motion track of the target in the video image;

a three-dimensional coordinate extraction unit for determining three-dimensional coordinate information of the motion trajectory of the target;

the event extraction module is further used for extracting event occurrence based on the motion trail of the target and the position relation of the virtual door.

25. The apparatus of claim 15, wherein the event comprises being inside the virtual door, outside the virtual door, in a virtual door area, passing through the virtual door from outside to inside, passing through the virtual door from inside to outside, moving from outside to inside and not passing through the virtual door, and/or moving from inside to outside and not passing through the virtual door.

26. The apparatus of claim 15, further comprising a target type analysis module for analyzing a target type, the target type comprising a human, an animal and/or a car.

27. The device of claim 15, further comprising an alarm module for sending alarm information according to the extracted predetermined event, wherein the alarm information includes intrusion position information and/or intrusion direction information.

28. The apparatus of claim 15, wherein the event extraction module is further configured to count a number of consecutive frames of the event, and determine that the event occurs when the number of the consecutive frames is greater than a predetermined number of alarm frames.