US8842162B2 - Method and system for improving surveillance of PTZ cameras - Google Patents

Method and system for improving surveillance of PTZ cameras Download PDF

Info

Publication number
US8842162B2
US8842162B2 US13/586,884 US201213586884A US8842162B2 US 8842162 B2 US8842162 B2 US 8842162B2 US 201213586884 A US201213586884 A US 201213586884A US 8842162 B2 US8842162 B2 US 8842162B2
Authority
US
United States
Prior art keywords
scene
point
human object
obtaining
ptz
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US13/586,884
Other versions
US20140049600A1 (en
Inventor
Vladimir GOLDNER
Guy BOUDOUKH
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Monroe Capital Management Advisors LLC
Original Assignee
Nice Systems Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nice Systems Ltd filed Critical Nice Systems Ltd
Priority to US13/586,884 priority Critical patent/US8842162B2/en
Assigned to NICE-SYSTEMS LTD reassignment NICE-SYSTEMS LTD ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BOUDOUKH, GUY, GOLDNER, VLADIMIR
Publication of US20140049600A1 publication Critical patent/US20140049600A1/en
Application granted granted Critical
Publication of US8842162B2 publication Critical patent/US8842162B2/en
Assigned to QOGNIFY LTD. reassignment QOGNIFY LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: NICE SYSTEMS LTD.
Assigned to MONROE CAPITAL MANAGEMENT ADVISORS, LLC reassignment MONROE CAPITAL MANAGEMENT ADVISORS, LLC CORRECTIVE ASSIGNMENT TO CORRECT THE PROPERTY NUMBERS PREVIOUSLY RECORDED AT REEL: 047871 FRAME: 0771. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT. Assignors: ON-NET SURVEILLANCE SYSTEMS INC., QOGNIFY LTD.
Assigned to QOGNIFY LTD., ON-NET SURVEILLANCE SYSTEMS INC. reassignment QOGNIFY LTD. RELEASE OF SECURITY INTEREST IN PATENT COLLATERAL Assignors: MONROE CAPITAL MANAGEMENT ADVISORS, LLC, AS ADMINISTRATIVE AGENT
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • G06K9/00
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • G08B13/19602Image analysis to detect motion of the intruder, e.g. by frame subtraction
    • G08B13/19604Image analysis to detect motion of the intruder, e.g. by frame subtraction involving reference image or background adaptation with time to compensate for changing conditions, e.g. reference image update on detection of light level change
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • G08B13/19602Image analysis to detect motion of the intruder, e.g. by frame subtraction
    • G08B13/19608Tracking movement of a target, e.g. by detecting an object predefined as a target, using target direction and or velocity to predict its new position
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B29/00Checking or monitoring of signalling or alarm systems; Prevention or correction of operating errors, e.g. preventing unauthorised operation
    • G08B29/18Prevention or correction of operating errors
    • G08B29/185Signal analysis techniques for reducing or preventing false alarms or for enhancing the reliability of the system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/698Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture
    • H04N5/23238

Definitions

  • the subject matter relates generally to PTZ cameras, and more specifically to surveillance using PTZ cameras.
  • a main product in Video Analytics is PIDS (Perimeter Intrusion Detection System). Normally it includes one fixed video camera, which detects all suspected objects in its field of view (FOV), raises an alarm and tracks the suspected objects until they remain in the FOV.
  • FOV field of view
  • PTZ pan/tilt/zoom
  • the intrusion detection is performed in the PTZ camera (either static or scanning), that continues with tracking after detection.
  • the intrusion detection is performed in a fixed camera, which triggers the PTZ camera.
  • the most sensitive part of the PTZ tracking is the object's initial “acquiring” or start of the tracking. Therefore, the existing solutions are less robust especially at this stage. Any moving object that appears in the frame may “catch” the PTZ camera. Even if there are no moving pixels in the frame other than the object, the object's “acquisition” fails frequently because of lack of the clean background model (without the object), especially if it moves toward the camera or goes far from the camera.
  • said second model defines a false positive determination that said at least one point comprises a human object; wherein said second scene model is created when the scene does not contain a human object; obtaining an image of the scene, said image is captured by a video camera; determining whether the human object is detected at the at least one point of said captured image by applying said first scene terrain model and said second scene model on the least one point.
  • the method further comprises obtaining a position of a PTZ camera, the position including values of pan, tilt and zoom; detecting a PTZ frame by the PTZ camera at the obtained PTZ position; obtaining a successful detection of the human object in a specific location in the frame of the PTZ camera; determining the polar coordinates of the human object; determining an altitude of the human object.
  • the method further comprises obtaining a panoramic map of the detected PTZ frame and identifying a point of the panoramic map closest to the detected human object, according to the determined polar coordinates. In some cases, the method further comprises determining a matching point in the panoramic map closest to the specific pixel.
  • determining the altitude of the object after obtaining a bounding box of human object and selecting two pixels of the bounding box.
  • the two pixels are a top pixel having top coordinates (x, y1) and a bottom pixel having bottom coordinates (x, y2), defining x as the horizontal middle of the bounding box.
  • the method further comprises converting the top pixel and the bottom pixel into polar coordinates, thereby obtaining a polar top pixel ( ⁇ 1, ⁇ 1) and a polar bottom pixel ( ⁇ 2, ⁇ 2).
  • the method further comprises determining the object's altitude in a specific point after determining a Y-coordinate of the second ray pixel. In some cases, creating the second scene model comprises determining HOG matching scores for all pixels of the image of the scene. In some cases, the method further comprises obtaining an object's location on the frame in pixels. In some cases, the method further comprises converting the human object's location in pixels into polar coordinates on a panoramic map.
  • the method further comprises obtaining Panoramic HOG point associated to the polar coordinates of the pixels in which the human object is located.
  • the second scene model of the scene comprises a Panoramic HOG map.
  • FIG. 1 shows a method for terrain map setup, according to some exemplary embodiments of the subject matter
  • FIG. 2 shows a method for terrain map learning, according to exemplary embodiments of the disclosed subject matter
  • FIG. 3 shows a method for determining the height in pixels of a typical human, according to exemplary embodiments of the disclosed subject matter.
  • FIG. 4 shows a method for learning a HoG map, according to exemplary embodiments of the disclosed subject matter
  • FIG. 5 shows a method for reducing false-positive decisions using the HoG map, according to exemplary embodiments of the disclosed subject matter
  • FIG. 6 shows a method detecting a human object in a PTZ image, according to exemplary embodiments of the disclosed subject matter
  • FIG. 7 shows a panoramic map reflected from a non-panoramic map, according to exemplary embodiments of the disclosed subject matter
  • FIGS. 8A-8B illustrate a method of converting pixel in a frame to polar coordinates
  • FIG. 9 shows a system, using fixed and PTZ cameras, for implementing the method for detecting a human object in an image, according to exemplary embodiments of the disclosed subject matter.
  • the disclosed subject matter provides for a method for human detection.
  • the method comprises detecting a human object by a camera, for example a fixed camera.
  • the fixed camera transmits data related to the human object to a PTZ camera.
  • data may be 2D frame position, size and speed.
  • the PTZ camera translates the data related to the human object to 3D world coordinates.
  • the PTZ camera determines an optimal pose T 1 in terms of pan/tilt/zoom.
  • the optimal pose is defined as a pose in which the object's predicted location is in the frame center, with moderate predefined zoom.
  • the learned terrain panoramic map helps to calculate more accurately the optimal pose of the PTZ camera.
  • the exact object's location in the frame captured by the PTZ camera is unknown and the camera's jump to the optimal pose T 1 may take a few seconds, then the predicted object's location is not exact. Practically the object may be located at any part of the PTZ frame. Therefore, human detection mechanism is necessary.
  • the intrusion detection is performed in a fixed (static) camera, where many advanced tools minimizing the false alarm rate may work only on fixed camera. However, the alarm only is not enough: the zoomed visualization of the suspected intruder is required. PTZ camera may jump to the appropriate zoomed pose, to detect the intruder in the PTZ frame and to track it as long as possible, such that it appears with sufficient zoom.
  • One technical problem addressed by the subject matter is to detect a human object in the frame captured by the PTZ camera.
  • the technical solution provides for detection that uses a human detection algorithm calculating a match between HOG features on the frame against a predefined HOG human model.
  • the technical solution of the disclosed subject matter provides for reducing the probability of false detections and improving the detection speed, given a candidate at some location in the frame captured by the PTZ camera.
  • the technical solution determines and uses two parameters:
  • the method for detecting a human object utilizes two panoramic maps, one panoramic map for each parameter:
  • the two panoramic maps may be updated automatically after every detection session.
  • the method includes determining the scene geometry and terrain.
  • the method obtains the panoramic altitudes map in which every point of the panoramic altitudes map contains the altitude of the appropriate 3D world point on the scene.
  • the method comprises generating a 3D mesh representing the scene's terrain according to the panoramic map points with altitudes.
  • the scene terrain is refined when the panoramic altitudes map is updated after every successful human detection session.
  • the method also comprises determining frame perspective of the PTZ camera after the jump.
  • the frame perspective represents determining the size of the human object in pixels at any place on the frame. Such determination is made according to the calculated scene geometry, as disclosed below.
  • the method also comprises obtaining a map of Histogram of Gradients (HOG) features that are stored in the map after every detection session.
  • the human detection on the PTZ camera is performed based on the Histogram of Gradients (HOG) features.
  • a HOG score calculated in some regions of the frame indicates the similarity of the region with a human object, using a predefined human HOG model.
  • the method of the disclosed subject matter uses the Panoramic HOG (PHOG) map that represents “False Humans” panoramic map that was learned based on the HOG feedback from previous detection sessions.
  • the PHOG map learns the scene, stores all the locations or areas on the scene that are similar to the human object based on a high HOG score.
  • the method provides significant decrease in a searching range for the human detection by using the calculated typical human size, based on the determined scene geometry disclosed above.
  • focal length refers here to a distance between the PTZ camera's optical center and the frame plane (or CCD). Focal length knowledge is equivalent to the knowledge of the field of view angle. Given the frame size in pixels, the value of the focal length f may be represented in pixels. The focal length of the PTZ camera in zoom-out is known and used to determine the focal length for any given PTZ pose with known zoom.
  • FIGS. 8A and 8B illustrate a method of converting a pixel in a frame to polar coordinates based on the current pan, tilt and zoom of the PTZ camera.
  • the panoramic polar coordinates are similar to two-dimensional geographic earth coordinates by latitude and longitude. All points lying on a ray starting from an origin, such as the camera's optical center, have the same polar coordinates. Therefore, the panoramic polar coordinates of a pixel p are identical to panoramic polar coordinates of any point projected on the frame plane at pixel p. For example, the point (0,0,1) has polar coordinates ( ⁇ , 0), where the longitude ⁇ is the camera's pan.
  • a 3D coordinates system is defined that depends only on the pan and determining 3D parameters of the plane containing the camera frame, as shown in FIG. 8A .
  • the 3D coordinates system is defined such that:
  • the origin is defined as the camera's optical center
  • Y axis is defined as a vertical line (PTZ panning axis), X and Z axes are horizontal;
  • X axis is parallel to the frame plane, i.e. the frame plane is perpendicular to the plane YZ.
  • the plane Q defines the frame plane of the PTZ camera, when the PTZ camera's tilt is zero.
  • plane P be the rotation of the plane Q around the X axis by the angle ⁇ , as the angle ⁇ defines the PTZ camera's tilt.
  • point P (x p , y p , z p ) be the rotation of the point Q around the X axis by the angle ⁇ .
  • R be the projection of P on the plane XZ as shown in FIG. 8B .
  • sqrt(x p 2 +z p 2 ).
  • the pixel p is converted to polar coordinates and defined by a vertical polar coordinate and a horizontal polar coordinate ( ⁇ p , ⁇ p ).
  • the inverted conversion is performed as follows:
  • P is a world point on a ray connecting P with the origin O.
  • P has a form z ⁇ v, where v is a known 3D vector.
  • FIGS. 1 and 8C show a method for calculating a terrain's altitude for the given object on the frame captured by the PTZ camera.
  • Step 115 discloses obtaining the position of the PTZ camera.
  • the PTZ camera comprises three PTZ parameters—pan, tilt and zoom.
  • the above three PTZ parameters may be obtained by querying the PTZ camera and a receipt of a message from the PTZ camera.
  • Step 152 discloses converting the top pixel t and the bottom pixel b to polar coordinates ( ⁇ 1 , ⁇ 1 ) and ( ⁇ 2 , ⁇ 2 ). The conversion process is described in details above.
  • Step 154 discloses converting the polar coordinates of the top pixel t and the bottom pixel b into rays in 3D world coordinates zv 1 and zv 2 , respectively.
  • the conversion process is described in details above.
  • step 160 discloses determining the given object's altitude, according to the given object's size and location.
  • the given object's altitude is equal to the altitude of the terrain at the object's location, i.e. the altitude of the object's bottom point3D in 3D world coordinates P 2 , or the Y-coordinate of P 2 .
  • R be the projection of P 2 on the plane XZ, i.e. the Y component is 0.
  • the requested altitude is equal to
  • the required altitude in meters is
  • tan ⁇ 2 1.8 tan ⁇ 2 /(tan ⁇ 2 ⁇ tan ⁇ 1 ), where ⁇ 1 and ⁇ 2 are the tilt component of the polar coordinates found in the step 152 of the top pixel t and the bottom pixel b of the given object in the bounding box.
  • the terrain map contains points identified by their polar coordinates.
  • the points identified by polar coordinates correspond to points on the 3D world scene. Since every point in the panoramic altitudes map is associated with a known altitude as disclosed above, a 3D mesh of the terrain may be built. The more points the panoramic altitudes map contains, the more points the 3D mesh contains, and more accurate description is obtained on the terrain.
  • the accurate description of the terrain is required especially for objects located far from the camera, because small error in altitude estimation is translated to a large error in object size in pixels, which causes poor results in human detection. In addition, coarser estimation of typical human size results in trying more candidates during the human detection, which increases both error is probability and CPU consumption.
  • the method After obtaining a new bounding box of a human object at the terrain, the method provides for updating the panoramic altitudes map.
  • FIG. 2 describes the updating process.
  • Step 210 discloses finding the polar coordinates of the box's bottom p.
  • Step 220 discloses calculating the altitude h at point p.
  • Step 230 discloses finding the nearest point q on the map to p.
  • Step 240 discloses the case in which the points p and q are too close, where the method comprises updating the altitude of q:
  • V(q): (1 ⁇ )V(q)+ ⁇ h, where ⁇ is the learning speed, for example equals 0.05.
  • the term “too close” may define a case in which the distance between the points p and q is lower than a predefined threshold.
  • a user draws one bounding box of human object on the frame for different PTZ camera positions. For each drawn bounding box, the panoramic altitudes map is updated as disclosed above.
  • one box is satisfying for nearly planar scene.
  • the scene model is initialized to a horizontal plane.
  • Some PTZ camera positions may have a long zoom that enables detecting the human objects when located far from the camera.
  • the bounding box of the human is picked. Then, the panoramic altitudes map is updated as disclosed above.
  • FIG. 3 shows a method for determining the height in pixels of a typical human object at a given PTZ pose and at a given pixel, according to exemplary embodiments of the disclosed subject matter.
  • the method comprises obtaining the terrain map of the scene detected by the PTZ camera.
  • the method comprises obtaining the pose (pan, tilt, zoom) of the PTZ camera.
  • the method comprises obtaining the location of the pixel p on the PTZ frame.
  • the method comprises determining frame plane parameters in the 3D coordinates system according to the PTZ camera pose obtained in step 305 .
  • the frame plane is defined as follows: the normal of the frame plane is perpendicular to X axis and has angle ⁇ with Z axis, the distance of the frame plane from the origin is f (the focal length). In an exemplary manner, 1 pixel on the frame is equivalent to 1 meter in the scene.
  • Step 320 discloses determining the physical altitude of a world point matching to the given pixel p. Such determination may be performed by translating the location of the given pixel p to polar coordinates ( ⁇ , ⁇ ) and the vector v on the ray from the origin. Since the panoramic altitudes map is triangulated, the method obtains a triangle containing the polar point ( ⁇ , ⁇ ). By obtaining the altitudes of the vertices that assemble the triangle and performing interpolation between the vertices, is the method determines the altitude h at the given pixel p.
  • the method comprises determining the value of z, as the known altitude h equals to the Y-coordinate of P. This gives the 3D coordinates of P.
  • P 1 be the human object's top
  • the points P and P 1 have the same X and Z coordinates.
  • the Y-coordinate of P 1 is 1.8 m above the point P. This gives the 3D coordinates of P 1 .
  • Step 340 comprises determining intersection points of lines OP and OP 1 (O is the origin) with the frame plane determined in step 310 , i.e. the pixels p and p 1 are the projections of P and P 1 on the frame plane. Since the frame plane was constructed such that its distance from the origin is f and the value is represented in pixels, the distance between p and p 1 is also represented in pixels.
  • Step 350 discloses determining typical object size in pixels as the distance between the pixels p and p 1 .
  • the PHOG map contains points with polar coordinates: longitude & latitude. Any point in the PHOG map uniquely corresponds to a point on the 3D world scene. Similarly, any pixel on the frame captured by the PTZ camera at a given PTZ camera pose uniquely matches to one point on the map using polar coordinates.
  • HOG Heistogram of Gradients
  • features calculated on the scene region of a typical human size are compared to a predefined HOG model of a human object. Given a point p on the map, the HOG matching score is calculated on a frame rectangular segment, whose center is at the point p and having typical human size.
  • the frame rectangular segment is a bounding box of the potential human object on the frame.
  • Typical human size is based on the scene geometry or altitudes map, which is initialized roughly and refined after each human detection. At the initial stages, the scene geometry is rough and typical human size is not exact. In such initial stages, the HOG matches may be performed on a wider range of bounding box sizes.
  • FIG. 4 shows a method for learning and reducing false-positive decisions when detecting a human object using a PHOG map, according to exemplary embodiments of the disclosed subject matter.
  • a first threshold T 1 and a second threshold T 2 (T 1 >T 2 ) are stored in the system executing the method of the disclosed subject matter.
  • the following steps are performed after every jump of the PTZ camera to a new pose, as a result of an alarm, and a successful human detection.
  • Step 410 discloses determining HOG matching scores for all pixels of the frame, excluding pixels that belong to the detected human object.
  • H(P) denote the HOG matching score at P, where P is either a pixel on the frame or an existing point in the PHOG map.
  • Step 420 discloses picking of all pixels, whose HOG score is a local maximum on their predefined neighborhood, and whose HOG score is greater than the threshold T 1 . These pixels are candidates to be inserted to the PHOG map. After picking the candidates, the method discloses determining the polar coordinates of the chosen candidates. If there are candidates with too close coordinates, for example inside the same human bounding box, then the method chooses a candidate with a greater score.
  • the current PTZ frame defines a (nearly) rectangular area F in terms of polar coordinates.
  • Step 430 discloses picking all existing points on the PHOG map that lay inside F, excluding the detected human object.
  • Step 440 discloses finding all neighboring candidates that are located inside the human bounding box centered at P for each existing point P as picked in step 430 . Then, the method discloses choosing the neighboring candidate C with the highest score H(C).
  • the location of P in polar coordinates is updated towards C in a similar way.
  • Step 460 discloses inserting new points into the PHOG map: for any candidate C that did not have neighboring existing point, a new point is inserted to the PHOG map with initial value H(C).
  • Step 470 discloses deleting any existing point P from the map with an H(P) score lower than a predefined threshold T 2 .
  • FIG. 5 shows a method for detecting an object's similarity to a known model, according to exemplary embodiments of the disclosed subject matter.
  • Step 510 discloses obtaining the object's location on the frame in pixels.
  • Step 520 discloses converting the object's location in pixels into polar coordinates as disclosed above.
  • Step 530 discloses obtaining the closest PHOG point P to the object's center O.
  • H(P) is the PHOG value at P.
  • Let d be the distance between P and O.
  • Step 540 discloses obtaining a weight for the point P.
  • Step 550 discloses updating the weight of the current HOG score on the final decision of human detector according to W. For example, when the value W ⁇ H(P) is high, it means the HOG score of the object at O is less reliable. Therefore, the human detector will give a lower weight to HOG matching score, relatively to other tracking criteria, like background subtraction score, object's trajectory, object's speed, etc.
  • FIG. 6 shows a method detecting a human object in an image, according to exemplary embodiments of the disclosed subject matter.
  • Step 610 discloses manually marking of at least one bounding box of human object on the PTZ frame at any PTZ pose.
  • the bounding box surrounds a person residing on the scene viewed by the PTZ camera, and data related to the bounding box is stored and later used to determine the size in pixels of a typical object at different parts of the scene.
  • Step 615 discloses calculating an altitude of a typical human object for any human object marked on step 610 .
  • Step 620 discloses inserting the objects' polar coordinates and altitudes to the panoramic terrain map, thus creating a first scene terrain model with typical human object size in pixels.
  • the method comprises creating a second scene model using a panoramic HOG map with false likelihood.
  • the second scene model may include assigning a value for each of the segments of the scene, such that the value represents the similarity between the segment and a predefined human object model.
  • the model may be based on a HoG map.
  • the method of the subject matter further discloses obtaining an image of the scene.
  • the image of the scene may be captured by a standard video camera.
  • a PTZ camera may capture the image.
  • Step 630 discloses calculating an optimal PTZ jump pose for the PTZ camera, given a new alarm.
  • the calculation uses the updated terrain panoramic map.
  • the alarm may be activated by detecting an intruder by a fixed camera.
  • Step 635 discloses performing a HoG human detection on the frame captured by the PTZ camera and selecting all candidates. The candidates are points on the frame in which the intruder may be located.
  • Step 640 discloses determining an appropriate point on the terrain panoramic map and the panoramic HoG map for each candidate. The altitude of this point is obtained from the terrain map.
  • Step 650 discloses calculating typical human size in pixels according to the obtained altitude.
  • Step 660 discloses calculate the candidate's likelihood by comparing its size with a predefined typical object size and considering the Panoramic HoG map likelihood at the selected point.
  • Step 670 discloses a case in which the system performs a final human detection.
  • the method comprises updating the terrain map and panoramic HoG map by inserting new point(s) or updating existing point(s).
  • FIG. 7 shows a panoramic polar map reflected from a linear map, according to exemplary embodiments of the disclosed subject matter.
  • the linear map 710 shows a terrain with a complicated structure.
  • point 715 of the linear map 710 represents a relatively high terrain point.
  • the point 715 of the linear map 710 is also represented at panoramic polar map 720 , at point 712 .
  • the panoramic polar map 720 comprises many points, each represents a different terrain point.
  • the points of the panoramic polar map 720 are defined by longitude 704 and latitude 702 as appearing from a focal point 705 .
  • the focal point 705 represents the location of the camera.
  • the person When detecting a person at the panoramic polar map 720 , the person is detected at a specific terrain point, such as terrain point 742 .
  • FIG. 9 shows a system for detecting a human object in an image, using both fixed and PTZ cameras, according to exemplary embodiments of the disclosed subject matter.
  • the system comprises a fixed camera 905 that performs intrusion detection.
  • the fixed camera 905 raises an alarm, it updates the PTZ camera using communication channel 910 .
  • the system further comprises a PTZ camera 920 that receives frame coordinates of the intruder from the fixed camera 905 .
  • the PTZ camera 920 translates the fixed camera coordinates to 3D coordinates. Then it determines optimal pan tilt and zoom values, such that the object will be near the PTZ frame center and with appropriate zoom, in order to detect the object that caused the alarm.
  • the PTZ camera 920 communicates with a Panoramic altitudes Map unit 930 that determines the perspective of the PTZ camera 920 .
  • the Panoramic altitudes Map unit 930 provides the PTZ camera 920 with a typical human object size that is sent to a human detection module 942 .
  • the human detection module 942 comprises a HoG detector 945 for detecting a HoG matching value on the PTZ frame.
  • the human detection module 942 further comprises a Final Detector 940 that uses a background model, foreground model, HOG detector, clustering and object's trajectory in order to determine the final decision for the PTZ camera 920 .
  • a background model unit 935 provides the background model used by the final detector 940 .
  • the background model unit 935 communicates with the final detector 940 and stores data related to the background of the scene.
  • the HoG detector 945 communicates with a HoG feedback processing unit 950 , which receives data concerning the HOG scores on the frame and updates the PHOG map accordingly, which affects next human detection sessions.
  • the fixed camera 910 updates the PTZ camera 920 with new object's coordinates. If the human object was detected, the PTZ camera 920 continues tracking as shown in 965 , and the terrain feedback processing unit 970 updates the Terrain Map 930 with current human object size.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The subject matter discloses a method, comprising obtaining a scene from a video camera and defining at least one point of the scene; creating a first scene terrain model of the scene, said first scene terrain model comprises a typical human object size in pixels in said at least one point of the scene; creating a second scene model of the scene, said second model defines a false positive determination that said at least one point comprises a human object; wherein said second scene model is created when the scene does not contain a human object; obtaining an image of the scene, said image is captured by a video camera; determining whether the human object is detected at the at least one point of said captured image by applying said first scene terrain model and said second scene model on the least one point.

Description

FIELD OF THE INVENTION
The subject matter relates generally to PTZ cameras, and more specifically to surveillance using PTZ cameras.
BACKGROUND OF THE INVENTION
A main product in Video Analytics is PIDS (Perimeter Intrusion Detection System). Normally it includes one fixed video camera, which detects all suspected objects in its field of view (FOV), raises an alarm and tracks the suspected objects until they remain in the FOV.
However, there is a problem in trade-off between the FOV size and the zoom: either the camera sees only narrow region, or the objects are small and not recognizable. PTZ (pan/tilt/zoom) camera comes to solve this trade-off. PTZ camera has 3 degrees of freedom: it may move in two directions (vertical and horizontal) and to zoom-in/out.
There are two types of autonomous PTZ tracking solutions. In the first, the intrusion detection is performed in the PTZ camera (either static or scanning), that continues with tracking after detection. In the second, the intrusion detection is performed in a fixed camera, which triggers the PTZ camera.
The most sensitive part of the PTZ tracking is the object's initial “acquiring” or start of the tracking. Therefore, the existing solutions are less robust especially at this stage. Any moving object that appears in the frame may “catch” the PTZ camera. Even if there are no moving pixels in the frame other than the object, the object's “acquisition” fails frequently because of lack of the clean background model (without the object), especially if it moves toward the camera or goes far from the camera.
All existing human detection algorithms are not exact enough and not fast enough. On one hand, usage of the background model or motion detection as a filter for human detection may reduce the number of false detections to speed up the recognition. On the other hand, we don't have a clean background model. There is an assumption that the human has to move in order to be detected. A moving nuisance in the scene (trees, shadows, etc.), makes the background/motion even less useful. There is a technical need for additional tools for filtering non-relevant candidates of human detection algorithm.
SUMMARY
It is an object of the subject matter to disclose a method, comprising: obtaining a scene from a video camera and defining at least one point of the scene; creating a first scene terrain model of the scene, said first scene terrain model comprises a typical human object size in pixels in said at least one point of the scene;
creating a second scene model of the scene, said second model defines a false positive determination that said at least one point comprises a human object; wherein said second scene model is created when the scene does not contain a human object; obtaining an image of the scene, said image is captured by a video camera; determining whether the human object is detected at the at least one point of said captured image by applying said first scene terrain model and said second scene model on the least one point.
In some cases, the method further comprises obtaining a position of a PTZ camera, the position including values of pan, tilt and zoom; detecting a PTZ frame by the PTZ camera at the obtained PTZ position; obtaining a successful detection of the human object in a specific location in the frame of the PTZ camera; determining the polar coordinates of the human object; determining an altitude of the human object.
In some cases, the method further comprises obtaining a panoramic map of the detected PTZ frame and identifying a point of the panoramic map closest to the detected human object, according to the determined polar coordinates. In some cases, the method further comprises determining a matching point in the panoramic map closest to the specific pixel.
In some cases, determining the altitude of the object after obtaining a bounding box of human object, and selecting two pixels of the bounding box. In some cases, the two pixels are a top pixel having top coordinates (x, y1) and a bottom pixel having bottom coordinates (x, y2), defining x as the horizontal middle of the bounding box. In some cases, the method further comprises converting the top pixel and the bottom pixel into polar coordinates, thereby obtaining a polar top pixel (Π1, θ1) and a polar bottom pixel (Π2, θ2).
In some cases, the method further comprises converting the polar coordinates of top pixel and the bottom pixel into 3D world coordinates. In some cases, the method further comprises obtaining 2 rays starting from a 3D origin located at a first ray pixel P1=z1v1, and a second ray pixel P2=z2v2. In some cases, determining the altitude of a standing human, wherein a line connecting the first ray pixel and the second ray pixel is defined as vertical and parallel to Y-axis.
In some cases, the method further comprises determining the object's altitude in a specific point after determining a Y-coordinate of the second ray pixel. In some cases, creating the second scene model comprises determining HOG matching scores for all pixels of the image of the scene. In some cases, the method further comprises obtaining an object's location on the frame in pixels. In some cases, the method further comprises converting the human object's location in pixels into polar coordinates on a panoramic map.
In some cases, the method further comprises obtaining Panoramic HOG point associated to the polar coordinates of the pixels in which the human object is located.
In some cases, the second scene model of the scene comprises a Panoramic HOG map.
BRIEF DESCRIPTION OF THE DRAWINGS
Exemplary non-limited embodiments of the disclosed subject matter will be described, with reference to the following description of the embodiments, in conjunction with the figures. The figures are generally not shown to scale and any sizes are only meant to be exemplary and not necessarily limiting. Corresponding or like elements are optionally designated by the same numerals or letters.
FIG. 1 shows a method for terrain map setup, according to some exemplary embodiments of the subject matter;
FIG. 2 shows a method for terrain map learning, according to exemplary embodiments of the disclosed subject matter;
FIG. 3 shows a method for determining the height in pixels of a typical human, according to exemplary embodiments of the disclosed subject matter; and,
FIG. 4 shows a method for learning a HoG map, according to exemplary embodiments of the disclosed subject matter;
FIG. 5 shows a method for reducing false-positive decisions using the HoG map, according to exemplary embodiments of the disclosed subject matter;
FIG. 6 shows a method detecting a human object in a PTZ image, according to exemplary embodiments of the disclosed subject matter;
FIG. 7 shows a panoramic map reflected from a non-panoramic map, according to exemplary embodiments of the disclosed subject matter;
FIGS. 8A-8B illustrate a method of converting pixel in a frame to polar coordinates; and,
FIG. 9 shows a system, using fixed and PTZ cameras, for implementing the method for detecting a human object in an image, according to exemplary embodiments of the disclosed subject matter.
DETAILED DESCRIPTION
The disclosed subject matter provides for a method for human detection. The method comprises detecting a human object by a camera, for example a fixed camera. Then, the fixed camera transmits data related to the human object to a PTZ camera. Such data may be 2D frame position, size and speed. Then, the PTZ camera translates the data related to the human object to 3D world coordinates. The PTZ camera then determines an optimal pose T1 in terms of pan/tilt/zoom. The optimal pose is defined as a pose in which the object's predicted location is in the frame center, with moderate predefined zoom. The learned terrain panoramic map helps to calculate more accurately the optimal pose of the PTZ camera. Since the exact object's location in the frame captured by the PTZ camera is unknown and the camera's jump to the optimal pose T1 may take a few seconds, then the predicted object's location is not exact. Practically the object may be located at any part of the PTZ frame. Therefore, human detection mechanism is necessary.
Motivation
The intrusion detection is performed in a fixed (static) camera, where many advanced tools minimizing the false alarm rate may work only on fixed camera. However, the alarm only is not enough: the zoomed visualization of the suspected intruder is required. PTZ camera may jump to the appropriate zoomed pose, to detect the intruder in the PTZ frame and to track it as long as possible, such that it appears with sufficient zoom. One technical problem addressed by the subject matter is to detect a human object in the frame captured by the PTZ camera. The technical solution provides for detection that uses a human detection algorithm calculating a match between HOG features on the frame against a predefined HOG human model.
Challenges
There are two challenges in the technical solution disclosed above: (a) the human size in pixels in various parts of the scene is unknown, (b) the scene detected by the camera usually contains several places that have a high matching score when compared with a predefined human HOG model. The unknown size of the human object in various parts of the scene causes both a high CPU consumption and much higher false detection probability.
The technical solution of the disclosed subject matter provides for reducing the probability of false detections and improving the detection speed, given a candidate at some location in the frame captured by the PTZ camera. The technical solution determines and uses two parameters:
a. Typical human size (in pixels) at the given location
b. A-priori probability of false detection at the given location.
These two parameters may be determined easily in a static camera, but they are much more challenging when performed by the PTZ camera.
The method for detecting a human object according to the disclosed subject matter utilizes two panoramic maps, one panoramic map for each parameter:
  • a. The first panoramic map is a Panoramic altitudes map that describes the scene 3D terrain with altitudes. The Panoramic altitudes map enables to determine a typical human size in pixels at any location on the frame for any PTZ position.
  • b. The second panoramic map is a Panoramic HOG (PHOG) map that describes the similarity of different areas in the scene to the predefined human HOG model, at any location on the frame for any PTZ position.
The two panoramic maps may be updated automatically after every detection session.
The method includes determining the scene geometry and terrain. The method obtains the panoramic altitudes map in which every point of the panoramic altitudes map contains the altitude of the appropriate 3D world point on the scene. Then, the method comprises generating a 3D mesh representing the scene's terrain according to the panoramic map points with altitudes.
The scene terrain is refined when the panoramic altitudes map is updated after every successful human detection session.
The method also comprises determining frame perspective of the PTZ camera after the jump. The frame perspective represents determining the size of the human object in pixels at any place on the frame. Such determination is made according to the calculated scene geometry, as disclosed below.
The method also comprises obtaining a map of Histogram of Gradients (HOG) features that are stored in the map after every detection session. The human detection on the PTZ camera is performed based on the Histogram of Gradients (HOG) features. A HOG score calculated in some regions of the frame indicates the similarity of the region with a human object, using a predefined human HOG model. In order to decrease the number of false detections, the method of the disclosed subject matter uses the Panoramic HOG (PHOG) map that represents “False Humans” panoramic map that was learned based on the HOG feedback from previous detection sessions. The PHOG map learns the scene, stores all the locations or areas on the scene that are similar to the human object based on a high HOG score. Such areas having a high HOG score are more likely to mistakenly detect a scene as having a human objects. In addition, in order to decrease the number of false detections and the CPU time, the method provides significant decrease in a searching range for the human detection by using the calculated typical human size, based on the determined scene geometry disclosed above.
Focal Length
The term focal length refers here to a distance between the PTZ camera's optical center and the frame plane (or CCD). Focal length knowledge is equivalent to the knowledge of the field of view angle. Given the frame size in pixels, the value of the focal length f may be represented in pixels. The focal length of the PTZ camera in zoom-out is known and used to determine the focal length for any given PTZ pose with known zoom.
Converting a 2D Frame Coordinates to 2D Polar Coordinates
FIGS. 8A and 8B illustrate a method of converting a pixel in a frame to polar coordinates based on the current pan, tilt and zoom of the PTZ camera. The panoramic polar coordinates are similar to two-dimensional geographic earth coordinates by latitude and longitude. All points lying on a ray starting from an origin, such as the camera's optical center, have the same polar coordinates. Therefore, the panoramic polar coordinates of a pixel p are identical to panoramic polar coordinates of any point projected on the frame plane at pixel p. For example, the point (0,0,1) has polar coordinates (Π, 0), where the longitude Π is the camera's pan.
Given the pan, tilt and zoom, a 3D coordinates system is defined that depends only on the pan and determining 3D parameters of the plane containing the camera frame, as shown in FIG. 8A. Given the PTZ camera's pan, the 3D coordinates system is defined such that:
The origin is defined as the camera's optical center,
Y axis is defined as a vertical line (PTZ panning axis), X and Z axes are horizontal;
X axis is parallel to the frame plane, i.e. the frame plane is perpendicular to the plane YZ.
The view direction of the viewer in FIG. 8A is parallel to the X axis, such that the X axis is invisible. FIG. 8A describes the calculation of the plane containing the camera frame in the 3D coordinates system. Given a pixel p=(x,y) on the frame, a 3D point Q=(x, y, f) is defined as a point lying on the plane Q, said plane Q is built perpendicular to the Z axis and containing the point (0,0,f).
The plane Q defines the frame plane of the PTZ camera, when the PTZ camera's tilt is zero. Let plane P be the rotation of the plane Q around the X axis by the angle θ, as the angle θ defines the PTZ camera's tilt. Let point P=(xp, yp, zp) be the rotation of the point Q around the X axis by the angle θ. The point P was generated such that it lies on the frame plane P and coincides with the 3D location of the pixel p=(x,y), also lying on the frame plane P.
Let R be the projection of P on the plane XZ as shown in FIG. 8B. Define |OR|=sqrt(xp 2+zp 2). The vertical polar coordinate (latitude) of the pixel p is θp=arc tan(yp/|OR|). The horizontal polar coordinate (longitude) of the pixel p is Πp=Π+arc tan(xp/zp), where Π is the camera's pan angle. As a result, the pixel p is converted to polar coordinates and defined by a vertical polar coordinate and a horizontal polar coordinate (Πp, θp).
Converting Polar Coordinates to Rays in 3D World Coordinates
The inverted conversion is performed as follows:
  • Let P=(Π, θ) be a point in polar coordinates (FIG. 8B).
  • For simplicity, suppose zp=1.
  • Let R be the projection of P on the plain XZ. R=(tan Π, 0, 1). |OR|=sqrt(1+tan2 Π).
    P=z·(tan Π,|OR|tan θ,1)=z·(tan Π,sqrt(1+tan2 Π)tan θ,1),
where z is any positive real number. P is a world point on a ray connecting P with the origin O. Finally, P has a form z·v, where v is a known 3D vector.
Given an Object on the Frame, Calculating its Altitude on the 3D Scene
FIGS. 1 and 8C show a method for calculating a terrain's altitude for the given object on the frame captured by the PTZ camera.
Step 115 discloses obtaining the position of the PTZ camera. The PTZ camera comprises three PTZ parameters—pan, tilt and zoom. The above three PTZ parameters may be obtained by querying the PTZ camera and a receipt of a message from the PTZ camera.
Step 130 discloses obtaining a bounding box of the given object on the frame captured by the PTZ camera. Two pixels are picked to define the bounding box: a top pixel t=(x, y1) and a bottom pixel b=(x, y2), as x is the horizontal middle of the bounding box.
Step 152 discloses converting the top pixel t and the bottom pixel b to polar coordinates (Π1, θ1) and (Π2, θ2). The conversion process is described in details above.
Step 154 discloses converting the polar coordinates of the top pixel t and the bottom pixel b into rays in 3D world coordinates zv1 and zv2, respectively. The conversion process is described in details above. Let P1 and P2 be the 3D world coordinates of the object's top pixel and bottom pixel, lying on these rays, i.e. P1=z1v1, P2=z2v2.
After the inverted conversion, step 160 discloses determining the given object's altitude, according to the given object's size and location. The given object's altitude is equal to the altitude of the terrain at the object's location, i.e. the altitude of the object's bottom point3D in 3D world coordinates P2, or the Y-coordinate of P2.
Let R be the projection of P2 on the plane XZ, i.e. the Y component is 0. The requested altitude is equal to |RP2|, as shown in FIG. 8C.
In case of determining the altitude of a standing human object, the 3D object is vertical. Thus, a line connecting P1 and P2 is defined as vertical and parallel to Y-axis. The assumption is that a typical human height is 1.8 meters. Since |RP1|=|OR|tan θ1 and |RP2|=|OR|tan θ2, the following equation takes place:
|OR|tan θ2 −|OR|tan θ1=1.8,
|OR|=1.8/(tan θ2−tan θ1).
As a result, the required altitude in meters is |RP2|=|OR|tan θ2=1.8 tan θ2/(tan θ2−tan θ1), where θ1 and θ2 are the tilt component of the polar coordinates found in the step 152 of the top pixel t and the bottom pixel b of the given object in the bounding box.
Associating the Panoramic Altitudes Map, the Scene Geometry and Human Detection
At any stage, the terrain map contains points identified by their polar coordinates. The points identified by polar coordinates correspond to points on the 3D world scene. Since every point in the panoramic altitudes map is associated with a known altitude as disclosed above, a 3D mesh of the terrain may be built. The more points the panoramic altitudes map contains, the more points the 3D mesh contains, and more accurate description is obtained on the terrain. The accurate description of the terrain is required especially for objects located far from the camera, because small error in altitude estimation is translated to a large error in object size in pixels, which causes poor results in human detection. In addition, coarser estimation of typical human size results in trying more candidates during the human detection, which increases both error is probability and CPU consumption.
Updating the Panoramic Altitudes Map
After obtaining a new bounding box of a human object at the terrain, the method provides for updating the panoramic altitudes map.
FIG. 2 describes the updating process. Step 210 discloses finding the polar coordinates of the box's bottom p. Step 220 discloses calculating the altitude h at point p. Step 230 discloses finding the nearest point q on the map to p. Step 240 discloses the case in which the points p and q are too close, where the method comprises updating the altitude of q:
V(q):=(1−λ)V(q)+λh, where λ is the learning speed, for example equals 0.05. The term “too close” may define a case in which the distance between the points p and q is lower than a predefined threshold.
Step 250 discloses the case in which p and q are not too close, where the method comprises adding the point p to the panoramic altitudes map with V(p):=h. A new 3D terrain point was added, and the method performs triangulation on all terrain points, to achieve an updated 3D triangular mesh.
Initializing the Scene/Terrain Geometry—Setup Stage
A user draws one bounding box of human object on the frame for different PTZ camera positions. For each drawn bounding box, the panoramic altitudes map is updated as disclosed above.
In some cases, one box is satisfying for nearly planar scene. In some other cases, the scene model is initialized to a horizontal plane.
Some PTZ camera positions may have a long zoom that enables detecting the human objects when located far from the camera.
Updating the Terrain Geometry after Every Human Detection
After every successful human detection, the bounding box of the human is picked. Then, the panoramic altitudes map is updated as disclosed above.
Calculating Human Height in Pixels at any PTZ Position
FIG. 3 shows a method for determining the height in pixels of a typical human object at a given PTZ pose and at a given pixel, according to exemplary embodiments of the disclosed subject matter. In step 303, the method comprises obtaining the terrain map of the scene detected by the PTZ camera. In step 305, the method comprises obtaining the pose (pan, tilt, zoom) of the PTZ camera. In step 308, the method comprises obtaining the location of the pixel p on the PTZ frame.
In step 310, the method comprises determining frame plane parameters in the 3D coordinates system according to the PTZ camera pose obtained in step 305. The frame plane is defined as follows: the normal of the frame plane is perpendicular to X axis and has angle θ with Z axis, the distance of the frame plane from the origin is f (the focal length). In an exemplary manner, 1 pixel on the frame is equivalent to 1 meter in the scene.
Step 320 discloses determining the physical altitude of a world point matching to the given pixel p. Such determination may be performed by translating the location of the given pixel p to polar coordinates (Π, θ) and the vector v on the ray from the origin. Since the panoramic altitudes map is triangulated, the method obtains a triangle containing the polar point (Π, θ). By obtaining the altitudes of the vertices that assemble the triangle and performing interpolation between the vertices, is the method determines the altitude h at the given pixel p.
In step 325, the method comprises translating the polar coordinates (Π, θ) of the given pixel to 3D world coordinates, P=zv, as the constant z is unknown. In step 330, the method comprises determining the value of z, as the known altitude h equals to the Y-coordinate of P. This gives the 3D coordinates of P. Let P1 be the human object's top The points P and P1 have the same X and Z coordinates. The Y-coordinate of P1 is 1.8 m above the point P. This gives the 3D coordinates of P1.
Step 340 comprises determining intersection points of lines OP and OP1 (O is the origin) with the frame plane determined in step 310, i.e. the pixels p and p1 are the projections of P and P1 on the frame plane. Since the frame plane was constructed such that its distance from the origin is f and the value is represented in pixels, the distance between p and p1 is also represented in pixels.
Step 350 discloses determining typical object size in pixels as the distance between the pixels p and p1.
Panoramic HOG (PHOG) Map Definition
The PHOG map contains points with polar coordinates: longitude & latitude. Any point in the PHOG map uniquely corresponds to a point on the 3D world scene. Similarly, any pixel on the frame captured by the PTZ camera at a given PTZ camera pose uniquely matches to one point on the map using polar coordinates. HOG (Histogram of Gradients) features calculated on the scene region of a typical human size are compared to a predefined HOG model of a human object. Given a point p on the map, the HOG matching score is calculated on a frame rectangular segment, whose center is at the point p and having typical human size. The frame rectangular segment is a bounding box of the potential human object on the frame.
Typical human size is based on the scene geometry or altitudes map, which is initialized roughly and refined after each human detection. At the initial stages, the scene geometry is rough and typical human size is not exact. In such initial stages, the HOG matches may be performed on a wider range of bounding box sizes.
Learning of PHOG Map
FIG. 4 shows a method for learning and reducing false-positive decisions when detecting a human object using a PHOG map, according to exemplary embodiments of the disclosed subject matter. A first threshold T1 and a second threshold T2 (T1>T2) are stored in the system executing the method of the disclosed subject matter.
The following steps are performed after every jump of the PTZ camera to a new pose, as a result of an alarm, and a successful human detection.
Step 410 discloses determining HOG matching scores for all pixels of the frame, excluding pixels that belong to the detected human object. Let H(P) denote the HOG matching score at P, where P is either a pixel on the frame or an existing point in the PHOG map.
Step 420 discloses picking of all pixels, whose HOG score is a local maximum on their predefined neighborhood, and whose HOG score is greater than the threshold T1. These pixels are candidates to be inserted to the PHOG map. After picking the candidates, the method discloses determining the polar coordinates of the chosen candidates. If there are candidates with too close coordinates, for example inside the same human bounding box, then the method chooses a candidate with a greater score.
The current PTZ frame defines a (nearly) rectangular area F in terms of polar coordinates. Step 430 discloses picking all existing points on the PHOG map that lay inside F, excluding the detected human object. Step 440 discloses finding all neighboring candidates that are located inside the human bounding box centered at P for each existing point P as picked in step 430. Then, the method discloses choosing the neighboring candidate C with the highest score H(C). H(C) represents the HOG matching score at point C on the frame. If C was not found, denote H(C)=0.
Step 450 discloses updating the H(P) on the map, the update may be according to the following formula: H(P):=(1−λ)H(P)+λH(C), where λ=0.05 is learning speed. The location of P in polar coordinates is updated towards C in a similar way. Step 460 discloses inserting new points into the PHOG map: for any candidate C that did not have neighboring existing point, a new point is inserted to the PHOG map with initial value H(C). Step 470 discloses deleting any existing point P from the map with an H(P) score lower than a predefined threshold T2.
Usage of the PHOG Map
FIG. 5 shows a method for detecting an object's similarity to a known model, according to exemplary embodiments of the disclosed subject matter. Step 510 discloses obtaining the object's location on the frame in pixels. Step 520 discloses converting the object's location in pixels into polar coordinates as disclosed above. Step 530 discloses obtaining the closest PHOG point P to the object's center O. H(P) is the PHOG value at P. Let d be the distance between P and O. Step 540 discloses obtaining a weight for the point P. The weight may be defined by the following formula, W=e−βd/s, where e is exponent, β is a constant and s is a typical human size at P. W expresses the impact of P on the point O, based on the relative distance between P and O. Step 550 discloses updating the weight of the current HOG score on the final decision of human detector according to W. For example, when the value W·H(P) is high, it means the HOG score of the object at O is less reliable. Therefore, the human detector will give a lower weight to HOG matching score, relatively to other tracking criteria, like background subtraction score, object's trajectory, object's speed, etc.
FIG. 6 shows a method detecting a human object in an image, according to exemplary embodiments of the disclosed subject matter.
Setup Stage:
Step 610 discloses manually marking of at least one bounding box of human object on the PTZ frame at any PTZ pose. The bounding box surrounds a person residing on the scene viewed by the PTZ camera, and data related to the bounding box is stored and later used to determine the size in pixels of a typical object at different parts of the scene. Step 615 discloses calculating an altitude of a typical human object for any human object marked on step 610. Step 620 discloses inserting the objects' polar coordinates and altitudes to the panoramic terrain map, thus creating a first scene terrain model with typical human object size in pixels.
The method comprises creating a second scene model using a panoramic HOG map with false likelihood. The second scene model may include assigning a value for each of the segments of the scene, such that the value represents the similarity between the segment and a predefined human object model. The model may be based on a HoG map.
Real Time:
The method of the subject matter further discloses obtaining an image of the scene. The image of the scene may be captured by a standard video camera. A PTZ camera may capture the image.
Step 630 discloses calculating an optimal PTZ jump pose for the PTZ camera, given a new alarm. The calculation uses the updated terrain panoramic map. The alarm may be activated by detecting an intruder by a fixed camera. Step 635 discloses performing a HoG human detection on the frame captured by the PTZ camera and selecting all candidates. The candidates are points on the frame in which the intruder may be located.
Step 640 discloses determining an appropriate point on the terrain panoramic map and the panoramic HoG map for each candidate. The altitude of this point is obtained from the terrain map.
Step 650 discloses calculating typical human size in pixels according to the obtained altitude.
Step 660 discloses calculate the candidate's likelihood by comparing its size with a predefined typical object size and considering the Panoramic HoG map likelihood at the selected point.
Step 670 discloses a case in which the system performs a final human detection. In such case, the method comprises updating the terrain map and panoramic HoG map by inserting new point(s) or updating existing point(s).
Example of Panoramic Map and Polar Coordinates
FIG. 7 shows a panoramic polar map reflected from a linear map, according to exemplary embodiments of the disclosed subject matter. The linear map 710 shows a terrain with a complicated structure. For example, point 715 of the linear map 710 represents a relatively high terrain point. The point 715 of the linear map 710 is also represented at panoramic polar map 720, at point 712. The panoramic polar map 720 comprises many points, each represents a different terrain point. The points of the panoramic polar map 720 are defined by longitude 704 and latitude 702 as appearing from a focal point 705. The focal point 705 represents the location of the camera.
When detecting a person at the panoramic polar map 720, the person is detected at a specific terrain point, such as terrain point 742.
FIG. 9 shows a system for detecting a human object in an image, using both fixed and PTZ cameras, according to exemplary embodiments of the disclosed subject matter.
The system comprises a fixed camera 905 that performs intrusion detection. When the fixed camera 905 raises an alarm, it updates the PTZ camera using communication channel 910. The system further comprises a PTZ camera 920 that receives frame coordinates of the intruder from the fixed camera 905. The PTZ camera 920 translates the fixed camera coordinates to 3D coordinates. Then it determines optimal pan tilt and zoom values, such that the object will be near the PTZ frame center and with appropriate zoom, in order to detect the object that caused the alarm.
The PTZ camera 920 communicates with a Panoramic altitudes Map unit 930 that determines the perspective of the PTZ camera 920. The Panoramic altitudes Map unit 930 provides the PTZ camera 920 with a typical human object size that is sent to a human detection module 942.
The human detection module 942 comprises a HoG detector 945 for detecting a HoG matching value on the PTZ frame.
The human detection module 942 further comprises a Final Detector 940 that uses a background model, foreground model, HOG detector, clustering and object's trajectory in order to determine the final decision for the PTZ camera 920. A background model unit 935 provides the background model used by the final detector 940. The background model unit 935 communicates with the final detector 940 and stores data related to the background of the scene.
The HoG detector 945 communicates with a HoG feedback processing unit 950, which receives data concerning the HOG scores on the frame and updates the PHOG map accordingly, which affects next human detection sessions.
After the human detection finishes calculations, it is determined whether the human object was detected or not. If the human object was not detected, the fixed camera 910 updates the PTZ camera 920 with new object's coordinates. If the human object was detected, the PTZ camera 920 continues tracking as shown in 965, and the terrain feedback processing unit 970 updates the Terrain Map 930 with current human object size.
While the disclosure has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings without departing from the essential scope thereof. Therefore, it is intended that the disclosed subject matter not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but only by the claims that follow.

Claims (16)

What is claimed is:
1. A method, comprising:
obtaining a scene from a video camera and defining at least one point of the scene;
creating a first scene terrain model of the scene, said first scene terrain model comprises a human object size in pixels in said at least one point of the scene;
creating a second scene model of the scene, said second model defines a false positive determination that said at least one point comprises a human object;
wherein said second scene model is created when the scene does not contain a human object;
obtaining an image of the scene, said image is captured by a video camera;
determining whether the human object is detected at the at least one point of said captured image by applying said first scene terrain model and said second scene model on the at least one point;
wherein the second scene model of the scene comprises a Panoramic HOG map;
obtaining a position of a PTZ camera, the position including values of pan, tilt and zoom;
detecting a PTZ frame by the PTZ camera at the obtained PTZ position;
obtaining a successful detection of the human object in a specific location in the frame of the PTZ camera;
determining the polar coordinates of the human object;
determining an altitude of the human object; and
obtaining a panoramic map of the detected PTZ frame and identifying a point of the panoramic map closest to the detected human object, according to the determined polar coordinates.
2. The method according to claim 1, further comprises determining a matching point in the panoramic map closest to the specific pixel.
3. The method according to claim 1, wherein determining the altitude of the object after obtaining a bounding box of human object, and selecting two pixels of the bounding box.
4. The method according to claim 3, wherein the two pixels are a top pixel having top coordinates (x, y1) and a bottom pixel having bottom coordinates (x, y2), defining x as the horizontal middle of the bounding box.
5. The method according to claim 4, further comprises converting the top pixel and the bottom pixel into polar coordinates, thereby obtaining a polar top pixel (Π1,θ1) and a polar bottom pixel (Π2, θ2).
6. The method according to claim 4, further comprises converting the polar coordinates of top pixel and the bottom pixel into 3D world coordinates.
7. The method according to claim 4, further comprises obtaining 2 rays starting from a 3D origin located at a first ray pixel P1=z1v1, and a second ray pixel P2=z2v2.
8. The method according to claim 7, wherein determining the altitude of a standing human, wherein a line connecting the first ray pixel and the second ray pixel is defined as vertical and parallel to Y-axis.
9. The method according to claim 7, further comprises determining the object's altitude in a specific point after determining a Y-coordinate of the second ray pixel.
10. The method according to claim 1, wherein creating the second scene model comprises determining HOG matching scores for all pixels of the image of the scene.
11. The method according to claim 10, further comprises obtaining an object's location on the frame in pixels.
12. The method according to claim 11, further comprises converting the human object's location in pixels into polar coordinates on a panoramic map.
13. The method according to claim 10, further comprises obtaining Panoramic HOG point associated to the polar coordinates of the pixels in which the human object is located.
14. The method of claim 1, wherein the human object size is determined according to scene geometry and altitude maps.
15. A method, comprising:
obtaining a scene from a video camera and defining at least one point of the scene;
creating a first scene terrain model of the scene, said first scene terrain model comprises a human object size in pixels in said at least one point of the scene;
creating a second scene model of the scene, said second model defines a false positive determination that said at least one point comprises a human object; wherein said
second scene model is created when the scene does not contain a human object;
obtaining an image of the scene, said image is captured by a video camera;
determining whether the human object is detected at the at least one point of said captured image by applying said first scene terrain model and said second scene model on the at least one point;
obtaining a position of a PTZ camera, the position including values of pan, tilt and zoom;
detecting a PTZ frame by the PTZ camera at the obtained PTZ position;
obtaining a successful detection of the human object in a specific location in the frame of the PTZ camera;
determining the polar coordinates of the human object;
determining an altitude of the human object;
obtaining a panoramic map of the detected PTZ frame and identifying a point of the panoramic map closest to the detected human object, according to the determined polar coordinates.
16. A method, comprising:
obtaining a scene from a video camera and defining at least one point of the scene;
creating a first scene terrain model of the scene, said first scene terrain model comprises a human object size in pixels in said at least one point of the scene;
creating a second scene model of the scene, said second model defines a false positive determination that said at least one point comprises a human object; wherein said
second scene model is created when the scene does not contain a human object;
obtaining an image of the scene, said image is captured by a video camera;
determining whether the human object is detected at the at least one point of said captured image by applying said first scene terrain model and said second scene model on the at least one point;
wherein creating the second scene model comprises determining HOG matching scores for all pixels of the image of the scene;
obtaining a position of a PTZ camera, the position including values of pan, tilt and zoom;
detecting a PTZ frame by the PTZ camera at the obtained PTZ position;
obtaining a successful detection of the human object in a specific location in the frame of the PTZ camera;
determining the polar coordinates of the human object;
determining an altitude of the human object; and
obtaining a panoramic map of the detected PTZ frame and identifying a point of the panoramic map closest to the detected human object, according to the determined polar coordinates.
US13/586,884 2012-08-16 2012-08-16 Method and system for improving surveillance of PTZ cameras Active 2033-02-08 US8842162B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/586,884 US8842162B2 (en) 2012-08-16 2012-08-16 Method and system for improving surveillance of PTZ cameras

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/586,884 US8842162B2 (en) 2012-08-16 2012-08-16 Method and system for improving surveillance of PTZ cameras

Publications (2)

Publication Number Publication Date
US20140049600A1 US20140049600A1 (en) 2014-02-20
US8842162B2 true US8842162B2 (en) 2014-09-23

Family

ID=50099782

Family Applications (1)

Application Number Title Priority Date Filing Date
US13/586,884 Active 2033-02-08 US8842162B2 (en) 2012-08-16 2012-08-16 Method and system for improving surveillance of PTZ cameras

Country Status (1)

Country Link
US (1) US8842162B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11086927B2 (en) * 2015-10-07 2021-08-10 Google Llc Displaying objects based on a plurality of models

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9721166B2 (en) 2013-05-05 2017-08-01 Qognify Ltd. System and method for identifying a particular human in images using an artificial image composite or avatar
JP6332917B2 (en) * 2013-06-26 2018-05-30 キヤノン株式会社 IMAGING DEVICE, EXTERNAL DEVICE, IMAGING DEVICE CONTROL METHOD, AND EXTERNAL DEVICE CONTROL METHOD
EP2894600B1 (en) * 2014-01-14 2018-03-14 HENSOLDT Sensors GmbH Method of processing 3D sensor data to provide terrain segmentation
CN104332037B (en) * 2014-10-27 2017-02-15 小米科技有限责任公司 method and device for alarm detection
JP6397354B2 (en) * 2015-02-24 2018-09-26 Kddi株式会社 Human area detection apparatus, method and program
US9576204B2 (en) 2015-03-24 2017-02-21 Qognify Ltd. System and method for automatic calculation of scene geometry in crowded video scenes
CN108197612A (en) * 2018-02-05 2018-06-22 武汉理工大学 A kind of method and system of ship sensitizing range testing staff invasion
US10003688B1 (en) 2018-02-08 2018-06-19 Capital One Services, Llc Systems and methods for cluster-based voice verification
EP3839910B1 (en) * 2019-12-19 2023-01-25 Axis AB Prioritization among cameras of a multi-camera arrangement
CN111355926B (en) * 2020-01-17 2022-01-11 高新兴科技集团股份有限公司 Linkage method of panoramic camera and PTZ camera, storage medium and equipment
CN111798634B (en) * 2020-06-29 2022-02-01 杭州海康威视数字技术股份有限公司 Perimeter detection method and device
US11647294B2 (en) * 2021-05-25 2023-05-09 Shanghai Bilibili Technology Co., Ltd. Panoramic video data process
CN113989124B (en) * 2021-12-27 2022-04-19 浙大城市学院 System for improving positioning accuracy of pan-tilt-zoom camera and control method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Authors: Chu-Sing Yang et al. Title: PTZ camera based position tracking in IP-surveillance system; Date of Conference: Nov. 30, 2008-Dec. 3, 2008 Published in: 3rd International Conference on Sensing Technology, Nov.30-Dec. 3, 2008, Tainan, Taiwan Pertinent pp. 142-146; Print ISBN:978-1-4244-2176-3. *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11086927B2 (en) * 2015-10-07 2021-08-10 Google Llc Displaying objects based on a plurality of models
US11809487B2 (en) 2015-10-07 2023-11-07 Google Llc Displaying objects based on a plurality of models

Also Published As

Publication number Publication date
US20140049600A1 (en) 2014-02-20

Similar Documents

Publication Publication Date Title
US8842162B2 (en) Method and system for improving surveillance of PTZ cameras
US10198823B1 (en) Segmentation of object image data from background image data
KR101725060B1 (en) Apparatus for recognizing location mobile robot using key point based on gradient and method thereof
KR102126513B1 (en) Apparatus and method for determining the pose of the camera
KR101708659B1 (en) Apparatus for recognizing location mobile robot using search based correlative matching and method thereof
US9201425B2 (en) Human-tracking method and robot apparatus for performing the same
KR101784183B1 (en) APPARATUS FOR RECOGNIZING LOCATION MOBILE ROBOT USING KEY POINT BASED ON ADoG AND METHOD THEREOF
KR101776621B1 (en) Apparatus for recognizing location mobile robot using edge based refinement and method thereof
CN104756155B (en) Merge multiple maps for the system and method for the tracking based on computer vision
KR101788269B1 (en) Method and apparatus for sensing innormal situation
JP5956248B2 (en) Image monitoring device
EP2948927A1 (en) A method of detecting structural parts of a scene
KR20110011424A (en) Method for recognizing position and controlling movement of a mobile robot, and the mobile robot using the same
US20210356293A1 (en) Robot generating map based on multi sensors and artificial intelligence and moving based on map
EP2610783B1 (en) Object recognition method using an object descriptor
JP2016085602A (en) Sensor information integrating method, and apparatus for implementing the same
JP2016152027A (en) Image processing device, image processing method and program
US9947106B2 (en) Method and electronic device for object tracking in a light-field capture
KR101755023B1 (en) 3d motion recognition apparatus and method
JP4578864B2 (en) Automatic tracking device and automatic tracking method
JP2005217883A (en) Method for detecting flat road area and obstacle by using stereo image
KR101456172B1 (en) Localization of a mobile robot device, method and mobile robot
KR101346510B1 (en) Visual odometry system and method using ground feature
Ristić-Durrant et al. Low-level sensor fusion-based human tracking for mobile robot
Thornton et al. Multi-sensor detection and tracking of humans for safe operations with unmanned ground vehicles

Legal Events

Date Code Title Description
AS Assignment

Owner name: NICE-SYSTEMS LTD, ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GOLDNER, VLADIMIR;BOUDOUKH, GUY;REEL/FRAME:028890/0016

Effective date: 20120826

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: QOGNIFY LTD., ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NICE SYSTEMS LTD.;REEL/FRAME:036615/0243

Effective date: 20150918

FEPP Fee payment procedure

Free format text: PAT HOLDER CLAIMS SMALL ENTITY STATUS, ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: LTOS); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551)

Year of fee payment: 4

AS Assignment

Owner name: MONROE CAPITAL MANAGEMENT ADVISORS, LLC, ILLINOIS

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE PROPERTY NUMBERS PREVIOUSLY RECORDED AT REEL: 047871 FRAME: 0771. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNORS:QOGNIFY LTD.;ON-NET SURVEILLANCE SYSTEMS INC.;REEL/FRAME:053117/0260

Effective date: 20181228

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2552); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

Year of fee payment: 8

AS Assignment

Owner name: ON-NET SURVEILLANCE SYSTEMS INC., MASSACHUSETTS

Free format text: RELEASE OF SECURITY INTEREST IN PATENT COLLATERAL;ASSIGNOR:MONROE CAPITAL MANAGEMENT ADVISORS, LLC, AS ADMINISTRATIVE AGENT;REEL/FRAME:063280/0367

Effective date: 20230406

Owner name: QOGNIFY LTD., MASSACHUSETTS

Free format text: RELEASE OF SECURITY INTEREST IN PATENT COLLATERAL;ASSIGNOR:MONROE CAPITAL MANAGEMENT ADVISORS, LLC, AS ADMINISTRATIVE AGENT;REEL/FRAME:063280/0367

Effective date: 20230406

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY