WO2024077005A1 - Systèmes et procédés de détection d'événements de chute - Google Patents

Systèmes et procédés de détection d'événements de chute Download PDF

Info

Publication number
WO2024077005A1
WO2024077005A1 PCT/US2023/075859 US2023075859W WO2024077005A1 WO 2024077005 A1 WO2024077005 A1 WO 2024077005A1 US 2023075859 W US2023075859 W US 2023075859W WO 2024077005 A1 WO2024077005 A1 WO 2024077005A1
Authority
WO
WIPO (PCT)
Prior art keywords
person
keypoints
image
pose
keypoint
Prior art date
Application number
PCT/US2023/075859
Other languages
English (en)
Inventor
Abhishek Mitra
Gopi Subramanian
Yash Chaturvedi
Original Assignee
Sensormatic Electronics, LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US18/479,667 external-priority patent/US20240112469A1/en
Application filed by Sensormatic Electronics, LLC filed Critical Sensormatic Electronics, LLC
Publication of WO2024077005A1 publication Critical patent/WO2024077005A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • G06V40/25Recognition of walking or running movements, e.g. gait recognition
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/02Alarms for ensuring the safety of persons
    • G08B21/04Alarms for ensuring the safety of persons responsive to non-activity, e.g. of elderly persons
    • G08B21/0438Sensor means for detecting
    • G08B21/0476Cameras to detect unsafe condition, e.g. video cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/34Smoothing or thinning of the pattern; Morphological operations; Skeletonisation

Definitions

  • the described aspects relate to fall event detection systems.
  • An example aspect includes a method for computer vision detection of a fall event, comprising detecting a person in a first image captured at a first time.
  • the method further includes identifying a first plurality of keypoints on the person in the first image, wherein the first plurality of keypoints, when connected, indicate a pose of the person.
  • the method further includes classifying, using the first plurality of keypoints, the pose as a standing pose in response to determining that keypoints of the first plurality of keypoints associated with shoulders of the person are lower than keypoints of the first plurality of keypoints associated with eyes and ears of the person.
  • the method further includes detecting the person in a second image captured at a second time.
  • the method further includes identifying a second plurality of keypoints on the person in the second image. Additionally, the method further includes detecting, using the second plurality of keypoints, that the person has fallen in response to determining that, subsequent to the pose being the standing pose in the first image, the keypoints of the second plurality of keypoints associated with the shoulders of the person are higher than the keypoints of the second plurality of keypoints associated with the eyes and the ears of the person in the second image. Additionally, the method further includes generating an alert indicating that the person has fallen.
  • Another example aspect includes an apparatus for computer vision detection of a fall event, comprising at least one memory and one or more processors coupled with the one or more memories and configured, individually or in combination, to: detect a person in a first image captured at a first time.
  • the at least one processor is further configured to identify a first plurality of keypoints on the person in the first image, wherein the first plurality of keypoints, when connected, indicate a pose of the person.
  • the at least one processor further configured to classify, using the first plurality of keypoints, the pose as a standing pose in response to determining that keypoints of the first plurality of keypoints associated with shoulders of the person are lower than keypoints of the first plurality of keypoints associated with eyes and ears of the person.
  • the at least one processor further configured to detect the person in a second image captured at a second time. Additionally, the at least one processor further configured to identify a second plurality of keypoints on the person in the second image. Additionally, the at least one processor further configured to detect, using the second plurality of keypoints, that the person has fallen in response to determining that, subsequent to the pose being the standing pose in the first image, the keypoints of the second plurality of keypoints associated with the shoulders of the person are higher than the keypoints of the second plurality of keypoints associated with the eyes and the ears of the person in the second image. Additionally, the at least one processor further configured to generate an alert indicating that the person has fallen.
  • Another example aspect includes an apparatus for computer vision detection of a fall event, comprising means for detecting a person in a first image captured at a first time.
  • the apparatus further includes means for identifying a first plurality of keypoints on the person in the first image, wherein the first plurality of keypoints, when connected, indicate a pose of the person.
  • the apparatus further includes means for classifying, using the first plurality of keypoints, the pose as a standing pose in response to determining that keypoints of the first plurality of keypoints associated with shoulders of the person are lower than keypoints of the first plurality of keypoints associated with eyes and ears of the person.
  • the apparatus further includes means for detecting the person in a second image captured at a second time.
  • the apparatus further includes means for identifying a second plurality of keypoints on the person in the second image. Additionally, the apparatus further includes means for detecting, using the second plurality of keypoints, that the person has fallen in response to determining that, subsequent to the pose being the standing pose in the first image, the keypoints of the second plurality of keypoints associated with the shoulders of the person are higher than the keypoints of the second plurality of keypoints associated with the eyes and the ears of the person in the second image. Additionally, the apparatus further includes means for generating an alert indicating that the person has fallen.
  • Another example aspect includes a computer-readable medium having instructions stored thereon for computer vision detection of a fall event, wherein the instructions are executable by one or more processors, individually or in combination, to detect a person in a first image captured at a first time.
  • the instructions are further executable to identify a first plurality of keypoints on the person in the first image, wherein the first plurality of keypoints, when connected, indicate a pose of the person.
  • the instructions are further executable to classify, using the first plurality of keypoints, the pose as a standing pose in response to determining that keypoints of the first plurality of keypoints associated with shoulders of the person are lower than keypoints of the first plurality of keypoints associated with eyes and ears of the person.
  • the instructions are further executable to detect the person in a second image captured at a second time. Additionally, the instructions are further executable to identify a second plurality of keypoints on the person in the second image. Additionally, the instructions are further executable to detect, using the second plurality of keypoints, that the person has fallen in response to determining that, subsequent to the pose being the standing pose in the first image, the keypoints of the second plurality of keypoints associated with the shoulders of the person are higher than the keypoints of the second plurality of keypoints associated with the eyes and the ears of the person in the second image. Additionally, the instructions are further executable to generate an alert indicating that the person has fallen.
  • the one or more aspects comprise the features hereinafter fully described and particularly pointed out in the claims.
  • the following description and the annexed drawings set forth in detail certain illustrative features of the one or more aspects. These features are indicative, however, of but a few of the various ways in which the principles of various aspects may be employed, and this description is intended to include all such aspects and their equivalents.
  • FIG. 1 is a diagram of a system for detecting a fall event, in accordance with exemplary aspects of the present disclosure.
  • FIG. 2 is a block diagram of a computing device for detecting a fall event, in accordance with exemplary aspects of the present disclosure.
  • FIG. 3 is a flowchart illustrating a method for detecting a fall event, in accordance with exemplary aspects of the present disclosure.
  • FIG. 4 is a diagram depicting two scenarios captured by a multi camera-based system, where in the first scenario a person is seen to be in a first pose and in a second scenario the person is seen to be in a second pose, in accordance with exemplary aspects of the present disclosure.
  • FIG. 5 is a first snapshot demonstrating detection of fall event by utilizing a single camera-based system, in accordance with exemplary aspects of the present disclosure.
  • FIG. 6 is a second snapshot demonstrating detection of fall event by utilizing a single camera-based system, in accordance with exemplary aspects of the present disclosure.
  • FIG. 7 is a third snapshot demonstrating detection of fall event by utilizing a single camera-based system, in accordance with exemplary aspects of the present disclosure.
  • FIG. 8 is a flow diagram for detecting keypoints and generating bounding boxes, in accordance with exemplary aspects of the present disclosure.
  • FIG. 9 is a flow diagram for classifying a fall event based on detection of a person, in accordance with exemplary aspects of the present disclosure.
  • FIG. 10 is a flow diagram for classifying a fall event using a proximity search region, in accordance with exemplary aspects of the present disclosure.
  • FIG. 11 is a diagram for detecting a person using a proximity search region, in accordance with exemplary aspects of the present disclosure.
  • FIG. 12 is a block diagram of an example of a computer device having components configured to perform a method for computer vision detection of a fall event;
  • FIG. 13 is a flowchart of an example of a method for computer vision detection of a fall event;
  • FIG. 14 is a flowchart of additional aspects of the method of FIG. 13;
  • FIG. 15 is a flowchart of additional aspects of the method of FIG. 13.
  • FIG. 16 is a flowchart of additional aspects of the method of FIG. 13.
  • Slip/trip and fall accidents can occur in a retail environment for a number of different reasons. For instance, a storeowner may neglect to pick up product(s) that fell on the floor, which may cause an unaware person to trip over the product(s). In another example, a custodian for the environment may mop the floor and forget to place a “wet floor” sign, which triggers a fall event.
  • Vision systems may be used to detect objects in an environment and track the objects within a region of interest.
  • vision systems may include object detection and tracking capabilities.
  • the object may be a person, and the object tracking may be used for determining whether current pose of a person amounts to a fall event. Detection of fall event is important for fields such as security, injury prevention, reduction of financial liability, reduction of damage to reputation of an institution, retail stores, investigators, etc.
  • a user of a security system may be interested in knowing when people enter and exit a region of interest.
  • a camera such as an Internet Protocol (IP) camera
  • IP Internet Protocol
  • a camera may be used to monitor the region of interest, track people as they traverse in and out of the region of interest, and detect when a person may have fallen. If an event occurs within an area being monitored, timely intervention may assist in reducing injury, liability, etc.
  • IP Internet Protocol
  • the present disclosure leverages existing person detection and pose estimation models which are meant for edge devices.
  • these light (i.e., processor-friendly) models are trained on images of persons in standing/bending positions.
  • Such models may be used for detection and pose estimations, and apply image processing techniques and linear algebra for fallen position person detection.
  • keypoint detection may be used to detect falls.
  • the systems and methods of the present disclosure check the shoulder keypoint, ear keypoint, and eye keypoint placement. If the shoulder keypoint is above the ear keypoint or eye keypoint, the person is determined to have fallen.
  • the shoulder keypoint may be above the ear and eye keypoints even though the person has not fallen (e.g., a person may bend over briefly).
  • a person in a retail store may bend to search for a product in a lower shelf.
  • the present disclosure describes leveraging cross view camera outputs (if available) and analysing person pose keypoints for both cross view camera outputs on a one -dimensional (1-D) projection domain to decide “fall” or “no fall.” In particular, the spread of keypoints is considered.
  • keypoints may be widely spread out when a person is standing, and may be narrowly spaced when the person is lying down.
  • pose keypoints on a 1-D projection space may be fed to a heuristic algorithm or binary machine learning (ML) classifier model to classify “fall” or “no fall.”
  • ML binary machine learning
  • the systems and methods of the present disclosure retain an object identifier of the person, and take the last detected bounding box of that person to mark nearby areas as probable fall region (hereinafter referred to as proximity search region).
  • the systems and methods crop the image to the proximity search region and apply transformations (e.g., rotating the cropped image by 45 and 90 degrees clockwise and anti-clockwise).
  • transformations e.g., rotating the cropped image by 45 and 90 degrees clockwise and anti-clockwise.
  • the proximity search region is searched on every nth image of a plurality of images to improve efficiency, where n is an integer that can be configured by an operator of the system. Considering that a person may simply stand up and start walking, the proximity search region is only analysed for a threshold number of frames before being discarded. In a more detailed level, the proximity search region is calculated based on the last detected bounding boxes coordinates (x top left, y top left, width, height) of the tracked object. Based on the above coordinates, the bottom coordinates are derived - creating two proximity search regions. The first search region (default) is the last detected bounding box region. The second region is the expanded form of the first region where the approximate height of the person is treated as the width of the region and the approximate width of the person is used as the height for the region.
  • additional search regions may be derived by tracking the identifier and trajectory of the person.
  • the proximity search region may be any shape (e.g., a rectangle, a circle with radius (Rp) around the bottom coordinates of the last detected bounding box, etc.).
  • FIG. 1 is a diagram of a vision system 100 for detecting a fall event.
  • the vision system 100 may be a single camera-based system comprising either a first camera 110a or a second camera 110b (e.g., internet protocol (IP) cameras). In some other aspects, the vision system 100 may be a multi camera-based system comprising multiple cameras such as the first camera 110a, the second camera 110b, and/or any other additional cameras. Furthermore, the vision system 100 may comprise a fall event detector 120, which is executed on an edge device (e.g., a computer), and an alert recipient 130 (e.g., a smartphone). The communication between the edge device 125 and the alert recipient 130 may be performed via a communication network (Internet, 4G/5G network, enterprise network, or any other standard network).
  • IP internet protocol
  • the fall event detector 120 may be implemented as a module of the at least one IP camera 110a, or 110b or on another device (e.g., server) communicatively coupled to at least one of the IP cameras 110a, 110b for receiving video streams from the at least one IP camera 110a, or 100b.
  • the video stream includes a plurality of data frames such as video frames 105a- 105N and/or image frames.
  • the IP cameras may capture 30 data frames per second. Then, for each second, the video stream includes the 30 captured data frames.
  • FIG. 2 is a block diagram of a computing device 200 for detecting a fall event.
  • the computing device 200 may comprise a processing circuit 201 having a processor 202, a memory 203 and the fall event detector 120.
  • the fall event detector 120 may be implemented in an computing device 200.
  • the computing device 200 may be part of at least one of first camera 110a, second camera 110b or can be separate from the cameras 110a, 110b.
  • the components of the computing device 200 can be in a server.
  • the vision system 100 may include a single computing device 200 that is configured to cooperate with at least one of the first camera 110a and the second camera 110b.
  • the computing device 200 may include object detector 212, bounding box generator 214, pose estimator 216 for estimating poses of object(s), pose keypoint projector 218, and an analyzer 220.
  • the object detector 212 is basically a person detector that is configured to receive the data frames generated by at least one of the first camera 110a and the second camera 110b.
  • the object detector 212 employs Artificial Intelligence (Al) and/or Machine Learning (ML) based techniques to identify one or more objects within the received data frames.
  • the object detector 212 utilizes one or more image processing techniques to identify objects.
  • the object detector 212 may be configured to determine Region of Interest (ROI) within the data frame to determine and locate an object within the ROI.
  • the ROI may be the entire data frame being captured by the camera(s) 110a, 110b or a portion of the data frame, as set by the user.
  • the object may be detected using an object detection software as known in the art and executed by the processor 202.
  • the object detector 212 may use an object tracking software. For example, as the object moves within the region of interest, the object detector tracks the object by comparing images of the object in a plurality of data frames.
  • the software for detecting and tracking the object may be implemented together or separately in different components of at least one of the IP cameras 110a, or 110b or by the computing device 200 communicatively coupled with at least one of the IP cameras 110a, 110b.
  • tracking of the object may be performed by the processor 202 (e.g., executing tracking software).
  • the computing device 200 is further shown to include the bounding box generator 214.
  • the bounding box generator 214 is configured to coordinate with the object detector 212 to generate a bounding box (not shown in figures) around the object(s) detected by the object detector 212.
  • the bounding boxes around the object(s) indicates people identified within a data frame. In other words, the bounding box also represents location of person or object within the data frame.
  • the bounding box generator 214 tags each of the bounding boxes with a unique identifier, where each bounding box corresponds to a single object identified within the data frame.
  • the unique identifier is also associated with the object within the bounding box.
  • the count of bounding boxes generated and/or count of unique identifier generated corresponds to the number of objects identified in a particular data frame.
  • the objects may be identified or referred by unique identifiers. The objects may be identified based on the unique identifier tagged with the respective bounding boxes.
  • the vision system 100 is capable of establishing a correlation between bounding boxes generated for the same object identified by multiple cameras within their respective data frames. This allows the vision system 100 to keep track of the objects from different angles or views.
  • object 1 may be visible within the data frames generated by the first camera 110a and the second camera 110b.
  • Such objects can also be associated with a single unique identifier.
  • different unique identifier may be allotted to the bounding boxes generated by the bounding box generator 214 of both first camera 110a and second camera 110b.
  • the computing device 200 is further shown to include the pose estimator 216.
  • the pose estimator 216 may apply pose estimation technique on an entire data frame.
  • the pose estimator 216 may apply pose estimating technique on the bounding boxes generated by the bounding box generator 214.
  • the pose estimation technique may be based on one or more Artificial Intelligence (Al) and/or Machine Learning (ML) based technique(s) to estimate a pose of the object.
  • the pose estimator 216 identifies a plurality of keypoints and links them to determine a pose of the object. Furthermore, upon creating linkage between the keypoints, the pose estimator 216 correlates the mapped keypoints with the bounding box.
  • the pose estimator 216 determines an X coordinate value and a Y coordinate value for each keypoint.
  • a single keypoint represents a portion of the object.
  • the keypoints and their linkage may enable the pose estimator 216 to estimate the current pose of object.
  • the keypoint(s) representing shoulders of the object/person should be lower than the keypoint(s) representing eyes and ears in order to estimate that the object is standing.
  • the pose estimator 216 may estimate that the object is not in a standing position when keypoint(s) representing shoulder is higher than the keypoint(s) representing eyes and ears.
  • the pose keypoint projector 218 is configured to cooperate with the pose estimator 216.
  • the pose keypoint projector 218 receives the plurality of keypoints identified for each of the objects within the data frame, and further receives the Y coordinate value for each of the keypoints.
  • the pose keypoint projector 218 is configured to plot keypoint projections based on the received Y- coordinate value.
  • the analyzer 220 analyzes these keypoint projections to determine fall event using Artificial Intelligence (Al) or Machine Learning (ML) based technique(s).
  • the analyzer 220 may be configured to identify a change in spacing between the keypoint projections (e.g., shift in keypoint projections in different data frames to detect occurrence of a fall event).
  • one or more alerts may be provided over the alert recipient 130 (e.g., a server, a user device, etc.).
  • the alert recipient 130 e.g., a server, a user device, etc.
  • a fall event may be determined based on the analysis performed on the data frame generated by the second camera 110b.
  • the object detector 212 may identify an object within the data frames provided by the second camera 110b.
  • the bounding box generator 214 may generate a bounding box around the object.
  • the pose estimator 216 may estimate a current pose of the object by identifying and connecting a plurality of keypoints associated with the object.
  • the pose keypoint projector 218 may generate keypoint projections of the object by plotting the keypoints in at least one dimension.
  • the analyzer 220 may detect a fall event associated with the object by analyzing keypoint projections of the data frames provided only by the second camera 110b. Thus, the fall event can be determined by the vision system 100 by using a single camera.
  • the vision system 100 may determine the fall event by using only one camera, in some aspects, using more than one camera such as the first camera 110a and the second camera 110b may provide better accuracy in determining fall events as shift in keypoint projections from standing pose to falling pose will be prominent in either of camera views of the two cameras 110a or 110b.
  • the first camera 110a and the second camera 110b may be positioned substantially orthogonal to each other and camera view or field of view (FOV) of the first camera 110a may at least partially overlap with the camera view of the second camera 110b.
  • FOV field of view
  • the data frame provided by the first camera 110a may depict significant changes in spacing of keypoint projections (on 1-D projection space) for the falling person as compared to the camera view of the second camera 110b, where the person is falling in a longitudinal direction.
  • the fall event may be detected with a higher confidence level in either of camera views for the two cameras 110a or 110b. Additionally, increasing the count of cameras may enhance accuracy of fall detection event.
  • the vision system 100 may comprise multiple cameras where the field of view (FOV) of at least two cameras may partially overlap or may not overlap.
  • the vision system 100 may be configured to generate a confidence score pertaining to fall detection event. The confidence score can be, at least, based on the detection of object’s fall event within overlapping FOV or non-overlapping FOV.
  • the vision system 100 comprises multiple cameras such as the first camera 110a and the second camera 110b.
  • the object detector 212 may receive a plurality of first data frames provided by the first camera 110a and a plurality of second data frames provided by the second camera 110b. Further, the object detector 212 may identify an object within at least one of the first data frames and the second data frames.
  • the bounding box generator 214 may generate a bounding box around the object.
  • the pose estimator 216 may estimate a current pose of the object by identifying and connecting a plurality of keypoints associated with the object. Further, the pose keypoint projector 218 may generate keypoint projections of the object by plotting the keypoints in at least one dimension.
  • the analyzer 220 may detect a fall event associated with the object by analyzing keypoint projections of at least one of the first data frames and second data frames.
  • FIG. 3 flowchart illustrating method 300 for detecting a fall event, is shown in accordance with exemplary aspects of the present disclosure.
  • the method 300 is performed by an computing device 200 deployed in the first camera 110a and second camera 110b.
  • the method 300 may be performed by an computing device 200 deployed on a server.
  • the method 300 is shown to include receiving, at 302a, one or more video streams from a first camera 110a.
  • the video streams may include a plurality of data frames such as video frames and/or image frames.
  • the method 300 is shown to include detecting, at 304a, one or more persons in the data frames.
  • the one or more persons may be detected by the object detector 212 (referred above in FIG. 2) which is basically a person detectorthat is configured to receive the data frames generated from the first camera 110a.
  • the object detector 212 (referred above in FIG. 2) which is basically a person detectorthat is configured to receive the data frames generated from the first camera 110a.
  • Artificial Intelligence (Al) and/or Machine Learning (ML) based techniques may be employed to identify one or more objects within the received data frame.
  • one or more image processing techniques may be employed to identify objects.
  • a ROI within the data frame may be detected to determine and locate an object within the region of interest.
  • the ROI may be the entire data frame being captured by the camera(s) 110a, 110b or a portion of the data frame, as set by the user.
  • the object may be detected using an object detection software as known in the art and executed by the processor 202.
  • an object tracking software may be used to track objects within the data frame. For example, as the object moves within the ROI, the object may be tracked by comparing images of the object in a plurality of data frames.
  • the software for detecting and tracking the object may be implemented together or separately in different components of the first camera 110a or by the computing device 200 communicatively coupled with the first camera 110a.
  • tracking of the object may be performed by the processor 202 (e.g., executing tracking software as known in the art).
  • the method 300 is shown to include generating, at 306a, one or more bounding boxes.
  • the bounding boxes may be generated by the bounding box generator 214 (referred above in FIG. 2).
  • the bounding box (not shown in figures) may be generated around the object(s) detected by the object detector 212.
  • the bounding boxes around the object(s) indicates people identified within a data frame. In other words, the bounding box also represents location of person or object within the data frame.
  • each of the bounding boxes may be tagged with a unique identifier, where each bounding box corresponds to a single object identified within the data frame.
  • the unique identifier is also associated with the object within the bounding box. Therefore, the count of bounding boxes generated and/or count of unique identifier generated corresponds to the number of objects identified in a particular data frame.
  • the objects may be identified or referred by unique identifiers. The objects may be identified based on the unique identifier tagged with the respective bounding boxes.
  • a correlation between bounding boxes generated for the same object identified by multiple cameras such as 110a, 110b within their respective data frames may be established. Thereby, allowing tracking of the objects from different angles or views.
  • object 1 may be visible within the data frames generated by the first camera 110a and the second camera 110b.
  • Such objects can be associated with a single unique identifier.
  • different unique identifier may be allotted to the bounding boxes generated by the bounding box generator 214 of both cameras i.e., the first camera 110a and the second camera 110b.
  • the method 300 is further shown to include estimating, at 308a, a pose for the objects.
  • pose estimation may be performed by the pose estimator 216 (referred above in FIG. 2).
  • a post estimation technique may be applied on an entire data frame received from the first camera 110a.
  • the pose estimation technique may be applied on the bounding boxes generated by the bounding box generator 214.
  • the pose estimation technique may be based on one or more Artificial Intelligence (Al) and/or Machine Learning (ML) based technique(s) to estimate a pose of the object.
  • a plurality of keypoints may be identified and linked to determine a pose of the object. Furthermore, upon creating linkage between the keypoints, the mapped keypoints may be correlated with the bounding box.
  • an X coordinate value and a Y coordinate value for each keypoint may be determined.
  • a single keypoint represents a portion of the object.
  • the keypoints and their linkage may enable estimation of the current pose of the object.
  • the keypoint(s) representing shoulders of the object/person should be lower than the keypoint(s) representing eyes and ears in order to estimate that the object is standing.
  • the method 300 is further shown to include projecting, at 310a, the keypoints.
  • the keypoints may be projected by the pose keypoint projector 218 (referred above in FIG. 2).
  • the X coordinate value and Y coordinate value for each keypoint may be determined.
  • the keypoint projections may be plotted based on the Y-coordinate value of the keypoint.
  • the keypoint projections may be analyzed, at 312, by the analyzer 220 (referred above in FIG. 2) using Artificial Intelligence (Al) or Machine Learning (ML) based technique(s) such as binary classifiers, heuristic algorithms etc., to determine a fall event associated with the object.
  • the fall event may be indicated in form of percentage, for example Fall- 90%, No fall- 10%, or in any other form.
  • one or more alerts may be provided over the alert recipient 130 (e.g., a server, user device).
  • the alert recipient 130 e.g., a server, user device.
  • the method 300 is shown to include receiving, at 302b, one or more video streams from the second camera 110b.
  • the video streams may include a plurality of data frames such as video data frames.
  • the data frames may be received by the computing device 200 that is part of the second camera 110b to detect a fall event by performing the one or more steps (i.e., 304b, 306b, 308b, and 310b) similar to that performed by the computing device 200 that is part of the first camera 110a, as explained above.
  • FIG. 4 depicts a first scenario 400 where a person is seen to be in a first pose that is standing pose in a retail environment.
  • the first scenario 400 may be captured by each of the first camera 110a and the second camera 110b of a multi camera-based system.
  • Each of the first camera 110a and the second camera 110b are positioned substantially orthogonal to each other with at least partially overlapping FOV and may provide one or more data frames pertaining to the first scenario 400.
  • a plurality of keypoints may be identified and linked to determine a pose of the person by the pose estimator 216. Further, keypoint projections may be plotted based on received Y coordinate value of each keypoint.
  • the keypoint projections 406 are generated for the data frame provided by the first camera 110a and the keypoint projections 408 are generated for the data frame provided by the second camera 110b as shown in first scenario 400.
  • the second scenario 402 is shown, where the same person is detected and seen to be in a second pose.
  • the second scenario 402 also may be captured by each of the first camera 110a and the second camera 110b of the multi camera-based system.
  • Each of the first camera 110a and the second camera 110b are positioned substantially orthogonal to each other with at least partially overlapping FOV and may provide one or more data frames pertaining to the second scenario 402.
  • keypoint projections 410 are generated for the data frame provided by the first camera 110a
  • keypoint projections 412 are generated for the data frame provided by the second camera 110b.
  • the keypoint projections 406 generated in the first scenario 400 are compared and analyzed with the keypoint projections 410 generated in the second scenario 402.
  • the analysis shows that the keypoint projections 406 in the first scenario 400 are spaced apart or scattered, for example, the keypoint(s) representing shoulders of the person are lower than the keypoint(s) representing eyes and ears suggesting that the person is in standing pose.
  • the keypoint projections 410 in the second scenario 402 are clustered. This shift in keypoint projections in different data frames indicates that the person has fallen down, thereby facilitating detection of a fall event.
  • the keypoint projections 408 in the first scenario 400 are compared and analyzed with the keypoint projections 412 in the second scenario 402 in order to detect a fall event.
  • the vision system 100 may determine a fall event of a person by using only one camera, however, in some aspects, using more than one camera such as the first camera 110a and the second camera 110b provides greater accuracy in determining fall events as change in spacing of keypoint projections from standing pose to falling pose will be prominent in either of camera views of the two cameras 110a or 110b that are positioned substantially orthogonal to each other.
  • the keypoint projections 408 in the first scenario 400 and the keypoint projections 412 in the second scenario 402 are shown for the data frames captured by the second camera 110b.
  • the fall event may not be accurately determined due to no changes in spacing between keypoint projections 408 and 412 as the person falls.
  • using more than one camera such as the first camera 110a along with the second camera 110b, may help in accurately determining the fall event.
  • the keypoint projections 406 for the first scenario 400 and the keypoint projections 410 for the second scenario 402 are shown for the data frames captured by the first camera 110a and such keypoint projections show significant changes in spacing as the person falls, due to a different camera view as compared to the second camera 110b. Therefore, the vision system 100 utilizing more than one camera may increase a confidence level in determining the fall event.
  • FIGs. 5-7 snapshots demonstrating detection of fall event by utilizing a single camera-based system, are shown in accordance with exemplary aspects of the present disclosure.
  • FIG. 5 is a first snapshot 500 demonstrating detection of fall event by utilizing a single camera-based system, in accordance with exemplary aspects of the present disclosure.
  • a plurality of keypoints 506 may be identified and linked to determine a pose of the person detected in boundary box 504 by the pose estimator 216. Further, the keypoint projections 508 may be plotted based on the received Y-coordinate value of each keypoint. In FIG. 5, keypoint projections 508 are spaced apart.
  • FIG. 6 is a second snapshot 600 demonstrating detection of fall event by utilizing a single camera-based system, in accordance with exemplary aspects of the present disclosure.
  • the person is in the process of falling (e.g., in midfall).
  • Fall event detector 120 again detects the keypoints of the person in snapshot 600.
  • the ears and eyes keypoints 602 and shoulder keypoints 604 are marked in FIG. 6 and all other keypoints are not marked.
  • fall event detector 120 may not classify that the person has fallen because the shoulder keypoints 604 are lower than ears and eyes keypoints 602. Nonetheless, because the person is falling, keypoint projections 606 show the keypoints condensing in one area relative to the wide spread in keypoint projections 508.
  • FIG. 7 is a third snapshot 700 demonstrating detection of fall event by utilizing a single camera-based system, in accordance with exemplary aspects of the present disclosure.
  • the person has fallen and is lying on the floor.
  • ears and eyes keypoints 702 are below shoulder keypoints 704.
  • fall event detector 120 may classify a fall event.
  • each of the keypoint projections 508, 606, and 706 are compared and analyzed by the analyzer 220 (referred to above in FIG. 5).
  • the analysis shows that the keypoint projections 508 in FIG. 5 are spaced apart, for example, the keypoint(s) representing shoulders of the person are lower than the keypoint(s) representing eyes and ears, suggesting that the person is standing.
  • the keypoint projections 706 shown in subsequent data frames are clustered. This shift in keypoint projections in two or more adjacent data frames indicates that the person has fallen down.
  • alert recipient 130 e.g., a server, user device.
  • FIG. 8 is a flow diagram 800 for detecting keypoints and generating bounding boxes, in accordance with exemplary aspects of the present disclosure.
  • camera 802 captures video frames 804 (e.g., snapshots 500-700).
  • fall event detector 120 executes person detection and tracking on video frames 804 and at 808, fall event detector 120 executes a pose keypoints estimation model. These executions result in bounding boxes and identifiers 810 and pose keypoints 812, respectively, which are stored as inference results 814.
  • FIG. 9 is a flow diagram 900 for classifying a fall event based on detection of a person, in accordance with exemplary aspects of the present disclosure.
  • diagram 900 steps are initiated.
  • fall event detector 120 retrieves the last person bounding box.
  • fall event detector 120 calculates the aspect ratio of the bounding box.
  • fall event detector 120 determines whether the aspect ratio is less than a threshold aspect ratio. If it is not, at 922, no fall is classified.
  • fall event detector 120 retrieves the latest pose keypoints of the person.
  • fall event detector 120 calculates the aspect ratio of the pose-derived bounding box. From 908, diagram 900 proceeds to 910, as described above.
  • fall event detector 120 identifies the ears and eyes keypoints and shoulder keypoints. At 918, fall event detector 120 detects a fall based on the keypoints identified in 914 and 916. If the aspect ratio at 910 is determined to be greater than a threshold, diagram 900 proceeds to 912. Likewise, from 918, diagram 900 proceeds to 912. At 912, fall event detector 120 determines whether a fall was detected. If not, diagram 900 ends at 922 (i.e., no fall) and a fall is detected, diagram 900 ends at 920.
  • FIG. 10 is a flow diagram 1000 for classifying a fall event using a proximity search region, in accordance with exemplary aspects of the present disclosure. If a person that is initially detected during a fall event is no longer detectable (e.g., the person moves in a position where a person detection algorithm cannot identify them), the steps of diagram 1000 are initiated after the steps of diagram 800.
  • fall event detector 120 retrieves the last detected bounding box.
  • fall event detector 120 determines that height and width of the person in the bounding box.
  • fall event detector 120 generates one or more proximity search regions based on the last known coordinates and determined height and width of the person.
  • fall event detector 120 crops the frame based on the proximity search regions.
  • fall event detector 120 rotates the cropped frames clockwise/anti-clockwise by 45 and/or 90 degrees. After applying each rotation, fall event detector 120 executes person detector 1012. At 1014, fall event detector 120 determines whether the person is detected. If detected, fall event detector 120 determines that the fall is detected at 1018. Otherwise, no fall is detected at 1016.
  • FIG. 11 is a diagram for detecting a person using a proximity search region, in accordance with exemplary aspects of the present disclosure.
  • FIG. 7 it is visible that the person in snapshot 600 has fallen. However, depending on the orientation of the person and whether certain features (e.g., face, torso, foot, etc.) are visible, a person detection model may fail to detect the person.
  • Fall event detector 120 may identify the last seen boundary box 1102 (e.g., the boundary box in a previous frame such as the one shown in snapshot 600) and generate proximity search region 1104 in a neighboring area of the image.
  • Fall event detector 120 may crop extracted image 1106 (i.e., the pixel contents of proximity search region 1104) and apply transformations (e.g., rotations clockwise and counterclockwise by 45 degrees and 90 degrees). These transformation(s) result in transformed image 1108, on which fall event detector 120 applies the person detection algorithm. In response to detecting the person in transformed image 1108, the person is marked using boundary box 1110 in the original image (e.g., snapshot 700) and keypoints 1112 are generated as discussed before.
  • transformations e.g., rotations clockwise and counterclockwise by 45 degrees and 90 degrees.
  • computing device 1200 may perform a method 1300 for computer vision detection of a fall event, by such as via execution of fall detection component 1215 by one or more processors 1205 and/or one or more memories 1210. It should be noted that computing device 1200 may correspond to computing device 200. For example, processor(s) 1205 corresponds to processor 202, and memory 1210 corresponds to memory 203.
  • Fall detection component 1215 corresponds to fall event detector 120 such that fall detection component 1215 may execute object detector 212 (e.g., via detecting component 1220), bounding box generator 214 (e.g., via detecting component 1220), pose estimator 216 (e.g., via identifying component 1225 and classifying component 1230), pose keypoint projector 218, and analyzer 220 (e.g., via generating component 1235).
  • object detector 212 e.g., via detecting component 1220
  • bounding box generator 214 e.g., via detecting component 1220
  • pose estimator 216 e.g., via identifying component 1225 and classifying component 1230
  • pose keypoint projector 218, and analyzer 220 e.g., via generating component 1235.
  • the method 1300 includes detecting a person in a first image captured at a first time.
  • computing device 1200, one or more processors 1205, one or more memories 1210, fall detection component 1215, and/or detecting component 1220 may be configured to or may comprise means for detecting a person in a first image captured at a first time.
  • Fall detection component 1215 may detect the person shown in FIG. 5 using computer vision and/or machine learning techniques (e.g., object classification).
  • the method 1300 includes identifying a first plurality of keypoints on the person in the first image, wherein the first plurality of keypoints, when connected, indicate a pose of the person.
  • computing device 1200, one or more processors 1205, one or more memories 1210, fall detection component 1215, and/or identifying component 1225 may be configured to or may comprise means for identifying a first plurality of keypoints on the person in the first image, wherein the first plurality of keypoints, when connected, indicate a pose of the person.
  • fall detection component 1215 may detect keypoints 506 (e.g., using a keypoint detection computer vision algorithm).
  • the method 1300 includes classifying, using the first plurality of keypoints, the pose as a standing pose in response to determining that keypoints of the first plurality of keypoints associated with shoulders of the person are lower than keypoints of the first plurality of keypoints associated with eyes and ears of the person.
  • computing device 1200, one or more processors 1205, one or more memories 1210, fall detection component 1215, and/or classifying component 1230 may be configured to or may comprise means for classifying, using the first plurality of keypoints, the pose as a standing pose in response to determining that keypoints of the first plurality of keypoints associated with shoulders of the person are lower than keypoints of the first plurality of keypoints associated with eyes and ears of the person.
  • fall detection component 1215 determines that the person is standing.
  • the method 1300 includes detecting the person in a second image captured at a second time.
  • computing device 1200, one or more processors 1205, one or more memories 1210, fall detection component 1215, and/or detecting component 1230 may be configured to or may comprise means for detecting the person in a second image captured at a second time.
  • Fall detection component 1215 may detect the person in FIG. 7.
  • the method 1300 includes identifying a second plurality of keypoints on the person in the second image.
  • computing device 1200, one or more processors 1205, one or more memories 1210, fall detection component 1215, and/or identifying component 1230 may be configured to or may comprise means for identifying a second plurality of keypoints on the person in the second image.
  • fall detection component 1215 may identify several keypoints on the person in FIG. 7. Those keypoints may include shoulder keypoints 704 and ears and eyes keypoints 702.
  • the method 1300 includes detecting, using the second plurality of keypoints, that the person has fallen in response to determining that, subsequent to the pose being the standing pose in the first image, the keypoints of the second plurality of keypoints associated with the shoulders of the person are higher than the keypoints of the second plurality of keypoints associated with the eyes and the ears of the person in the second image.
  • computing device 1200, one or more processors 1205, one or more memories 1210, fall detection component 1215, and/or detecting component 1230 may be configured to or may comprise means for detecting, using the second plurality of keypoints, that the person has fallen in response to determining that, subsequent to the pose being the standing pose in the first image, the keypoints of the second plurality of keypoints associated with the shoulders of the person are higher than the keypoints of the second plurality of keypoints associated with the eyes and the ears of the person in the second image.
  • fall detection component 1215 may determine that at least one of the Y-coordinates of the shoulder keypoints 704 is lower than at least one of the Y-coordinates of ears and eyes keypoints 702. Based on this logic, fall detection component 1215 determines that the person has fallen.
  • the method 1300 includes generating an alert indicating that the person has fallen.
  • computing device 1200, one or more processors 1205, one or more memories 1210, fall detection component 1215, and/or generating component 1235 may be configured to or may comprise means for generating an alert indicating that the person has fallen.
  • the alert may be generated on a graphical user interface of fall detection component 1215 on computing device 1200.
  • the alert is transmitted to a different computing device (e.g., a smartphone) belonging to security personnel, an emergency contact, a housekeeper, etc.
  • the method 1300 may further include mapping the first plurality of keypoints to a onedimensional line based on each respective vertical component of the first plurality of keypoints.
  • computing device 1200, one or more processors 1205, one or more memories 1210, fall detection component 1215, and/or mapping component 1240 may be configured to or may comprise means for mapping the first plurality of keypoints to a one -dimensional line based on each respective vertical component of the first plurality of keypoints.
  • the first plurality of keypoints may be mapped by taking the Y -coordinates of the keypoint and place them in a line along the Y -axis.
  • the coordinates of a keypoint are given by (X,Y).
  • the shoulder keypoint is (300, 400), which indicates that the keypoint is located 300 pixels to the right and 400 pixels above of the origin point (0,0) of an image (where the origin point is the bottom leftmost point in the image)
  • the Y-coordinate extracted is 400.
  • the Y-coordinates of keypoints 506 are mapped along the Y-axis and shown as keypoint projections 508.
  • the method 1300 may further include determining, based on the one-dimensional line, a first distance between a highest keypoint and a lowest keypoint of the first plurality of keypoints.
  • computing device 1200, one or more processors 1205, one or more memories 1210, fall detection component 1215, and/or determining component 1245 may be configured to or may comprise means for determining, based on the one-dimensional line, a first distance between a highest keypoint and a lowest keypoint of the first plurality of keypoints.
  • the Y-coordinate value of the highest keypoint may be 1020 and the Y-coordinate value of the lowest keypoint may be 100.
  • the first distance is therefore 920 pixels.
  • the method 1300 may further include classifying the pose as the standing pose further in response to the first distance being greater than a threshold distance.
  • computing device 1200, one or more processors 1205, one or more memories 1210, fall detection component 1215, and/or classifying component 1230 may be configured to or may comprise means for classifying the pose as the standing pose further in response to the first distance being greater than a threshold distance.
  • the threshold distance may be a preset value stored in memory 1210.
  • the threshold distance may be 500 pixels.
  • fall detection component 1215 may determine that 920 exceeds 500 and therefore the person is standing in FIG. 5.
  • the method 1300 may further include mapping the second plurality of keypoints to the one -dimensional line based on each respective vertical component of the second plurality of keypoints.
  • computing device 1200, one or more processors 1205, one or more memories 1210, fall detection component 1215, and/or mapping component 1240 may be configured to or may comprise means for mapping the second plurality of keypoints to the one -dimensional line based on each respective vertical component of the second plurality of keypoints.
  • the method 1300 may further include determining, based on the one-dimensional line, a second distance between a highest keypoint and a lowest keypoint of the second plurality of keypoints.
  • computing device 1200, one or more processors 1205, one or more memories 1210, fall detection component 1215, and/or determining component 1245 may be configured to or may comprise means for determining, based on the one-dimensional line, a second distance between a highest keypoint and a lowest keypoint of the second plurality of keypoints.
  • the Y-coordinate value of the highest keypoint may be 400 and the Y -coordinate value of the lowest keypoint may be 70.
  • the second distance is thus 330 pixels.
  • the method 1300 may further include detecting that the person has fallen in response to determining that the first distance is greater than the threshold distance and the second distance is not greater than the threshold distance.
  • computing device 1200, one or more processors 1205, one or more memories 1210, fall detection component 1215, and/or detecting component 1220 may be configured to or may comprise means for detecting that the person has fallen in response to determining that the first distance is greater than the threshold distance and the second distance is not greater than the threshold distance.
  • the threshold distance may be 500. Because the second distance of 330 is less than this value, fall detection component 1215 may determine that the person has fallen.
  • the method 1300 may further include generating a first boundary box around the person in the first image and a second boundary box around the person in the second image.
  • computing device 1200, one or more processors 1205, one or more memories 1210, fall detection component 1215, and/or generating component 1245 may be configured to or may comprise means for generating a first boundary box around the person in the first image and a second boundary box around the person in the second image.
  • the first image may be the image in FIG. 5 and the second image may be the image in FIG. 11.
  • the first boundary box may be boundary box 504 and the second boundary box may be boundary box 1110.
  • the size of a boundary box may be set such that the size of the person is maximized in the boundary box without exceeding the limits of the boundary box.
  • classifying the pose as the standing pose is further in response to determining that an aspect ratio of the first boundary box is greater than a threshold aspect ratio.
  • boundary box 504 may have a width of 300 pixels and a length of 800 pixels.
  • the threshold aspect ratio may be 1.
  • Fall detection component 1215 may thus determine that because 2.7 is larger than 1, the person is standing in FIG. 5.
  • detecting that the person has fallen is further in response to determining that an aspect ratio of the second boundary box is not greater than a threshold aspect ratio.
  • fall detection component 1215 may thus determine that because 0.43 is not greater than 1, the person is in a fallen position in FIG. 11.
  • the detecting at block 1302 of the person in the second image captured at the second time further includes determining that a person detection model has failed to detect the person in the second image. For example, because the person has fallen and some keypoints may be obscured or undetected, the person may not be detected in FIG. 11.
  • the detecting at block 1302 of the person in the second image captured at the second time further includes generating at least one proximity search region based on coordinates and dimensions of the first boundary box in response to determining that the first boundary box is a latest boundary box generated for the person.
  • an area of the at least one proximity search region matches an area of the first boundary box, and wherein a center point of the at least one proximity search region is within a threshold distance from a center point of the first boundary box.
  • the latest boundary box may be last seen boundary box 1102. This is the last boundary box that the person was detected in before going undetected.
  • last seen boundary box 1102 may be the bounding box shown in FIG. 6.
  • fall detection component 1215 may generate proximity search region 1104.
  • proximity search region 1104 may share the shape of last seen boundary box 1102.
  • Fall detection component 1215 may set the size of proximity search region 1104 to be proportional to last seen boundary box 1102 (e.g., lx size, 2x size, etc.). Fall detection component 1215 may further place proximity search region 1104 in an area of the image where the person is not detected because on the position of the last seen boundary box 1102.
  • proximity search region 1104 may first be placed in the vicinity of last seen boundary box 1102 (e.g., shifted by a preset amount such as 100 pixels lower and 50 pixels to the right relative to the origin point). Fall detection component 1215 may keep shifting proximity search region 1104 until the person is detected or until the person cannot be detected despite shifting the region a threshold number of times.
  • the detecting at block 1302 of the person in the second image captured at the second time further includes generating at least one input image by cropping the second image to the at least one proximity search region.
  • fall detection component 1215 may place proximity search region 1104 as shown in FIG. 11 and crop the image to produce extracted image 1106.
  • the detecting at block 1302 of the person in the second image captured at the second time further includes applying a rotation to the at least one input image.
  • fall detection component 1215 may rotate extracted image 1106 90 degrees clockwise to produce transformed image 1108.
  • the detecting at block 1302 of the person in the second image captured at the second time further includes detecting the person in the at least one input image after the rotation is applied.
  • the person detection algorithm used by fall detection component 1215 may detect the fallen person in transformed image 1108.
  • the method 1300 may further include transmitting the alert to a computing device.
  • computing device 1200, one or more processors 1205, one or more memories 1210, fall detection component 1215, and/or transmitting component 1250 may be configured to or may comprise means for transmitting the alert to a computing device.
  • the alert may be an audio alert on a smart speaker, an email, a text message, a notification on an application, etc., on a smartphone, desktop computer, laptop, tablet, etc.
  • An apparatus for computer vision detection of a fall event comprising: at least one memory; and one or more processors coupled with the one or more memories and configured, individually or in combination, to: detect a person in a first image captured at a first time; identify a first plurality of keypoints on the person in the first image, wherein the first plurality of keypoints, when connected, indicate a pose of the person; classify, using the first plurality of keypoints, the pose as a standing pose in response to determining that keypoints of the first plurality of keypoints associated with shoulders of the person are lower than keypoints of the first plurality of keypoints associated with eyes and ears of the person; detect the person in a second image captured at a second time; identify a second plurality of keypoints on the person in the second image; detect, using the second plurality of keypoints, that the person has fallen in response to determining that, subsequent to the pose being the standing pose in the first image, the keypoints of the second plurality of key
  • the at least one processor is further configured to: map the second plurality of keypoints to the onedimensional line based on each respective vertical component of the second plurality of keypoints; determine, based on the one -dimensional line, a second distance between a highest keypoint and a lowest keypoint of the second plurality of keypoints; and detect that the person has fallen in response to determine that the first distance is greater than the threshold distance and the second distance is not greater than the threshold distance.
  • the at least one processor is further configured to generate a first boundary box around the person in the first image and a second boundary box around the person in the second image.
  • the at least one processor is further configured to: determine that a person detection model has failed to detect the person in the second image; generate at least one proximity search region based on coordinates and dimensions of the first boundary box in response to determine that the first boundary box is a latest boundary box generated for the person; generate at least one input image by cropping the second image to the at least one proximity search region; apply a rotation to the at least one input image; and detect the person in the at least one input image after the rotation is applied.
  • a method for computer vision detection of a fall event comprising: detecting a person in a first image captured at a first time; identifying a first plurality of keypoints on the person in the first image, wherein the first plurality of keypoints, when connected, indicate a pose of the person; classifying, using the first plurality of keypoints, the pose as a standing pose in response to determining that keypoints of the first plurality of keypoints associated with shoulders of the person are lower than keypoints of the first plurality of keypoints associated with eyes and ears of the person; detecting the person in a second image captured at a second time; identifying a second plurality of keypoints on the person in the second image; detecting, using the second plurality of keypoints, that the person has fallen in response to determining that, subsequent to the pose being the standing pose in the first image, the keypoints of the second plurality of keypoints associated with the shoulders of the person are higher than the keypoints of the second plurality of key
  • detecting the person in the second image captured at the second time further comprises: determining that a person detection model has failed to detect the person in the second image; generating at least one proximity search region based on coordinates and dimensions of the first boundary box in response to determining that the first boundary box is a latest boundary box generated for the person; generating at least one input image by cropping the second image to the at least one proximity search region; applying a rotation to the at least one input image; and detecting the person in the at least one input image after the rotation is applied.
  • Clause 18 The method of any preceding clause, further comprising transmitting the alert to a computing device.
  • Clause 19 An apparatus for computer vision detection of a fall event, comprising one or more means for performing the method of any of clauses 10 to 18.
  • Clause 20 A computer-readable medium having instructions stored thereon for computer vision detection of a fall event, wherein the instructions are executable by one or more processors, individually or in combination, to perform the method of any of clauses 10 to 18.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Gerontology & Geriatric Medicine (AREA)
  • Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • Image Analysis (AREA)

Abstract

Des modes de réalisation donnés à titre d'exemple comprennent un procédé, un appareil et un support lisible par ordinateur pour la détection par vision artificielle d'un événement de chute, comprenant la détection d'une personne dans une première image capturée à un premier instant. Les modes de réalisation comprennent en outre l'identification d'une pluralité de points-clés sur la personne dans une image, la pluralité de points-clés, lorsqu'ils sont connectés, indiquant une pose de la personne. En outre, les modes de réalisation comprennent en outre la détection, à l'aide de la pluralité de points-clés, du fait que la personne est tombée en réponse à la détermination du fait que, suite à la pose qui est la pose debout dans une image précédente, les points-clés de la pluralité de points-clés associés aux épaules de la personne sont supérieurs aux points-clés de la seconde pluralité de points-clés associés aux yeux et aux oreilles de la personne dans la seconde image. De plus, les modes de réalisation comprennent en outre la génération d'une alerte indiquant que la personne est tombée.
PCT/US2023/075859 2022-10-03 2023-10-03 Systèmes et procédés de détection d'événements de chute WO2024077005A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202263378116P 2022-10-03 2022-10-03
US63/378,116 2022-10-03
US18/479,667 US20240112469A1 (en) 2022-10-03 2023-10-02 Systems and methods for detecting fall events
US18/479,667 2023-10-02

Publications (1)

Publication Number Publication Date
WO2024077005A1 true WO2024077005A1 (fr) 2024-04-11

Family

ID=88689590

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/075859 WO2024077005A1 (fr) 2022-10-03 2023-10-03 Systèmes et procédés de détection d'événements de chute

Country Status (1)

Country Link
WO (1) WO2024077005A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150109442A1 (en) * 2010-09-23 2015-04-23 Stryker Corporation Video monitoring system
EP3309748A1 (fr) * 2015-06-10 2018-04-18 Konica Minolta, Inc. Système de traitement d'image, dispositif de traitement d'image, procédé de traitement d'image et programme de traitement d'image
CN111582158A (zh) * 2020-05-07 2020-08-25 济南浪潮高新科技投资发展有限公司 一种基于人体姿态估计的摔倒检测方法
CN112287759A (zh) * 2020-09-26 2021-01-29 浙江汉德瑞智能科技有限公司 基于关键点的跌倒检测方法
CN111881898B (zh) * 2020-09-27 2021-02-26 西南交通大学 基于单目rgb图像的人体姿态检测方法
JP2022016979A (ja) * 2020-07-13 2022-01-25 清水建設株式会社 傷病者検出装置、傷病者検出システム、及び、傷病者検出方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150109442A1 (en) * 2010-09-23 2015-04-23 Stryker Corporation Video monitoring system
EP3309748A1 (fr) * 2015-06-10 2018-04-18 Konica Minolta, Inc. Système de traitement d'image, dispositif de traitement d'image, procédé de traitement d'image et programme de traitement d'image
CN111582158A (zh) * 2020-05-07 2020-08-25 济南浪潮高新科技投资发展有限公司 一种基于人体姿态估计的摔倒检测方法
JP2022016979A (ja) * 2020-07-13 2022-01-25 清水建設株式会社 傷病者検出装置、傷病者検出システム、及び、傷病者検出方法
CN112287759A (zh) * 2020-09-26 2021-01-29 浙江汉德瑞智能科技有限公司 基于关键点的跌倒检测方法
CN111881898B (zh) * 2020-09-27 2021-02-26 西南交通大学 基于单目rgb图像的人体姿态检测方法

Similar Documents

Publication Publication Date Title
KR102153591B1 (ko) 영상 감시 시스템에서의 실시간 쓰레기 투기 행위 탐지 방법 및 장치
CN107358149B (zh) 一种人体姿态检测方法和装置
EP2584529B1 (fr) Procédé de traitement d'image et dispositif correspondant
JP4216668B2 (ja) 映像視覚情報を結合してリアルタイムで複数の顔を検出して追跡する顔検出・追跡システム及びその方法
JP6233624B2 (ja) 情報処理システム、情報処理方法及びプログラム
JP2007265367A (ja) 視線検出方法および装置ならびにプログラム
JP2009064410A (ja) 車両の死角における移動物体を検知するための方法、および死角検知装置
EP2951783B1 (fr) Procédé et système permettant de détecter des objets en mouvement
JP6803525B2 (ja) 顔検出装置およびこれを備えた顔検出システムならびに顔検出方法
WO2015040929A1 (fr) Système et procédé de traitement d'image et programme
EP3349142A1 (fr) Procédé et dispositif de traitement d'informations
García-Martín et al. Robust real time moving people detection in surveillance scenarios
Azim et al. Automatic fatigue detection of drivers through pupil detection and yawning analysis
JP2017174343A (ja) 入店者属性抽出装置及び入店者属性抽出プログラム
Wang et al. Pedestrian detection in crowded scenes via scale and occlusion analysis
US20220036056A1 (en) Image processing apparatus and method for recognizing state of subject
Das et al. Computer vision-based social distancing surveillance solution with optional automated camera calibration for large scale deployment
CN112101134B (zh) 物体的检测方法及装置、电子设备和存储介质
US11544926B2 (en) Image processing apparatus, method of processing image, and storage medium
JP6798609B2 (ja) 映像解析装置、映像解析方法およびプログラム
US20240112469A1 (en) Systems and methods for detecting fall events
US10902249B2 (en) Video monitoring
CN114764895A (zh) 异常行为检测装置和方法
WO2024077005A1 (fr) Systèmes et procédés de détection d'événements de chute
Ezatzadeh et al. ViFa: an analytical framework for vision-based fall detection in a surveillance environment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23800692

Country of ref document: EP

Kind code of ref document: A1