WO2021227874A1 - 一种跌倒行为的检测方法和设备 - Google Patents

一种跌倒行为的检测方法和设备 Download PDF

Info

Publication number
WO2021227874A1
WO2021227874A1 PCT/CN2021/090584 CN2021090584W WO2021227874A1 WO 2021227874 A1 WO2021227874 A1 WO 2021227874A1 CN 2021090584 W CN2021090584 W CN 2021090584W WO 2021227874 A1 WO2021227874 A1 WO 2021227874A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
frame
tracking
target frame
image
Prior art date
Application number
PCT/CN2021/090584
Other languages
English (en)
French (fr)
Inventor
蔡冬
Original Assignee
杭州萤石软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州萤石软件有限公司 filed Critical 杭州萤石软件有限公司
Publication of WO2021227874A1 publication Critical patent/WO2021227874A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of target behavior recognition, and in particular, to a method and equipment for detecting fall behavior.
  • the detection and recognition methods for fall behaviors in the prior art usually perform indiscriminate recognition of targets in the entire scene, that is, only detect the fall behavior itself, and cannot distinguish whether it belongs to a specific person, which leads to insufficient intelligence.
  • the method that can realize the real-time monitoring of a specific person's fall requires the monitored person to wear some wearable devices to collect data, and the cost is high. Most of the wearable devices are not friendly to the monitored person and are inconvenient to use. Among them, The monitored person is the specific person who needs to be cared for.
  • This application provides a fall behavior detection method to monitor the fall behavior of target persons with certain attributes.
  • the fall behavior detection method provided by this application is implemented as follows:
  • the pose estimation of the target in the target frame is performed.
  • the attribute includes the target person who needs to be detected.
  • the target persons to be detected include the elderly and/or persons lacking self-care ability;
  • the method also includes, when the attributes of the target do not conform to the set attributes, abandoning the pose estimation of the target in the target frame;
  • the persons lacking self-care ability include at least one of persons with disabilities, persons with intellectual disabilities, and persons with young children;
  • the performing target detection based on the acquired image to obtain target frame information includes:
  • the attribute recognition of the target in the target frame includes classifying the target image in the target frame by a deep learning classification method, and identifying the target person who meets the set attribute and the non-target person who does not meet the set attribute;
  • the performing posture estimation of the target in the target frame further includes:
  • the target frame is tracked to obtain the tracking target frame, and the pose estimation is performed based on the target in the tracking target frame.
  • the tracking the target frame to obtain the tracking target frame includes whether the ratio of the area of the intersecting image of the current frame and the previous frame of the current frame to the area of the merged image is greater than a set threshold , To track and match,
  • the performing posture estimation for posture estimation based on the target in the tracking target frame includes detecting a human-shaped target in the tracking target frame by a human posture recognition method to obtain target bone data.
  • the acquiring the current image includes acquiring K frames of images in a time series, taking any image frame of the K frame images as the current image, and executing the target detection based on the acquired images to obtain the target frame information. step,
  • the recognizing the fall behavior according to the posture estimation includes,
  • the target bone data obtained based on the K frames of images are used as the input of the graph integral method, and the fall behavior is recognized through the graph integral method.
  • the K is a natural number greater than 1.
  • the said using the target bone data respectively obtained based on the K frame images as the input of the image volume integration method includes:
  • the input buffer queue is indexed by the remainder of the number of tracking times N divided by K, and each index corresponds to a frame of bone data
  • Said using the target bone data respectively obtained based on the K frame images as the input of the image volume integration method further includes:
  • the remainder of the tracking number N divided by K of the tracking target is the first index, and the skeleton data in the input buffer queue is taken from the first index.
  • the starting position of the input buffer queue is taken Start to get the second index that is 1 less than the first index, as the input of the graph integral network,
  • the first threshold is a natural number greater than 1.
  • said performing posture estimation based on the target in the tracking target frame further includes: performing attribute recognition on the target in the tracking target frame, setting the tracking target frame identifier, recording the attribute recognition result in the identifier, and The posture estimation of the tracking target frame that conforms to the attribute is performed, and the posture estimation of the tracking target frame that does not conform to the attribute is rejected.
  • the present application also provides a device for detecting a fall behavior.
  • the memory stores instructions that can be executed by a processor, and the instructions are executed by the processor so that the processor executes any of the foregoing fall behaviors. Steps of the detection method.
  • the present application provides a computer-readable storage medium in which a computer program is stored, and when the computer program is executed by a processor, the steps of any one of the above-mentioned fall behavior detection methods are realized.
  • the present application provides a computer program product containing instructions that, when the computer program product runs on a computer, causes the computer to execute any of the steps of the method for detecting fall behavior described above.
  • the fall behavior detection method provided by the present application is to perform attribute recognition on target images in target detection, perform fall behavior detection for targets that meet the set attributes, and refuse to perform fall behavior detection for targets that do not meet the set attributes.
  • This method filters out the targets that do not meet the set attributes, realizes the behavior detection of specific personnel, improves the detection efficiency, and solves the care and monitoring of specific personnel in practical applications.
  • deep learning algorithms are used for target detection, pose estimation, and action classification to improve the accuracy and robustness of detection.
  • FIG. 1 is a schematic flowchart of a fall behavior detection method according to Embodiment 1 of this application.
  • Fig. 2 is a schematic diagram of a flow of detection of a fall behavior of a target person in the second embodiment.
  • Fig. 3 is a schematic diagram of a framework for detecting a fall behavior of a target person in the third embodiment.
  • Fig. 4 is a schematic diagram of a process of detecting a fall behavior of a target person in the third embodiment.
  • Fig. 5 is a schematic diagram of reading and writing bone data to the input buffer queue.
  • Fig. 6 is a schematic diagram of the structure of the spatiotemporal graph integral deep learning algorithm in the graph integral model.
  • fall behavior detection method image data is acquired, attributes included in the image data are identified, falling behavior detection is performed on targets that meet the set attributes, and falling behavior is rejected for targets that do not meet the attributes. Behavior detection, that is, for targets that do not meet the attributes, no further operations are performed.
  • the fall behavior detection method provided in the embodiments of the present application can be applied to any electronic device that needs to perform fall behavior detection.
  • it can be a monitoring device, an image processing device, a server, etc., which are not specifically limited here.
  • FIG. 1 is a schematic flowchart of a fall behavior detection method according to Embodiment 1 of the application.
  • the detection method includes,
  • Step 101 Obtain a two-dimensional image, which may be an RGB image
  • Step 102 Perform target detection based on the acquired image, and acquire target frame information
  • existing target detection algorithms can be used, including traditional target detection algorithms and deep learning target detection algorithms; that is, electronic devices can use existing target detection algorithms to perform target detection on the acquired images.
  • Target detection algorithms can include traditional target detection algorithms and deep learning target detection algorithms.
  • the target frame information is related information identifying the target frame in the area where each target in the image is located.
  • the target frame can be a rectangular frame
  • the target frame information can be the coordinates of the four vertices, the coordinates of the diagonal vertices, or one of the vertices of the target frame. The coordinates of the vertex of the upper left corner and the frame and height of the target frame, and so on.
  • Step 103 Perform attribute recognition on the target in the target frame, and determine whether the recognized attribute meets the set attribute, if yes, go to step 104, otherwise, go back to step 101;
  • the electronic device can identify the area identified by the target frame information, and obtain the attributes of the target corresponding to the target frame information.
  • image feature extraction may be performed on the area identified by the target frame information, and the attributes of the corresponding target can be determined based on the extracted image features.
  • the set attribute is the attribute of the target person to be detected, including the attribute of the elderly and/or the attribute of lack of self-care ability, and the attribute of lack of self-care ability includes the attribute of the handicapped person, the attribute of the mentally handicapped person, the attribute of the young child, or any of them combination.
  • Attribute recognition can be realized by classification neural network algorithm.
  • the classification neural network can be obtained by training based on image samples in advance.
  • the targets included in the image sample can be targets lacking self-care ability, such as the elderly, the disabled, the intellectually disabled, and the young children.
  • the calibration label of the image sample can be calibrated according to the target included in the image sample. For example, if the target is an elderly person, the calibration label can be 1, indicating that the attribute of the target included in the image sample is the attribute of the elderly; the target is the handicapped person, the mentally handicapped person, young child, etc.
  • the calibration label can be 0, which means that the attribute of the target included in the image sample is an attribute lacking self-care ability.
  • the image sample can be input into the classification neural network.
  • the classification neural network can output the predicted label according to the image characteristics of the image sample, and then can be based on the difference between the predicted label and the calibration label.
  • the parameters of the classification neural network are continuously adjusted until the classification neural network converges, and a classification neural network that can accurately determine the attribute category of the target in the image is obtained.
  • Step 104 Perform posture estimation on the target in the target frame, and identify whether a fall has occurred according to the posture estimation, and if so, further trigger an alarm. Otherwise, return to step 101.
  • the pose estimation can be obtained based on the analysis of the trajectory of the bone points.
  • the attributes of the target in the target frame meet the set attributes, indicating that the target in the target frame is a specific person who needs care, such as the elderly, the disabled, and young children. Therefore, the electronic device can estimate the posture of the target in the target frame. Determine whether a specific person has fallen.
  • the trajectory of the body bone points is a straight line, or a more symmetrical and regular curved shape, but in the state of falling, the body bone points often show an irregular curved shape, so it can be based on the bones.
  • Point trajectory analysis determines whether the target falls.
  • the set target person is locked through attribute recognition, so that only the fall behavior of the set specific person is detected, and the fall behavior detection is not performed on unspecified persons and cases, which is conducive to resource saving and improves fall behavior detection.
  • the degree of intelligence is configured to be used to determine whether the fall behavior of the set specific person is detected, and the fall behavior detection is not performed on unspecified persons and cases, which is conducive to resource saving and improves fall behavior detection.
  • this embodiment further optimizes the fall behavior detection.
  • FIG. 2 is a schematic flowchart of a fall behavior detection of a target person in the second embodiment.
  • the detection method includes,
  • Step 201 Obtain a two-dimensional image in the current environment, the image may be a red, green, and blue RGB image;
  • Step 202 Use a deep learning target detection algorithm to perform target detection to obtain at least one target frame information; in order to effectively identify the target person in the image.
  • the deep learning target detection algorithm can be the YOLO (you only look once) algorithm, the Faster R-CNN algorithm, and the SSD algorithm.
  • the algorithm can directly output the category of each target based on the input image and the corresponding position information in the image; in this embodiment, the YOLO algorithm is applied based on the input image and the output image of each person included in the image and their Location.
  • the target person is the target included in the image
  • the position information is the target frame information.
  • Step 203 Perform attribute recognition on each target image in the target frame through the classification deep learning model; when the target frame is identified as an elderly person, set an identification ID for the target frame;
  • the classification deep learning model adopts the mobile net v1 basic network, which reduces the fully connected layer. All channels are one-half of the original, and the output part adopts a convolutional layer with an output channel number of 2. Connect a softmax layer; the image input size of the attribute network model is 64*64. Among them, the attribute network model is the above-mentioned classification deep learning model.
  • the identification ID is set for the target frame to facilitate the detection of the fall behavior of different elderly people. For example, when there are multiple elderly people in the home environment, the detection Each target elder of the elders is set with an identification ID.
  • Step 204 For the same target frame identifier, perform tracking and matching through the preceding and following frames to obtain the tracking target frame;
  • the method of cross-comparison of the front and back frames can be directly used for tracking and matching.
  • the same target frame is the target frame with the same identification ID.
  • Step 205 Perform pose estimation on the target in the tracking target frame
  • the posture estimation can use a human posture recognition algorithm, such as the openpose algorithm and the ALPHAPOSE algorithm. Take the ALPHAPOSE algorithm as an example.
  • This algorithm is a top-down pose estimation algorithm. Through this algorithm, based on the human-shaped target frame, the bone data information is detected, including 18 skeletal limbs, with three pieces of information for each limb. These are the coordinates x, y, and confidence information.
  • the bone data information can be obtained using end-to-end models such as Hrnet and Alphapose.
  • the training sample is generally a human body RGB image
  • the label data can be a json file in a standard format with reference to the COCO data set.
  • it can be the information of the 18 skeletal limbs of the human body.
  • Specific training methods can use gradient descent algorithm, stochastic gradient descent algorithm, etc., which are not specifically limited and explained here, as long as the trained model can process the input human RGB image and output accurate bone data information.
  • Step 206 Perform a fall action recognition on the posture estimation based on the detected bone data; thereby, a detection result of the fall action of the identified target frame is obtained.
  • the fall behavior can be recognized. That is to say, after the electronic device obtains the detected bone data, it can determine whether the coordinates x, y are accurate according to the coordinates x, y and the corresponding confidence information. When the confidence information is greater than the preset confidence, it can determine the The coordinates x and y are accurate. Furthermore, the electronic device can determine whether the target has a fall action based on the difference between the coordinates x, y and the coordinates of the bone limbs in the normal state.
  • This embodiment uses a classification deep learning algorithm for attribute recognition, directly filtering the target detection results, effectively filtering out unnecessary target tracking and behavior recognition, focusing only on the behavior of the target person with the set attribute, improving the efficiency of detection , Tracking the target person is beneficial to improve the accuracy of fall behavior detection, effectively avoiding missed detection and false detection.
  • FIG. 3 is a schematic diagram of a framework for detecting a fall behavior of a target person in the third embodiment.
  • target detection, tracking, attribute recognition, and pose estimation are performed on each frame, and the pose estimation of each frame is recognized by the Spatial Temporal Graph Convolutional Networks (STGCN) recognition for pose recognition ,
  • STGCN Spatial Temporal Graph Convolutional Networks
  • the electronic device can perform target detection, tracking, attribute recognition, and posture estimation for each frame of image.
  • the posture estimation of each frame of image can be obtained through the image Volume integral network model to achieve.
  • FIG. 4 is a schematic flowchart of a fall behavior detection of a target person in the third embodiment.
  • the part of the dashed box is the processing flow of one frame
  • the outside of the dashed box is the processing of multiple frames.
  • Step 401 Perform target detection using a deep learning target detection algorithm to obtain at least one target frame information, so as to effectively identify the position information of all targets in the image. This step can be the same as step 202.
  • Step 402 Based on the target position information of the T frame, perform tracking and matching with the T-1 frame.
  • the tracking and matching can be performed according to the intersection ratio of the front and rear frames. When the intersection ratio is greater than the set threshold, the matching is successful, and the tracking target of the T frame is generated. frame;
  • Step 403 Based on the tracking target frame of the T frame, the attribute recognition of the target in the tracking target frame is performed through the classification neural network, the ID mark is set for the tracking target frame, and the attribute recognition result of the tracking target frame is recorded.
  • the classification neural network can be trained with the image of the care person, and then the trained classification neural network model can be called to compare Track the target in the target box for identification.
  • an ID mark is set for the tracking target frame, and the attribute recognition result of the tracking target frame is recorded.
  • Step 404 Determine whether the target in the tracking target frame of the T frame is recognized as a set attribute. For example, whether it is an elderly person, if yes, go to step 405, otherwise, go back to step 403.
  • Step 405 the tracking target frame of the frame with the set attribute (for example, the elderly) is used for pose estimation by the ALPHAPOSE algorithm, and the target bone data information is output, including 18 bone limb points, three pieces of information for each limb point, which are coordinates respectively. x, y, and confidence information, and store the bone data information in the input buffer queue for image volume integration.
  • the queue can store K frames of bone information with a size of K*18*3. If the tracking target frame If the number of tracking times N of the identification ID exceeds K, the original data will be overwritten from the start position of the input buffer queue. N is a natural number greater than or equal to 1, and the starting position of the input buffer queue generally starts from 0.
  • the electronic device may perform step 405, that is, perform pose estimation on the tracking target frame of the frame whose tracking attribute is the set attribute, and output the target bone Data information, the bone data information is stored in the input buffer queue used for image and volume integration.
  • an input buffer queue for storing bone data information can be established in advance.
  • the size of the input buffer queue can be K*18*3, so that K pieces of bone data information can be stored, that is, K pose estimations can be stored.
  • the specific value of K can be determined according to actual factors such as the accuracy of posture estimation and the speed of fall behavior detection, and is not specifically limited here. For example, it can be 10, 15, 18, and so on.
  • Step 406 It is judged whether the input buffer queue has reached the minimum required K frames for image and volume integration, and whether the remainder of N-K divided by 10 is 0.
  • Remaining bone data for example, take the bone data obtained every 10 times of tracking as input, then judge whether the remainder of the difference of NK divided by 10 is 0, if it is, it means that the tracking number of the tracking target frame meets the setting Requirement, otherwise, you need to increase the number of tracking.
  • the first threshold is a natural number greater than 1, therefore, if these two conditions are met, step 407 is executed, otherwise, step 408 is executed.
  • the electronic device can perform step 406, that is, determine whether the input buffer queue reaches the minimum required K frames for image volume integration, and whether the remainder of N-K divided by 10 is zero. If N exceeds K, since the first N-K data in the input buffer queue have been covered by new data, it can be determined whether the difference between N and K is an integer multiple of the set first threshold, so as to remove redundant bone data information. For example, taking the bone data information obtained every 10 times as the input of the image volume integral model, it can be judged whether the remainder of the difference of NK divided by 10 is 0, if it is, because every 10 times of tracking is taken at this time. The obtained bone data is used as input, so if the remainder of the difference of NK divided by 10 is 0, it just meets the algorithm recognition frequency, and it can be determined that the number of tracking meets the set requirement, otherwise, the number of tracking needs to be increased.
  • Step 407 According to the number of tracking times N identified by the tracking target frame, the remainder of N%K is used as the index index, and the value in the input buffer queue is taken from the index. When K is reached, the value in the input buffer queue is taken from 0 to index-1.
  • the input of the picture-volume integral network it is recognized through the picture-volume integral model;
  • the input size of the image volume integral model is K*18*3
  • the remainder of N%K is the first index index
  • the input buffer queue is taken from the first index.
  • Skeleton data value, when K is taken, the second index index-1 is taken from the starting position of the input buffer queue, as the input of the image volume integral network, and the image volume integral type model is used for identification. If the output is a fall Behavior, an alarm is issued, thereby completing a fall behavior detection.
  • the electronic device can start fetching bone data from the data with the first index index in the input buffer queue, and when fetching the bone data with index K, then start fetching the bone data from the starting position 0 of the input buffer queue. , Until the index is the data of the second index index-1, and then take all the extracted bone data as the input of the graph-volume integral network.
  • the queue can store K frames of bone data, and each frame of bone data corresponds to an index.
  • FIG. 6 is a schematic diagram of the structure of the space-time graph integration (STGCN) deep learning algorithm in the graph integration model.
  • STGCN space-time graph integration
  • the electronic device may arrange the bone data removed from the input buffer queue in the order of the acquisition time of the corresponding images to obtain the bone data sequence, which is used as the input bone sequence of the image volume integration network.
  • the topology map structure is constructed according to the input bone sequence, wherein the specific process of constructing the topology map structure can be implemented according to the STGCN algorithm, which is not specifically limited and explained here.
  • ST-GCN can determine the action classification according to the input topology map structure, and then output the action classification.
  • the action classification may specifically include falling actions and non-falling actions.
  • Step 408 Wait for the input buffer queue to reach the minimum required K frames for image volume integration, and the difference between the current tracking times N and K of the tracking target frame ID is an integral multiple of the first threshold (10), and return to step 406.
  • the electronic device determines that the input buffer queue has not reached the minimum required K frames for image volume integration, or the remainder of NK divided by 10 is not 0, it can continue to obtain bone data and store the bone data in the buffer queue, and Return to step 406.
  • the skeletal data of the target in the tracking target frame that is tracked N times in the tracked target frame of the K frames that are input at the same time and meets the set attribute is classified by the image-volume integral algorithm, which improves the accuracy and recognition efficiency of fall behavior recognition, and effectively reduces In particular, the monitoring of elderly people living alone improves the monitoring effect and the user experience.
  • the picture-volume integral model mentioned in this article is the picture-volume integral network, and the bone data information is the bone data.
  • the present application also provides a detection device for fall behavior.
  • the detection device includes a memory and a processor.
  • the memory stores instructions executable by the processor.
  • the instructions are executed by the processor to enable the processor Perform the steps of the fall behavior detection method as described in any one of the embodiments.
  • the memory may include random access memory (Random Access Memory, RAM), and may also include non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk storage.
  • NVM non-Volatile Memory
  • the memory may also be at least one storage device located far away from the foregoing processor.
  • the embodiment of the present application also provides a computer-readable storage medium in which a computer program is stored, and when the computer program is executed by a processor, the steps of the fall behavior detection method in the above-mentioned embodiment are implemented.
  • the embodiments of the present application also provide a computer program product containing instructions, which when the computer program product runs on a computer, cause the computer to execute the steps of the fall behavior detection method in the above-mentioned embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种跌倒行为的检测方法和设备,该方法包括,获取当前图像,基于获取的图像进行目标检测,得到目标框信息,对目标框中的目标进行属性识别,当目标的属性符合设定的属性时,对该目标框中的目标进行姿态估计,根据姿态估计识别跌倒行为;其中,所述属性包括需要被检测的目标人员。本申请实现了特定人员的行为检测,提高了检测效率,并解决了实际应用中对特定人员的看护和监控。

Description

一种跌倒行为的检测方法和设备
本申请要求于2020年5月11日提交中国专利局、申请号为202010390108.1发明名称为“一种跌倒行为的检测方法和设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及目标行为识别领域,特别地,涉及一种跌倒行为的检测方法和设备。
背景技术
随着技术的发展,对特定人员看护的智能化成为热点。例如,对于独居的老人、残(智)障人员能够实时监控是否跌倒就显得非常实用。
现有技术中跌倒行为的检测识别方法,通常对整个场景中的目标进行无差别识别,即,只是针对跌倒行为的本身进行检测,无法区分是否属于特定人员,这导致智能化的程度不够高。而能够实现特定人员跌倒实时监控的方法,需要被监控人员穿戴一些可穿戴设备,以采集数据,成本较高,大部分可穿戴设备对被监控人员来说并不友好,使用不方便,其中,被监控人员也就是需要被看护的特定人员。
发明内容
本申请提供了一种跌倒行为的检测方法,以对具有一定属性的目标人员的跌倒行为进行监控。
本申请提供的一种跌倒行为的检测方法,是这样实现的:
获取当前图像,
基于获取的图像进行目标检测,得到目标框信息,
对目标框中的目标进行属性识别,
当目标的属性符合设定的属性时,对该目标框中的目标进行姿态估计,
根据姿态估计识别跌倒行为;
其中,所述属性包括需要被检测的目标人员。
较佳地,所述需要被检测的目标人员包括,老人和/或欠缺自理能力的人员;
该方法还包括,当目标的属性不符合设定的属性时,放弃对该目标框中的目标进行姿态估计;
当根据姿态估计识别为跌倒行为时,触发报警。
较佳地,所述欠缺自理能力的人员至少包括残障人员、智障人员、幼童人员之一;
所述基于获取的图像进行目标检测,得到目标框信息,包括,
采用深度学习目标检测方法进行目标检测,得到至少一个目标框信息;
所述对目标框中的目标进行属性识别包括,通过深度学习分类方法对目标框中的目标图像进行分类,识别出符合设定属性的目标人员和不符合设定属性的非目标人员;
所述对该目标框中的目标进行姿态估计进一步包括,
对该目标框进行跟踪,得到跟踪目标框,基于跟踪目标框中的目标,进行姿态估计。
较佳地,所述对该目标框进行跟踪,得到跟踪目标框,包括,基于当前帧和该当前帧的前一帧的相交图像的面积与相并图像的面积之比是否大于设定的阈值,来进行跟踪匹配,
所述基于跟踪目标框中的目标,进行姿态估计进行姿态估计,包括,通过人体姿态识别方法检测跟踪目标框中人形目标,得到目标骨骼数据。
较佳地,所述获取当前图像包括,获取时间序列上的K帧图像,将K帧图像中的任一图像帧作为当前图像,执行所述基于获取的图像进行目标检测,得到目标框信息的步骤,
所述根据姿态估计识别跌倒行为包括,
将基于K帧图像分别得到的目标骨骼数据作为图卷积分类方法的输入,通过图卷积分类方法识别跌倒行为,
其中,所述K为大于1的自然数。
较佳地,所述将基于K帧图像分别得到的目标骨骼数据作为图卷积分类方法的输入,包括,
对于K帧中任一图像帧,将基于跟踪目标框进行姿态估计得到的骨骼数据保存于用于图卷积分类的输入缓冲队列中,所述输入缓冲队列用于存储K 帧骨骼数据;
若跟踪目标框的跟踪次数N超过K,则再从输入缓冲队列的起始位置处开始覆盖原始数据,其中,N为大于等于1的自然数。
较佳地,所述输入缓冲队列以跟踪次数N除以K的余数为索引,每个索引对应一帧骨骼数据,
所述将基于K帧图像分别得到的目标骨骼数据作为图卷积分类方法的输入,还包括,
判断输入缓冲队列是否达到进行图卷积分类的最低要求,并且,如果跟踪目标框的跟踪次数N超过K时,判断跟踪目标框当前跟踪次数N与K的差值是否是设定第一阈值的整数倍,
如果是,则根据跟踪目标的跟踪次数N除以K的余数为第一索引,从第一索引开始取输入缓冲队列中的骨骼数据,当取到K时,再从输入缓冲队列的起始位置开始取到比第一索引小1的第二索引,作为图卷积分类网络的输入,
否则,则等待输入缓冲队列达到进行图卷积分类的最低要求,和满足跟踪目标框当前跟踪次数N与K的差值为第一阈值的整倍数;
所述第一阈值为大于1的自然数。
较佳地,所述基于跟踪目标框中的目标,进行姿态估计进一步包括,对跟踪目标框中的目标进行属性识别,并设置该跟踪目标框标识,将属性识别结果记录于标识中,且对于符合属性的跟踪目标框进行姿态估计,拒绝不符合属性的跟踪目标框的姿态估计。
本申请还提供一种用于跌倒行为的检测设备,所述存储器存储有可被处理器执行的指令,所述指令被处理器执行,以使所述处理器执行上述任一所述跌倒行为的检测方法的步骤。
本申请提供的一种计算机可读存储介质,所述存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述任一所述跌倒行为的检测方法的步骤。
本申请提供的一种包含指令的计算机程序产品,当所述计算机程序产品在计算机中运行时,使得所述计算机执行上述任一所述跌倒行为的检测方法 的步骤。
本申请提供的一种跌倒行为的检测方法,通过对目标检测中的目标图像进行属性识别,对于符合设定属性的目标进行跌倒行为检测,对于不符合设定属性的目标拒绝进行跌倒行为检测,该方法过滤掉不符合设定属性的目标,实现了特定人员的行为检测,提高了检测效率,并解决了实际应用中对特定人员的看护和监控。此外,对于目标检测、姿态估计、动作分类采用深度学习算法,提高了检测的准确性和鲁棒性。
附图说明
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,本领域普通技术人员来讲还可以根据这些附图获得其他的附图。
图1为本申请实施例一的跌倒行为检测方法的一种流程示意图。
图2为实施例二的目标人员跌倒行为检测的一种流程示意图。
图3为实施例三的目标人员跌倒行为检测的一种框架示意图。
图4为实施例三的目标人员跌倒行为检测的一种流程示意图。
图5为骨骼数据进行读写至输入缓冲队列的一种示意图。
图6为图卷积分类模型中时空图卷积分类深度学习算法的结构的一种示意图。
具体实施方式
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。本领域普通技术人员基于本申请中的实施例所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供的跌倒行为检测方法中,获取图像数据,对图像数据中所包括的目标进行属性识别,对于符合设定属性的目标进行跌倒行为的检测,对于不符合属性的目标拒绝进行跌倒行为检测,也就是说,对于不符合属性的目标,不执行进一步的操作。
本申请实施例提供的跌倒行为检测方法可以应用于任意需要进行跌倒行 为检测的电子设备,例如,可以为监控设备、图像处理设备、服务器等,在此不做具体限定。
实施例一
参见图1所示,图1为本申请实施例一的跌倒行为检测方法的一种流程示意图。该检测方法包括,
步骤101,获取二维图像,该图像可以为RGB图像;
步骤102,基于获取的图像进行目标检测,获取目标框信息;
在该步骤中,可以采用既有的目标检测算法,包括传统的目标检测算法和深度学习目标检测算法;也就是说,电子设备可以采用既有的目标检测算法对获取的图像进行目标检测,该目标检测算法可以包括传统的目标检测算法和深度学习目标检测算法。
其中,目标框信息为标识图像中的各个目标所在区域的目标框的相关信息,例如,目标框可以为矩形框,那么目标框信息可以为目标框的四个顶点坐标、对角顶点坐标或者一个左上角顶点坐标以及目标框的框和高,等等。
步骤103,对目标框中的目标进行属性识别,判断所识别的属性是否符合设定的属性,如果是,则执行步骤104,否则,则返回步骤101;
也就是说,电子设备可以对目标框信息所标识的区域进行识别,得到该目标框信息对应的目标的属性。在一种实施方式中,可以对目标框信息所标识的区域进行图像特征的提取,基于提取的图像特征确定对应的目标的属性。
其中,设定的属性为需要被检测的目标人员的属性,包括老人属性和/或欠缺自理能力属性,所述欠缺自理能力属性包括残障人员属性、智障人员属性、幼童属性之一或其任意组合。
属性识别可通过分类神经网络算法来实现。在一种实施方式中,可以预先基于图像样本训练得到该分类神经网络。图像样本包括的目标可以为老人、残障人员、智障人员、幼童等缺乏自理能力的目标。进而可以根据图像样本包括的目标标定图像样本的标定标签,例如,目标为老人,标定标签可以为1,表示图像样本包括的目标的属性为老人属性;目标为残障人员、智障人员、幼童等缺乏自理能力的目标,标定标签可以为0,表示图像样本包括的目标的属性为欠缺自理能力属性。
确定了每个图像样本的标定标签之后,可以将图像样本输入分类神经网络,此时的分类神经网络可以根据图像样本的图像特征输出预测标签,进而可以基于预测标签与标定标签之间的差异,不断调整分类神经网络的参数,直到分类神经网络收敛,得到可以准确确定图像中目标的属性类别的分类神经网络。
步骤104,对目标框中的目标进行姿态估计,根据姿态估计识别是否发生了跌倒,如果是,则进一步触发报警。否则,返回步骤101。
其中,姿态估计可以是基于骨骼点轨迹分析得到。目标框中的目标的属性符合设定的属性,说明目标框中的目标是老人、残障人员、幼童等需要看护的特定人员,因此电子设备可以对该目标框中的目标进行姿态估计,以确定特定人员是否发生跌倒行为。
人在站立、坐姿、躺姿等正常情况下,身体骨骼点轨迹呈直线形状,或者较为对称且规整的弯曲形状,但是在跌倒状态,身体骨骼点往往呈现不规整的弯曲形状,因此可以基于骨骼点轨迹分析确定目标是否发生跌倒行为。
本实施例通过属性识别来锁定设定的目标人员,从而只对设定的特定人员的跌倒行为进行检测,对非特定人与案件不进行跌倒行为检测,有利于资源的节省,提高跌倒行为检测的智能程度。
实施例二
为了有效提升识别效率和准确率,本实施例对跌倒行为检测进行了进一步的优化。
参见图2所示,图2为实施例二的目标人员跌倒行为检测的一种流程示意图。该检测方法包括,
步骤201,获取当前环境中的二维图像,该图像可以为红绿蓝RGB图像;
步骤202,采用深度学习目标检测算法进行目标检测,得到至少一个目标框信息;以便有效识别图像中的目标人员。
其中,深度学习目标检测算法可以是YOLO(you only look once)算法,Faster R-CNN算法,SSD算法。以YOLO算法为例,该算法可直接基于输入图像输出各个目标的类别和在图像中的相应的位置信息;在本实施例中,应用YOLO算法基于输入图像输出图像中所包括的各个人员和其位置。其中, 目标人员即为图像中包括的目标,位置信息即为目标框信息。
步骤203,对目标框中的各个目标图像通过分类深度学习模型进行属性识别;当识别出目标框为老人时,为该目标框设置标识ID;
以属性识别为识别老人属性为例,通过分类深度学习模型,将各个目标框中的图像分别进行分类,识别出老人和非老人。鉴于分类任务简单,分类深度学习模型的基础网络采用mobile net v1基础网络,缩减全连接层所有通道(channel)均为原来的二分之一,输出部分采用一个输出通道数为2的卷积层连接一个softmax层;属性网络模型图片输入大小为64*64。其中,属性网络模型即为上述分类深度学习模型。
进一步地,当识别出目标框对应的属性为老人属性时,为该目标框设置标识ID,以便于对不同的老人的跌倒行为检测,例如,家居环境中有多个老人时,则对检测到的每个目标老人分别设置标识ID。
步骤204,对于同一目标框标识,通过前后帧进行跟踪匹配,得到跟踪目标框;
为了提高检测的准确性,在该步骤中,鉴于家居环境中人员较为简单,可以直接采用前后帧交并比的方法进行跟踪匹配,即,计算前一帧与后一帧图像相交的面积,与前一帧和后一帧相并的面积的比值,如果交并比大于设定的阈值,则匹配成功,将前后帧中同一目标框设定为跟踪目标框。其中,同一目标框即为标识ID相同的目标框。
步骤205,对跟踪目标框中目标进行姿态估计;
在该步骤中,姿态估计可以采用人体姿态识别算法,例如openpose算法、ALPHAPOSE算法。以ALPHAPOSE算法为例,该算法是一种自顶向下的姿态估计算法,通过该算法,基于人形目标框,检测出骨骼数据信息,包括18个骨骼肢点,每个肢点三个信息,分别是坐标x、y和置信度信息。
在一种实施方式中,骨骼数据信息可以采用Hrnet、Alphapose等end-to-end的模型获取。其中,训练样本一般为人体RGB图像,标签数据可以是参照COCO数据集标准格式的json文件,具体来说,可以为人体18个骨骼肢点的信息。具体训练方式可以采用梯度下降算法、随机梯度下降算法等,在此不做具体限定及说明,只要训练得到的模型可以对数输入的人体RGB图像进行 处理,输出准确的骨骼数据信息即可。
步骤206,基于检测的骨骼数据,对姿态估计进行跌倒动作的识别;从而得到所述标识的目标框的跌倒动作的检测结果。
鉴于跌倒状态下,骨骼肢点的坐标位置会发生异于正常状态下的骨骼肢点的变化,基于此,可识别出跌倒行为。也就是说,电子设备获得检测到的骨骼数据后,可以根据坐标x、y及其对应的置信度信息确定该坐标x、y是否准确,在置信度信息大于预设置信度时,可以确定该坐标x、y准确。进而,电子设备可以根据该坐标x、y与正常状态下的骨骼肢点坐标之间的差异,确定目标是否发生跌倒动作。
本实施例采用了分类深度学习算法进行属性识别,直接对目标检测结果进行了过滤,有效滤除了不必要的目标跟踪和行为识别,只关注设定属性的目标人员的行为,提高了检测的效率,对目标人员的跟踪,有利于提高跌倒行为检测的准确性,有效地避免了漏检和误检。
实施例三
参见图3所示,图3为实施例三的目标人员跌倒行为检测的一种框架示意图。对于时间序列的K帧,基于每一帧分别进行目标检测、跟踪、属性识别、姿态估计,将各帧的姿态估计均通过图卷积分类(Spatial Temporal Graph Convolutional Networks,STGCN)识别来进行姿态识别,其中,K为大于等于1的自然数。
也就是说,对于时间序列T内采集到的K帧图像,电子设备可以对每一帧图像分别进行目标检测、跟踪、属性识别以及姿态估计,其中,对各帧图像的姿态估计均可以通过图卷积分类网络模型来实现。
参见图4所示,图4为实施例三的目标人员跌倒行为检测的一种流程示意图。其中,虚线框部分为一帧的处理流程,虚线框外部为多帧的处理。获取K帧二维图像后,对于多帧中任意第T帧图像数据,进行如下处理:
步骤401,采用深度学习目标检测算法进行目标检测,得到至少一个目标框信息,以便有效识别图像中的所有目标的位置信息。该步骤可以与步骤202相同。
步骤402,基于T帧目标位置信息,与T-1帧进行跟踪匹配,可以按照前 后帧交并比进行跟踪匹配,当交并比大于设定的阈值则为匹配成功,生成T帧的跟踪目标框;
步骤403,基于T帧的跟踪目标框,通过分类神经网络对跟踪目标框中的目标进行属性识别,对跟踪目标框设置ID标识,并记录该跟踪目标框的属性识别结果。
例如,识别出老人和非老人,又例如,识别出需要被看护人员,这种情况下,可以用被看护人员的图像对分类神经网络进行训练,然后,调用训练后的分类神经网络模型来对跟踪目标框中的目标进行识别。
进一步地,对跟踪目标框设置ID标识,并记录该跟踪目标框的属性识别结果。
步骤404,判断所述T帧的跟踪目标框中的目标是否被识别为设定的属性。例如,是否是老人,如果是,则执行步骤405,否则,返回步骤403。
由于上述步骤402-步骤404中,确定跟踪目标框的具体方式、属性识别的具体方式以及判断跟踪目标框中的目标的属性是否为设定的属性的具体方式均在上述实施例中进行过介绍,因此不再赘述。
步骤405,跟踪属性为设定属性(例如老人)的该帧跟踪目标框通过ALPHAPOSE算法进行姿态估计,输出目标骨骼数据信息,包括18个骨骼肢点,每个肢点三个信息,分别是坐标x、y和置信度信息,并将该骨骼数据信息存入用于进行图卷积分类的输入缓冲队列,该队列能够存储K帧骨骼信息,大小为K*18*3,若该跟踪目标框标识ID的跟踪次数N超过K,则再从输入缓冲队列的起始位置处开始覆盖原始数据。N为大于等于1的自然数,输入缓冲队列的起始位置一般从0开始。
也就是说,如果T帧的跟踪目标框中的目标的属性为设定属性,那么电子设备可以执行步骤405,即对跟踪属性为设定属性的该帧跟踪目标框进行姿态估计,输出目标骨骼数据信息,将该骨骼数据信息存入用于进行图卷积分类的输入缓冲队列。
具体来说,可以预先建立用于存储骨骼数据信息的输入缓冲队列,该输入缓冲队列的大小可以为K*18*3,这样便可以存储K个骨骼数据信息,也就是可以存储K次姿态估计输出的骨骼数据信息。其中,K的具体数值可以根 据姿态估计的准确度、跌倒行为检测的速度等实际因素确定,在此不做具体限定,例如,可以为10、15、18等。
步骤406,判断输入缓冲队列是否达到进行图卷积分类的最低要求K帧,且N-K除以10是否余数为0。
如果N超过K,由于输入缓冲队列中N-K处已被新数据覆盖,则判断该跟踪目标框标识ID当前跟踪次数N与K的差值是否是设定第一阈值的整数倍,以便于去除冗余的骨骼数据,例如,取每10次跟踪所得到的骨骼数据作为输入,则判断N-K的差值除以10的余数是否余数为0,如果是,则说明该跟踪目标框跟踪次数满足设定要求,否则,则需要增加跟踪的次数。其中,第一阈值为大于1的自然数,因此,如果满足这两个条件,则执行步骤407,否则,执行步骤408。
也就是说,将骨骼数据信息存入输入缓冲队列后,电子设备可以执行步骤406,即判断输入缓冲队列是否达到进行图卷积分类的最低要求K帧,且N-K除以10是否余数为0。如果N超过K,由于输入缓冲队列中前N-K个数据已被新数据覆盖,则可以判断N与K的差值是否是设定第一阈值的整数倍,以便于去除冗余的骨骼数据信息。例如,取每10次跟踪所得到的骨骼数据信息作为图卷积分类模型的输入,则可以判断N-K的差值除以10的余数是否为0,如果是,由于此时取每10次跟踪所得到的骨骼数据作为输入,所以如果N-K的差值除以10的余数为0,则刚好符合算法识别频率,可以确定跟踪次数满足设定要求,否则,则需要增加跟踪的次数。
步骤407,根据该跟踪目标框标识的跟踪次数N,以N%K的余数为索引index,从该index开始取输入缓冲队列中的值,取到K时,再从0开始取到index-1作为图卷积分类网络的输入,通过图卷积分类模型进行识别;
鉴于图卷积分类模型输入大小为K*18*3,根据该跟踪目标框标识的跟踪次数N,以N%K的余数为第一索引index,从该第一index开始取输入缓冲队列中的骨骼数据值,取到K时,再从输入缓冲队列的起始位置开始取到第二索引index-1,作为图卷积分类网络的输入,通过图卷积分类模型进行识别,如果输出为跌倒行为,则进行报警,由此完成一次跌倒行为检测。
也就是说,电子设备可以从输入缓冲队列中,索引为第一索引index的数 据开始取骨骼数据,取到索引为K的骨骼数据时,再从输入缓冲队列的起始位置0开始取骨骼数据,直到索引为第二索引index-1的数据,然后将取出的所有骨骼数据作为图卷积分类网络的输入。
参见图5所示,图5为骨骼数据进行读写至输入缓冲队列的一种示意图。该队列可存储K帧骨骼数据,每帧骨骼数据对应一索引。
例如,N为11,K为9,那么第一索引index即为11%9=2。电子设备便可以从输入缓冲队列中的索引为2的数据开始取骨骼数据,取到索引为9的骨骼数据时,再从输入缓冲队列的起始位置,也就是索引为0的数据开始取骨骼数据,直到取出索引为2-1=1的数据,然后将取出的10个骨骼数据作为图卷积分类网络的输入。
参见图6所示,图6为图卷积分类模型中时空图卷积分类(STGCN)深度学习算法的结构的一种示意图。该时空图卷积分类深度学习算法模型以空间上的姿态在时间序列上的展开作为输入,实现对姿态进行分类识别。
具体来说,电子设备可以将从输入缓冲队列中去除的骨骼数据按照对应的图像的采集时间顺序排列,得到骨骼数据序列,作为图卷积分类网络的输入骨骼序列。进而根据该输入骨骼序列构造拓扑图结构,其中,构造拓扑图结构具体过程可以按照STGCN算法实现,在此不做具体限定及说明。
然后将拓扑图结构输入ST-GCN,ST-GCN可以根据输入的拓扑图结构确定动作分类,进而输出动作分类。其中,动作分类具体可以包括跌倒动作和非跌倒动作等。
步骤408,等待输入缓冲队列达到进行图卷积分类的最低要求K帧,以及该跟踪目标框标识ID当前跟踪次数N与K的差值为第一阈值(10)的整倍数,返回步骤406。
也就是说,电子设备确定输入缓冲队列未达到进行图卷积分类的最低要求K帧,或者N-K除以10的余数不为0,可以继续获取骨骼数据并将该骨骼数据存入缓存队列,并返回步骤406。
本实施例通过图卷积分类算法对同时输入的K帧符合设定属性的跟踪N次的跟踪目标框中的目标的骨骼数据进行分类,提高了跌倒行为识别的准确性和识别效率,有效降低了误检率;特别地,对于独居老人的监控具有改善 了监控效果,提高了用户体验的效果。
本文中所说的图卷积分类模型即为图卷积分类网络,骨骼数据信息即为骨骼数据。
本申请还提供一种用于跌倒行为的检测设备,该检测设备包括存储器和处理器,所述存储器存储有可被处理器执行的指令,所述指令被处理器执行,以使所述处理器执行如实施例任一所述跌倒行为的检测方法的步骤。
存储器可以包括随机存取存储器(Random Access Memory,RAM),也可以包括非易失性存储器(Non-Volatile Memory,NVM),例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。
本申请实施例还提供了一种计算机可读存储介质,所述存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述实施例中跌倒行为的检测方法的步骤。
本申请实施例还提供了一种包含指令的计算机程序产品,当所述计算机程序产品在计算机中运行时,使得所述计算机执行上述实施例中跌倒行为的检测方法的步骤。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (11)

  1. 一种跌倒行为的检测方法,其特征在于,该方法包括,
    获取当前图像,
    基于获取的图像进行目标检测,得到目标框信息,
    对目标框中的目标进行属性识别,
    当目标的属性符合设定的属性时,对该目标框中的目标进行姿态估计,
    根据姿态估计识别跌倒行为;
    其中,所述属性包括需要被检测的目标人员。
  2. 如权利要求1所述的方法,其特征在于,所述需要被检测的目标人员包括,老人和/或欠缺自理能力的人员;
    该方法还包括,当目标的属性不符合设定的属性时,放弃对该目标框中的目标进行姿态估计;
    当根据姿态估计识别为跌倒行为时,触发报警。
  3. 如权利要求2所述的方法,其特征在于,所述欠缺自理能力的人员至少包括残障人员、智障人员、幼童人员之一;
    所述基于获取的图像进行目标检测,得到目标框信息,包括,
    采用深度学习目标检测方法进行目标检测,得到至少一个目标框信息;
    所述对目标框中的目标进行属性识别包括,通过深度学习分类方法对目标框中的目标图像进行分类,识别出符合设定属性的目标人员和不符合设定属性的非目标人员;
    所述对该目标框中的目标进行姿态估计进一步包括,
    对该目标框进行跟踪,得到跟踪目标框,基于跟踪目标框中的目标,进行姿态估计。
  4. 如权利要求3所述的方法,其特征在于,所述对该目标框进行跟踪,得到跟踪目标框,包括,基于当前帧和该当前帧的前一帧的相交图像的面积与相并图像的面积之比是否大于设定的阈值,来进行跟踪匹配,
    所述基于跟踪目标框中的目标,进行姿态估计进行姿态估计,包括,通过人体姿态识别方法检测跟踪目标框中人形目标,得到目标骨骼数据。
  5. 如权利要求4所述的方法,其特征在于,所述获取当前图像包括,获 取时间序列上的K帧图像,将K帧图像中的任一图像帧作为当前图像,执行所述基于获取的图像进行目标检测,得到目标框信息的步骤,
    所述根据姿态估计识别跌倒行为包括,
    将基于K帧图像分别得到的目标骨骼数据作为图卷积分类方法的输入,通过图卷积分类方法识别跌倒行为,
    其中,所述K为大于1的自然数。
  6. 如权利要求5所述的方法,其特征在于,所述将基于K帧图像分别得到的目标骨骼数据作为图卷积分类方法的输入,包括,
    对于K帧中任一图像帧,将基于跟踪目标框进行姿态估计得到的骨骼数据保存于用于图卷积分类的输入缓冲队列中,所述输入缓冲队列用于存储K帧骨骼数据;
    若跟踪目标框的跟踪次数N超过K,则再从输入缓冲队列的起始位置处开始覆盖原始数据,其中,N为大于等于1的自然数。
  7. 如权利要求6所述的方法,其特征在于,所述输入缓冲队列以跟踪次数N除以K的余数为索引,每个索引对应一帧骨骼数据,
    所述将基于K帧图像分别得到的目标骨骼数据作为图卷积分类方法的输入,还包括,
    判断输入缓冲队列是否达到进行图卷积分类的最低要求,并且,如果跟踪目标框的跟踪次数N超过K时,判断跟踪目标框当前跟踪次数N与K的差值是否是设定第一阈值的整数倍,
    如果是,则根据跟踪目标的跟踪次数N除以K的余数为第一索引,从第一索引开始取输入缓冲队列中的骨骼数据,当取到K时,再从输入缓冲队列的起始位置开始取到比第一索引小1的第二索引,作为图卷积分类网络的输入,
    否则,则等待输入缓冲队列达到进行图卷积分类的最低要求,和满足跟踪目标框当前跟踪次数N与K的差值为第一阈值的整倍数;
    所述第一阈值为大于1的自然数。
  8. 如权利要求7所述的方法,其特征在于,所述基于跟踪目标框中的目标,进行姿态估计进一步包括,对跟踪目标框中的目标进行属性识别,并设 置该跟踪目标框标识,将属性识别结果记录于标识中,且对于符合属性的跟踪目标框进行姿态估计,拒绝不符合属性的跟踪目标框的姿态估计。
  9. 一种用于跌倒行为的检测设备,其特征在于,所述检测设备包括存储器和处理器,所述存储器存储有可被处理器执行的指令,所述指令被处理器执行,以使所述处理器执行如权利要求1至8任一所述跌倒行为的检测方法的步骤。
  10. 一种计算机可读存储介质,其特征在于,所述存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至8任一所述跌倒行为的检测方法的步骤。
  11. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在计算机中运行时,使得所述计算机执行权利要求1至8任一项所述的方法。
PCT/CN2021/090584 2020-05-11 2021-04-28 一种跌倒行为的检测方法和设备 WO2021227874A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010390108.1A CN113642361B (zh) 2020-05-11 2020-05-11 一种跌倒行为的检测方法和设备
CN202010390108.1 2020-05-11

Publications (1)

Publication Number Publication Date
WO2021227874A1 true WO2021227874A1 (zh) 2021-11-18

Family

ID=78415269

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090584 WO2021227874A1 (zh) 2020-05-11 2021-04-28 一种跌倒行为的检测方法和设备

Country Status (2)

Country Link
CN (1) CN113642361B (zh)
WO (1) WO2021227874A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114463788A (zh) * 2022-04-12 2022-05-10 深圳市爱深盈通信息技术有限公司 一种跌倒检测方法、系统、计算机设备及存储介质
CN114550240A (zh) * 2022-01-28 2022-05-27 北京百度网讯科技有限公司 图像识别的方法、装置、电子设备及存储介质
CN115273243A (zh) * 2022-09-27 2022-11-01 深圳比特微电子科技有限公司 跌倒检测方法、装置、电子设备和计算机可读存储介质
CN115512315A (zh) * 2022-11-01 2022-12-23 深圳市城市交通规划设计研究中心股份有限公司 一种非机动车儿童搭乘检测方法、电子设备及存储介质
CN115661943A (zh) * 2022-12-22 2023-01-31 电子科技大学 一种基于轻量级姿态评估网络的跌倒检测方法
CN116824631A (zh) * 2023-06-14 2023-09-29 西南交通大学 一种姿态估计方法及系统
CN117422931A (zh) * 2023-11-16 2024-01-19 上海放放智能科技有限公司 一种婴儿翻床的检测方法
WO2024103682A1 (zh) * 2022-11-14 2024-05-23 天地伟业技术有限公司 基于视频分类的跌倒行为识别方法及电子设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116721470A (zh) * 2023-08-04 2023-09-08 千巡科技(深圳)有限公司 一种基于人体骨骼关键点的摔倒动作识别方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886101A (zh) * 2018-12-29 2019-06-14 江苏云天励飞技术有限公司 姿势识别方法及相关装置
CN109919132A (zh) * 2019-03-22 2019-06-21 广东省智能制造研究所 一种基于骨架检测的行人跌倒识别方法
CN110335277A (zh) * 2019-05-07 2019-10-15 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机可读存储介质和计算机设备
CN110443150A (zh) * 2019-07-10 2019-11-12 思百达物联网科技(北京)有限公司 一种跌倒检测方法、装置、存储介质
US20200026282A1 (en) * 2018-07-23 2020-01-23 Baidu Usa Llc Lane/object detection and tracking perception system for autonomous vehicles

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399808A (zh) * 2019-07-05 2019-11-01 桂林安维科技有限公司 一种基于多目标跟踪的人体行为识别方法及系统
CN110991261A (zh) * 2019-11-12 2020-04-10 苏宁云计算有限公司 交互行为识别方法、装置、计算机设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200026282A1 (en) * 2018-07-23 2020-01-23 Baidu Usa Llc Lane/object detection and tracking perception system for autonomous vehicles
CN109886101A (zh) * 2018-12-29 2019-06-14 江苏云天励飞技术有限公司 姿势识别方法及相关装置
CN109919132A (zh) * 2019-03-22 2019-06-21 广东省智能制造研究所 一种基于骨架检测的行人跌倒识别方法
CN110335277A (zh) * 2019-05-07 2019-10-15 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机可读存储介质和计算机设备
CN110443150A (zh) * 2019-07-10 2019-11-12 思百达物联网科技(北京)有限公司 一种跌倒检测方法、装置、存储介质

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114550240A (zh) * 2022-01-28 2022-05-27 北京百度网讯科技有限公司 图像识别的方法、装置、电子设备及存储介质
CN114463788A (zh) * 2022-04-12 2022-05-10 深圳市爱深盈通信息技术有限公司 一种跌倒检测方法、系统、计算机设备及存储介质
CN115273243A (zh) * 2022-09-27 2022-11-01 深圳比特微电子科技有限公司 跌倒检测方法、装置、电子设备和计算机可读存储介质
CN115273243B (zh) * 2022-09-27 2023-03-28 深圳比特微电子科技有限公司 跌倒检测方法、装置、电子设备和计算机可读存储介质
CN115512315A (zh) * 2022-11-01 2022-12-23 深圳市城市交通规划设计研究中心股份有限公司 一种非机动车儿童搭乘检测方法、电子设备及存储介质
CN115512315B (zh) * 2022-11-01 2023-04-18 深圳市城市交通规划设计研究中心股份有限公司 一种非机动车儿童搭乘检测方法、电子设备及存储介质
WO2024103682A1 (zh) * 2022-11-14 2024-05-23 天地伟业技术有限公司 基于视频分类的跌倒行为识别方法及电子设备
CN115661943A (zh) * 2022-12-22 2023-01-31 电子科技大学 一种基于轻量级姿态评估网络的跌倒检测方法
CN115661943B (zh) * 2022-12-22 2023-03-31 电子科技大学 一种基于轻量级姿态评估网络的跌倒检测方法
CN116824631A (zh) * 2023-06-14 2023-09-29 西南交通大学 一种姿态估计方法及系统
CN116824631B (zh) * 2023-06-14 2024-02-27 西南交通大学 一种姿态估计方法及系统
CN117422931A (zh) * 2023-11-16 2024-01-19 上海放放智能科技有限公司 一种婴儿翻床的检测方法

Also Published As

Publication number Publication date
CN113642361A (zh) 2021-11-12
CN113642361B (zh) 2024-01-23

Similar Documents

Publication Publication Date Title
WO2021227874A1 (zh) 一种跌倒行为的检测方法和设备
CN110348335B (zh) 行为识别的方法、装置、终端设备及存储介质
Zhang et al. Detecting soybean leaf disease from synthetic image using multi-feature fusion faster R-CNN
Idrees et al. Composition loss for counting, density map estimation and localization in dense crowds
Adhikari et al. Activity recognition for indoor fall detection using convolutional neural network
WO2020042419A1 (zh) 基于步态的身份识别方法、装置、电子设备
CN110458061B (zh) 一种识别老年人跌倒的方法及陪伴机器人
CN110222611A (zh) 基于图卷积网络的人体骨架行为识别方法、系统、装置
CN110532970B (zh) 人脸2d图像的年龄性别属性分析方法、系统、设备和介质
CN109726672B (zh) 一种基于人体骨架序列和卷积神经网络的摔倒检测方法
GB2560387A (en) Action identification using neural networks
CN111524608B (zh) 智能检测与防疫系统和方法
Le et al. Robust hand detection in vehicles
CN110765833A (zh) 一种基于深度学习的人群密度估计方法
Hsu et al. Deep hierarchical network with line segment learning for quantitative analysis of facial palsy
CN112560723A (zh) 一种基于形态识别与速度估计的跌倒检测方法及系统
Kumar et al. A unified grid-based wandering pattern detection algorithm
CN115187911A (zh) 一种医疗防护用品穿脱消毒视频ai监测方法及装置
Iazzi et al. Fall detection based on posture analysis and support vector machine
Li et al. ET-YOLOv5s: toward deep identification of students’ in-class behaviors
Sheeba et al. Hybrid features-enabled dragon deep belief neural network for activity recognition
CN108460370A (zh) 一种固定式家禽生命信息报警装置
Yusro et al. Comparison of faster r-cnn and yolov5 for overlapping objects recognition
Yao et al. An improved feature-based method for fall detection
Bo Human fall detection for smart home caring using yolo networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21804646

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21804646

Country of ref document: EP

Kind code of ref document: A1