WO2019128304A1 - 人体跌倒检测方法和装置 - Google Patents

人体跌倒检测方法和装置 Download PDF

Info

Publication number
WO2019128304A1
WO2019128304A1 PCT/CN2018/104734 CN2018104734W WO2019128304A1 WO 2019128304 A1 WO2019128304 A1 WO 2019128304A1 CN 2018104734 W CN2018104734 W CN 2018104734W WO 2019128304 A1 WO2019128304 A1 WO 2019128304A1
Authority
WO
WIPO (PCT)
Prior art keywords
human body
image
sample data
state
target
Prior art date
Application number
PCT/CN2018/104734
Other languages
English (en)
French (fr)
Inventor
谢阳阳
Original Assignee
南京阿凡达机器人科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京阿凡达机器人科技有限公司 filed Critical 南京阿凡达机器人科技有限公司
Publication of WO2019128304A1 publication Critical patent/WO2019128304A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Definitions

  • the present application relates to the field of human body detection technology, and in particular, to a human body fall detection method and device.
  • most of the existing methods are to arrange a plurality of cameras in the human active area in advance to collect video stream data, and then analyze the human body changes in the video stream data to determine whether the human body has fallen.
  • the processing and analysis of the video stream data are required, and the workload is large and the efficiency is low.
  • judging whether the human body has a fall by analyzing the changes of the human body is more complicated and the error is relatively large.
  • the existing methods are implemented, there are often technical problems in identifying the accuracy of the fall, the error is large, and the efficiency is low.
  • the embodiment of the present invention provides a method and a device for detecting a fall of a human body, so as to solve the technical problem of identifying a fall accuracy, a large error, and a low efficiency existing in the prior method, and achieving the technical effect of accurately and efficiently identifying a fall state. .
  • the embodiment of the present application provides a human body fall detection method, including:
  • the target image is an image including a human body
  • the target image is subjected to fall recognition by a convolutional neural network to determine whether the human body in the target image is in a falling state.
  • the acquiring the target image includes:
  • the target detection network is established in the following manner:
  • body image sample data wherein the body image sample data includes a plurality of images including a human body state
  • the labeled human body image sample data is used for training to obtain a target detection network based on the target detection algorithm.
  • the human body state includes a state in which the human body is standing, a state in which the human body is sitting, a state in which the human body is lying down, a state in which the human body is kneeling, a state in which the human body is tilted, and a state in which the human body is kneeling.
  • the method further includes: re-acquiring the target image.
  • the convolutional neural network is established in the following manner:
  • the image in the positive sample data includes at least one of: including An image of a state in which the human body is standing, an image including a state in which the human body is sitting, an image including a state in which the human body is squatting, an image including a state in which the human body is tilted; and an image in the negative sample data includes at least the following One of: an image containing a state in which the human body is lying, and an image containing a state in which the human body is lying;
  • Training is performed using the positive sample data and the negative sample data to establish the convolutional neural network for identifying a human body state type.
  • the compliant image comprises an image of a human body region having a map ratio greater than 80%.
  • the embodiment of the present application further provides a human body fall detection device, including:
  • a human body detecting module configured to perform human body detection on the target image through a target detection network to determine whether the target image is an image including a human body
  • a fall identification module configured to perform fall recognition on the target image by using a convolutional neural network to determine whether the human body in the target image is in a falling state, in a case where the target image is determined to be an image including a human body.
  • the obtaining module comprises:
  • a sound collector for collecting sound information in a target area
  • a locator configured to determine a target orientation according to the sound information
  • a mobile device and a camera wherein the camera is disposed on the mobile device, and the mobile device is configured to move the camera according to the target orientation, and the camera is configured to acquire a target image.
  • the apparatus further includes an alarm module for issuing an alarm and/or transmitting an alert message if the human body in the target image is determined to be in a falling state.
  • the target image of the single frame is acquired instead of the video stream for analysis and processing, and the target detection network based on the target detection algorithm first identifies the image containing the human body, and then uses the convolutional nerve based on the classification algorithm.
  • the network classifies and recognizes the human body state in the target image to identify the specific state of the human body in the target image, thereby solving the technical problem of identifying the fall accuracy, large error and low efficiency existing in the existing method, and achieving accurate Efficiently identify the technical effects of the fall state.
  • FIG. 1 is a schematic diagram of a process flow of a human body fall detection method according to an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a structure of a human fall detection device according to an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of an electronic device according to a human fall detection method provided by an embodiment of the present application
  • FIG. 4 is a schematic structural diagram of a human fall detection robot designed by applying a human fall detection method and apparatus provided by an embodiment of the present application;
  • FIG. 5 is a flow chart showing the application of a human fall detection robot to perform human fall detection in a scene example.
  • the existing methods are mostly for collecting video stream data and analyzing and processing the video stream data, the amount of data to be analyzed is large, resulting in a large amount of resources and low efficiency.
  • most of the existing methods detect human body fall by analyzing human body changes. This identification method is inherently complicated, has poor precision, and is prone to errors.
  • the existing methods are implemented, there are often technical problems in identifying the accuracy of the fall and the low efficiency.
  • the present application considers that image data of a single frame can be acquired, instead of specific analysis of the video stream data, so as to effectively reduce the amount of data processing; in addition, for the characteristics and advantages of the image data, through analysis
  • the human body state in the image rather than the human body change, determines whether the human body has fallen, and solves the technical problem of identifying the fall accuracy, large error, and low efficiency existing in the existing method, and achieves accurate and efficient recognition of the fall state.
  • the embodiment of the present application provides a human body fall detection method.
  • a human body fall detection method For details, refer to the process flow diagram of the human fall detection method provided by the embodiment of the present application shown in FIG. 1 .
  • the method for detecting a human fall in the embodiment of the present application may include the following steps.
  • a target image of a single frame may be acquired instead of the video stream collected by the existing method, and subsequent specific analysis and processing are performed. Compared with the video stream, it is only necessary to analyze, detect, and identify a single frame of image for subsequent analysis and processing of a single frame of the target image, thereby effectively reducing the amount of calculation, reducing the calculation cost, and improving the recognition speed.
  • the image acquisition is prevented from being acquired multiple times in order to acquire an image containing the human body.
  • the effective image may be preferentially obtained as long as possible in the specific implementation.
  • Target image The above effective image can be specifically understood as an image including a human body.
  • an image that does not contain a human body can be understood as an invalid image. In this way, it is possible to avoid obtaining the image of the human body that can be used later, and repeatedly acquiring the target image, which helps to improve the processing efficiency.
  • the above-mentioned acquisition target image may be specifically implemented to include the following contents:
  • S11-1 collecting sound information in the target area
  • the target orientation may specifically be the direction of the sound source.
  • the above directions have a greater probability of human movement. Therefore, in the above-described target orientation, an image including a human body, that is, an effective image, is acquired with a greater probability than other orientations.
  • the microphone array may be used as a sound collector to collect sound information in the target area; and the positioner determines the direction of the sound source according to the collected sound information, and determines the direction of the sound source.
  • the microphone array may be used as a sound collector to collect sound information in the target area; and the positioner determines the direction of the sound source according to the collected sound information, and determines the direction of the sound source.
  • the microphone array may be used as a sound collector to collect sound information in the target area; and the positioner determines the direction of the sound source according to the collected sound information, and determines the direction of the sound source.
  • the microphone array may be used as a sound collector to collect sound information in the target area; and the positioner determines the direction of the sound source according to the collected sound information, and determines the direction of the sound source.
  • the microphone array may be used as a sound collector to collect sound information in the target area; and the positioner determines the direction of the sound source according to the collected sound information, and determines the direction of the sound source.
  • the camera may be specifically disposed on the mobile device, that is, the camera is movable in a target area that is not fixedly disposed.
  • the camera can be placed on a mobile device consisting of a pulley and a motor.
  • the camera can be flexibly moved in the target area by the mobile device, so that the range of the target image can be effectively expanded, and more target images can be acquired in a larger detection range. That is, the manner in which the camera is used in the embodiment of the present application is different from the manner in which the camera is used in the existing method.
  • the camera is fixedly disposed at a certain fixed position in the target area to collect video stream data.
  • the range that can be detected by a single camera is limited, and in order to increase the total detection range, it is necessary to separately arrange cameras at a plurality of positions in the target area. In this way, it will increase the implementation cost.
  • the method for using the camera provided in the embodiment of the present application is to set the camera on the mobile device, and then, according to the situation, the camera can be moved in real time by the mobile device to acquire target images at different positions in the target area, thereby utilizing a Or a small number of cameras to achieve a large range of target images, reducing implementation costs.
  • the angle and distance between the camera and the human body can be adjusted according to the specific conditions of the human body, so that the target image with higher quality can be obtained, so that the fall recognition can be performed more accurately.
  • the above-described mobile devices are only for better explaining the embodiments of the present application.
  • other movable structures can also be selected as mobile devices according to specific situations and precision requirements, such as a mobile robot, a remote control car, etc., so that the position of the camera can be flexibly moved.
  • the application is not limited.
  • the sound information in the target area may be first collected through the microphone array; the source direction of the sound is determined by the locator, and the direction is taken as the direction in which the human activity may exist, that is, the target orientation; By moving the device, the camera is moved to the source position of the sound according to the determined target orientation, so that an effective image with relatively high quality can be obtained by a common camera.
  • S12 Perform human body detection on the target image through the target detection network to determine whether the target image is an image including a human body.
  • the target image is first subjected to human body detection to determine whether the target image to be analyzed is an image including the human body, that is, an effective image.
  • the target image to be analyzed is an image including the human body, that is, an effective image.
  • An image that does not contain the human body is taken as an invalid image, and the next fall recognition is not performed. Therefore, it is possible to eliminate the meaningless fall recognition of the image that does not include the human body by eliminating the image that does not include the human body in advance, thereby reducing the data processing amount of the fall recognition and further improving the processing rate.
  • the target image may be re-acquired in the case of determining that the target image is an image that does not include a human body, so as to perform real-time monitoring on an area in the target area where there may be human activities.
  • the acquired data to be analyzed is a single-frame image
  • the acquired target image may be subjected to human body detection by the target detection network based on the target detection algorithm to determine whether the target image is an image including the human body.
  • the target detection network for performing human body detection may be established in advance by performing the following manner before the step S12 is performed:
  • S1 collecting human body image sample data, where the human body image sample data includes human body images in different states;
  • S3 training is performed by using the labeled human body image sample data to obtain a target detection network based on the target detection algorithm.
  • the target detection algorithm may be a depth learning based detection algorithm, which is also called an SSD (SingleShotMultiBoxDetector) algorithm.
  • the core of the algorithm is to use the convolution kernel to predict the category score and offset of a series of defaultboundingboxes on the feature map, so that it can quickly and accurately detect whether the target image to be detected is a valid image containing the human body.
  • the human body image sample data is required to specifically include a plurality of images of the human body state in different states.
  • the human body state may specifically include: a state in which the human body stands, a state in which the human body is sitting, a state in which the human body is lying, and a state in which the human body is lying. The state in which the human body is tilted, the state in which the human body is squatting, and the like.
  • a plurality of images including different human body states can be learned by the target detection algorithm, so that an image capable of simultaneously detecting and recognizing a plurality of different human body states can be established.
  • the SSD target detection network may be used to calibrate the human body region in the image of the human body image sample data, so that the training related to the human body region feature recognition may be performed subsequently.
  • the SSD target detection network ie, the initial model corresponding to the target detection, may be constructed prior to training with the annotated human body image sample data.
  • the above SSD target detection network can be constructed on the tensorflow framework, and the inception_v2 is used as the feature extractor.
  • the labeled human body image sample data is used for training to obtain a target detection network based on the target detection algorithm.
  • the following may be included: using the labeled human body image sample data as input data.
  • the above SSD target detection network that is, the initial model of the target detection is trained to obtain the trained target detection network; and then the above-mentioned trained target detection network is adjusted and optimized according to the human body image sample data and accuracy requirements to obtain
  • the human body detects the SSD network, that is, the target detection network based on the target detection algorithm.
  • the target image is determined to be an image including a human body
  • the target image is subjected to fall recognition by a convolutional neural network to determine whether the human body in the target image is in a falling state.
  • a convolutional neural network pair may be used.
  • the target image is subjected to fall recognition to determine whether the human body in the target image is in a falling state.
  • the trained convolutional neural network can be used as the fall recognition model, and the target image including the human body is determined as the input data, and the fall identification is determined by the above-mentioned fall.
  • the model recognizes whether the human body in the target image is in a falling state, so that it is possible to determine whether the human body has fallen based on the single-frame image.
  • a convolutional neural network with a higher fall accuracy and a faster recognition speed may be established in advance by performing the following manner:
  • S1 acquiring human body image sample data, wherein the human body image sample data includes a human body image in different states;
  • S3 dividing the image in the preprocessed sample data into positive sample data and negative sample data according to a human body state in the image of the preprocessed sample data, wherein the image in the positive sample data includes at least one of the following : an image including a state in which the human body is standing, an image including a state in which the human body is sitting, an image including a state in which the human body is kneeling, an image including a state in which the human body is tilted; and an image in the negative sample data includes At least one of the following: an image including a state in which the human body is lying, an image including a state in which the human body is kneeling;
  • the human body image sample data in order to establish a more accurate target detection network based on the target detection algorithm, the human body image sample data already includes a plurality of images including the human body state. Therefore, in the present embodiment, an image that meets the requirements can be extracted based on the human body image sample data as preprocessed sample data.
  • the images in the pre-processed sample data need to be classified according to the two states of falling and non-falling.
  • the non-falling image may be represented in the pre-processed sample data, including: an image including a state in which the human body is standing, an image including a state in which the human body is sitting, an image including a state in which the human body is lying, and including An image such as an image of a state in which the human body is tilted is divided into positive sample data, that is, a positive image data set.
  • the image representing the fall in the preprocessed sample data including an image including a state in which the human body is lying down, an image including a state in which the human body is lying, is divided into negative sample data, that is, a negative image data set.
  • negative sample data that is, a negative image data set.
  • the positive sample data and the negative sample data are used for training to establish the convolutional neural network for identifying a human body state type.
  • the following may be included: constructing an initial volume Using the above positive sample data and negative sample data as input data, the initial convolutional neural network is trained on the fall state of the human body and the non-fall state of the human body, so that the recognition accuracy is higher and the recognition speed is higher.
  • Fast convolutional neural network Further, the convolutional neural network can be used to accurately recognize whether the human body state in the target image corresponds to a fall state of the human body.
  • the human body state in the identified target image corresponds to the falling state of the human body, it can be judged that the human body is in a falling state; if the human body state in the identified target image corresponds to the non-falling state of the human body, it can be judged that the human body is not in a falling state status.
  • the specific implementation may further include the following content:
  • S1 acquiring image sample data that does not include a human body
  • S2 Perform error detection training on the convolutional neural network by using the image sample data that does not include the human body.
  • the target image that does not include the human body can be identified and filtered first, and the processing efficiency of the convolutional neural network when performing the fall detection can be improved.
  • the target image of the single frame is acquired instead of the video stream for analysis processing, and the target detection network based on the target detection algorithm first identifies the image containing the human body, and then passes the image.
  • the convolutional neural network based on the classification algorithm classifies and recognizes the human body state in the target image to identify the specific state of the human body in the target image, thereby solving the problem of poor recognition accuracy, large error and low efficiency in the existing methods.
  • the technical problem has reached the technical effect of accurately and efficiently identifying the fall state.
  • the compliant image may specifically include: an image of the human body region having a map ratio greater than 80%.
  • sample data suitable for fall recognition training can be extracted from the body image sample data, thereby avoiding re-acquisition of sample data for fall recognition, reducing training cost and improving learning efficiency.
  • the initial convolutional neural network may specifically be an inception_v3 network.
  • the above-mentioned inception_v3 network is specifically a convolutional neural network suitable for image recognition.
  • the above-listed convolutional neural networks are only for better explaining the embodiments of the present application.
  • other suitable convolutional neural networks may also be selected according to specific situations and specific characteristics identified. In this regard, the application is not limited.
  • the method further comprises, according to an initial convolution
  • the neural network preprocesses the image in the positive sample data and the negative sample data such that the image in the positive sample data and the negative sample data matches the initial convolutional neural network.
  • the foregoing pre-processing may specifically include: performing image transformation on the image in the positive sample data and the negative sample data to a specified size, for example, converting to 299 ⁇ 299 The size of the pixel.
  • the convolutional neural network when using a convolutional neural network for fall recognition, it is only necessary to distinguish between two types, namely, the fall state of the human body and the non-fall state of the human body. Therefore, according to the complexity of the classification and identification of the convolutional neural network, in order to improve the processing efficiency and reduce the occupation and waste of computing resources, the convolutional neural network can be simplified first when establishing the initial convolutional neural network. Improve. Wherein, the above simplified improvement may specifically include: reducing the number of layers of the convolutional neural network, and/or reducing the number of convolution kernels of the convolutional neural network.
  • the foregoing simplified improvement of the inception_v3 network may specifically include: deleting the number of layers of the inception_v3 network from 11 layers (or structures) to 6 layers or 5 layers. And/or, the number of convolution kernels in the inception_v3 network is truncated, and a simplified convolutional neural network can be obtained.
  • the simplified convolutional neural network described above may be implemented in the following manner:
  • S1 Simplify the existing inception_V3 network.
  • the last five inception structures of the inception_V3 network can be deleted, and the simplified inception_v3 network is obtained.
  • S3 The number of convolution kernels of all convolutional layers of the simplified inception_v3 network is reduced to two-thirds of the original, and the parameter model Fa1 is modified to adapt to the network after reducing the number of convolution kernels.
  • the verification may specifically include: comparing the accuracy of the network fall detection after the reduction of the convolution kernel and the reduction, and if the accuracy of the fall detection does not decrease significantly, the reduced convolution kernel may be continued. And perform corresponding training and fine-tuning operations to obtain a more convolutional convolutional neural network; if the accuracy of the fall detection is significantly reduced, the training and fine-tuning operations can be stopped, and the last network and parameter model can be determined.
  • fall detection as a convolutional neural network for fall detection.
  • the sending the alarm may specifically include an alarm sounding by the buzzer to remind the person to fall; or sending a warning message (for example, an alarm message) to the person in charge of the target area or the surrounding medical staff through the communication device, requesting timely treatment, etc. Wait.
  • a warning message for example, an alarm message
  • the human fall detection method analyzes and processes the target image of a single frame instead of the video stream, and uses the target detection network based on the target detection algorithm to first identify the inclusion.
  • the human body image is used to classify and recognize the human body state in the target image through a convolutional neural network based on the classification algorithm to identify the specific state of the human body in the target image, thereby solving the recognition fall accuracy existing in the existing method.
  • the technical problem of poor error, large error and low efficiency achieves the technical effect of accurately and efficiently identifying the fall state; and by collecting sound information to determine the target orientation, and moving the camera according to the target orientation to acquire a valid target image, effectively
  • the detection range of the fall detection is expanded, the accuracy of obtaining an effective target image is improved, the detection effect is improved, and the user experience is improved; and the image including multiple human body states is acquired as sample data to establish a target detection network and convolution.
  • Neural network improved image recognition based on single frame The accuracy of the human body fall; also through the complexity of the type of state to be identified, the convolutional neural network has been simplified and improved accordingly, improving the implementation efficiency and reducing the occupation of computing resources.
  • an embodiment of the present invention further provides a human body fall detection device, as described in the following embodiments. Since the principle of the human fall detection device solving the problem is similar to the human fall detection method, the implementation of the device can be referred to the implementation of the human fall detection method, and the repeated description will not be repeated.
  • the term "unit” or “module” may implement a combination of software and/or hardware of a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 2 is a schematic structural diagram of a structure of a human body fall detection device according to an embodiment of the present disclosure. The device may specifically include: an acquisition module 21, a human body detection module 22, and a fall recognition module 23. Description.
  • the obtaining module 21 is specifically configured to acquire a target image.
  • the human body detecting module 22 is specifically configured to perform human body detection on the target image through the target detection network to determine whether the target image is an image including a human body;
  • the fall identification module 23 may be specifically configured to: when determining that the target image is an image including a human body, perform a fall recognition on the target image by using a convolutional neural network to determine whether the human body in the target image is in a Falling state.
  • the human body fall detection device may specifically be a human body fall detection robot capable of realizing human body fall detection.
  • the above-mentioned human fall detection robot can be specifically applied to various places such as homes, hospitals, shopping malls, etc., to detect the above-mentioned places in real time, and to find out in time that the personnel in the place fall, so as to timely perform an alarm and timely perform related assistance.
  • the acquisition module 21 may specifically include the following structural units in order to expand the detection range and efficiently obtain an effective target image:
  • the sound collector can be specifically used to collect sound information in the target area
  • a locator specifically, configured to determine a target orientation according to the sound information
  • the mobile device and the camera wherein the camera may be specifically disposed on the mobile device, and the mobile device may be specifically configured to move the camera according to the target orientation, and the camera may be specifically configured to acquire a target image.
  • the moving device may specifically include a pulley and a motor.
  • the moving position of the camera avatar can be driven by a moving device with a pulley and a motor to better acquire an effective target image.
  • the mobile device may also be other types of mobile devices, such as a mobile robot, a remote control car, and the like. In this regard, the application is not limited.
  • the effective target image may specifically be an image including a human body.
  • the camera can be moved according to the target orientation, and the effective target image can be acquired as much as possible, so that the workload of the human body detecting module 22 can be reduced, and the work efficiency can be improved.
  • the device in order to promptly respond to the fall of the person after detecting the fall of the human body, the device may further include an alarm module for issuing an alarm.
  • the alarm module may specifically include a buzzer.
  • an alarm may be issued by the buzzer in determining that the target image is considered to be in a falling state.
  • the alarm module may further include a communication device such as a signal transmitter.
  • the communication device such as a signal transmitter may determine that the target image is in a falling state. Down to the relevant responsible person (such as guardian or store security) or surrounding medical staff to send an alarm message to remind the relevant person in charge or the surrounding medical staff to fall, as soon as possible to treat.
  • the device may further include a target detection network establishing module, where the target detection network establishing module may be executed according to the following procedure: acquiring human body image sample data, wherein the human body image sample data includes multiple An image including a human body state; a human body region in the image in which the human body image sample data is marked; and the labeled human body image sample data is used for training to obtain a target detection network based on the target detection algorithm.
  • the target detection network establishing module may be executed according to the following procedure: acquiring human body image sample data, wherein the human body image sample data includes multiple An image including a human body state; a human body region in the image in which the human body image sample data is marked; and the labeled human body image sample data is used for training to obtain a target detection network based on the target detection algorithm.
  • the human body state may specifically include: a state in which the human body is standing, a state in which the human body is sitting, a state in which the human body is lying down, a state in which the human body is kneeling, a state in which the human body is tilted, a state in which the human body is tilted, and the like.
  • the above-described human body states are only for better explaining the embodiments of the present application.
  • other states than the above-described states may be introduced as the human body state according to specific conditions and requirements. In this regard, the application is not limited.
  • the human body detecting module 22 is connected to the acquiring module 21.
  • the human body detecting module 22 may send the information to the acquiring module 21 by determining that the target image is an image that does not include a human body.
  • the acquisition module 21 reacquires the target image.
  • the apparatus may further include a convolutional neural network establishing module, configured to establish a convolutional neural network for identifying a human state type, wherein the convolutional neural network establishing module may specifically include:
  • the acquiring unit may be specifically configured to acquire human body image sample data, where the human body image sample data includes a plurality of images including a human body state;
  • the extracting unit may be specifically configured to extract, from the human body image sample data, an image that meets the requirements as the pre-processed sample data;
  • the dividing unit may be specifically configured to divide the image in the preprocessed sample data into positive sample data and negative sample data according to a human body state in the image of the preprocessed sample data, wherein the image in the positive sample data Including at least one of the following: an image including a state in which the human body is standing, an image including a state in which the human body is sitting, an image including a state in which the human body is kneeling, an image including a state in which the human body is tilted; the negative sample
  • the image in the data includes at least one of the following: an image including a state in which the human body lies, and an image including a state in which the human body is kneeling;
  • the establishing unit may be specifically configured to perform training by using the positive sample data and the negative sample data to establish a convolutional neural network for identifying a human state type.
  • the convolutional neural network establishing module may further include:
  • the erroneous detection training unit may be specifically configured to acquire image sample data that does not include a human body; and perform error detection training on the convolutional neural network by using the image sample data that does not include the human body.
  • the image that meets the requirements may specifically include: an image of the human body region having a ratio of more than 80%, and the like.
  • system, device, module or unit illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function.
  • the above devices are described separately in terms of functions divided into various units.
  • the functions of each unit may be implemented in the same software or software and/or hardware when implementing the present application.
  • the human fall detection device analyzes and processes the target image of the single frame instead of the video stream, and firstly identifies the inclusion by using the target detection network based on the target detection algorithm.
  • the human body image is used to classify the human body state in the target image by a convolutional neural network based on the classification algorithm to identify the specific state of the human body in the target image, thereby solving the difference in recognition fall accuracy existing in the existing method.
  • the low-efficiency technical problem achieves the technical effect of accurately and efficiently identifying the fall state; and by collecting the sound information to determine the target orientation, and moving the camera according to the target orientation to acquire a valid target image, effectively expanding the fall detection
  • the detection range improves the accuracy of obtaining an effective target image and improves the detection effect.
  • the embodiment of the present application further provides an electronic device.
  • the electronic device may specifically include an input device 31 and 32, memory 33.
  • the input device 31 can be specifically configured to receive the acquired target image.
  • the processor 32 may be specifically configured to perform human body detection on the target image through a target detection network to determine whether the target image is an image including a human body; and in a case where the target image is determined to be an image including a human body
  • the target image is subjected to fall recognition by a convolutional neural network to determine whether the human body in the target image is in a falling state.
  • the memory 33 may be specifically configured to store the target image, the target detection network, the convolutional neural network, and intermediate data generated during the detection process.
  • the input device may specifically be one of main devices for exchanging information between the user and the computer system.
  • the input device may include a keyboard, a mouse, a camera, a scanner, a light pen, a handwriting input pad, a voice input device, etc.; the input device is used to input raw data and a program for processing the numbers into the computer.
  • the input device can also acquire data transmitted by other modules, units, and devices.
  • the processor can be implemented in any suitable manner.
  • a processor can employ, for example, a microprocessor or processor and a computer readable medium, logic gate, switch, or application-specific integrated circuit (such as software or firmware) that can be executed by the (micro)processor.
  • the memory may specifically be a memory device for storing information in modern information technology.
  • the memory may include multiple levels.
  • a circuit having a storage function without a physical form is also called a memory, such as a RAM, a FIFO, or the like;
  • a storage device having a physical form is also called a memory, such as a memory stick, a TF card, or the like.
  • the application storage embodiment further provides a computer storage medium based on a human body fall detection method, wherein the computer storage medium stores computer program instructions, when the computer program instructions are executed: obtaining a target image; a network, performing human body detection on the target image to determine whether the target image is an image including a human body; and in the case of determining that the target image is an image including a human body, the target image is obtained by convolving a neural network A fall recognition is performed to determine whether the human body in the target image is in a falling state.
  • the storage medium includes but is not limited to a random access memory (RAM), a read-only memory (ROM), a cache (Cache), a hard disk (Hard Disk Drive, HDD), or a memory card (MemoryCard). ).
  • the memory can be used to store computer program instructions.
  • the network communication unit may be an interface for performing network connection communication in accordance with a standard stipulated by the communication protocol.
  • the application provides a human fall detection method and a device for designing a corresponding human fall detection robot, and applying the human fall detection robot to perform specific human fall detection.
  • the specific implementation process can refer to the following.
  • the human body fall detection detecting robot may be configured by using the human fall detection method and device designed by applying the human body fall detection method provided in the embodiment of the present application.
  • the robot can specifically use the sound source positioning module to locate the general orientation of the human body (ie, the target orientation), and then use the camera to collect data (ie, the target image), and realize the human fall detection based on the single frame image through the deep learning algorithm.
  • the fall detection robot includes a specific movable robot body 12, a camera module 13, an alarm module 14 (optional), a sound source positioning module 15 (optional), a human body detection module 16, and a fall recognition module 17 functional module.
  • the sound source locating module 15 can be specifically configured to determine a general orientation of the human body, and use the camera module 13 to capture a single frame image.
  • the human body detecting module 16 and the fall recognition module 17 can be specifically configured to determine whether a person falls according to the captured image. And transmitting the result to the movable robot body 12; if it falls, the movable robot body 12 can alarm by controlling the alarm module 14.
  • the movable robot body 12 includes at least a structure such as a robot body, a motor, and a pulley.
  • the camera module 13 can be used to collect a single image and send it to the human body detection module 16 for determining whether there is a human body (ie, determining whether the image is an image containing a human body).
  • the alarm module 14 can include at least a mobile phone communication function and a 110 alarm function. In this way, in the specific implementation, the mobile phone communication function can be used to realize the sending of the fall information and the sending of the picture information, and the 110 alarm function realizes the 110 alarm for timely rescue.
  • the sound source locating module 15 can specifically determine the source direction of the sound through the microphone array for conveniently searching for people.
  • the human body detecting module 16 can specifically implement human body detection by using an SSD target detection algorithm in deep learning.
  • the fall identification module 17 implements fall state recognition by a deep learning convolutional neural network.
  • the above-described human fall detection robot can be considered as a specific human fall detection device, and the main principle of the implementation is the same as the human fall detection device.
  • S4 It is determined by the human body detection module whether a person exists in the collected image. If yes, continue with 5; if not, return 1;
  • S5 sending the detected human body area to the fall recognition module to determine whether the human body falls;
  • S8 Perform an alarm to transmit the information and images of the fall to the connected mobile phone or other terminal.
  • the human body detection module described above is implemented based on an SSD target detection algorithm in deep learning.
  • the SSD algorithm training can be performed according to the following process:
  • S1 Collecting sample data of human body images containing human body (the ratio of people to pictures is not limited). Because the human body area needs to be detected, and the human body in any state needs to be detected, the collected image data may specifically include a human body in different states, such as a human body standing, kneeling, lying, and leaning.
  • S2 Label the collected human body image sample data.
  • the SSD target detection network will calibrate the human body area during human body detection. Therefore, it is necessary to provide the human body area in the human body image sample data during training.
  • the SSD target detection network can be constructed on the tensorflow framework, and the inception_v2 is used as the feature extractor.
  • the fall detection module may specifically include a convolutional neural network in deep learning. Before the image recognition, the fall recognition module can perform convolutional neural network training through the following process:
  • S1 Collecting pre-processed sample data including the human body (the proportion of the human-generated picture is more than 80%, that is, the human body area image detected by the human body detection module).
  • the positive sample (that is, the positive sample data) contains all non-falling human body pictures, that is, the human body state is standing, holding, tilting, etc.;
  • the negative sample ie, negative sample data contains pictures that are fallen after the person falls. That is, the human body state is lying, kneeling, and the like.
  • S3 Preprocess the image in the image data sample. Specifically, all image data can be converted to a specified size, for example, a 299 x 299 pixel size.
  • the fall identification module may use an inception_v3 network.
  • S4-1 Reduce the inception structure, such as the number of layers, while ensuring the recognition accuracy. It simplifies the network structure, improves the recognition speed, and saves the effect of computing resources.
  • S4-2 Reduce the number of convolution kernels while ensuring the recognition accuracy.
  • the network size is reduced, the recognition speed is improved, and the effect of computing resources is saved.
  • S5 The preprocessed picture data sample is input into the inception_v3 network for training, and a fall identification network (ie, a convolutional neural network) is obtained.
  • a fall identification network ie, a convolutional neural network
  • the human body detection module and the fall detection module are specifically used to perform the human body fall detection, the following may specifically include the following content:
  • S1 Input the captured image into the SSD target detection network, detect the area where the human body is located, and save the result.
  • S2 Convert all detected human body regions to a specified size, such as a 299 ⁇ 299 pixel size.
  • S3 Input the result obtained in S2 into the obtained inception_v3 model, and simultaneously predict in a multi-threaded manner to give a recognition result.
  • the fall detection result is displayed to determine whether the human body has a fall.
  • the above-mentioned human fall detection robot can achieve higher precision fall through a single frame image in a complicated scene by using the target detection algorithm SSD and the image classification algorithm CNN. Detection and alarm handling can be implemented. Overcoming the problem of inaccurate human detection in the existing methods; at the same time, since the analysis and processing of the video stream is not required, the fall detection can be realized only by the single frame image, the calculation amount is reduced, the detection efficiency is improved, and the movable efficiency is improved.
  • the robot is a carrier that enables full-scale monitoring.
  • the method and device for detecting human fall in the embodiment of the present application are verified, and the target image of the single frame is acquired instead of the video stream for analysis and processing, and the target detection network based on the target detection algorithm is first identified and included.
  • the image of the human body is classified by the convolutional neural network based on the classification algorithm to classify the human body state in the target image to identify the specific state of the human body in the target image, and indeed solves the difference in recognition fall accuracy existing in the existing method.
  • the technical problem of low efficiency has achieved the technical effect of accurately and efficiently identifying the fall state.
  • the device or module and the like set forth in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function.
  • the above devices are described as being separately divided into various modules by function.
  • the functions of the modules may be implemented in the same software or software and/or hardware, or the modules that implement the same function may be implemented by a combination of multiple sub-modules.
  • the device embodiments described above are merely illustrative.
  • the division of the modules is only a logical function division. In actual implementation, there may be another division manner.
  • multiple modules or components may be combined or integrated. Go to another system, or some features can be ignored or not executed.
  • the controller can be logically programmed by means of logic gates, switches, ASICs, programmable logic controllers, and embedding.
  • the application can be described in the general context of computer-executable instructions executed by a computer, such as a program module.
  • program modules include routines, programs, objects, components, data structures, classes, and the like that perform particular tasks or implement particular abstract data types.
  • the present application can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communication network.
  • program modules can be located in both local and remote computer storage media including storage devices.
  • the present application can be implemented by means of software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product in essence or in the form of a software product, which may be stored in a storage medium such as a ROM/RAM or a disk. , an optical disk, etc., includes instructions for causing a computer device (which may be a personal computer, mobile terminal, server, or network device, etc.) to perform the methods described in various embodiments of the present application or portions of the embodiments.
  • a computer device which may be a personal computer, mobile terminal, server, or network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Alarm Systems (AREA)

Abstract

一种人体跌倒检测方法和装置,其中,该方法包括:获取目标图像(S11);通过目标检测网络,对目标图像进行人体检测,以确定目标图像是否为包含人体的图像(S12);在确定目标图像为包含人体的图像的情况下,通过卷积神经网络,对目标图像进行跌倒识别,以确定目标图像中的人体是否处于跌倒状态(S13),由于该方法通过获取单帧的目标图像而不是视频流进行分析处理,并利用基于目标检测算法的目标检测网络识别出包含有人体的图像,再通过基于分类算法的卷积神经网络对目标图像中的人体状态进行分类识别,以识别目标图像中人体的状态,从而解决了现有方法的识别人体跌倒准确度差、效率低的技术问题,达到精确、高效地识别出跌倒状态的技术效果。

Description

人体跌倒检测方法和装置
本申请要求2017年12月29日提交的申请号为:201711468689.0、发明名称为“人体跌倒检测方法和装置”的中国专利申请的优先权,其全部内容合并在此。
技术领域
本申请涉及人体检测技术领域,特别涉及一种人体跌倒检测方法和装置。
背景技术
随着社会老龄化趋势的日益严重,人们越来越关注老人的日常生活安全。例如,希望可以及时检测老人独自在家时是否发生跌倒等意外。因此,现实生活中,如何有效、准确地检测是否发生跌倒,以便及时地对老人进行救助成为一个重要的问题。
目前为了检测跌倒,现有的方法大多是事先在人体活动区域内布设多个摄像头,以采集视频流数据,再通过分析视频流数据中人体变化情况来判断人体是否发生跌倒。上述方法具体实施时,由于需要对视频流数据进行处理、分析,工作量大、效率低。此外,通过分析人体变化情况判断人体是否发生跌倒判断过程较为复杂,误差相对较大。综上可知,现有方法具体实施时,往往存在识别跌倒准确度差、误差大、效率低的技术问题。
针对上述问题,目前尚未提出有效的解决方案。
发明内容
本申请实施例提供了一种人体跌倒检测方法和装置,以解决现有方法中存在的识别跌倒准确度差、误差大、效率低的技术问题,达到准确、高效地识别出跌倒状态的技术效果。
本申请实施例提供了一种人体跌倒检测方法,包括:
获取目标图像;
通过目标检测网络,对所述目标图像进行人体检测,以确定所述目标图像是否为包含人体的图像;
在确定所述目标图像为包含人体的图像的情况下,通过卷积神经网络,对所述目标图像进行跌倒识别,以确定所述目标图像中的人体是否处于跌倒状态。
在一个实施方式中,所述获取目标图像,包括:
采集目标区域中的声音信息;
根据所述声音信息,确定目标方位;
根据所述目标方位,移动摄像头,以获取所述目标图像。
在一个实施方式中,按照以下方式建立所述目标检测网络:
获取人体图像样本数据,其中,所述人体图像样本数据包括多个包含人体状态的图像;
标注所述人体图像样本数据的图像中的人体区域;
利用标注后的人体图像样本数据进行训练,以得到基于目标检测算法的目标检测网络。
在一个实施方式中,所述人体状态包括:人体站着的状态、人体坐着的状态、人体躺着的状态、人体蹲着的状态、人体倾斜着的状态、人体趴着的状态。
在一个实施方式中,在确定所述目标图像为不包含人体的图像的情况下,所述方法还包括:重新获取目标图像。
在一个实施方式中,按照以下方式建立所述卷积神经网络:
从所述人体图像样本数据中提取符合要求的图像作为预处理样本数据;
根据所述预处理样本数据的图像中的人体状态,将所述预处理样本数据中的图像划分正样本数据和负样本数据,其中,所述正样本数据中的图像包括以下至少之一:包含有人体站着的状态的图像、包含有人体坐着的状态的图像、 包含有人体蹲着的状态的图像、包含有人体倾斜着的状态的图像;所述负样本数据中的图像包括以下至少之一:包含有人体躺着的状态的图像、包含有人体趴着的状态的图像;
利用所述正样本数据、所述负样本数据进行训练,以建立用于识别人体状态类型的所述卷积神经网络。
在一个实施方式中,所述符合要求的图像包括:人体区域的占图比大于80%的图像。
本申请实施例还提供了一种人体跌倒检测装置,包括:
获取模块,用于获取目标图像;
人体检测模块,用于通过目标检测网络,对所述目标图像进行人体检测,以确定所述目标图像是否为包含人体的图像;
跌倒识别模块,用于在确定所述目标图像为包含人体的图像的情况下,通过卷积神经网络,对所述目标图像进行跌倒识别,以确定所述目标图像中的人体是否处于跌倒状态。
在一个实施方式中,所述获取模块包括:
声音采集器,用于采集目标区域中的声音信息;
定位器,用于根据所述声音信息,确定目标方位;
移动装置和摄像头,其中,所述摄像头设于所述移动装置上,所述移动装置用于根据所述目标方位,移动所述摄像头,所述摄像头用于获取目标图像。
在一个实施方式中,所述装置还包括报警模块,用于在确定目标图像中的人体处于跌倒状态的情况下发出警报,和/或,发送警示信息。
在本申请实施例中,通过获取单帧的目标图像而不是视频流进行分析处理,并利用基于目标检测算法的目标检测网络先识别出包含有人体的图像,再通过基于分类算法的卷积神经网络对目标图像中的人体状态进行分类识别,以识别出目标图像中人体的具体状态,从而解决了现有方法中存在的识别跌倒准 确度差、误差大、效率低的技术问题,达到了精确、高效地识别出跌倒状态的技术效果。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是根据本申请实施方式提供的人体跌倒检测方法的处理流程示意图;
图2是根据本申请实施方式提供的人体跌倒检测装置的组成结构示意图;
图3是基于本申请实施方式提供的人体跌倒检测方法的电子设备组成结构示意图;
图4是在一个场景示例中应用本申请实施方式提供的人体跌倒检测方法和装置设计的人体跌倒检测机器人的组成结构示意图;
图5是在一个场景示例中应用人体跌倒检测机器人进行人体跌倒检测的流程示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请中的技术方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
考虑到现有方法具体实施时大多是采集视频流数据,并对视频流数据进行分析、处理,由于要分析的数据量大,导致占用资源多,且效率低。此外,现有方法大多是通过分析人体变化检测人体是否跌倒,这种识别方式本身较为复 杂、精度较差,容易出现误差。综上可知,现有方法具体实施时往往存在识别跌倒准确度差、效率低的技术问题。针对产生上述技术问题的根本原因,本申请考虑可以获取单帧的图像数据,而不是视频流数据进行具体的分析,以有效的降低数据处理量;此外,针对图像数据的特点和优势,通过分析图像中的人体状态而不是人体变化来判断人体是否发生跌倒,解决了现有方法中存在的识别跌倒准确度差、误差大、效率低的技术问题,达到了精确、高效地识别出跌倒状态的技术效果。
基于上述思考思路,本申请实施例提供了一种人体跌倒检测方法。具体请参阅图1所示的根据本申请实施方式提供的人体跌倒检测方法的处理流程示意图。本申请实施例提供的人体跌倒检测方法,具体实施时,可以包括以下步骤。
S11:获取目标图像。
在本实施方式中,为了降低计算量,减少对计算资源的占用,具体实施时,可以获取单帧的目标图像,而不是现有方法采集的视频流,进行后续具体分析、处理。相较于视频流,对于单帧的目标图像后续分析、处理时只要对单独一帧图像进行分析、检测、识别,如此,可以有效地减少计算量、降低计算成本,提高识别速度。
在一个实施方式中,为了进一步减少后续人体检测阶段的工作量,避免为了获取包含人体的图像,多次进行图像获取,在获取目标图像的过程中,具体实施时,可以尽量优先获取有效图像作为目标图像。其中,上述有效图像具体可以理解为是包含人体的图像。相应的,可以将不包含人体的图像理解为无效图像。如此,可以避免为了获取后续可以使用的包含人体的图像,多次重复进行目标图像的获取,有助于提高了处理效率。
在一个实施方式中,为了能高效地获取上述有效图像,上述获取目标图像,具体实施时,可以包括以下内容:
S11-1:采集目标区域中的声音信息;
S11-2:根据所述声音信息,确定目标方位;
S11-3:根据所述目标方位,移动摄像头,以获取所述目标图像。
在本实施方式中,上述目标方位具体可以是声音来源的方向。上述方向具有较大的概率存在人员运动。因此,在上述目标方位,相对于其他方位,具有更大地概率获取到包含人体的图像,即有效图像。
在本实施方式中,具体实施时,可以采用麦克风阵列作为声音采集器采集目标区域中的声音信息;并通过定位器根据所采集的声音信息,确定声音来源的方向,将上述声音来源的方向确定为上述目标方位。当然,需要说明的是,上述所列举的麦克风阵列只是为了更好地说明本申请实施方式。具体实施时,也可以根据具体情况,选择使用其他合适的声音采集器。
在本实施方式中,具体实施时,上述摄像头具体可以是设置在移动装置上,即摄像头是可以移动不是固定设置目标区域内的。例如,可以将摄像头设置在由滑轮和电机组成的移动装置上。如此,摄像头可以通过移动装置灵活地在目标区域内进行移动,从而可以有效地扩大采集目标图像的区域范围,实现在更大的检测范围中获取更多的目标图像。即本申请实施例使用摄像头的方式不同于现有方法中使用摄像头的方式。具体的,现有方法中使用摄像头时是将摄像头固定设置在目标区域中某个固定位置上,以采集视频流数据。按照现有方法中使用摄像头的方式单个摄像头所能检测的范围有限,为了提高总的检测范围则需要在目标区域的多个位置分别布设摄像头。如此,又会增加实施成本。而本申请实施例中提供的使用摄像头的方式,则是将摄像头设置在移动装置上,进而可以根据情况,通过移动装置实时地移动摄像头以获取目标区域中不同位置的目标图像,从而可以利用一个或少量的摄像头实现对较大范围内的目标图像的获取,降低了实施成本。同时,由于摄像头可以移动,可以根据人体的具体情况调整摄像头与人体的角度和距离,从而可以获取质量更高的目标图像,以便后续可以更加准确地进行跌倒识别。当然,需要说明的是,上述所列举的 移动装置只是为了更好地说明本申请实施方式。具体实施时,也可以根据具体情况和精度要求选择使用其他可移动结构作为移动装置,例如可移动机器人、遥控车等等,以便可以灵活地移动摄像头的位置。对此,本申请不作限定。
在本实施方式中,具体实施时,可以先通过麦克风阵列采集目标区域中的声音信息;通过定位器确定声音的来源方向,并将该方向作为有可能存在人员活动的方向,即目标方位;再通过移动装置,根据所确定的目标方位,将摄像头移动到声音的来源位置,从而可以通过普通的摄像头获取到质量相对较高的有效图像。
S12:通过目标检测网络,对所述目标图像进行人体检测,以确定所述目标图像是否为包含人体的图像。
在本实施方式中,在获取到目标图像后先要对目标图像进行人体检测,以确定所要分析的目标图像是否为包含人体的图像,即有效图像。以便后续可以仅对有效图像进行下一步的跌倒识别。将不包含人体的图像作为无效图像,不进行下一步的跌倒识别。从而可以通过事先排除不包含人体的图像,避免对不包含人体的图像进行无意义的跌倒识别,降低了跌倒识别的数据处理量,进一步提高了处理速率。
在一个实施方式中,具体实施时,可以在确定所述目标图像为不包含人体的图像的情况下,重新获取目标图像,以便对目标区域中的可能存在人员活动的区域进行实时监测。
在一个实施方式中,考虑到所获取的、待分析的数据是单帧的图像,考虑到图像分析的具体特点,为了能够快速、准确地确定出目标图像是否为包含人体的图像。具体实施时,可以通过基于目标检测算法的目标检测网络对所获取的目标图像进行人体检测,以确定目标图像是否为包含人体的图像。
在一个实施方式中,可以在步骤S12执行之前,预先通过以下方式建立上述用于进行人体检测的目标检测网络:
S1:收集人体图像样本数据,所述人体图像样本数据包括不同状态下的人体图像;
S2:在所述人体图像样本数据中标注人体区域;
S3:利用标注后的人体图像样本数据进行训练,以得到基于目标检测算法的目标检测网络。
在本实施方式中,上述目标检测算法具体可以是一种基于深度学习的检测算法,也称为SSD(SingleShotMultiBoxDetector)算法。该算法的核心是在在特征图上采用卷积核来预测一系列defaultboundingboxes的类别分数、偏移量,进而可以快速、准确地检测出待检测的目标图像是否是包含人体的有效图像。
在本实施方式中,为了配合后续的跌倒识别,要求所述人体图像样本数据具体可以包括不同状态下人体状态的多个图像。
在一个实施方式中,为了能够全面地考虑到多种不同的人体状态状况,上述人体状态具体可以包括:人体站着的状态、人体坐着的状态、人体躺着的状态、人体蹲着的状态、人体倾斜着的状态、人体趴着的状态等等。如此,具体实施时,可以通过目标检测算法对多种包含不同的人体状态的图像进行学习,以便可以建立能够同时检测、识别多种包含不同的人体状态的图像。
在本实施方式中,具体实施时,可以利用SSD目标检测网络在人体图像样本数据的图像中标定出人体区域,以便后续可以进行与人体区域特征识别相关的训练。
在一个实施方式中,在利用标注后的人体图像样本数据进行训练之前,可以先构建SSD目标检测网络,即相当于目标检测的初始模型。具体实施时,可以在tensorflow框架上构建上述SSD目标检测网络,并以inception_v2作为为特征提取器。
在一个实施方式中,上述利用标注后的人体图像样本数据进行训练,以得到基于目标检测算法的目标检测网络,具体实施时,可以包括以下内容:利用 标注后的人体图像样本数据作为输入数据,对上述SSD目标检测网络,即目标检测的初始模型进行训练,以得到训练好的目标检测网络;再根据人体图像样本数据和精度要求,对上述训练好的目标检测网络进行调整优化,以获得用于人体检测的SSD网络,即所述基于目标检测算法的目标检测网络。
S13:在确定所述目标图像为包含人体的图像的情况下,通过卷积神经网络,对所述目标图像进行跌倒识别,以确定所述目标图像中的人体是否处于跌倒状态。
在一个实施方式中,为了能够快速、准确地从包含人体的图像中识别出人体状态所对应的人体状态,例如,分辨出人体处于跌倒状态,或者没有处于跌倒状态,可以采用卷积神经网络对目标图像进行跌倒识别,以确定目标图像中的人体是否处于跌倒状态。
在本实施方式中,具体实施时,考虑到图像分类算法(CNN)相关思想,可以将训练好的卷积神经网络作为跌倒识别模型,将确定包含人体的目标图像作为输入数据,通过上述跌倒识别模型识别出上述目标图像中的人体是否处于跌倒状态,从而可以根据单帧图像判断人体是否发生跌倒。
在一个实施方式中,具体实施时,可以在进行S13之前预先通过以下方式建立识别跌倒精度较高、识别速度较快的卷积神经网络:
S1:获取人体图像样本数据,其中,所述人体图像样本数据包括不同状态下的人体图像;
S2:从所述人体图像样本数据中提取符合要求的图像作为预处理样本数据;
S3:根据所述预处理样本数据的图像中的人体状态,将所述预处理样本数据中的图像划分正样本数据和负样本数据,其中,所述正样本数据中的图像包括以下至少之一:包含有人体站着的状态的图像、包含有人体坐着的状态的图像、包含有人体蹲着的状态的图像、包含有人体倾斜着的状态的图像;所述负 样本数据中的图像包括以下至少之一:包含有人体躺着的状态的图像、包含有人体趴着的状态的图像;
S4:利用所述正样本数据、所述负样本数据进行训练,以建立用于识别人体状态类型的所述卷积神经网络。
在一个实施方式中,考虑到为了建立较为精确的基于目标检测算法的目标检测网络,上述人体图像样本数据中就已经包括了多个包含人体状态的图像。因此,在本实施方式中,可以基于人体图像样本数据提取符合要求的图像,作为预处理样本数据。
在本实施方式中,在获得预处理样本数据后,需要先根据跌倒和非跌倒两种状态,对预处理样本数据中的图像进行分类。具体的,可以将预处理样本数据中表征非跌倒的图像,包括:包含有人体站着的状态的图像、包含有人体坐着的状态的图像、包含有人体蹲着的状态的图像、包含有人体倾斜着的状态的图像等图像划分为正样本数据,即正图像数据集。将预处理样本数据中表征跌倒的图像,包括:包含有人体躺着的状态的图像、包含有人体趴着的状态的图像等图像划分为负样本数据,即负图像数据集。如此,后续可以针对人体的跌倒状态和人体的非跌倒两种状态的识别,利用对应的上述两种样本数据进行具体的训练学习,以建立识别精度较高的卷积神经网络。
在一个实施方式中,利用所述正样本数据、所述负样本数据进行训练,以建立用于识别人体状态类型的所述卷积神经网络,具体实施时,可以包括以下内容:构建初始的卷积神经网络;利用上述正样本数据和负样本数据作为输入数据对上述初始的卷积神经网络进行关于人体的跌倒状态和人体的非跌倒状态的识别训练,以到识别精度较高、识别速度较快的卷积神经网络。进而可以利用该卷积神经网络准确地识别出目标图像中的人体状态对应的是否是人体的跌倒状态。如果识别得到的目标图像中的人体状态对应于人体的跌倒状态,则可以判断人体处于跌倒状态;如果识别得到的目标图像中的人体状态对应于 人体的非跌倒状态,则可以判断人体没有处于跌倒状态。
在一个实施方式中,在建立上述卷积神经网络的过程中,具体实施时,还可以包括以下内容:
S1:获取不包含人体的图像样本数据;
S2:利用所述不包含人体的图像样本数据,对所述卷积神经网络进行误检测训练。
在本实施方式中,通过上述误检测训练,可以先识别出并过滤掉不包含人体的目标图像,提高卷积神经网络在进行跌倒检测时的处理效率。
在本申请实施例中,相较于现有技术,通过获取单帧的目标图像而不是视频流进行分析处理,并利用基于目标检测算法的目标检测网络先识别出包含有人体的图像,再通过基于分类算法的卷积神经网络对目标图像中的人体状态进行分类识别,以识别出目标图像中人体的具体状态,从而解决了现有方法中存在的识别跌倒准确度差、误差大、效率低的技术问题,达到了精确、高效地识别出跌倒状态的技术效果。
在一个实施方式中,为了能够从人体图像样本数据中提取适用进行跌倒识别训练的预处理样本数据,所述符合要求的图像具体可以包括:人体区域的占图比大于80%的图像。如此,可以从人体图像样本数据中提取适用跌倒识别训练的样本数据,避免了重新采集进行跌倒识别的样本数据,降低了训练成本,提高了学习效率。
在一个实施方式中,上述初始的卷积神经网络具体可以是inception_v3网络。其中,上述inception_v3网络具体是一种适用于图像识别的卷积神经网络。当然,需要说明的是,上述所列举的卷积神经网络只是为了更好地说明本申请实施方式。具体实施时,也可以根据具体情况和识别的具体特征选择使用其他合适的卷积神经网络。对此,本申请不作限定。
在一个实施方式中,在利用所述正样本数据、所述负样本数据进行训练, 以建立用于识别人体状态类型的所述卷积神经网络之前,所述方法还包括,根据初始的卷积神经网络,对所述正样本数据、负样本数据中的图像进行预处理,以使得所述正样本数据、负样本数据中的图像与初始的卷积神经网络相匹配。具体的,例如,在初始的卷积神经网络为inception_v3网络,上述预处理具体可以包括:对所述正样本数据、负样本数据中的图像进行图像变换到指定尺寸,例如,变换到299×299像素点的大小。
在一个实施方式中,进一步考虑到在利用卷积神经网络进行跌倒识别时,实际上只需要区分两种类型,即人体的跌倒状态和人体的非跌倒状态。因此,根据卷积神经网络所要分类识别的复杂程度,兼顾为了提高处理效率,减少对计算资源的占用和浪费,在建立初始的卷积神经网络时,可以先对所述卷积神经网络进行简化改进。其中,上述简化改进具体可以包括:减少卷积神经网络的层数,和/或,减少卷积神经网络的卷积核个数。即可以通过单独减少卷积神经网络的层数,或者单独减少卷积神经网络的卷积核个数,或者同时少卷积神经网络的层数和减少卷积神经网络的卷积核个数对上述卷积神经网络进行简化改进,从而可以达到兼顾识别精度的同时,减少对计算资源的占用,提高处理效率。
在一个实施方式中,在卷积神经网络为inception_v3网络的情况下,上述对inception_v3网络的简化改进具体可以包括:将inception_v3网络的层数由11层(或结构)删减为6层或5层,和/或,删减inception_v3网络中的卷积核个数,进而可以得到简化的卷积神经网络。
在一个实施方式中,上述简化的卷积神经网络具体实施时可以按照以下方式建立:
S1:对已有的inception_V3网络进行简化处理。
在本实施方式中,具体的,可以删除inception_V3网络最后5个inception结构,得到简化后的inception_v3网络。
S2:使用预处理的样本数据训练进简化后的inception_v3网络,得到可用于跌倒检测的参数模型Fa1。
S3:依次将简化后的inception_v3网络的所有卷积层的卷积核个数减少为原有的三分之二,同时修改参数模型Fa1,使其适应于减少卷积核个数后的网络。
S4:继续使用预处理的样本数据对修改后的参数模型Fa1进行训练,并对修改后的Fa1进行微调,得到可用于跌倒检测的参数模型Fa2。
S5:校验上述参数模型Fa2,根据校验结果对上述参数模型Fa2按照S4所包括的训练和微调操作进行调整,以得到简化的卷积神经网络。
在本实施方式中,上述校验具体可以包括:比较卷积核减少后与减少前的网络跌倒检测的准确率,若跌倒检测的准确率没有出现明显下降,则可以继续上述的减少卷积核,并进行相应的训练和微调的操作,以得到更加精简的卷积神经网络;若跌倒检测的准确率出现明显下降,则可以停止训练和微调的操作,并确定上一次的网络和参数模型用于跌倒检测,即作为用于跌倒检测的卷积神经网络。
在一个实施方式中,确定目标图像中的人体处于跌倒状态后,可以判断目标区域中的人体发生跌倒,进而可以发出警报,以提示目标区域中有人跌倒。其中,上述发出警报具体可以包括通过蜂鸣器发出警报声以提醒有人跌倒;也可以通过通讯设备向目标区域的负责人或者周边的医护人员发送报警信息(例如,警报短信),请求及时医治等等。当然,上述所列举的多种发出警报的方式只是为了更好地说明本申请实施方式。具体实施时,也可以根据具体情况选择其他合适的发出警报的方式进行报警。对此,本申请不作限定。
从以上的描述中,可以看出,本申请实施例提供的人体跌倒检测方法,通过获取单帧的目标图像而不是视频流进行分析处理,并利用基于目标检测算法的目标检测网络先识别出包含有人体的图像,再通过基于分类算法的卷积神经网络对目标图像中的人体状态进行分类识别,以识别出目标图像中人体的具体 状态,从而解决了现有方法中存在的识别跌倒准确度差、误差大、效率低的技术问题,达到了精确、高效地识别出跌倒状态的技术效果;又通过采集声音信息以确定目标方位,并根据目标方位移动摄像头以采集有效的目标图像,有效地扩大了跌倒检测的检测范围,提高了获取有效目标图像的准确度,提高了检测效果,改善了用户体验;还通过获取包含多种人体状态的图像作为样本数据,以建立目标检测网络、卷积神经网络,提高了根据单帧图像识别人体跌倒的精度;还通过根据所要识别的状态类型的复杂度,对卷积神经网络进行了相应的简化改进,提高了实施效率、降低了对运算资源的占用。
基于同一发明构思,本发明实施例中还提供了一种人体跌倒检测装置,如下面的实施例所述。由于人体跌倒检测装置解决问题的原理与人体跌倒检测方法相似,因此装置的实施可以参见人体跌倒检测方法的实施,重复之处不再赘述。以下所使用的,术语“单元”或者“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。请参阅图2,是本申请实施例提供的人体跌倒检测装置的一种组成结构示意图,该装置具体可以包括:获取模块21、人体检测模块22、跌倒识别模块23,下面对该结构进行具体说明。
获取模块21,具体可以用于获取目标图像;
人体检测模块22,具体可以用于通过目标检测网络,对所述目标图像进行人体检测,以确定所述目标图像是否为包含人体的图像;
跌倒识别模块23,具体可以用于在确定所述目标图像为包含人体的图像的情况下,通过卷积神经网络,对所述目标图像进行跌倒识别,以确定所述目标图像中的人体是否处于跌倒状态。
在本实施方式中,需要说明的是上述人体跌倒检测装置具体可以是一种能够实现人体跌倒检测的人体跌倒检测机器人。上述人体跌倒检测机器人具体可以应用于家庭、医院、商场等多种场所,以实时检测上述场所,及时发现场所 中的人员跌倒,以便及时地进行报警,及时进行相关救助。
在一个实施方式中,为了能够扩大检测范围,高效地获取有效目标图像,所述获取模块21具体可以包括以下结构单元:
声音采集器,具体可以用于采集目标区域中的声音信息;
定位器,具体可以用于根据所述声音信息,确定目标方位;
移动装置和摄像头,其中,所述摄像头具体可以设于所述移动装置上,所述移动装置具体可以用于根据所述目标方位,移动所述摄像头,所述摄像头具体可以用于获取目标图像。
在本实施方式中,上述移动装置具体可以包括滑轮和电机。如此,具体实施时,可以通过带有滑轮和电机的移动装置,带动摄像头像目标方位移动,以更好地获取有效的目标图像。当然,需要说明的是,上述所列举的移动装置只是为了更好的说明本申请实施方式。具体实施时,上述移动装置也可以是其他类型的可移动设备,例如,可移动的机器人、遥控汽车等等。对此,本申请不作限定。
在本实施方式中,上述有效的目标图像具体可以是包含有人体的图像。通过可以上述移动装置,可以根据目标方位移动摄像头,尽可能地获取有效目标图像,从而可以减少人体检测模块22的工作量,提高工作效率。
在一个实施方式中,为了在检测到人体跌倒后及时进行报警以对跌倒人员进行及时救治,所述装置具体还可以包括报警模块,用于发出警报。
在一个实施方式中,上述报警模块具体可以包括蜂鸣器,如此,所述报警模块具体实施时,可以通过蜂鸣器在确定目标图像中的认为处于跌倒状态的情况下发出警报。
在一个实施方式中,上述报警模块具体还可以包括信号发送器等通讯设备,如此,所述报警模块具体实施时,可以通过信号发送器等通讯设备在确定目标图像中的认为处于跌倒状态的情况下向相关负责人(例如监护人或者商场 保安)或者周边医护人员发送报警信息,以提示相关负责人或者周边医护人员有人跌倒,尽快救治。
在一个实施方式中,所述装置具体还可以包括目标检测网络建立模块,目标检测网络建立模块具体实施时可以按照以下程序执行:获取人体图像样本数据,其中,所述人体图像样本数据包括多个包含人体状态的图像;标注所述人体图像样本数据的图像中的人体区域;利用标注后的人体图像样本数据进行训练,以得到基于目标检测算法的目标检测网络。
在一个实施方式中,所述人体状态具体可以包括:人体站着的状态、人体坐着的状态、人体躺着的状态、人体蹲着的状态、人体倾斜着的状态、人体趴着的状态等。当然,需要说明的是,上述所列举的人体状态只是为了更好地说明本申请实施方式。具体实施时,也可以根据具体情况和要求,引入除上述所列举的状态以外的其他状态作为人体状态。对此,本申请不作限定。
在一个实施方式中,上述人体检测模块22与获取模块21相连,具体实施时,人体检测模块22可以在确定所述目标图像为不包含人体的图像的情况下,发送信息至获取模块21,通过获取模块21重新获取目标图像。
在一个实施方式中,所述装置具体还可以包括卷积神经网络建立模块,用于建立用于识别人体状态类型的卷积神经网络,其中,所述卷积神经网络建立模块具体可以包括:
获取单元,具体可以用于获取人体图像样本数据,其中,所述人体图像样本数据包括多个包含人体状态的图像;
提取单元,具体可以用于从所述人体图像样本数据中提取符合要求的图像作为预处理样本数据;
划分单元,具体可以用于根据所述预处理样本数据的图像中的人体状态,将所述预处理样本数据中的图像划分正样本数据和负样本数据,其中,所述正样本数据中的图像包括以下至少之一:包含有人体站着的状态的图像、包含有 人体坐着的状态的图像、包含有人体蹲着的状态的图像、包含有人体倾斜着的状态的图像;所述负样本数据中的图像包括以下至少之一:包含有人体躺着的状态的图像、包含有人体趴着的状态的图像;
建立单元,具体可以用于利用所述正样本数据、所述负样本数据进行训练,以建立用于识别人体状态类型的卷积神经网络。
在一个实施方式中,所述卷积神经网络建立模块具体还可以包括:
误检测训练单元,具体可以用于获取不包含人体的图像样本数据;并利用所述不包含人体的图像样本数据,对所述卷积神经网络进行误检测训练。
在本实施方式中,为了建立并训练出准确度更高的卷积神经网络,所述符合要求的图像具体可以包括:人体区域的占图比大于80%的图像等。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
需要说明的是,上述实施方式阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。为了描述的方便,在本说明书中,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。
此外,在本说明书中,诸如第一和第二这样的形容词仅可以用于将一个元素或动作与另一元素或动作进行区分,而不必要求或暗示任何实际的这种关系或顺序。在环境允许的情况下,参照元素或部件或步骤(等)不应解释为局限于仅元素、部件、或步骤中的一个,而可以是元素、部件、或步骤中的一个或多个等。
从以上的描述中,可以看出,本申请实施例提供的人体跌倒检测装置,通过获取单帧的目标图像而不是视频流进行分析处理,并先利用基于目标检测算 法的目标检测网络识别出包含有人体的图像,再通过基于分类算法的卷积神经网络对目标图像中的人体状态进行分类,以识别出目标图像中人体的具体状态,从而解决了现有方法中存在的识别跌倒准确度差、效率低的技术问题,达到了精确、高效地识别出跌倒状态的技术效果;又通过采集声音信息以确定目标方位,并根据目标方位移动摄像头以采集有效的目标图像,有效地扩大了跌倒检测的检测范围,提高了获取有效目标图像的准确度,改善了检测效果。
本申请实施方式还提供了一种电子设备,具体可以参阅图3所示的基于本申请实施方式提供的人体跌倒检测方法的电子设备组成结构示意图,所述电子设备具体可以包括输入设备31、处理器32、存储器33。其中,所述输入设备31具体可以用于接收所获取的目标图像。所述处理器32具体可以用于通过目标检测网络,对所述目标图像进行人体检测,以确定所述目标图像是否为包含人体的图像;在确定所述目标图像为包含人体的图像的情况下,通过卷积神经网络,对所述目标图像进行跌倒识别,以确定所述目标图像中的人体是否处于跌倒状态。所述存储器33具体可以用于存储所述目标图像、所述目标检测网络、所述卷积神经网络,以及检测过程中产生的中间数据等。
在本实施方式中,所述输入设备具体可以是用户和计算机系统之间进行信息交换的主要装置之一。所述输入设备可以包括键盘、鼠标、摄像头、扫描仪、光笔、手写输入板、语音输入装置等;输入设备用于把原始数据和处理这些数的程序输入到计算机中。所述输入设备还可以获取接收其他模块、单元、设备传输过来的数据。所述处理器可以按任何适当的方式实现。例如,处理器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(ApplicationSpecificIntegratedCircuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式等等。所述存储器具体可以是现代信息技术中用于保存信息的记忆设备。所述存储器可以包括多个层次,在数字系统中,只要能保存二进制数据的 都可以是存储器;在集成电路中,一个没有实物形式的具有存储功能的电路也叫存储器,如RAM、FIFO等;在系统中,具有实物形式的存储设备也叫存储器,如内存条、TF卡等。
在本实施方式中,该电子设备具体实现的功能和效果,可以与其它实施方式对照解释,在此不再赘述。
本说申请实施方式中还提供了一种基于人体跌倒检测方法的计算机存储介质,所述计算机存储介质存储有计算机程序指令,在所述计算机程序指令被执行时实现:获取目标图像;通过目标检测网络,对所述目标图像进行人体检测,以确定所述目标图像是否为包含人体的图像;在确定所述目标图像为包含人体的图像的情况下,通过卷积神经网络,对所述目标图像进行跌倒识别,以确定所述目标图像中的人体是否处于跌倒状态。
在本实施方式中,上述存储介质包括但不限于随机存取存储器(RandomAccessMemory,RAM)、只读存储器(Read-OnlyMemory,ROM)、缓存(Cache)、硬盘(HardDiskDrive,HDD)或者存储卡(MemoryCard)。所述存储器可以用于存储计算机程序指令。网络通信单元可以是依照通信协议规定的标准设置的,用于进行网络连接通信的接口。
在本实施方式中,该计算机存储介质存储的程序指令具体实现的功能和效果,可以与其它实施方式对照解释,在此不再赘述。
在一个具体实施场景示例中,应用本申请提供人体跌倒检测方法和装置设计相应的人体跌倒检测机器人,并应用该人体跌倒检测机器人进行具体的人体跌倒检测。具体实施过程可以参阅以下内容。
在本实施方式中,上述人体跌倒检测检测机器人具体可以参阅图4所示的在一个场景示例中应用本申请实施方式提供的人体跌倒检测方法和装置设计的人体跌倒检测机器人的组成结构示意图。该机器人具体可以使用声源定位模块定位人体大致方位(即目标方位),再利用摄像头采集数据(即目标图像),通过 深度学习算法实现基于单帧图像的人体跌倒检测。其中,所述跌倒检测机器人包括具体可以移动式机器人本体12、摄像头模块13、报警模块14(可选)、声源定位模块15(可选)、人体检测模块16和跌倒识别模块17等多个功能模块。
具体实施时,上述声源定位模块15具体可以用于判断人体大致方位,并利用摄像头模块13拍摄单帧图像,人体检测模块16和跌倒识别模块17具体可以用于根据拍摄的图像判断人是否跌倒,并将结果传输给可移动式机器人本体12;若跌倒则可移动式机器人本体12可以通过控制报警模块14进行报警。
其中,所述的可移动式机器人本体12至少包括:机器人主体、电机和滑轮等结构。所述的摄像头模块13具体可以用于采集单张图像,并送入人体检测模块16用以判断是否存在人体(即判断图像是否是包含人体的图像)。所述的报警模块14至少可以包含手机通信功能和110报警功能。如此,具体实施时,可以利用手机通信功能实现跌倒信息的发送和图片信息发送,通过110报警功能实现110报警以便及时救助。所述的声源定位模块15具体可以通过麦克风阵列判断声音的来源方向,用以方便的寻找人。所述的人体检测模块16具体可以通过深度学习中SSD目标检测算法实现人体检测。所述的跌倒识别模块17通过深度学习中卷积神经网络实现跌倒状态识别。
在本实施方式中,需要说明的是,上述人体跌倒检测机器人可以认为是一种具体的人体跌倒检测装置,其实施的主要原理同人体跌倒检测装置相同。
具体实施时,可以参阅图5所示的在一个场景示例中应用人体跌倒检测机器人进行人体跌倒检测的流程示意图,利用人体跌倒检测机器人进行人体跌倒检测。具体实施时,可以包括以下步骤:
S1:可选的,通过可移动式机器人结合声源定位模块寻找人的大致方向;
S2:通过摄像头模块采集单帧图像,并传入可移动式机器人;
S3:通过可移动式机器人本体将采集到的单帧图像传入人体检测模块;
S4:通过人体检测模块判断采集的图像中是否有人存在。如有,则继续5; 若没有,则返回1;
S5:将检测到的人体区域送入跌倒识别模块,判断人体是否跌倒;
S6:将识别得到的结果信息传输到可移动式机器人本体;
S7:若跌倒,则继续8;若没有跌倒,则返回2;
S8:执行报警,将跌倒的信息和图像传输到连接的手机或者其他终端上。
在本实施方式中,上述人体检测模块是基于深度学习中SSD目标检测算法实现的。检测模块在进行图像检测之前,可以按照如下流程进行SSD算法训练:
S1:收集包含人体的人体图像样本数据(人占图片的比例不限)。因为需要检测人体区域,且需要检测任何状态下的人体,因此收集的图像数据具体可以包含不同状态下的人体,如站着、蹲着、躺着、倾斜着的人体。
S2:对收集到的人体图像样本数据进行标注。SSD目标检测网络在人体检测时会标定出人体的区域,因此在训练时需要先提供人体图像样本数据中人体的区域。
S3:构建SSD目标检测网络。具体实施时,可以在tensorflow框架上构建SSD目标检测网络,并以inception_v2为特征提取器。
S4:用处理好的人体图像样本数据训练SSD目标检测网络,并利用现有已经训练好的参数模型对其进行微调,得到用于人体检测的SSD网络(即目标检测网络)。
在本实施方式中,上述跌倒检测模块具体可以包括一种深度学习中的卷积神经网络。跌倒识别模块在进行图像识别之前,具体可以通过如下流程进行卷积神经网络训练:
S1:收集包含人体的预处理样本数据(人占图片的比例超过80%,即人体检测模块检测到的人体区域图片)。
S2:构建正负图像数据样本。正样本(即正样本数据)所包含所有非跌倒的人体图片,即人体状态为站着、端着、倾斜着等;负样本(即负样本数据)所包 含的图片都是人跌倒后图片,即人体状态为躺着、趴着等。
S3:预处理图像数据样本中的图像。具体的,可以将所有的图像数据变换到指定大小,例如299×299像素点大小。
S4:构建卷积神经网络。具体的,上述跌倒识别模块可以采用inception_v3网络。
在本实施方式中,需要补充的是,针对跌倒识别的需求,通常使用的inception_v3的网路在计算资源上存在浪费。因此在构建inception_v3网络时,对其进行了简化修改,具体简化改进包括以下内容:
S4-1:在保证识别准确率的同时,减少inception结构,例如层数。达到简化了网络结构,提升了识别速度,节约了计算资源的效果。
S4-2:在保证识别准确率同时,减少卷积核个数。达到降低了网络大小,提升了识别速度,节约了计算资源的效果。
S5:将预处理后图片数据样本输入inception_v3网络进行训练,得到跌倒识别网络(即卷积神经网络)。
在本实施方式中,具体利用上述人体检测模块和跌倒检测模块进行人体跌倒检测时具体可以包括以下内容:
S1:将采集到的图片输入SSD目标检测网络,检测人体所在的区域,并将结果保存。
S2:将检测到的所有人体区域变换成指定大小,如299×299像素点大小。
S3:将S2中得到结果输入到得到的inception_v3模型中,以多线程的方式同时进行预测,给出识别结果。
S4:根据所述识别记过,显示跌倒检测结果,确定人体是否发生跌倒。
对上述人体跌倒检测机器人进行多次跌倒检测测试后,分析发现:上述人体跌倒检测机器人由于使用目标检测算法SSD和图像分类算法CNN,可以在复杂场景下,通过单帧图像实现较高精度的跌倒检测,并可以实施报警处理。克 服的现有方法中人体检测不准确的问题;同时由于不需要通过对视频流的分析处理,仅以单帧图像就能实现跌倒检测,降低了计算量,提高了检测效率;并且以可移动式机器人为载体,可实现全方位的监控。
通过上述场景示例,验证了本申请实施例提供的人体跌倒检测方法和装置,通过获取单帧的目标图像而不是视频流进行分析处理,并先利用基于目标检测算法的目标检测网络识别出包含有人体的图像,再通过基于分类算法的卷积神经网络对目标图像中的人体状态进行分类,以识别出目标图像中人体的具体状态,确实解决了现有方法中存在的识别跌倒准确度差、效率低的技术问题,达到了精确、高效地识别出跌倒状态的技术效果。
尽管本申请内容中提到不同的具体实施例,但是,本申请并不局限于必须是行业标准或实施例所描述的情况等,某些行业标准或者使用自定义方式或实施例描述的实施基础上略加修改后的实施方案也可以实现上述实施例相同、等同或相近、或变形后可预料的实施效果。应用这些修改或变形后的数据获取、处理、输出、判断方式等的实施例,仍然可以属于本申请的可选实施方案范围之内。
虽然本申请提供了如实施例或流程图所述的方法操作步骤,但基于常规或者无创造性的手段可以包括更多或者更少的操作步骤。实施例中列举的步骤顺序仅仅为众多步骤执行顺序中的一种方式,不代表唯一的执行顺序。在实际中的装置或客户端产品执行时,可以按照实施例或者附图所示的方法顺序执行或者并行执行(例如并行处理器或者多线程处理的环境,甚至为分布式数据处理环境)。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、产品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、产品或者设备所固有的要素。在没有更多限制的情况下,并不排除在包括所述要素的过程、方法、产品或者设备中还存在另外的相同或等同要素。
上述实施例阐明的装置或模块等,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。为了描述的方便,描述以上装置时以功能分为各种模块分别描述。当然,在实施本申请时可以把各模块的功能在同一个或多个软件和/或硬件中实现,也可以将实现同一功能的模块由多个子模块的组合实现等。以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。
本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内部包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构、类等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,移动终端,服务器,或者网络设备等)执行本申请各个实施例或者实施例的某些部分所述的 方法。
本说明书中的各个实施例采用递进的方式描述,各个实施例之间相同或相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。本申请可用于众多通用或专用的计算机系统环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。
虽然通过实施例描绘了本申请,本领域普通技术人员知道,本申请有许多变形和变化而不脱离本申请的精神,希望所附的实施方式包括这些变形和变化而不脱离本申请。

Claims (8)

  1. 一种人体跌倒检测方法,其特征在于,包括:
    获取目标图像;
    通过目标检测网络,对所述目标图像进行人体检测,以确定所述目标图像是否为包含人体的图像;
    在确定所述目标图像为包含人体的图像的情况下,通过卷积神经网络,对所述目标图像进行跌倒识别,以确定所述目标图像中的人体是否处于跌倒状态。
  2. 根据权利要求1所述的方法,其特征在于,所述获取目标图像,包括:
    采集目标区域中的声音信息;
    根据所述声音信息,确定目标方位;
    根据所述目标方位,移动摄像头,以获取所述目标图像。
  3. 根据权利要求1所述的方法,其特征在于,按照以下方式建立所述卷积神经网络:
    获取人体图像样本数据,其中,所述人体图像样本数据包括多个包含人体状态的图像;
    从所述人体图像样本数据中提取符合要求的图像作为预处理样本数据;
    根据所述预处理样本数据的图像中的人体状态,将所述预处理样本数据中的图像划分正样本数据和负样本数据,其中,所述正样本数据中的图像包括以下至少之一:包含有人体站着的状态的图像、包含有人体坐着的状态的图像、包含有人体蹲着的状态的图像、包含有人体倾斜着的状态的图像;所述负样本数据中的图像包括以下至少之一:包含有人体躺着的状态的图像、包含有人体趴着的状态的图像;
    利用所述正样本数据、所述负样本数据进行训练,以建立用于识别人体状态类型的卷积神经网络。
  4. 根据权利要求3所述的方法,其特征在于,建立所述卷积神经网络的过程中,所述方法还包括:
    获取不包含人体的图像样本数据;
    利用所述不包含人体的图像样本数据,对所述卷积神经网络进行误检测训练。
  5. 一种人体跌倒检测装置,其特征在于,包括:
    获取模块,用于获取目标图像;
    人体检测模块,用于通过目标检测网络,对所述目标图像进行人体检测,以确定所述目标图像是否为包含人体的图像;
    跌倒识别模块,用于在确定所述目标图像为包含人体的图像的情况下,通过卷积神经网络,对所述目标图像进行跌倒识别,以确定所述目标图像中的人体是否处于跌倒状态。
  6. 根据权利要求5所述的装置,其特征在于,所述获取模块包括:
    声音采集器,用于采集目标区域中的声音信息;
    定位器,用于根据所述声音信息,确定目标方位;
    移动装置和摄像头,其中,所述摄像头设于所述移动装置上,所述移动装置用于根据所述目标方位,移动所述摄像头;所述摄像头用于获取目标图像。
  7. 根据权利要求5所述的装置,其特征在于,所述装置还包括卷积神经网络建立模块,用于建立用于识别人体状态类型的卷积神经网络,其中,所述卷积神经网络建立模块包括:
    获取单元,用于获取人体图像样本数据,其中,所述人体图像样本数据包括多个包含人体状态的图像;
    提取单元,用于从所述人体图像样本数据中提取符合要求的图像作为预处理样本数据;
    划分单元,用于根据所述预处理样本数据的图像中的人体状态,将所述预处理样本数据中的图像划分正样本数据和负样本数据,其中,所述正样本数据中的图像包括以下至少之一:包含有人体站着的状态的图像、包含有人体坐着的状态的图像、包含有人体蹲着的状态的图像、包含有人体倾斜着的状态的图像;所述负样本数据中的图像包括以下至少之一:包含有人体躺着的状态的图像、包含有人体趴着的状态的图像;
    建立单元,用于利用所述正样本数据、所述负样本数据进行训练,以建立用于识别人体状态类型的卷积神经网络。
  8. 根据权利要求7所述的装置,其特征在于,所述卷积神经网络建立模块还包括:
    误检测训练单元,用于获取不包含人体的图像样本数据;并利用所述不包含人体的图像样本数据,对所述卷积神经网络进行误检测训练。
PCT/CN2018/104734 2017-12-29 2018-09-08 人体跌倒检测方法和装置 WO2019128304A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711468689.0 2017-12-29
CN201711468689.0A CN108090458B (zh) 2017-12-29 2017-12-29 人体跌倒检测方法和装置

Publications (1)

Publication Number Publication Date
WO2019128304A1 true WO2019128304A1 (zh) 2019-07-04

Family

ID=62179860

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/104734 WO2019128304A1 (zh) 2017-12-29 2018-09-08 人体跌倒检测方法和装置

Country Status (2)

Country Link
CN (1) CN108090458B (zh)
WO (1) WO2019128304A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110633643A (zh) * 2019-08-15 2019-12-31 青岛文达通科技股份有限公司 一种面向智慧社区的异常行为检测方法及系统
CN111178134A (zh) * 2019-12-03 2020-05-19 广东工业大学 一种基于深度学习与网络压缩的摔倒检测方法
CN111461042A (zh) * 2020-04-07 2020-07-28 中国建设银行股份有限公司 跌倒检测方法及系统
CN111639546A (zh) * 2020-05-07 2020-09-08 金钱猫科技股份有限公司 一种基于神经网络的小尺度目标云计算识别方法和装置
CN113221621A (zh) * 2021-02-04 2021-08-06 宁波卫生职业技术学院 一种基于深度学习的重心监测与识别方法
CN113478485A (zh) * 2021-07-06 2021-10-08 上海商汤智能科技有限公司 机器人及其控制方法、装置、电子设备、存储介质
CN113762219A (zh) * 2021-11-03 2021-12-07 恒林家居股份有限公司 一种移动会议室内人物识别方法、系统和存储介质
CN114229646A (zh) * 2021-12-28 2022-03-25 苏州汇川控制技术有限公司 电梯控制方法、电梯及电梯检测系统

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090458B (zh) * 2017-12-29 2020-02-14 南京阿凡达机器人科技有限公司 人体跌倒检测方法和装置
CN108961675A (zh) * 2018-06-14 2018-12-07 江南大学 基于卷积神经网络的跌倒检测方法
CN108985214A (zh) * 2018-07-09 2018-12-11 上海斐讯数据通信技术有限公司 图像数据的标注方法和装置
CN111127837A (zh) * 2018-10-31 2020-05-08 杭州海康威视数字技术股份有限公司 一种报警方法、摄像机及报警系统
CN111382610B (zh) * 2018-12-28 2023-10-13 杭州海康威视数字技术股份有限公司 一种事件检测方法、装置及电子设备
US11179064B2 (en) * 2018-12-30 2021-11-23 Altum View Systems Inc. Method and system for privacy-preserving fall detection
CN110008853B (zh) * 2019-03-15 2023-05-30 华南理工大学 行人检测网络及模型训练方法、检测方法、介质、设备
CN111967287A (zh) * 2019-05-20 2020-11-20 江苏金鑫信息技术有限公司 一种基于深度学习的行人检测方法
CN110443150A (zh) * 2019-07-10 2019-11-12 思百达物联网科技(北京)有限公司 一种跌倒检测方法、装置、存储介质
CN110532966A (zh) * 2019-08-30 2019-12-03 深兰科技(上海)有限公司 一种基于分类模型进行跌倒识别的方法及设备
CN111352349A (zh) * 2020-01-27 2020-06-30 东北石油大学 对老年人居住环境进行信息采集和调节的系统及方法
CN112149511A (zh) * 2020-08-27 2020-12-29 深圳市点创科技有限公司 基于神经网络的驾驶员违规行为检测方法、终端、装置
CN112418096A (zh) * 2020-11-24 2021-02-26 京东数科海益信息科技有限公司 检测跌的方法、装置和机器人
CN112784676A (zh) * 2020-12-04 2021-05-11 中国科学院深圳先进技术研究院 图像处理方法、机器人及计算机可读存储介质
CN112733618A (zh) * 2020-12-22 2021-04-30 江苏艾雨文承养老机器人有限公司 人体跌倒检测方法、防跌倒机器人及防跌倒系统
CN113158733B (zh) * 2020-12-30 2024-01-02 北京市商汤科技开发有限公司 图像过滤方法、装置、电子设备及存储介质
CN113065473A (zh) * 2021-04-07 2021-07-02 浙江天铂云科光电股份有限公司 一种适用于嵌入式系统的口罩人脸检测和体温测量方法
CN113221661A (zh) * 2021-04-14 2021-08-06 浪潮天元通信信息系统有限公司 一种智能化人体摔倒检测系统及方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102722715A (zh) * 2012-05-21 2012-10-10 华南理工大学 一种基于人体姿势状态判决的跌倒检测方法
US20130128051A1 (en) * 2011-11-18 2013-05-23 Syracuse University Automatic detection by a wearable camera
CN105678267A (zh) * 2016-01-08 2016-06-15 浙江宇视科技有限公司 一种场景识别方法及装置
CN107331118A (zh) * 2017-07-05 2017-11-07 浙江宇视科技有限公司 跌倒检测方法及装置
CN107408308A (zh) * 2015-03-06 2017-11-28 柯尼卡美能达株式会社 姿势检测装置以及姿势检测方法
CN108090458A (zh) * 2017-12-29 2018-05-29 南京阿凡达机器人科技有限公司 人体跌倒检测方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130128051A1 (en) * 2011-11-18 2013-05-23 Syracuse University Automatic detection by a wearable camera
CN102722715A (zh) * 2012-05-21 2012-10-10 华南理工大学 一种基于人体姿势状态判决的跌倒检测方法
CN107408308A (zh) * 2015-03-06 2017-11-28 柯尼卡美能达株式会社 姿势检测装置以及姿势检测方法
CN105678267A (zh) * 2016-01-08 2016-06-15 浙江宇视科技有限公司 一种场景识别方法及装置
CN107331118A (zh) * 2017-07-05 2017-11-07 浙江宇视科技有限公司 跌倒检测方法及装置
CN108090458A (zh) * 2017-12-29 2018-05-29 南京阿凡达机器人科技有限公司 人体跌倒检测方法和装置

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110633643A (zh) * 2019-08-15 2019-12-31 青岛文达通科技股份有限公司 一种面向智慧社区的异常行为检测方法及系统
CN111178134A (zh) * 2019-12-03 2020-05-19 广东工业大学 一种基于深度学习与网络压缩的摔倒检测方法
CN111178134B (zh) * 2019-12-03 2023-05-30 广东工业大学 一种基于深度学习与网络压缩的摔倒检测方法
CN111461042A (zh) * 2020-04-07 2020-07-28 中国建设银行股份有限公司 跌倒检测方法及系统
CN111639546A (zh) * 2020-05-07 2020-09-08 金钱猫科技股份有限公司 一种基于神经网络的小尺度目标云计算识别方法和装置
CN113221621A (zh) * 2021-02-04 2021-08-06 宁波卫生职业技术学院 一种基于深度学习的重心监测与识别方法
CN113221621B (zh) * 2021-02-04 2023-10-31 宁波卫生职业技术学院 一种基于深度学习的重心监测与识别方法
CN113478485A (zh) * 2021-07-06 2021-10-08 上海商汤智能科技有限公司 机器人及其控制方法、装置、电子设备、存储介质
CN113762219A (zh) * 2021-11-03 2021-12-07 恒林家居股份有限公司 一种移动会议室内人物识别方法、系统和存储介质
CN114229646A (zh) * 2021-12-28 2022-03-25 苏州汇川控制技术有限公司 电梯控制方法、电梯及电梯检测系统
CN114229646B (zh) * 2021-12-28 2024-03-22 苏州汇川控制技术有限公司 电梯控制方法、电梯及电梯检测系统

Also Published As

Publication number Publication date
CN108090458A (zh) 2018-05-29
CN108090458B (zh) 2020-02-14

Similar Documents

Publication Publication Date Title
WO2019128304A1 (zh) 人体跌倒检测方法和装置
US20220175287A1 (en) Method and device for detecting driver distraction
CN112183166B (zh) 确定训练样本的方法、装置和电子设备
WO2019000929A1 (zh) 垃圾分类回收方法、垃圾分类设备以及垃圾分类回收系统
US20090051787A1 (en) Apparatus and method for photographing image using digital camera capable of providing preview images
WO2019129255A1 (zh) 一种目标跟踪方法及装置
EP3702957B1 (en) Target detection method and apparatus, and computer device
WO2022227490A1 (zh) 行为识别方法、装置、设备、存储介质、计算机程序及程序产品
WO2021031954A1 (zh) 对象数量确定方法、装置、存储介质与电子设备
CN110705500A (zh) 基于深度学习的人员工作图像的注意力检测方法及系统
KR101337554B1 (ko) 차량용 블랙박스의 영상인식을 이용한 수배자 및 실종자 추적 장치 및 그 방법
US10621424B2 (en) Multi-level state detecting system and method
KR20220078893A (ko) 영상 속 사람의 행동 인식 장치 및 방법
CN111723671A (zh) 一种智慧灯杆呼救系统及方法
CN104392201B (zh) 一种基于全向视觉的人体跌倒识别方法
KR102116396B1 (ko) 사회약자 인식장치 및 그 장치의 구동방법
EP3035238A1 (en) Video surveillance system and method for fraud detection
CN115331386B (zh) 一种基于计算机视觉的防垂钓检测告警系统
WO2023164370A1 (en) Method and system for crowd counting
CN112733722B (zh) 姿态识别方法、装置、系统及计算机可读存储介质
CN111178134B (zh) 一种基于深度学习与网络压缩的摔倒检测方法
KR102134771B1 (ko) 객체 인식을 통해 위급 상황을 판단하는 장치 및 방법
Varghese et al. An Intelligent Voice Assistance System for Visually Impaired using Deep Learning
CN112883876A (zh) 室内行人检测的方法、装置、设备及计算机存储介质
CN111611979A (zh) 基于面部扫描的智能健康监测系统及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18894977

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18894977

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 09.03.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18894977

Country of ref document: EP

Kind code of ref document: A1