WO2021031384A1

WO2021031384A1 - Fall-down behavior detection processing method and apparatus, and computer device and storage medium

Info

Publication number: WO2021031384A1
Application number: PCT/CN2019/116490
Authority: WO
Inventors: 王健宗; 王义文
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-08-19
Filing date: 2019-11-08
Publication date: 2021-02-25
Also published as: CN110647812A; CN110647812B

Abstract

Disclosed are a fall-down behavior detection processing method and apparatus, and a computer device and a storage medium. The method comprises: obtaining a video to be identified acquired by a video acquisition device in real time; identifying said video by using an R-C3D-based behavior detection model, and determining whether a behavior action corresponding to said video comprises a fall-down action; if the behavior action corresponding to said video comprises a fall-down action, intercepting a target video clip corresponding to the fall-down action from said video; performing severity analysis on the target video clip to obtain a fall-down severity corresponding to the target video clip; obtaining medical suggestion information corresponding to the target video clip on the basis of the fall-down severity; and sending the target video clip and the medical suggestion information to a reminding terminal corresponding to the target video clip. According to the method, whether said video comprises the fall-down action can be quickly and accurately detected, and targeted reminding is carried out on the basis of the fall-down action.

Description

Falling behavior detection and processing method, device, computer equipment and storage medium

This application is based on the Chinese invention application filed on August 19, 2019 with the application number 201910763921.6 and titled "Falling behavior detection and processing method, device, computer equipment and storage medium", and claims its priority.

Technical field

This application relates to the field of artificial intelligence technology, and in particular to a fall behavior detection and processing method, device, computer equipment, and storage medium.

Background technique

A fall is a sudden fall to the ground. If the circumstances are serious, it may have serious consequences for the health of the falling person. For example, for an elderly person who falls, it may cause psychological creation, fractures and soft tissue damage due to the fall, which may affect the physical and mental health of the person who falls. When a person living alone or a person who falls alone in a public place accidentally falls, because the person who fell does not pay enough attention to it, no timely treatment measures are taken, and serious consequences are caused by the delay in the treatment time. Therefore, how to quickly and accurately identify whether there is a fall behavior and give targeted reminders has become an urgent problem to be solved in public places, nursing places, or nursing care of elderly people living alone to avoid the risks caused by falling behavior.

Summary of the invention

The embodiments of the present application provide a fall behavior detection processing method, device, computer equipment, and storage medium to solve the problem of how to quickly and accurately identify whether there is a fall behavior and give targeted reminders.

A fall behavior detection and processing method, including:

Obtain the to-be-identified video captured by the video capture device in real time;

Recognizing the video to be recognized by using a behavior detection model based on R-C3D, and determining whether the behavior action corresponding to the video to be recognized includes a falling motion;

If the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;

Analyzing the severity of the target video segment to obtain the severity of the fall corresponding to the target video segment;

Obtaining medical advice information corresponding to the target video clip based on the severity of the fall;

The target video segment and the medical advice information are sent to a reminder terminal corresponding to the target video segment.

A fall behavior detection and processing device, including:

The to-be-recognized video acquisition module acquires the to-be-recognized video collected by the video capture device in real time;

A fall action detection module, configured to use an R-C3D-based behavior detection model to recognize the video to be recognized, and determine whether the behavior action corresponding to the video to be recognized includes a fall action;

A target video segment interception module, configured to, if the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;

A fall severity acquisition module, configured to analyze the severity of the target video clip, and obtain the fall severity corresponding to the target video clip;

A medical advice information acquisition module, configured to acquire medical advice information corresponding to the target video clip based on the severity of the fall;

The information sending module sends the target video clip and the medical advice information to the reminder terminal corresponding to the target video clip.

A computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:

One or more readable storage media storing computer readable instructions, the computer readable storage medium storing computer readable instructions, and when the computer readable instructions are executed by one or more processors, the one Or multiple processors perform the following steps:

The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.

FIG. 1 is a schematic diagram of an application environment of a fall behavior detection and processing method in an embodiment of the present application;

2 is a flowchart of a method for detecting and processing a fall behavior in an embodiment of the present application;

FIG. 3 is another flowchart of the method for detecting and processing a fall behavior in an embodiment of the present application;

FIG. 4 is another flowchart of a fall behavior detection processing method in an embodiment of the present application;

FIG. 5 is another flowchart of the method for detecting and processing a fall behavior in an embodiment of the present application;

FIG. 6 is another flowchart of a method for detecting and processing a fall behavior in an embodiment of the present application;

FIG. 7 is another flowchart of a fall behavior detection processing method in an embodiment of the present application;

FIG. 8 is another flowchart of a fall behavior detection processing method in an embodiment of the present application;

FIG. 9 is a schematic diagram of a fall behavior detection and processing device in an embodiment of the present application;

Fig. 10 is a schematic diagram of a computer device in an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of this application.

The fall behavior detection and processing method provided by the embodiment of the present application can be applied to the application environment as shown in FIG. 1. Specifically, the method for detecting and processing falling behaviors is applied in a system for detecting and processing falling behaviors. The system for detecting and processing falling behaviors includes a client and a server as shown in FIG. 1. The client and the server communicate through a network for Realize the rapid detection and recognition of the fall action from the video to be recognized and analyze the severity of the fall, and provide targeted reminders based on the severity of the fall to avoid delaying the treatment time and causing serious consequences. Among them, the client is also called the client, which refers to the program that corresponds to the server and provides local services to the client. The client can be installed on, but not limited to, various personal computers, laptops, smart phones, tablet computers, and portable wearable devices. The server can be implemented as an independent server or a server cluster composed of multiple servers.

In an embodiment, as shown in FIG. 2, a method for detecting and processing a fall behavior is provided. Taking the method applied to the server in FIG. 1 as an example, the method includes the following steps:

S201: Acquire a to-be-recognized video captured by the video capture device in real time.

Among them, the to-be-identified video is an unidentified video collected in real time by a video acquisition device. Video capture equipment is a device used to capture video, which can be set up in shopping malls, hospitals, nursing places or other public places, or set up by guardians in the homes of elderly people living alone.

S202: Recognize the video to be recognized by using the behavior detection model based on R-C3D, and determine whether the behavior action corresponding to the video to be recognized includes a fall motion.

Among them, the behavior detection model based on R-C3D (Region Convolutional 3D Network for Temporal Activity Detection, a regional convolutional 3D network for temporal activity detection) uses the R-C3D network pre-trained to identify the behavior of people in the video model. The behavior action corresponding to the video to be recognized refers to the behavior action recognized from the video to be recognized. Specifically, using the R-C3D behavior detection model to recognize the video to be recognized can quickly determine that the behavior of the person in the video to be recognized includes a falling motion.

Among them, the R-C3D network is a network trained in an end-to-end manner, and a three-dimensional convolution kernel is used to process the video to be recognized. R-C3D has 8 convolution operations and 5 pooling operations. Among them, the size of the convolution kernel is 3*3*3, and the step size is 1*1*1. The pooling core is 2*2*2, but in order not to shorten the timing length too early, the pooling size and step size of the first layer are 1*2*2; finally, the R-C3D network passes through two full cycles. The final output result obtained after connecting the layer and the softmax layer. The input image of the R-C3D network is 3*L*H*W, where 3 is RGB three channels, L is the number of frames of the input image, and H*W is the size of the input image.

Understandably, the R-C3D-based behavior detection model is a model for detecting actions in video people obtained by end-to-end training based on the R-C3D network. In order to ensure the detection efficiency and accuracy of the R-C3D-based behavior detection model trained on the R-C3D network, the positive samples that contain the fall action and the positive sample that does not contain the fall can be used in the model training process. The negative samples of actions (that is, video clips corresponding to actions other than the falling action) are trained on the model according to a preset ratio (which can be set to 1:1 to achieve a balanced sample and avoid overfitting). Since R-C3D is based on C3D for frame classification (Frame Label), it can quickly detect whether there is a fall action in the video to be recognized, and it can perform end-to-end fall detection for any length of video and any length of behavior, and share timing The C3D parameters of the generation and classification network are very fast, which helps to ensure the efficiency of fall detection.

S203: If the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized.

Wherein, the target video segment is a video segment that is cut out from the video to be recognized and used to analyze the severity of the fall corresponding to the fall action. Understandably, after a person falls, because of the different pain caused by the fall, his facial micro-expression will show different micro-expression changes or the body posture will show actions that match the pain. Therefore, Analyze the video after the fall action in order to analyze the severity of the fall of the person who fell. Specifically, after the server uses the R-C3D behavior detection model to recognize that the video to be recognized contains a falling motion, it can intercept a video clip containing the falling motion and a certain length of time after the falling motion from the video to be recognized as the target video clip , In order to analyze the severity of the fall, thereby reducing the amount of data processed by the severity analysis and improving the efficiency and accuracy of the analysis.

S204: Analyze the severity of the target video segment, and obtain the severity of the fall corresponding to the target video segment.

Since the target video clip is a video clip that is used to analyze the severity of the fall from the video to be identified, the server is analyzing the severity of the target video clip and can objectively and quickly analyze the fall of the person falling in the target video clip. Down the severity. For example, if the person who fell in the target video clip was a young person, and the target video clip after the fall showed that his facial micro-expression did not show painful expressions or showed painful expressions for a very short time, the severity of the fall could be determined Lower. For another example, if the falling person in the target video clip is an elderly person, the target video clip after the fall shows his facial micro-expression showing a painful expression that lasts for a long time, or has actions such as stroking the collision point for a long time. It can be determined that the severity of the fall is relatively high.

S205: Obtain medical advice information corresponding to the target video clip based on the severity of the fall.

Specifically, the server compares the severity of the fall with a preset degree threshold according to the severity of the fall analyzed by the target video clip to determine whether medical advice needs to be provided. The preset degree threshold is a preset degree threshold for evaluating whether medical advice needs to be provided. If the severity of the fall is less than the preset degree threshold, there is no need to provide medical advice, and the target video clip can be directly sent to the corresponding reminder terminal to remind the fall behavior. If the severity of the fall is not less than the preset degree threshold, identify the fall bone joint point corresponding to the fall action from the target video clip, query the medical advice database based on the fall bone joint point, and obtain the bone joint with the fall The medical advice information corresponding to the point is used as the medical advice information corresponding to the target video segment.

Among them, the falling skeletal joint points refer to the identification of the bones or joint points that the feller collided when he fell from the target video clip. By identifying the falling skeletal joint points, it can help to provide corresponding medical treatment to the falling person. Suggest. The medical advice information database is an information database used to store medical test recommendations or medical medication recommendations that need to be performed when each bone joint point is fallen. Medical advice information refers to information on determining which medical tests or medical medications are needed according to the bone joint points of the fall. For example, if the knee joint in the target video clip first touches the ground when a fall, the bone joint point of the fall is the knee joint, and the acquired medical advice information is medical advice related to knee joint injuries.

S206: Send the target video clip and the medical advice information to the reminder terminal corresponding to the target video clip.

Among them, the reminder terminal is a terminal for receiving target video clips or target video clips and medical advice information. Generally speaking, the reminder terminal is a terminal corresponding to the user who installs the video capture device. If the video capture device is installed in a public place, its corresponding reminder terminal can be the reminder terminal corresponding to the staff in the public place, specifically it can be the mobile terminal carried by the staff working at the entrance and exit of the public place, in order to fall down When a person leaves a public place, inform the person who fell or the person accompanying the person corresponding to the corresponding fall behavior and medical advice. If the video capture device is set in an elderly person living alone, the reminder terminal may be a terminal bound to the video capture device.

In the fall behavior detection and processing method provided in this embodiment, the R-C3D behavior detection model can quickly identify whether the video to be recognized contains a fall action, so as to improve the detection efficiency and detection accuracy of the fall action; The target video clip corresponding to the fall action is intercepted from the video to be recognized, and the severity of the target video clip is analyzed to reduce the amount of analysis data for the severity analysis, improve the analysis efficiency and accuracy; obtain corresponding medical advice based on the severity of the fall Information, medical advice information and target video clips are sent to the reminder terminal to realize targeted reminders for the fall behavior and avoid the risk of not adopting corresponding treatment measures after the fall.

In one embodiment, as shown in FIG. 3, step S202, that is, using a behavior detection model based on R-C3D to recognize the video to be recognized, and determining whether the behavior action corresponding to the video to be recognized includes a falling motion, includes the following steps:

S301: Based on the segment duration threshold and the overlap duration threshold, perform interlaced cutting of the video to be identified to obtain at least two original video segments, each of which corresponds to a segment time stamp.

Wherein, the segment duration threshold is a preset threshold for cutting the duration of the original video segment, that is, the duration of each original video segment cut out in this embodiment is the segment duration threshold, such as 10s. The overlap duration threshold is a preset threshold for the duration of overlapping two adjacent original video segments when cutting the original video segment, such as 3s. The original video segment is a unit segment cut out from the video to be recognized and used for inputting the behavior detection model for recognition. The segment time stamp corresponding to the original video segment may be the time stamp corresponding to the first image in the original video segment, so as to determine the corresponding video cutting sequence based on the segment time stamp.

Specifically, the server performs interlaced cutting of the video to be identified based on the segment duration threshold and the overlap duration threshold, which can ensure that any two adjacent original video segments cut out are partially overlapped, so as to ensure the accuracy of subsequent fall motion detection Therefore, when a fall action is formed between two consecutive original video clips that do not overlap, any one of the original video clips cannot be judged separately to form a fall action. For example, for a video to be recognized, first cut the 0-10s segment to form the first original video segment, then cut the 7-17s segment to form the second original video segment, and then cut the 14th to the first original video segment. The 24s segment forms the third original video segment, and so on, until all the original video segments are cut.

S302: According to the sequence of the time stamps of the segments, sequentially input at least two original video segments into the R-C3D-based behavior detection model for recognition, and determine whether each original video segment includes a fall action.

Specifically, the server determines the cutting sequence corresponding to at least two original video clips according to the sequence of the clip timestamps, and sequentially inputs the at least two original video clips into the R-C3D-based behavior detection model for recognition, so as to determine the original video clip Whether it includes a falling motion, it can be objectively and quickly determined whether each original video clip includes a falling motion.

In the fall behavior detection processing method provided by this embodiment, the video to be recognized is interleaved according to the segment duration threshold and the overlap duration threshold to obtain at least two original video segments, and input the at least two original video segments in sequence based on R-C3D's behavior detection model performs recognition to quickly determine whether each original video segment contains a fall action, so as to split the long-duration to-be-recognized video into short-duration original video segments for recognition. Improve the recognition accuracy and efficiency of the original video clips, and improve the fault tolerance rate of the fall motion detection, and avoid the long-term video to be recognized in the recognition process due to various objective reasons (such as server downtime) leading to overall recognition errors , It will take a long time for subsequent re-identification.

In one embodiment, as shown in FIG. 4, step S203, that is, if the behavior action corresponding to the video to be recognized includes a falling motion, intercepting the target video segment corresponding to the falling motion from the video to be recognized includes the following steps:

S401: If the behavior action corresponding to the to-be-recognized video includes a falling action, determine that the time stamp corresponding to the to-be-recognized image corresponding to the falling action is the starting time stamp.

Among them, the image to be recognized is an image that constitutes the video to be recognized. Specifically, when the server recognizes that the behavior action corresponding to the video to be recognized includes a falling motion, the time stamp corresponding to the image to be recognized at the moment when the falling motion is detected is used as the starting timestamp corresponding to the segmentation target video segment. Understandably, the time stamp of the image to be recognized corresponding to the fall action is determined as the starting time stamp, so that the target video clip can be intercepted from this starting time stamp, and the micro-expression changes and body posture after the fall action are analyzed. Change to determine the severity of the corresponding fall.

S402: Determine a termination time stamp based on the starting time stamp and the analysis duration threshold.

Among them, the analysis duration threshold is a threshold preset by the system for determining the video duration for which the severity of the fall needs to be analyzed. Specifically, the server adds the duration of the analysis duration threshold from the start timestamp to determine the end timestamp corresponding to the target video segment for dividing.

S403: Intercept a video segment between the start time stamp and the end time stamp from the video to be identified, and determine the target video segment corresponding to the fall action.

Since each image to be identified in the video to be identified corresponds to a unique time stamp, after determining the start time stamp and the end time stamp, the image to be identified corresponding to the start time stamp is intercepted from the video to be identified as the start image , Using the to-be-recognized image corresponding to the termination timestamp as the video segment of the termination image, and determining the video segment as the target video segment corresponding to the fall action, so as to use the target video segment to analyze the severity of the fall.

In the fall behavior detection processing method provided in this embodiment, the time stamp corresponding to the to-be-recognized image corresponding to the fall action is first determined as the starting time stamp, the starting time stamp is used to determine the ending time stamp, and the starting time is intercepted The video segment between the stamp and the termination time stamp is the target video segment, so that each image to be recognized in the target video segment is the analysis time threshold of the falling person after the fall, which can reflect the true emotional changes of the falling person In order to use the target video clip to analyze the severity of a fall, the objective authenticity of the analysis result can be guaranteed.

In an embodiment, each target video segment includes at least one image to be recognized. As shown in FIG. 5, step S204, that is, analyzing the severity of the target video segment to obtain the severity of the fall corresponding to the target video segment, includes the following steps:

S501: Recognize each to-be-recognized image in the target video segment using a micro-expression recognition model, and obtain a micro-expression type corresponding to each to-be-recognized image.

Among them, the micro-expression recognition model is a model used to recognize the facial micro-expression in the image to be recognized. In this embodiment, the micro-expression recognition model captures the local features of the user's face in the image to be recognized, and determines each target facial action unit of the face in the image to be recognized based on the local features, and then according to the recognized target face The action unit determines its micro-expression model. The micro-expression recognition model may be a neural network recognition model based on deep learning, a local recognition model based on classification, or a local emotion recognition model based on Local Binary Pattern (LBP). For example, when a classification-based local recognition model is used as a micro-expression recognition model, the micro-expression recognition model requires pre-collection of a large amount of training image data for model training. The training image data contains positive samples of each facial motion unit and facial motion unit For the negative samples of, train the training image data through the classification algorithm to obtain the micro expression recognition model. Specifically, the SVM classification algorithm can be used to train a large amount of training image data to obtain SVM classifiers corresponding to multiple facial action units. For example, it can be 39 SVM classifiers corresponding to 39 facial action units, or 54 SVM classifiers corresponding to 54 facial action units. The training image data contains the positive samples of different facial action units and The more negative samples, the more SVM classifiers are obtained. Understandably, in forming a micro-expression recognition model by using multiple SVM classifiers, the more SVM classifiers obtained, the more accurate the micro-expression type recognized by the formed micro-expression recognition model.

In this embodiment, taking the micro-expression recognition model formed by the SVM classifier corresponding to 54 facial action units as an example, using this micro-expression recognition model to recognize each image to be recognized in the target video segment, 54 can be identified. Types of micro-expression, for example, 54 types of micro-expression including love, interest, surprise, expectation...aggression, conflict, insult, doubt, fear, and pain can be identified.

S502: If the micro-expression type is the preset expression type, use the facial feature point detection algorithm to detect and locate the image to be recognized, and obtain a target mouth image that includes the mouth area of the face, and the target mouth image includes N face feature points And the corresponding feature location of each face feature point.

Among them, the preset expression type is the type that the system pre-sets that it may be the expression after a fall, such as pain and crying. Among them, the face feature point detection algorithm is an algorithm for face feature point detection. This algorithm can identify N face feature points and the feature location corresponding to each face feature point from the image to be recognized, that is, in the image In the coordinates. The face feature point detection algorithm can detect and locate the face feature points of the left eye, right eye, left eyebrow, right eyebrow, nose and mouth in the image to be recognized.

In this embodiment, after the facial feature point detection algorithm is used to detect and locate the image to be recognized, the image corresponding to the face and mouth area of the standard area size is intercepted from the image to be recognized, and it is determined as the target mouth image for subsequent analysis. . That is, the target mouth image is an image corresponding to the mouth area of the human face that matches the size of the standard area intercepted from the image to be recognized. The standard area size is preset by the system to limit the area size for capturing the target mouth image. Specifically, the standard area size can be determined by limiting the width of the mouth. That is, the width of the standard area is determined. When intercepting the target mouth image from the image to be recognized, the image to be recognized needs to be up-sampled or down-sampled (ie, zooming) to make the person fall in the image to be recognized Match the width of the mouth with the width of the standardized area, and then take a screenshot to obtain a target mouth image with the same width, so as to ensure the accuracy of the analysis result when the subsequent analysis is based on the average distance of the inner lip of the mouth.

Specifically, the target mouth image includes N facial feature points and feature positions corresponding to each facial feature point. At this time, the N facial feature points in the target mouth image refer to the facial feature points corresponding to the contour of the mouth. Mouth contour includes upper lip contour and lower lip contour, upper lip contour includes upper lip outer lip line and upper lip inner lip line, lower lip contour includes lower lip inner lip line and lower lip outer lip line. In this embodiment, the target mouth image According to preset rules, a number of segmentation lines are configured to segment the contour of the mouth to determine the corresponding facial feature points. For example, in the target mouth image, three dividing lines can be drawn at 1/4, 1/2, 3/4 of the width of the mouth, and each dividing line is connected to the outer lip line of the upper lip, the inner lip line of the upper lip, and the lower lip. The intersection of the inner lip line and the lower lip outer lip line respectively forms a set of facial features including upper lip outer lip point, upper lip inner lip point, lower lip inner lip point and lower lip outer lip point.

S503: Obtain the average distance of the inner lips of the mouth based on the feature positions corresponding to the N facial feature points. If the average distance of the inner lips of the mouth is greater than the preset distance threshold, determine the image to be recognized corresponding to the target mouth image as the target recognition image .

Generally speaking, when a person falls, the more painful his micro-expression and the higher the degree of mouth opening, the higher his pain degree, and the more it can reflect the severity of the person who fell. Therefore, when the server obtains the N facial feature points of the target mouth image and their feature positions, it can calculate the average distance of the inner lips of the mouth that reflects the degree of mouth opening. Specifically, the server may first calculate the interline inner lip distance between the upper lip inner lip point and the lower lip inner lip point on each segmentation line, and then calculate the average distance between the inner lip lines corresponding to all the segmentation lines, and that's it. Obtain the average distance of the inner lip of the mouth, so as to use the average distance of the inner lip of the mouth to objectively analyze the pain degree of the customer. Understandably, since all target mouth images are images corresponding to the standard area size, their image widths are the same; at this time, the average distance between the inner lip points of the mouth between all the inner lip points of the mouth can reflect the mouth more accurately Degree of opening.

Among them, the preset distance threshold is a threshold that is preset by the system for assessing the degree of mouth opening to determine that it is painful. Specifically, after obtaining the average distance of the inner lip of the mouth, the server compares the average distance of the inner lip of the mouth with a preset distance threshold. If the average distance of the inner lip of the mouth is greater than the preset distance threshold, it indicates that the mouth is more open. If it is large, it reflects that the degree of pain is high, and the image to be recognized corresponding to the target mouth image is determined as the target recognition image for subsequent analysis of the severity of the fall. At this time, the target recognition image is an image to be recognized whose micro-expression type is the preset expression type, and the average distance of the inner lip point of the mouth is greater than the preset distance threshold, through the micro-expression emotion recognition and the degree of mouth opening. Determining a target recognition image helps ensure the accuracy of the analysis result of the severity of the fall.

S504: According to the number of images corresponding to all target recognition images in the target video segment, obtain a fall severity corresponding to the target video segment.

Since the duration of the target video segment matches the analysis duration threshold, the number of images of all to-be-recognized images in the target video segment is fixed. The target recognition image is the image to be recognized with the micro-expression type of the falling person as the preset expression type, and the average distance of the inner lip of the mouth is greater than the preset distance threshold, which can fully reflect that the falling person is in a more painful state after the fall image. Therefore, under the premise that the number of images to be recognized in the target video clip is fixed, the more the number of target recognition images that reflect the more painful state of the falling person after the fall, the higher the severity of the fall. That is, in this embodiment, the number of images corresponding to the target recognition image is proportional to the severity of the fall, and the corresponding comparison table can be preset to quickly determine the corresponding severity of the fall.

In the fall behavior detection and processing method provided in this embodiment, the image to be recognized is first subjected to micro-expression analysis, and only the image to be recognized whose micro-expression type is the preset expression type is treated for subsequent facial feature point detection and positioning, which is helpful Reduce the amount of data processing and speed up the efficiency of data processing; then determine the average distance of the inner mouth of the mouth according to the feature positions of the N facial feature points corresponding to the target mouth image, and select the image to be recognized with the average distance of the inner mouth of the mouth greater than the preset distance threshold It is determined as the target recognition image to ensure that the target recognition image can truly and objectively reflect the pain of the person who fell, which helps to ensure the accuracy of subsequent fall severity analysis. Using the number of images corresponding to all target recognition images in the target video segment can quickly determine the severity of its fall, which helps to ensure the analysis efficiency, accuracy and objectivity of the analysis and processing.

In one embodiment, as shown in FIG. 6, step S504, that is, obtaining the fall severity corresponding to the target video segment according to the number of images corresponding to all target recognition images in the target video segment, includes the following steps:

S601: Divide the target video segment into at least two segments to be processed, and obtain a preset weight corresponding to each segment to be processed. The preset weight of the segment to be processed later is greater than the preset weight of the segment to be processed before the time .

Specifically, the server may divide the target video clip into at least two video clips to be processed based on the unit duration, and the unit duration may be 1/M (M≧2) of the analysis duration threshold to divide the target video clip into equal durations. At least two fragments to be processed. The preset weight is a weight preset by the system for each segment to be processed. Understandably, since the target video clip is a collection of all the images to be recognized from the recognition of the image to be recognized as a fall action to the analysis time threshold, the time stamp of the target recognition image in the target video clip is After that, it indicates that the duration of the painful state after the fall is higher. Therefore, the server can divide the target video segment into at least two segments to be processed according to the unit duration, and configure a corresponding preset for each segment to be processed The weight can make the preset weight corresponding to the later segment to be processed be greater than the preset weight corresponding to the segment to be processed before the time, so as to ensure the objectivity and accuracy of subsequent analysis and processing.

S602: Based on the number of images corresponding to all target recognition images and the number of images corresponding to all the images to be recognized in each segment to be processed, obtain the target score corresponding to the segment to be processed.

Specifically, the server uses

This formula obtains the target score corresponding to the segment to be processed by the number of images corresponding to all target recognition images and the number of images corresponding to all the images to be recognized in each segment to be processed; where P is the corresponding segment to be processed Target score, A is the number of images corresponding to all target recognition images in a certain segment to be processed, B is the number of images corresponding to all recognition images in a certain segment to be processed, K is a constant, used to normalize the target score To a specific numerical interval.

S603: Perform weighting processing on the preset weights and target scores corresponding to at least two segments to be processed, to obtain a fall score corresponding to the target video segment.

Specifically, the server uses

This formula weights the preset weights and target scores corresponding to at least two segments to be processed to obtain the fall score value corresponding to the target video segment; where S is the fall score value corresponding to the target video segment, and P _i Is the target score corresponding to the i-th segment to be processed, W _i is the preset weight corresponding to the i-th segment to be processed, and j is the number of all segments to be processed in the target video segment.

S604: Query the fall degree comparison table based on the fall score value, and obtain the fall severity corresponding to the target video clip.

Among them, the fall degree comparison table is a comparison table preset by the system to reflect the fall severity and the corresponding scoring range. Specifically, the server obtains the fall score value corresponding to the target video segment, determines its corresponding score range according to the fall score value, and determines the fall severity corresponding to the score range as the fall severity of the target video segment. So as to achieve the purpose of quickly determining the severity of the fall.

In the fall behavior detection and processing method provided in this embodiment, the target video segment is first divided into at least two segments to be processed, and each segment to be processed corresponds to a preset weight, and then the target recognition image in a segment to be processed is counted Determine the corresponding target score, use the preset weight and target score to perform weighting calculation to obtain the fall score value corresponding to the target video clip, so that the fall score value is more Objective and more accurate; using the fall score value to query the fall degree comparison table, the fall severity corresponding to the target video segment can be quickly obtained, and the efficiency of the fall severity analysis is improved.

Further, when analyzing the severity of a fall corresponding to the target video segment, weighting is performed on the preset weights and target scores corresponding to at least two segments to be processed, and after obtaining the fall score corresponding to the target video segment, Extract the target face image from the target video segment (the implementation process is the same as step S701), and then use the pre-trained age detection model to detect the target face image, obtain the predicted age corresponding to the person who fell, and query based on the predicted age The age score table is used to obtain the age constant corresponding to the predicted age, and the product of the fall score value multiplied by the age constant is used to update the fall score value corresponding to the target video clip, so that the fall score value can be considered for the fall Age status in order to better analyze the severity of the fall corresponding to the target video clip. For example, the age constant can be set in the value range of 0-2. For example, the age constant corresponding to the median age (such as 40 years old) can be set to 1; the older the age, the larger the value, indicating that age has an effect on falling The greater the effect of the severity of a person's fall, on the contrary, the younger the age, the smaller the value, indicating that the effect of age on the severity of a fall is smaller. For example, children generally do not have serious consequences when they fall, while elderly people have serious fractures or other risks when they fall. The age detection model can be a model for predicting age obtained by training positive and negative samples with age labels using CNN or other networks.

In one embodiment, as shown in FIG. 7, before sending to the reminder terminal corresponding to the target video clip, the method for detecting a fall behavior further includes the following steps:

S701: Extract a target face image from a target video clip, and query a system user image database based on the target face image.

Among them, the target face image is a clearer image of the falling person that contains the front face of the person extracted from the target video clip. The system user image library is a database used to store user images in the system. The user image library of the system can store registered user images corresponding to registered users, as well as registered user images and accompanying user images corresponding to registered users. The registered user may refer to a user registered in a system corresponding to a public place where the video capture device is installed. The registered user image refers to the user image associated with the registered user, which can be the image of the registered user himself or the image corresponding to the object to be cared for by the registered user. An accompanying user image can be understood as an image of an object that accompanies the registered user or enters a public place with the object to be cared for by the registered user but is not registered in the system.

Specifically, the server may select a clearer image to be recognized that contains the front face of the face from the target video clip to determine the target face image, and then query the system user image database based on the target face image to determine whether the target face image is A registered user image or an accompanying user image can be specifically identified by a facial feature similarity matching algorithm to determine whether it is a registered user image or an accompanying user image.

S702: If there is a registered user image corresponding to the target face image or an accompanying user image in the system user image library, the registered terminal corresponding to the registered user image is determined as the reminder terminal corresponding to the target video clip.

Specifically, if there is a registered user image corresponding to the target face image in the system user image library, it means that the person who fell down is the registered user himself or the object in need of care determined through system registration in advance. At this time, the registered user can be registered The registered terminal corresponding to the image is determined as the reminder terminal corresponding to the target video clip, so as to send the target video clip and medical advice information to the reminder terminal. If there is no registered user image corresponding to the target face image in the system user image library, but there is an accompanying user image corresponding to the target face image, it means that the person who fell down may be an object known or familiar to the registered user The corresponding registered user image can be found based on the accompanying user image, and the registered terminal corresponding to the registered user image can be determined as the reminder terminal corresponding to the target video clip, so as to send the target video clip and medical advice information to the reminder terminal.

S703: If there is no registered user image or accompanying user image corresponding to the target face image in the system user image library, the management terminal corresponding to the video capture device is determined as the reminder terminal corresponding to the target video segment.

Specifically, when there is no registered user image or accompanying user image corresponding to the target face image in the system user image library, it means that the falling person is a new person entering a public place or is not a person recognized by the registered user. At this time, The management terminal corresponding to the video capture device is determined as the reminder terminal corresponding to the target video segment.

In the fall behavior detection processing method provided in this embodiment, a target face image is extracted from a target video clip, and the target face image is determined according to whether there is a corresponding registered user image or accompanying user image in the system user image library Different reminder terminals, so that the target video clips and medical advice information can be sent to the corresponding reminder terminal in order to achieve the purpose of precise reminder, which helps to avoid the seriousness caused by the faller who fails to adopt the corresponding treatment measures in time after the fall. as a result of.

In one embodiment, as shown in FIG. 8, before acquiring the to-be-recognized video captured by the video capture device in real time, the fall behavior detection processing method further includes the following steps:

S801: Obtain historical videos collected by the video capture device, where each historical video includes at least one historical video image.

Among them, the historical video refers to the video collected before the server obtains the video to be recognized. Historical video images are images that constitute historical videos.

S802: Recognize each historical video image using a face detection algorithm, and obtain at least one original face image in the historical video image.

Among them, the face detection algorithm is an algorithm used to detect whether an image contains a face. The original face image is an image corresponding to the face area recognized by the face detection algorithm, and the original face image can be understood as the image corresponding to the face area selected by the face frame corresponding to the face detection algorithm.

S803: Query the system user image database based on each original face image, and if there is a registered user image matching the original face image, determine other original face images in the same historical video image as the registered user image as pending Analyze the face image.

Specifically, the server queries the system user image database based on each original face image to determine whether there is a registered user image corresponding to the original face image. The processing process is shown in step S701. To avoid repetition, we will not repeat them one by one. . The image of the person to be analyzed can be understood as other non-registered user images in the same historical video image as the registered user image. For example, if a historical video image includes three original face images of X, Y, and Z, if it is recognized that X is a registered user image in the system user image library, and Y and Z are not registered user images, then Y and Z Determined as the face image to be analyzed. The face image to be analyzed can be understood as a need to analyze whether it is an accompanying user image corresponding to the registered user.

S804: Determine the historical video image containing the same registered user image and the same face image to be analyzed as the target video image, and count the number of coexisting frames corresponding to the target video image per unit time.

Specifically, the target video image is a historical video image that contains both the same registered user image and the same face image to be analyzed. As in the above example, the target video image is all historical video images that contain both X and Y. The unit time is the time set in advance. Understandably, the number of images of all historical video images in a unit time is determined, which can be determined as the number of coexisting frames corresponding to the target video image by counting the number of images of all target video images in the unit time. The number of coexisting frames can be understood as the number of images of all target video images simultaneously containing X and Y per unit time. The greater the number of coexistence frames, the greater the probability that X and Y appear at the same time, and X and Y are most likely to accompany each other. Therefore, whether the image is an accompanying user can be evaluated based on the number of coexistence frames.

S805: Use the micro-expression recognition model to recognize the registered user image and the face image to be analyzed in each target video image, and count the probability of positive emotions that are simultaneously in a positive emotion.

The micro-expression recognition model may be the micro-expression recognition model in step S501, and the recognition process is the same as that in step S501. To avoid repetition, it will not be repeated here. Positive emotions are happiness, happiness, or other emotions that reflect a person's positive state, as opposed to anger, anger, and other negative emotions that are in a negative state.

Specifically, if the number of images corresponding to all target video images per unit time is R, the micro-expression recognition model is used to identify the registered user image and the face image to be analyzed in each target video image. If the registered user image and the face image to be analyzed are When the micro-expression types identified in the face image are all the micro-expression types corresponding to the positive emotions, the number of image frames in the positive emotion U is increased by 1, until all the target video images are analyzed to determine the images that are in the positive emotion at the same time The number of frames is U, and L=U/R is used to calculate the probability of positive emotion L at the same time being in a positive emotion.

S806: If the number of coexisting frames is greater than the preset frame number threshold and the positive emotion probability is greater than the preset probability threshold, determine the face image to be analyzed as the accompanying user image corresponding to the registered user image, and store the accompanying user image in association with the registered user image In the system user image library.

Wherein, the preset frame number threshold is a preset threshold used to evaluate whether they are accompanied by each other. If the number of coexisting frames is greater than the preset frame number threshold, it means that the number of times that the registered user image and the face image to be analyzed appear in one frame of the target video image at the same time is greater than the preset threshold that determines that the two are accompanied by each other.

Among them, the preset probability threshold is a preset threshold used to evaluate whether the two are friendly. If the positive emotion probability is greater than the preset probability threshold, it means that the two persons corresponding to the registered user image and the face image to be analyzed are more likely to be in positive emotions, and the probability that the two are friends is greater.

Specifically, when the number of coexisting frames is greater than the preset frame number threshold and the positive emotion probability is greater than the preset probability threshold, the server determines that the object corresponding to the accompanying user image is most likely to be the user corresponding to the accompanying registered user image entering the video capture device. Therefore, the face image to be analyzed can be determined as the accompanying user image corresponding to the registered user image, and the accompanying user image and the registered user image are associated and stored in the system user image library for subsequent query of the registered user When the image or the image of the accompanying user confirms the reminder terminal, a targeted reminder can be realized.

In the fall behavior detection and processing method provided in this embodiment, other original face images in the same historical video image as the registered user image are determined as the face images to be analyzed, so as to ensure the objectivity of the accompanying user image in the subsequent determination; The face image to be analyzed with the number of coexisting frames greater than the preset frame number threshold and the positive emotion probability greater than the preset probability threshold is determined as the accompanying user image corresponding to the registered user image to ensure the accuracy and objectivity of the accompanying user image determination.

It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present application.

In an embodiment, a fall behavior detection and processing device is provided, and the fall behavior detection and processing device corresponds to the fall behavior detection and processing method in the above-mentioned embodiment one-to-one. As shown in FIG. 9, the fall behavior detection processing device includes a video acquisition module 901 to be recognized, a fall motion detection module 902, a target video clip interception module 903, a fall severity acquisition module 904, a medical advice information acquisition module 905, and Information sending module 906. The detailed description of each functional module is as follows:

The to-be-identified video acquisition module 901 acquires the to-be-identified video collected by the video acquisition device in real time.

The fall action detection module 902 is configured to use the R-C3D-based behavior detection model to recognize the video to be recognized, and determine whether the behavior action corresponding to the video to be recognized includes a fall action.

The target video segment interception module 903 is configured to, if the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized.

The fall severity acquisition module 904 is configured to analyze the severity of the target video segment and obtain the fall severity corresponding to the target video segment.

The medical advice information obtaining module 905 is used to obtain medical advice information corresponding to the target video clip based on the severity of the fall.

The information sending module 906 sends the target video clip and medical advice information to the reminder terminal corresponding to the target video clip.

Preferably, the falling motion detection module 902 includes:

The original video segment acquisition unit is configured to perform interlaced cutting of the video to be identified based on the segment duration threshold and the overlap duration threshold to acquire at least two original video segments, each of which corresponds to a segment timestamp.

The video segment action detection unit is configured to sequentially input at least two original video segments into an R-C3D-based behavior detection model for identification according to the sequence of the segment timestamps, and determine whether each original video segment includes a fall action.

Preferably, the target video segment interception module 903 includes:

The initial time stamp determining unit is configured to determine that the time stamp corresponding to the image to be identified corresponding to the falling action is the initial time stamp if the behavior action corresponding to the video to be recognized includes a falling action.

The termination time stamp determining unit is configured to determine the termination time stamp based on the starting time stamp and the analysis duration threshold.

The video segment intercepting unit is used to intercept the video segment between the start time stamp and the end time stamp from the video to be identified, and determine the target video segment corresponding to the fall action.

Preferably, each target video segment includes at least one image to be recognized. The fall severity acquisition module 904 includes:

The micro-expression type obtaining unit is used to recognize each image to be recognized in the target video segment by using a micro-expression recognition model, and obtain the micro-expression type corresponding to each image to be recognized.

The target mouth image acquisition unit is used to if the micro-expression type is the preset expression type, use the facial feature point detection algorithm to detect and locate the image to be recognized, and obtain the target mouth image containing the face mouth area. The target mouth The image includes N face feature points and the feature location corresponding to each face feature point.

The target recognition image determination unit is used to obtain the average distance of the inner lips of the mouth based on the feature positions corresponding to the N facial feature points. If the average distance of the inner lips of the mouth is greater than the preset distance threshold, the target mouth image corresponding to the to-be-recognized The image is determined as the target recognition image.

The fall severity acquiring unit is configured to acquire the fall severity corresponding to the target video segment according to the number of images corresponding to all target recognition images in the target video segment.

Preferably, the fall severity acquisition unit includes:

The preset weight obtaining subunit is used to divide the target video segment into at least two to-be-processed segments, and to obtain the preset weight corresponding to each to-be-processed segment. The preset weight of the to-be-processed segment after the time is greater than that of the previous one The preset weight of the segment to be processed.

The target score obtaining subunit is used to obtain the target score corresponding to the segment to be processed based on the number of images corresponding to all target recognition images and the number of images corresponding to all the images to be recognized in each segment to be processed.

The fall score value obtaining subunit is configured to perform weighting processing on the preset weights and target scores corresponding to at least two segments to be processed to obtain the fall score value corresponding to the target video segment.

The fall severity acquisition subunit is used to query the fall degree comparison table based on the fall score value, and obtain the fall severity corresponding to the target video clip.

Preferably, before the information sending module 906, the device for detecting and processing a fall behavior further includes:

The target face image query unit is used to extract the target face image from the target video segment, and query the system user image database based on the target face image.

The first reminder terminal determining unit is configured to determine the registered terminal corresponding to the registered user image as the reminder terminal corresponding to the target video segment if there is a registered user image corresponding to the target face image or an accompanying user image in the system user image library .

The second reminder terminal determining unit is used to determine the management terminal corresponding to the video capture device as the reminder corresponding to the target video segment if there is no registered user image or accompanying user image corresponding to the target face image in the system user image library terminal.

Preferably, before the to-be-recognized video acquisition module 901, the device for detecting and processing a fall behavior further includes:

The historical video acquisition unit is configured to acquire historical videos collected by the video acquisition device, and each historical video includes at least one historical video image.

The original face image acquisition unit is used to recognize each historical video image by using a face detection algorithm to acquire at least one original face image in the historical video image.

The face image acquisition unit to be analyzed is used to query the system user image database based on each original face image. If there is a registered user image that matches the original face image, it will be in the same historical video image as the registered user image Other original face images are determined as face images to be analyzed.

The coexistence frame number counting unit is used to determine the historical video image containing the same registered user image and the same face image to be analyzed as the target video image, and count the number of coexistence frames corresponding to the target video image per unit time.

The positive emotion probability acquisition unit is used to recognize the registered user image and the face image to be analyzed in each target video image by using the micro-expression recognition model, and to count the positive emotion probability of being in a positive emotion at the same time.

An accompanying user image determining unit, configured to determine the face image to be analyzed as the accompanying user image corresponding to the registered user image if the number of coexisting frames is greater than the preset frame number threshold and the positive emotion probability is greater than the preset probability threshold, and the accompanying user image The image of the registered user is associated and stored in the system user image library.

For the specific limitation of the fall behavior detection and processing device, please refer to the above limitation on the fall behavior detection and processing method, which will not be repeated here. Each module in the above-mentioned falling behavior detection and processing device can be implemented in whole or in part by software, hardware, and combinations thereof. The foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 10. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store the data adopted or generated during the process of executing the fall behavior detection processing method. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instruction is executed by the processor to realize a fall behavior detection processing method.

In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor. When the processor executes the computer-readable instructions, the above-mentioned embodiments are implemented. The steps of the inverted behavior detection processing method, such as steps S201-S206 shown in FIG. 2, or the steps shown in FIG. 3 to FIG. 8, are not repeated here to avoid repetition. Or, when the processor executes the computer-readable instructions, the functions of the modules/units in this embodiment of the device for detecting and processing falling behavior are implemented, such as the functions of the modules/units/subunits shown in FIG. 9. To avoid repetition, I won't repeat it here.

In an embodiment, one or more readable storage media storing computer readable instructions are provided. The computer readable storage medium stores computer readable instructions, and the computer readable instructions are executed by one or more processors. When executed, the one or more processors are executed to implement the steps of the fall behavior detection processing method in the foregoing embodiment, such as steps S201-S206 shown in FIG. 2, or the steps shown in FIGS. 3 to 8 To avoid repetition, I won’t repeat it here. Alternatively, when the computer-readable instruction is executed by the processor, the function of each module/unit in this embodiment of the apparatus for detecting and processing falling behavior is realized, for example, the function of each module/unit/subunit shown in FIG. 9 is To avoid repetition, I won't repeat it here. The readable storage medium in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.

Persons of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a non-volatile readable storage medium. It may also be stored in a volatile readable storage medium, and when the computer readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Those skilled in the art can clearly understand that for the convenience and conciseness of description, only the division of the above-mentioned functional units and modules is used as an example. In practical applications, the above-mentioned functions can be allocated to different functional units and modules as required. Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims

A method for detecting and processing falling behavior, which is characterized in that it comprises:

Obtain the to-be-identified video captured by the video capture device in real time;

Recognizing the video to be recognized by using a behavior detection model based on R-C3D, and determining whether the behavior action corresponding to the video to be recognized includes a falling motion;

If the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;

Analyzing the severity of the target video segment to obtain the severity of the fall corresponding to the target video segment;

Obtaining medical advice information corresponding to the target video clip based on the severity of the fall;

The target video segment and the medical advice information are sent to a reminder terminal corresponding to the target video segment.
The fall behavior detection and processing method according to claim 1, wherein the behavior detection model based on R-C3D is used to recognize the video to be recognized, and it is determined whether the behavior action corresponding to the video to be recognized includes Falling actions, including:

Based on the segment duration threshold and the overlap duration threshold, interleave cutting the to-be-identified video to obtain at least two original video segments, each of the original video segments corresponding to a segment timestamp;

According to the sequence of the timestamps of the fragments, at least two of the original video fragments are sequentially input into the R-C3D-based behavior detection model for recognition, and it is determined whether each of the original video fragments includes a fall action.
The method for detecting and processing a fall behavior according to claim 1, wherein if the behavior action corresponding to the video to be recognized includes a fall action, then intercept the corresponding fall action from the video to be recognized The target video clips include:

If the behavior action corresponding to the to-be-recognized video includes a falling action, determining that the time stamp corresponding to the to-be-recognized image corresponding to the falling action is the starting time stamp;

Determine an end time stamp based on the start time stamp and the analysis duration threshold;

The video segment between the start timestamp and the end timestamp is intercepted from the video to be identified, and determined as the target video segment corresponding to the falling action.
The method for detecting and processing a fall behavior according to claim 1, wherein each of the target video segments includes at least one image to be recognized;

The analyzing the severity of the target video segment to obtain the severity of a fall corresponding to the target video segment includes:

Recognizing each of the to-be-recognized images in the target video segment using a micro-expression recognition model, and acquiring the micro-expression type corresponding to each of the to-be-recognized images;

If the micro-expression type is a preset expression type, the face feature point detection algorithm is used to detect and locate the image to be recognized, and a target mouth image including the mouth area of the face is obtained, and the target mouth image includes N facial feature points and feature positions corresponding to each of the facial feature points;

Based on the feature positions corresponding to the N facial feature points, the average distance of the inner lips of the mouth is obtained. If the average distance of the inner lips of the mouth is greater than the preset distance threshold, the target mouth image corresponding to the image to be recognized Determined as the target recognition image;

According to the number of images corresponding to all the target recognition images in the target video segment, the fall severity corresponding to the target video segment is acquired.
The method for detecting and processing a fall behavior according to claim 4, wherein the number of images corresponding to all the target recognition images in the target video clip is used to obtain the fall severity corresponding to the target video clip ,include:

The target video segment is divided into at least two segments to be processed, and the preset weight corresponding to each segment to be processed is obtained. The preset weight of the segment to be processed after the time is greater than that of the segment to be processed before the time. Set weight

Obtaining the target score corresponding to the segment to be processed based on the number of images corresponding to all the target recognition images and the number of images corresponding to all the images to be recognized in each segment to be processed;

Weighting the preset weights and target scores corresponding to at least two of the segments to be processed to obtain the fall score corresponding to the target video segment;

Query a fall degree comparison table based on the fall score value, and obtain the fall severity corresponding to the target video segment.
The method for detecting and processing a fall behavior according to claim 1, wherein before the sending to the reminder terminal corresponding to the target video clip, the method for detecting and processing the fall behavior further comprises:

Extracting a target face image from the target video segment, and querying a system user image database based on the target face image;

If there is a registered user image or an accompanying user image corresponding to the target face image in the system user image library, the registered terminal corresponding to the registered user image is determined as the reminder terminal corresponding to the target video clip;

If the registered user image or the accompanying user image corresponding to the target face image does not exist in the system user image library, the management terminal corresponding to the video capture device is determined as the target video segment The corresponding reminder terminal.
7. The method for detecting and processing a fall behavior according to claim 6, wherein before said acquiring the to-be-recognized video collected in real time by the video capture device, the method for detecting and processing falling behavior further comprises:

Acquiring historical videos collected by a video capture device, each of the historical videos includes at least one historical video image;

Use a face detection algorithm to recognize each of the historical video images, and obtain at least one original face image in the historical video images;

Based on each of the original face images in the query system user image database, if there is a registered user image that matches the original face image, it will be other original images in the same historical video image as the registered user image. The face image is determined to be the face image to be analyzed;

Determining historical video images containing the same registered user image and the same face image to be analyzed as the target video image, and counting the number of coexisting frames corresponding to the target video image per unit time;

Using a micro-expression recognition model to recognize the registered user image and the face image to be analyzed in each of the target video images, and count the probability of positive emotions that are simultaneously in positive emotions;

If the number of coexistence frames is greater than the preset number of frames threshold and the positive emotion probability is greater than the preset probability threshold, the face image to be analyzed is determined as the accompanying user image corresponding to the registered user image, and the accompanying user image The user image and the registered user image are associated and stored in the system user image library.
A device for detecting and processing falling behaviors, which is characterized by comprising:

The to-be-recognized video acquisition module acquires the to-be-recognized video collected by the video capture device in real time;

A fall action detection module, configured to use an R-C3D-based behavior detection model to recognize the video to be recognized, and determine whether the behavior action corresponding to the video to be recognized includes a fall action;

A target video segment interception module, configured to, if the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;

A fall severity acquisition module, configured to analyze the severity of the target video clip, and obtain the fall severity corresponding to the target video clip;

A medical advice information acquisition module, configured to acquire medical advice information corresponding to the target video clip based on the severity of the fall;

The information sending module sends the target video clip and the medical advice information to the reminder terminal corresponding to the target video clip.
A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor, wherein the processor executes the computer-readable instructions as follows step:

Obtain the to-be-identified video captured by the video capture device in real time;

Recognizing the video to be recognized by using a behavior detection model based on R-C3D, and determining whether the behavior action corresponding to the video to be recognized includes a falling motion;

If the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;

Analyzing the severity of the target video segment to obtain the severity of the fall corresponding to the target video segment;

Obtaining medical advice information corresponding to the target video clip based on the severity of the fall;

The target video segment and the medical advice information are sent to a reminder terminal corresponding to the target video segment.
9. The computer device according to claim 9, wherein the R-C3D-based behavior detection model is used to recognize the video to be recognized, and to determine whether the behavior action corresponding to the video to be recognized includes a fall motion, include:

Based on the segment duration threshold and the overlap duration threshold, interleave cutting the to-be-identified video to obtain at least two original video segments, each of the original video segments corresponding to a segment timestamp;

According to the sequence of the timestamps of the fragments, at least two of the original video fragments are sequentially input into the R-C3D-based behavior detection model for recognition, and it is determined whether each of the original video fragments includes a fall action.
The computer device according to claim 9, wherein if the behavior action corresponding to the video to be recognized includes a fall action, then intercept the target video clip corresponding to the fall action from the video to be recognized ,include:

If the behavior action corresponding to the to-be-recognized video includes a falling action, determining that the time stamp corresponding to the to-be-recognized image corresponding to the falling action is the starting time stamp;

Determine an end time stamp based on the start time stamp and the analysis duration threshold;

The video segment between the start timestamp and the end timestamp is intercepted from the video to be identified, and determined as the target video segment corresponding to the falling action.
9. The computer device of claim 9, wherein each of the target video segments includes at least one image to be recognized;

The analyzing the severity of the target video segment to obtain the severity of a fall corresponding to the target video segment includes:

Recognizing each of the to-be-recognized images in the target video segment using a micro-expression recognition model, and acquiring the micro-expression type corresponding to each of the to-be-recognized images;

If the micro-expression type is a preset expression type, the facial feature point detection algorithm is used to detect and locate the image to be recognized, and a target mouth image including the mouth area of the human face is obtained, and the target mouth image includes N personal facial feature points and feature positions corresponding to each of the facial feature points;

Based on the feature positions corresponding to the N facial feature points, the average distance of the inner lips of the mouth is obtained. If the average distance of the inner lips of the mouth is greater than the preset distance threshold, the target mouth image corresponding to the image to be recognized Determined as the target recognition image;

According to the number of images corresponding to all the target recognition images in the target video segment, the fall severity corresponding to the target video segment is acquired.
The computer device of claim 12, wherein the acquiring the fall severity corresponding to the target video segment according to the number of images corresponding to all the target recognition images in the target video segment comprises:

The target video segment is divided into at least two segments to be processed, and the preset weight corresponding to each segment to be processed is obtained. The preset weight of the segment to be processed after the time is greater than that of the segment to be processed before the time. Set weight

Obtaining the target score corresponding to the segment to be processed based on the number of images corresponding to all the target recognition images and the number of images corresponding to all the images to be recognized in each segment to be processed;

Weighting the preset weights and target scores corresponding to at least two of the segments to be processed to obtain the fall scores corresponding to the target video segments;

Query a fall degree comparison table based on the fall score value, and obtain the fall severity corresponding to the target video segment.
9. The computer device according to claim 9, wherein before said acquiring the to-be-recognized video captured by the video capture device in real time, the processor further implements the following steps when executing the computer-readable instruction:

Acquiring historical videos collected by a video capture device, each of the historical videos including at least one historical video image;

Using a face detection algorithm to recognize each of the historical video images, and obtain at least one original face image in the historical video images;

Based on each of the original facial image query system user image database, if there is a registered user image that matches the original facial image, it will be the other original in the same historical video image as the registered user image. The face image is determined to be the face image to be analyzed;

Determining historical video images containing the same registered user image and the same face image to be analyzed as the target video image, and counting the number of coexisting frames corresponding to the target video image per unit time;

Using a micro-expression recognition model to recognize the registered user image and the face image to be analyzed in each of the target video images, and count the positive emotion probability of being in a positive emotion at the same time;

If the number of coexistence frames is greater than the preset number of frames threshold and the positive emotion probability is greater than the preset probability threshold, the face image to be analyzed is determined as the accompanying user image corresponding to the registered user image, and the accompanying user image The user image and the registered user image are associated and stored in the system user image library.

Before the sending to the reminder terminal corresponding to the target video clip, the processor further implements the following steps when executing the computer-readable instruction:

Extracting a target face image from the target video segment, and querying a system user image database based on the target face image;

If there is a registered user image or an accompanying user image corresponding to the target face image in the system user image library, the registered terminal corresponding to the registered user image is determined as the reminder terminal corresponding to the target video clip;

If the registered user image or the accompanying user image corresponding to the target face image does not exist in the system user image library, the management terminal corresponding to the video capture device is determined as the target video segment The corresponding reminder terminal.
One or more readable storage media storing computer readable instructions, the computer readable storage medium storing computer readable instructions, characterized in that, when the computer readable instructions are executed by one or more processors, Make the one or more processors execute the following steps:

Obtain the to-be-identified video captured by the video capture device in real time;

Recognizing the video to be recognized by using a behavior detection model based on R-C3D, and determining whether the behavior action corresponding to the video to be recognized includes a falling motion;

If the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;

Analyzing the severity of the target video segment to obtain the severity of the fall corresponding to the target video segment;

Obtaining medical advice information corresponding to the target video clip based on the severity of the fall;

The target video segment and the medical advice information are sent to a reminder terminal corresponding to the target video segment.
The readable storage medium according to claim 15, wherein the R-C3D-based behavior detection model is used to recognize the video to be recognized, and to determine whether the behavior action corresponding to the video to be recognized includes falling Actions, including:

Based on the segment duration threshold and the overlap duration threshold, interleave cutting the to-be-identified video to obtain at least two original video segments, each of the original video segments corresponding to a segment timestamp;

According to the sequence of the timestamps of the fragments, at least two of the original video fragments are sequentially input into the R-C3D-based behavior detection model for recognition, and it is determined whether each of the original video fragments includes a fall action.
The readable storage medium according to claim 15, wherein if the behavior action corresponding to the video to be recognized includes a falling motion, the target corresponding to the falling motion is intercepted from the video to be recognized Video clips, including:

If the behavior action corresponding to the to-be-recognized video includes a falling action, determining that the time stamp corresponding to the to-be-recognized image corresponding to the falling action is the starting time stamp;

Determine an end time stamp based on the start time stamp and the analysis duration threshold;

The video segment between the start timestamp and the end timestamp is intercepted from the video to be identified, and determined as the target video segment corresponding to the falling action.
15. The readable storage medium of claim 15, wherein each target video segment includes at least one image to be recognized;

The analyzing the severity of the target video segment to obtain the severity of a fall corresponding to the target video segment includes:

Recognizing each of the to-be-recognized images in the target video segment using a micro-expression recognition model, and acquiring the micro-expression type corresponding to each of the to-be-recognized images;

If the micro-expression type is a preset expression type, the face feature point detection algorithm is used to detect and locate the image to be recognized, and a target mouth image including the mouth area of the face is obtained, and the target mouth image includes N facial feature points and feature positions corresponding to each of the facial feature points;

Based on the feature positions corresponding to the N facial feature points, the average distance of the inner lips of the mouth is obtained. If the average distance of the inner lips of the mouth is greater than the preset distance threshold, the target mouth image corresponding to the image to be recognized Determined as the target recognition image;

According to the number of images corresponding to all the target recognition images in the target video segment, the fall severity corresponding to the target video segment is acquired.
The readable storage medium according to claim 18, wherein said obtaining the severity of a fall corresponding to the target video clip according to the number of images corresponding to all the target recognition images in the target video clip comprises :

The target video segment is divided into at least two segments to be processed, and the preset weight corresponding to each segment to be processed is obtained. The preset weight of the segment to be processed after the time is greater than that of the segment to be processed before the time. Set weight

Obtaining the target score corresponding to the segment to be processed based on the number of images corresponding to all the target recognition images and the number of images corresponding to all the images to be recognized in each segment to be processed;

Weighting the preset weights and target scores corresponding to at least two of the segments to be processed to obtain the fall scores corresponding to the target video segments;

Query a fall degree comparison table based on the fall score value, and obtain the fall severity corresponding to the target video segment.
The readable storage medium according to claim 15, wherein, before the acquisition of the to-be-recognized video captured by the video capture device in real time, when the computer-readable instructions are executed by one or more processors, the One or more processors also perform the following steps:

Acquiring historical videos collected by a video capture device, each of the historical videos including at least one historical video image;

Using a face detection algorithm to recognize each of the historical video images, and obtain at least one original face image in the historical video images;

Based on each of the original face images in the query system user image database, if there is a registered user image that matches the original face image, it will be other original images in the same historical video image as the registered user image. The face image is determined to be the face image to be analyzed;

Determining historical video images containing the same registered user image and the same face image to be analyzed as the target video image, and counting the number of coexisting frames corresponding to the target video image per unit time;

Using a micro-expression recognition model to recognize the registered user image and the face image to be analyzed in each of the target video images, and count the positive emotion probability of being in a positive emotion at the same time;

If the number of coexistence frames is greater than the preset number of frames threshold and the positive emotion probability is greater than the preset probability threshold, the face image to be analyzed is determined as the accompanying user image corresponding to the registered user image, and the accompanying user image The user image and the registered user image are associated and stored in the system user image library.

Before the sending to the reminder terminal corresponding to the target video clip, when the computer-readable instruction is executed by one or more processors, the one or more processors further execute the following steps:

Extracting a target face image from the target video segment, and querying a system user image database based on the target face image;

If there is a registered user image or an accompanying user image corresponding to the target face image in the system user image library, the registered terminal corresponding to the registered user image is determined as the reminder terminal corresponding to the target video clip;

If the registered user image or the accompanying user image corresponding to the target face image does not exist in the system user image library, the management terminal corresponding to the video capture device is determined as the target video segment The corresponding reminder terminal.