CN110647812B

CN110647812B - Tumble behavior detection processing method and device, computer equipment and storage medium

Info

Publication number: CN110647812B
Application number: CN201910763921.6A
Authority: CN
Inventors: 王健宗; 王义文
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-08-19
Filing date: 2019-08-19
Publication date: 2023-09-19
Anticipated expiration: 2039-08-19
Also published as: CN110647812A; WO2021031384A1

Abstract

The invention discloses a tumble behavior detection processing method, a tumble behavior detection processing device, computer equipment and a storage medium. The method comprises the following steps: acquiring a video to be identified, which is acquired by video acquisition equipment in real time; identifying the video to be identified by adopting an R-C3D-based behavior detection model, and determining whether the behavior action corresponding to the video to be identified comprises a tumbling action or not; if the behavior action corresponding to the video to be identified comprises a falling action, intercepting a target video fragment corresponding to the falling action from the video to be identified; carrying out severity analysis on the target video segment to obtain the tumbling severity corresponding to the target video segment; acquiring medical advice information corresponding to the target video clip based on the fall severity; and sending the target video clip and the medical advice information to a reminding terminal corresponding to the target video clip. The method can rapidly and accurately detect whether the video to be identified comprises the falling action or not, and carry out targeted reminding based on the falling action.

Description

Tumble behavior detection processing method and device, computer equipment and storage medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for detecting and processing a tumbling behavior, a computer device, and a storage medium.

Background

Falls are sudden falls, and if the scenario is severe, serious consequences may be caused to the physical health of the person falling. For example, for an elderly person, the physical and mental health of the person may be affected by serious consequences such as mental creation, fracture, and soft tissue injury due to a fall. When current solitary personnel or independent falling people in public places fall unexpectedly, because the falling people are not paid attention to, no treatment measures are taken in time, and serious consequences are caused by delaying the treatment time. Therefore, how to quickly and accurately identify whether a falling behavior exists and conduct targeted reminding becomes a problem to be solved in order to avoid risks caused by the falling behavior in public places, nursing places or nursing of solitary old people and other scenes.

Disclosure of Invention

The embodiment of the invention provides a method, a device, computer equipment and a storage medium for detecting and processing falling behaviors, which are used for solving the problem of how to quickly and accurately identify whether falling behaviors exist or not and conduct targeted reminding.

A tumble behavior detection processing method comprises the following steps:

acquiring a video to be identified, which is acquired by video acquisition equipment in real time;

Identifying the video to be identified by adopting an R-C3D-based behavior detection model, and determining whether the behavior action corresponding to the video to be identified comprises a falling action or not;

if the behavior action corresponding to the video to be identified comprises a falling action, capturing a target video fragment corresponding to the falling action from the video to be identified;

carrying out severity analysis on the target video segment to obtain the tumbling severity corresponding to the target video segment;

acquiring medical advice information corresponding to the target video clip based on the tumbling severity;

and sending the target video clip and the medical advice information to a reminding terminal corresponding to the target video clip.

A tumble behavior detection processing device comprising:

the video acquisition module to be identified acquires videos to be identified, which are acquired by the video acquisition equipment in real time;

the falling action detection module is used for identifying the video to be identified by adopting an R-C3D-based behavior detection model and determining whether the behavior action corresponding to the video to be identified comprises a falling action or not;

the target video segment intercepting module is used for intercepting a target video segment corresponding to the tumbling action from the video to be identified if the action corresponding to the video to be identified comprises the tumbling action;

The falling severity obtaining module is used for analyzing the severity of the target video segment and obtaining the falling severity corresponding to the target video segment;

the medical advice information acquisition module is used for acquiring medical advice information corresponding to the target video clip based on the falling severity degree;

and the information sending module is used for sending the target video clip and the medical advice information to a reminding terminal corresponding to the target video clip.

A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above-described fall behavior detection processing method when the computer program is executed.

A computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described fall behavior detection processing method.

According to the method, the device, the computer equipment and the storage medium for detecting and processing the tumbling action, whether the video to be identified contains the tumbling action or not can be rapidly identified based on the R-C3D action detection model, so that the detection efficiency and the detection accuracy of the tumbling action are improved; then intercepting a target video fragment corresponding to the tumbling action from the video to be identified, and analyzing the severity of the target video fragment so as to reduce the analysis data volume of the severity analysis and improve the analysis efficiency and accuracy; corresponding medical advice information is obtained based on the tumbling severity, and the medical advice information and the target video clip are sent to the reminding terminal, so that targeted reminding aiming at tumbling behaviors is realized, and the risk caused by no corresponding treatment measures after tumbling is avoided.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of an application environment of a method for detecting and processing a tumbling action according to an embodiment of the invention;

FIG. 2 is a flow chart of a tumble behavior detection processing method according to an embodiment of the invention;

FIG. 3 is another flow chart of a tumble behavior detection processing method according to an embodiment of the invention;

FIG. 4 is another flow chart of a tumble behavior detection processing method according to an embodiment of the invention;

FIG. 5 is another flow chart of a tumble behavior detection processing method according to an embodiment of the invention;

FIG. 6 is another flow chart of a tumble behavior detection processing method according to an embodiment of the invention;

FIG. 7 is another flow chart of a tumble behavior detection processing method according to an embodiment of the invention;

FIG. 8 is another flow chart of a tumble behavior detection processing method according to an embodiment of the invention;

FIG. 9 is a schematic diagram of a tumbling behavior detection processing device according to an embodiment of the invention;

FIG. 10 is a schematic diagram of a computer device in accordance with an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The method for detecting and processing the tumbling behaviors, provided by the embodiment of the invention, can be applied to an application environment shown in fig. 1. Specifically, the method for detecting and processing the tumbling action is applied to a tumbling action detection and processing system, the tumbling action detection and processing system comprises a client and a server as shown in fig. 1, and the client and the server are communicated through a network and are used for rapidly detecting and identifying the tumbling action from videos to be identified and analyzing the tumbling severity of the tumbling action, and carrying out targeted reminding based on the tumbling severity, so that serious consequences caused by delaying treatment time are avoided. The client is also called a client, and refers to a program corresponding to the server for providing local service for the client. The client may be installed on, but is not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, a method for detecting and processing a tumbling action is provided, and the method is applied to the server in fig. 1, and includes the following steps:

s201: and acquiring the video to be identified, which is acquired by the video acquisition equipment in real time.

The video to be identified is an unidentified video acquired in real time by adopting video acquisition equipment. The video acquisition device is a device for acquiring videos, and can be arranged in a mall, a hospital, a nursing place or other public places, and can also be arranged in the home of the solitary old person by a guardian.

S202: and identifying the video to be identified by adopting an R-C3D-based behavior detection model, and determining whether the behavior action corresponding to the video to be identified comprises a tumbling action or not.

Wherein the behavior detection model based on R-C3D (Region Convolutional 3D Network for Temporal Activity Detection, regional convolution 3D network for temporal activity detection) is a model pre-trained with R-C3D network for identifying the behavior of people in video. The behavior action corresponding to the video to be identified refers to the behavior action identified from the video to be identified. Specifically, the R-C3D behavior detection model is adopted to identify the video to be identified, so that the falling-down motion contained in the behavior motion of the person in the video to be identified can be rapidly determined.

The R-C3D network is a network trained in an end-to-end mode, and a three-dimensional convolution kernel is used for processing videos to be identified. R-C3D has a total of 8 convolution operations, 5 pooling operations. Wherein the convolution kernels are 3 x 3 in size, the step size is 1 x 1. The pooling kernel is 2 x 2, but in order not to prematurely reduce the length in time sequence, the pooling size and step size of the first layer are 1 x 2; finally, the R-C3D network obtains a final output result after passing through the full connection layer and the softmax layer twice. The input image of the R-C3D network is 3×l×h×w, wherein 3 is RGB three channels, L is the number of frames of the input image, and h×w is the size of the input image.

It will be appreciated that the R-C3D based behavior detection model is a model for detecting actions in a video person based on end-to-end training of the R-C3D network. In order to ensure the detection efficiency and accuracy of the R-C3D-based behavior detection model trained based on the R-C3D network to the tumbling action, in the model training process, a positive sample containing the tumbling action and a negative sample not containing the tumbling action (namely, video clips corresponding to other actions except the tumbling action) can be adopted to perform model training according to a preset proportion (can be set to be 1:1 so as to achieve a balance sample and avoid overfitting). Because R-C3D is based on C3D Frame classification (Frame Label), whether the video to be identified has a falling action or not can be detected rapidly, the end-to-end falling detection can be carried out aiming at the video with any length and the behavior with any length, and the C3D parameters of the classification network are generated through sharing the time sequence, so that the speed is high, and the falling detection efficiency is guaranteed.

S203: if the behavior action corresponding to the video to be identified comprises a falling action, capturing a target video fragment corresponding to the falling action from the video to be identified.

The target video clips are video clips which are cut from videos to be identified and used for analyzing the falling severity degree corresponding to the falling action. It can be understood that after a fall, people can have different facial microexpressions or body gestures which are matched with the pain sense due to different pain senses caused by the fall, so that the severity of the fall can be analyzed by analyzing the video after the fall motion. Specifically, after the R-C3D behavior detection model is adopted to identify that the video to be identified contains the tumbling action, the server can intercept the video fragments containing the tumbling action and a certain time period after the tumbling action from the video to be identified as target video fragments so as to analyze the tumbling severity, thereby reducing the data volume of severity analysis processing and improving the analysis efficiency and accuracy.

S204: and analyzing the severity degree of the target video segment to obtain the tumbling severity degree corresponding to the target video segment.

Because the target video clips are video clips which are cut out from the video to be identified and used for analyzing the falling severity, the server can objectively and rapidly analyze the falling severity of the falling person in the target video clips when analyzing the severity of the target video clips. For example, if a fall in a target video segment is a young person, the target video segment after the fall shows that its facial microexpressions do not exhibit or exhibit a painful expression for a very short period of time, and the severity of the fall can be considered low. For another example, if the falling person in the target video segment is an elderly person, the target video segment after the falling shows that the facial microexpressions of the target video segment show a long-time painful expression, or the target video segment has a long-time motion such as a falling collision point, and the falling severity degree of the target video segment is high.

S205: and acquiring medical advice information corresponding to the target video clip based on the tumbling severity.

Specifically, the server compares the tumbling severity analyzed according to the target video segment with a preset degree threshold value to determine whether medical advice needs to be provided. The preset level threshold is a level threshold set in advance for evaluating whether or not a medical advice needs to be provided. If the tumbling severity is smaller than the preset degree threshold, medical advice is not required to be provided, and the target video clip can be directly sent to the corresponding reminding terminal so as to remind a tumbling action. If the degree of severity of the fall is not less than a preset degree threshold, identifying a falling bone joint point corresponding to the falling action from the target video segment, inquiring a medical advice information base based on the falling bone joint point, and acquiring medical advice information corresponding to the falling bone joint point as medical advice information corresponding to the target video segment.

The falling bone joint points refer to bones or joint points which are impacted when a falling person falls from the target video segment, and the identification of the falling bone joint points can be helpful for providing corresponding medical advice for the falling person in the follow-up process. The medical advice information base is an information base for storing medical detection advice or medical medication advice that needs to be performed when each bone node is fallen down. The medical advice information is information for determining which medical tests or medical administration are required according to the falling bone joint points. For example, if the knee joint in the target video segment lands before a fall, the falling bone joint is the knee joint, and the acquired medical advice information is medical advice related to injury of the knee joint.

S206: and sending the target video clip and the medical advice information to a reminding terminal corresponding to the target video clip.

The reminding terminal is used for receiving the target video clips or the target video clips and the medical advice information. Generally, the reminding terminal is a terminal corresponding to a user who installs the video capture device. If the video acquisition equipment is arranged on a public place, the corresponding reminding terminal can be a reminding terminal corresponding to a worker in the public place, and particularly can be a mobile terminal carried by the worker working at an entrance and an exit of the public place, so that when the person leaves the public place, the person falls or accompanying personnel corresponding to the person falls is informed of corresponding falling behaviors and medical advice information. If the video acquisition device is arranged in the home of the solitary old person, the reminding terminal can be a terminal bound with the video acquisition device.

In the method for detecting and processing the tumbling behaviors, whether the video to be identified contains the tumbling motions or not can be rapidly identified based on the R-C3D behavior detection model, so that the detection efficiency and the detection accuracy of the tumbling motions are improved; then intercepting a target video fragment corresponding to the tumbling action from the video to be identified, and analyzing the severity of the target video fragment so as to reduce the analysis data volume of the severity analysis and improve the analysis efficiency and accuracy; corresponding medical advice information is obtained based on the tumbling severity, and the medical advice information and the target video clip are sent to the reminding terminal, so that targeted reminding aiming at tumbling behaviors is realized, and the risk caused by no corresponding treatment measures after tumbling is avoided.

In an embodiment, as shown in fig. 3, step S202, that is, identifying a video to be identified by using an R-C3D-based behavior detection model, determines whether a behavior action corresponding to the video to be identified includes a falling action, includes the following steps:

s301: and based on the segment duration threshold and the overlapping duration threshold, performing staggered cutting on the video to be identified to obtain at least two original video segments, wherein each original video segment corresponds to a segment time stamp.

The segment duration threshold is a preset threshold for duration of cutting the original video segments, that is, the duration of each original video segment cut in this embodiment is a segment duration threshold, for example, 10s. The overlapping time period threshold is a threshold of a time period for overlapping adjacent two original video clips when the original video clips are cut, which is set in advance, for example, 3s. The original video clip is a unit clip cut from the video to be identified for identification by the input behavior detection model. The clip time stamp corresponding to the original video clip may be a time stamp corresponding to the 1 st image in the original video clip to determine the corresponding video cutting order based on the clip time stamp.

Specifically, the server performs staggered cutting on the video to be identified based on the segment duration threshold value and the overlapping duration threshold value, so that a part of overlapping segments can be ensured between any two adjacent cut original video segments, the accuracy of subsequent falling motion detection is ensured, and the situation that any one original video segment cannot be independently judged to form falling motion when the falling motion is formed between the original video segments of two continuous non-overlapping segments is avoided. For example, for a video to be identified, the 0 th to 10 th segment is cut to form the 1 st original video segment, the 7 th to 17 th segments are cut to form the 2 nd original video segment, the 14 th to 24 th segments are cut to form the 3 rd original video segment, and so on until all the original video segments are cut.

S302: and inputting at least two original video clips into an R-C3D-based behavior detection model in turn according to the sequence of the clip time stamps to identify, and determining whether each original video clip comprises a falling action.

Specifically, the server determines the cutting sequence corresponding to at least two original video clips according to the sequence of the clip time stamps, sequentially inputs the at least two original video clips into the R-C3D-based behavior detection model for recognition, so as to determine whether the original video clips contain a falling action, and objectively and rapidly determine whether each original video clip contains the falling action.

In the tumble behavior detection processing method provided by the embodiment, the videos to be identified are cut in a staggered manner according to the segment duration threshold and the overlapping duration threshold to obtain at least two original video segments, the at least two original video segments are sequentially input into the R-C3D-based behavior detection model for identification, so that whether each original video segment contains a tumble motion or not is quickly determined, the videos to be identified with longer duration are split into the original video segments with shorter duration for identification, the identification accuracy and the identification efficiency of the original video segments are improved, the fault tolerance of tumble motion detection is improved, and the problem that the overall identification error is caused by various objective reasons (such as server downtime) in the identification process of the videos to be identified with longer duration is avoided.

In an embodiment, as shown in fig. 4, step S203, that is, if the behavior action corresponding to the video to be identified includes a tumbling action, intercepts a target video segment corresponding to the tumbling action from the video to be identified, includes the following steps:

s401: if the behavior action corresponding to the video to be identified comprises a falling action, determining a time stamp corresponding to the image to be identified corresponding to the falling action as a starting time stamp.

Wherein the image to be recognized is an image constituting a video to be recognized. Specifically, when the server recognizes that the behavior action corresponding to the video to be recognized includes a falling action, the server takes a time stamp corresponding to the image to be recognized corresponding to the moment when the falling action is detected as a starting time stamp corresponding to the division target video segment. It can be appreciated that the timestamp of the image to be identified corresponding to the detected and determined tumbling action is taken as a starting timestamp, so that the target video segment is intercepted from the starting timestamp, and the micro-expression change and the body posture change after the tumbling action are analyzed to determine the corresponding tumbling severity.

S402: based on the start timestamp and the analysis duration threshold, a termination timestamp is determined.

The analysis duration threshold is a threshold preset by the system and used for determining the video duration of which the falling severity needs to be analyzed. Specifically, the server starts adding the duration of the analysis duration threshold from the start time stamp, and can determine the termination time stamp corresponding to the target video segment.

S403: and intercepting the video segments between the starting time stamp and the ending time stamp from the video to be identified, and determining the video segments as target video segments corresponding to the tumbling action.

Because each image to be identified in the video to be identified corresponds to a unique time stamp, after the starting time stamp and the ending time stamp are determined, capturing a video segment taking the image to be identified corresponding to the starting time stamp as the starting image and taking the image to be identified corresponding to the ending time stamp as the ending image from the video to be identified, and determining the video segment as a target video segment corresponding to the falling action so as to analyze the falling severity degree by using the target video segment.

In the method for detecting and processing the tumbling behaviors, the time stamp corresponding to the image to be identified and corresponding to the tumbling action is firstly determined to be the starting time stamp, the starting time stamp is used for determining the ending time stamp, the video segment between the starting time stamp and the ending time stamp is intercepted to be the target video segment, so that each image to be identified in the target video segment is an image which can reflect the real emotion change of the tumbling person within the analysis duration threshold after the tumbling, and objective reality of analysis results can be ensured when the analysis of the tumbling severity is carried out by using the target video segment.

In one embodiment, each target video clip includes at least one image to be identified. As shown in fig. 5, step S204, namely, analyzing the severity of the target video segment, obtains the severity of the fall corresponding to the target video segment, includes the following steps:

s501: and identifying each image to be identified in the target video segment by adopting the microexpressive identification model, and obtaining the microexpressive type corresponding to each image to be identified.

The micro-expression recognition model is used for recognizing the facial micro-expression in the image to be recognized. In this embodiment, the micro-expression recognition model is a model that captures local features of a face of a user in an image to be recognized, determines each target facial action unit of the face in the image to be recognized according to the local features, and determines the micro-expression according to the recognized target facial action units. The micro-expression recognition model can be a neural network recognition model based on deep learning, a local recognition model based on classification, and a local emotion recognition model based on a local binary pattern (Local Binary Pattern, LBP). For example, when a classification-based local recognition model is used as a micro-expression recognition model, the micro-expression recognition model needs to collect a large amount of training image data in advance to perform model training, the training image data includes a positive sample of each facial action unit and a negative sample of each facial action unit, and the training image data is trained through a classification algorithm to obtain the micro-expression recognition model. Specifically, a classification algorithm, such as an SVM classification algorithm, may be used to train a large amount of training image data to obtain SVM classifiers corresponding to a plurality of facial action units. For example, the number of SVM classifiers to be obtained may be greater as the number of positive and negative samples of different face action units included in the training image data to be trained is greater, or may be greater as the number of SVM classifiers to be obtained is greater as 39 SVM classifiers to be associated with 39 face action units or as 54 SVM classifiers to be associated with 54 face action units. It can be understood that, in the forming of the micro-expression recognition model by using the plurality of SVM classifiers, the more SVM classifiers they acquire, the more accurate the micro-expression type recognized by the formed micro-expression recognition model.

In this embodiment, taking a microexpressive recognition model formed by SVM classifiers corresponding to 54 facial action units as an example, each image to be recognized in a target video segment is recognized by using the microexpressive recognition model, 54 microexpressive types can be recognized, for example, 54 microexpressive types including love, interest, surprise, … … aggressiveness, conflict, disfigurement, suspicion, fear, pain, etc. can be recognized.

S502: if the micro expression type is the preset expression type, detecting and positioning an image to be identified by adopting a face feature point detection algorithm to obtain a target mouth image containing a face mouth area, wherein the target mouth image comprises N face feature points and feature positions corresponding to the face feature points.

The preset expression type is the type which is preset by the system and can be identified as the expression type after falling, such as pain, crying and the like. The face feature point detection algorithm is an algorithm for detecting face feature points, and N face feature points and feature positions corresponding to the face feature points, namely coordinates in an image, can be identified from the image to be identified by using the algorithm. The face feature point detection algorithm can detect and position face feature points of the left eye, the right eye, the left eyebrow, the right eyebrow, the nose, the mouth and other parts in the image to be identified.

In this embodiment, after an image to be identified is detected and located by using a face feature point detection algorithm, an image corresponding to a face mouth region with a standard region size is cut from the image to be identified, and is determined as a target mouth image which needs to be analyzed later. I.e. the target mouth image is an image corresponding to a face mouth region matching the standard region size, taken from the image to be identified. The standard region size is a region size preset by the system and used for limiting the interception target mouth image, and can be determined by limiting the mouth width. When the target mouth image is intercepted from the image to be identified, the image to be identified needs to be subjected to up-sampling processing or down-sampling processing (namely scaling processing) so that the mouth width of a person falling down in the image to be identified is matched with the width of the standardized area, and then screenshot is carried out to obtain the target mouth image with the consistent width, thereby ensuring the accuracy of an analysis result when the analysis is carried out based on the average distance of lip points in the mouth.

Specifically, the target mouth image includes N face feature points and feature positions corresponding to each face feature point, where the N face feature points in the target mouth image refer to face feature points corresponding to a mouth contour. The mouth contour comprises an upper lip contour and a lower lip contour, the upper lip contour comprises an upper lip outer lip line and an upper lip inner lip line, the lower lip contour comprises a lower lip inner lip line and a lower lip outer lip line, and in the embodiment, a plurality of dividing lines can be configured in the target mouth image according to a preset rule and used for dividing the mouth contour so as to determine corresponding face feature points. For example, three dividing lines may be drawn in the target mouth image at positions 1/4, 1/2, 3/4, etc. of the mouth width, where each dividing line intersects with the upper lip outer lip line, the upper lip inner lip line, the lower lip inner lip line, and the lower lip outer lip line, to form a set of face feature points including the upper lip outer lip point, the upper lip inner lip point, the lower lip inner lip point, and the lower lip outer lip point, respectively.

S503: and acquiring average distance of lip points in the mouth based on the feature positions corresponding to the N face feature points, and determining the image to be recognized corresponding to the target mouth image as a target recognition image if the average distance of lip points in the mouth is larger than a preset distance threshold.

Generally, when a person falls, the more the person's microexpressions are painful, the higher the degree of opening the mouth, which means that the higher the degree of pain, the more the degree of severity of the fall of the person can be reflected. Therefore, the server can calculate the average distance of lip points in the mouth reflecting the opening degree of the mouth when acquiring the N face feature points of the target mouth image and the feature positions thereof. Specifically, the server may first count the inter-line inter-lip distances between the upper lip inter-lip points and the lower lip inter-lip points on each dividing line, and then perform average calculation on the inter-line inter-lip distances corresponding to all the dividing lines, so as to obtain the average distance of the mouth inter-lip points, so that the pain degree of the customer can be objectively analyzed by using the average distance of the mouth inter-lip points. As can be appreciated, since all the target mouth images are images corresponding to the standard region size, the image widths thereof are uniform; at this time, the degree of mouth opening can be reflected more accurately by the average distance of the mouth inner lip points between all the mouth inner lip points.

The preset distance threshold is a threshold preset by the system for evaluating the opening degree of the mouth to the degree of identifying the mouth as painful. Specifically, after obtaining the average distance of the lip points in the mouth, the server compares the average distance of the lip points in the mouth with a preset distance threshold, if the average distance of the lip points in the mouth is greater than the preset distance threshold, the mouth opening degree is larger, the pain degree is reflected to be higher, and the image to be identified corresponding to the target mouth image is determined to be the target identification image, so that the falling severity degree can be analyzed later. At this time, the target recognition image is an image to be recognized, wherein the micro-expression type of the tumbling person is a preset expression type, and the average distance of lip points in the mouth is larger than a preset distance threshold value, so that the target recognition image is determined jointly through micro-expression emotion recognition and mouth opening degree, and the accuracy of the analysis result of the tumbling severity degree is guaranteed.

S504: and acquiring the tumbling severity corresponding to the target video segment according to the image quantity corresponding to all the target identification images in the target video segment.

Since the duration of the target video clip matches the analysis duration threshold, the number of images of all images to be identified in the target video clip is fixed. The target recognition image is an image to be recognized, wherein the micro expression type of the falling person is a preset expression type, and the average distance of lip points in the mouth is larger than a preset distance threshold value, so that the image can fully reflect a painful state of the falling person after the falling. Therefore, on the premise that the number of images of the images to be identified in the target video clip is fixed, the more the number of images of the target identification images which reflect the painful state of the fallen person after the fallen person, the higher the falling severity of the target identification images. In this embodiment, the number of images corresponding to the target identification image is in a proportional relationship with the falling severity thereof, and a corresponding comparison table can be preset, so that the corresponding falling severity can be rapidly determined.

In the tumble behavior detection processing method provided by the embodiment, the micro-expression analysis is performed on the image to be identified, and the image to be identified with the micro-expression type being the preset expression type is detected and positioned on the characteristic points of the subsequent face, so that the data processing amount is reduced, and the data processing efficiency is improved; and determining the average distance of lips in the mouth according to the characteristic positions of N face characteristic points corresponding to the target mouth image, and determining the image to be identified, of which the average distance of lips in the mouth is larger than a preset distance threshold, as a target identification image, so as to ensure that the target identification image can truly and objectively reflect the pain degree of a falling person, and be beneficial to ensuring the accuracy of the analysis of the subsequent falling severity degree. The falling severity degree of the target video clips can be rapidly determined by utilizing the number of images corresponding to all the target identification images in the target video clips, so that the analysis efficiency, the accuracy and the objectivity of analysis processing can be guaranteed.

In an embodiment, as shown in fig. 6, step S504, that is, according to the number of images corresponding to all the target identification images in the target video segment, acquires the fall severity corresponding to the target video segment, includes the following steps:

s601: dividing the target video segment into at least two segments to be processed, and acquiring a preset weight corresponding to each segment to be processed, wherein the preset weight of the segment to be processed after the time is greater than that of the segment to be processed before the time.

Specifically, the server may divide the target video clip into at least two to-be-processed video clips based on a unit time length, which may be 1/M (m+.2) of the analysis time length threshold value, to divide the target video clip into at least two to-be-processed clips of equal time length. The preset weight is a weight preset by the system for each fragment to be processed. As can be appreciated, since the target video segment is a set of all the images to be identified from the identification of the image to be identified which is determined to be the falling action to the analysis duration threshold, the more the timestamp of the target identification image in the target video segment is backward, the higher the duration of the time when the falling person falls in the painful state is indicated, so the server can divide the target video segment into at least two segments to be processed according to the unit duration, and configure each segment to be processed with a corresponding preset weight, so that the preset weight corresponding to the segment to be processed after the time is greater than the preset weight corresponding to the segment to be processed before the time, thereby ensuring the objectivity and accuracy of the subsequent analysis processing.

S602: and acquiring the target scores corresponding to the fragments to be processed based on the number of images corresponding to all the target identification images and the number of images corresponding to all the images to be identified in each fragment to be processed.

Specifically, the server employsThe formula obtains the target scores corresponding to the fragments to be processed for the number of images corresponding to all target identification images and the number of images corresponding to all the images to be identified in each fragment to be processed; wherein P is a target score corresponding to a certain fragment to be processed, A is the number of images corresponding to all target identification images in a certain fragment to be processed, B is the number of images corresponding to all the images to be identified in a certain fragment to be processed, and K is a constant for normalizing the target score to a specific numerical value interval.

S603: and carrying out weighting treatment on preset weights and target scores corresponding to at least two fragments to be treated, and obtaining fall score values corresponding to the target video fragments.

Specifically, the server employsThe formula carries out weighting treatment on preset weights and target scores corresponding to at least two fragments to be treated, and obtains fall-down score values corresponding to the target video fragments; wherein S is the tumble score value corresponding to the target video segment, P _i For the target score corresponding to the ith fragment to be processed, W _i The preset weight corresponding to the ith to-be-processed segment is j, which is the number of all to-be-processed segments in the target video segmentAmount of the components.

S604: inquiring a falling degree comparison table based on the falling score value, and acquiring the falling severity degree corresponding to the target video segment.

The falling degree comparison table is a comparison table preset by the system and used for reflecting the falling severity degree and the corresponding scoring range. Specifically, after acquiring the fall score value corresponding to the target video segment, the server determines a score range corresponding to the fall score value according to the fall score value, and determines the fall severity corresponding to the score range as the fall severity of the target video segment, thereby achieving the purpose of quickly determining the fall severity.

In the method for detecting and processing the tumbling behaviors, a target video segment is divided into at least two segments to be processed, each segment to be processed corresponds to a preset weight, then the number of images of target identification images in a certain segment to be processed and the number of images of the images to be identified are counted, corresponding target scores are determined, and weighting calculation is carried out by using the preset weights and the target scores to obtain the tumbling scores corresponding to the target video segment, so that the tumbling scores are objective and accurate; by using the fall score value to query the fall degree comparison table, the fall severity corresponding to the target video segment can be rapidly obtained, and the analysis efficiency of the fall severity is improved.

Further, when analyzing the severity of the fall corresponding to the target video segment, weighting the preset weights and the target scores corresponding to at least two segments to be processed, and then extracting the target face image from the target video segment after obtaining the fall score corresponding to the target video segment (the implementation process is consistent with step S701), detecting the target face image by using a pre-trained age detection model to obtain the predicted age corresponding to the fall, querying an age score table based on the predicted age, obtaining an age constant corresponding to the predicted age, and updating the fall score corresponding to the target video segment by using the product of the fall score multiplied by the age constant, so that the fall score can consider the age state of the fall, thereby facilitating the subsequent better analysis of the severity of the fall corresponding to the target video segment. For example, the age constant may be set within a range of 0-2, such as an age constant corresponding to an age median (e.g., 40 years) may be set to 1; the greater the age, the greater the value thereof, which indicates that the age has a greater effect on the severity of the fall of the person, whereas the lesser the age, the lesser the value thereof, which indicates that the age has a lesser effect on the severity of the fall of the person. For example, young children typically do not fall with serious consequences, while elderly people fall with serious fractures or other risks. The age detection model can adopt a model for predicting age, which is obtained by training positive and negative samples carrying age labels by CNN or other networks.

In an embodiment, as shown in fig. 7, before sending the signal to the reminding terminal corresponding to the target video clip, the method for detecting and processing the falling behavior further includes the following steps:

s701: and extracting a target face image from the target video segment, and querying a system user image library based on the target face image.

The target face image is an image of a clearer tumbling person which is extracted from the target video segment and contains the front face of the face. The system user image library is a database in the system for storing user images. The system user image library can store the registered user image corresponding to the registered user, and can also store the registered user image and the accompanying user image corresponding to the registered user. The registered user may refer to a user who performs registration in a system corresponding to a public place where the video capture device is installed. The registered user image is a user image associated with the registered user, and may be an image of the registered user itself or an image corresponding to an object to be cared for by the registered user. A companion user image may be understood as an image of a companion registered user or an object entering a public place with the object the registered user is about to care for but not registered with the system.

Specifically, the server may select a clearer image to be identified, which includes a front face of a face, from the target video segment, determine the image as a target face image, and query a system user image library based on the target face image to determine whether the target face image is a registered user image or an accompanying user image, and specifically may use a face feature similarity matching algorithm to perform identification to determine whether the target face image is the registered user image or the accompanying user image.

S702: if the registered user image or the accompanying user image corresponding to the target face image exists in the system user image library, determining the registered terminal corresponding to the registered user image as a reminding terminal corresponding to the target video clip.

Specifically, if a registered user image corresponding to a target face image exists in the system user image library, the user image library indicates that the user is a registered user or an object to be cared for which is determined by system registration in advance, at this time, a registered terminal corresponding to the registered user image can be determined as a reminding terminal corresponding to the target video clip, so as to send the target video clip and medical advice information to the reminding terminal. If the registered user image corresponding to the target face image does not exist in the system user image library, but the accompanying user image corresponding to the target face image exists, the fact that the tumbling person is likely to be a target recognized or familiar object of the registered user is indicated, the corresponding registered user image can be found based on the accompanying user image, the registered terminal corresponding to the registered user image is determined to be a reminding terminal corresponding to the target video segment, and the target video segment and the medical advice information are sent to the reminding terminal.

S703: if the registered user image or the accompanying user image corresponding to the target face image does not exist in the system user image library, determining the management terminal corresponding to the video acquisition equipment as a reminding terminal corresponding to the target video clip.

Specifically, when no registered user image or accompanying user image corresponding to the target face image exists in the system user image library, the fact that the user falls down to be a person newly entering a public place or a person recognized by a registered user is indicated, and at the moment, a management terminal corresponding to the video acquisition equipment is determined to be a reminding terminal corresponding to the target video clip.

According to the method for detecting and processing the tumbling behaviors, the target face images are extracted from the target video clips, and different reminding terminals are determined according to whether corresponding registered user images or accompanying user images exist in the system user image library or not according to the target face images, so that the target video clips and medical advice information are sent to the corresponding reminding terminals later, the purpose of accurate reminding is achieved, and serious consequences caused by the fact that the tumbling people do not adopt corresponding treatment measures in time after tumbling can be avoided.

In an embodiment, as shown in fig. 8, before acquiring the video to be identified acquired by the video acquisition device in real time, the method for detecting and processing the tumbling behavior further includes the following steps:

S801: and acquiring historical videos acquired by the video acquisition equipment, wherein each historical video comprises at least one historical video image.

The historical video refers to video acquired before a server acquires video to be identified. The history video image is an image constituting a history video.

S802: and identifying each historical video image by adopting a face detection algorithm, and acquiring at least one original face image in the historical video images.

The face detection algorithm is an algorithm for detecting whether a face is included in an image. The original face image is an image corresponding to a face region identified by a face detection algorithm, and the original face image can be understood as an image corresponding to a face region selected by a face frame corresponding to the face detection algorithm.

S803: and inquiring a user image library of the system based on each original face image, and if a registered user image matched with the original face image exists, determining other original face images in the same historical video image as the registered user image as face images to be analyzed.

Specifically, the server queries the system user image library based on each original face image, and determines whether there is a registered user image corresponding to the original face image, and the processing procedure is shown in step S701, which is not repeated. The human image to be analyzed may be understood as other non-registered user images in the same frame of historical video image as the registered user image. For example, if a historical video image includes X, Y and Z three original face images, if X is identified as a registered user image in the system user image library and Y and Z are not registered user images, then Y and Z are determined as face images to be analyzed. The face image to be analyzed can be understood as an accompanying user image corresponding to the registered user which needs to be analyzed.

S804: and determining the historical video image containing the same registered user image and the same face image to be analyzed as a target video image, and counting the coexistence frame number corresponding to the target video image in unit time.

Specifically, the target video image is a history video image that contains the same registered user image and the same face image to be analyzed at the same time, and as in the above example, the target video image is all history video images that contain X and Y at the same time. The unit time is a preset time. It will be appreciated that the number of images of all the historical video images in a unit time can be determined by counting the number of images of all the target video images in the unit time, and determining the number of coexistence frames corresponding to the target video images. The coexistence frame number can be understood as the number of images of all target video images simultaneously containing X and Y per unit time. The larger the coexistence frame number, the greater the probability that X and Y appear simultaneously, the more likely X and Y are to accompany each other, and therefore, whether or not it is a accompanied user image can be evaluated based on the coexistence frame number.

S805: and identifying the registered user image and the face image to be analyzed in each target video image by adopting a microexpressive identification model, and counting the probability of positive emotion which is in positive emotion at the same time.

The micro-expression recognition model in the step S501 may be a micro-expression recognition model, and the recognition processing process is consistent with the step S501, so that repetition is avoided. Positive emotions are happy, happy or other emotions reflecting that a person is in a positive state, as opposed to negative emotions in a negative state such as anger, etc.

Specifically, if the number of images corresponding to all target video images in a unit time is R, a microexpressive recognition model is used to recognize a registered user image and a face image to be analyzed in each target video image, if the microexpressive types recognized by the registered user image and the face image to be analyzed are microexpressive types corresponding to positive emotions, the number of image frames U in the positive emotion is increased by 1 until all target video images are analyzed, so as to determine the number of image frames U in the positive emotion at the same time finally, and the positive emotion probability L in the positive emotion at the same time is calculated by using l=u/R.

S806: if the coexisting frame number is larger than the preset frame number threshold and the positive emotion probability is larger than the preset probability threshold, the face image to be analyzed is determined to be the accompanying user image corresponding to the registered user image, and the accompanying user image and the registered user image are stored in a system user image library in a correlated mode.

The preset frame number threshold is a preset threshold used for evaluating whether the frame numbers are accompanied with each other. If the coexisting frame number is larger than the preset frame number threshold, the number of times that the registered user image and the face image to be analyzed are simultaneously present in one frame of target video image is larger than the preset threshold for identifying that the registered user image and the face image to be analyzed accompany each other.

The preset probability threshold is a preset threshold for evaluating whether the two are friendly. If the probability of the positive emotion is larger than the preset probability threshold, the probability that two persons corresponding to the registered user image and the face image to be analyzed are in the positive emotion is larger, and the probability of the two artificial friends is larger.

Specifically, when the coexistence frame number is greater than the preset frame number threshold and the positive emotion probability is greater than the preset probability threshold, the server determines that the object corresponding to the accompanying user image is most likely to be the object of the user corresponding to the accompanying registered user image entering the acquisition area corresponding to the video acquisition device, so that the face image to be analyzed can be determined to be the accompanying user image corresponding to the registered user image, the accompanying user image and the registered user image are associated and stored in the system user image library, and the targeted reminding can be realized when the reminding terminal is determined by inquiring the registered user image or the accompanying user image.

In the tumble behavior detection processing method provided by the embodiment, other original face images in the same historical video image as the registered user image are determined as the face image to be analyzed, so that the objectivity of the follow-up determination of the accompanying user image is ensured; and determining the face image to be analyzed, of which the coexistence frame number is larger than a preset frame number threshold value and the positive emotion probability is larger than a preset probability threshold value, as the accompanying user image corresponding to the registered user image, so as to ensure the accuracy and objectivity of the accompanying user image determination.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

In one embodiment, a falling behavior detection processing device is provided, and the falling behavior detection processing device corresponds to the falling behavior detection processing method in the embodiment one by one. As shown in fig. 9, the falling behavior detection processing device includes a video acquisition module 901 to be identified, a falling motion detection module 902, a target video clip interception module 903, a falling severity acquisition module 904, a medical advice information acquisition module 905, and an information transmission module 906. The functional modules are described in detail as follows:

The video to be identified acquisition module 901 acquires a video to be identified acquired by the video acquisition equipment in real time.

The falling motion detection module 902 is configured to identify a video to be identified by using an R-C3D-based behavior detection model, and determine whether a behavior motion corresponding to the video to be identified includes a falling motion.

The target video segment intercepting module 903 is configured to intercept a target video segment corresponding to a falling action from the video to be identified if the behavior action corresponding to the video to be identified includes the falling action.

And the falling severity obtaining module 904 is configured to analyze the severity of the target video segment, and obtain the falling severity corresponding to the target video segment.

The medical advice information obtaining module 905 is configured to obtain medical advice information corresponding to the target video clip based on the fall severity.

And the information sending module 906 sends the target video clip and the medical advice information to a reminding terminal corresponding to the target video clip.

Preferably, the falling motion detection module 902 includes:

the original video segment acquisition unit is used for performing staggered cutting on the video to be identified based on the segment duration threshold value and the overlapping duration threshold value to acquire at least two original video segments, and each original video segment corresponds to a segment time stamp.

The video clip action detection unit is used for sequentially inputting at least two original video clips into the R-C3D-based action detection model for recognition according to the sequence of the clip time stamps, and determining whether each original video clip comprises a falling action or not.

Preferably, the target video clip interception module 903 includes:

the starting time stamp determining unit is used for determining that the time stamp corresponding to the image to be identified corresponding to the falling action is the starting time stamp if the action corresponding to the video to be identified comprises the falling action.

And the termination time stamp determining unit is used for determining a termination time stamp based on the starting time stamp and the analysis duration threshold value.

The video segment intercepting unit is used for intercepting the video segment between the starting time stamp and the ending time stamp from the video to be identified and determining the video segment as the target video segment corresponding to the tumbling action.

Preferably, each target video clip includes at least one image to be identified. The fall severity acquisition module 904 includes:

the micro-expression type acquisition unit is used for identifying each image to be identified in the target video segment by adopting the micro-expression identification model, and acquiring the micro-expression type corresponding to each image to be identified.

And the target mouth image acquisition unit is used for detecting and positioning the image to be identified by adopting a face feature point detection algorithm if the micro expression type is a preset expression type, so as to acquire a target mouth image containing a face mouth area, wherein the target mouth image comprises N face feature points and feature positions corresponding to each face feature point.

The target recognition image determining unit is used for acquiring the average distance of lip points in the mouth based on the feature positions corresponding to the N face feature points, and determining the image to be recognized corresponding to the target mouth image as a target recognition image if the average distance of lip points in the mouth is greater than a preset distance threshold.

The falling severity obtaining unit is used for obtaining the falling severity corresponding to the target video segment according to the number of images corresponding to all the target identification images in the target video segment.

Preferably, the fall severity acquisition unit includes:

the preset weight acquisition subunit is used for dividing the target video segment into at least two segments to be processed, and acquiring the preset weight corresponding to each segment to be processed, wherein the preset weight of the segment to be processed after the time is greater than the preset weight of the segment to be processed before the time.

And the target score obtaining subunit is used for obtaining the target score corresponding to the fragment to be processed based on the number of images corresponding to all the target identification images and the number of images corresponding to all the images to be identified in each fragment to be processed.

And the tumble score obtaining subunit is used for carrying out weighting processing on preset weights and target scores corresponding to the at least two fragments to be processed to obtain the tumble score corresponding to the target video fragment.

And the falling severity obtaining subunit is used for inquiring the falling severity comparison table based on the falling score value and obtaining the falling severity corresponding to the target video fragment.

Preferably, before the information transmission module 906, the falling behavior detection processing device further includes:

and the target face image query unit is used for extracting target face images from the target video clips and querying a system user image library based on the target face images.

The first reminding terminal determining unit is used for determining the registered terminal corresponding to the registered user image as the reminding terminal corresponding to the target video clip if the registered user image or the accompanying user image corresponding to the target face image exists in the system user image library.

And the second reminding terminal determining unit is used for determining the management terminal corresponding to the video acquisition equipment as the reminding terminal corresponding to the target video clip if the registered user image or the accompanying user image corresponding to the target face image does not exist in the system user image library.

Preferably, before the video acquisition module 901 to be identified, the falling behavior detection processing device further includes:

the historical video acquisition unit is used for acquiring historical videos acquired by the video acquisition equipment, and each historical video comprises at least one historical video image.

The original face image acquisition unit is used for identifying each historical video image by adopting a face detection algorithm to acquire at least one original face image in the historical video images.

The face image acquisition unit to be analyzed is used for inquiring the system user image library based on each original face image, and if registered user images matched with the original face images exist, other original face images which are in the same historical video image with the registered user images are determined to be face images to be analyzed.

And the coexisting frame number counting unit is used for determining historical video images containing the same registered user image and the same face image to be analyzed as target video images and counting the coexisting frame number corresponding to the target video images in unit time.

And the positive emotion probability acquisition unit is used for identifying the registered user image and the face image to be analyzed in each target video image by adopting the microexpressive recognition model, and counting the positive emotion probability of being in positive emotion at the same time.

And the accompanying user image determining unit is used for determining the face image to be analyzed as an accompanying user image corresponding to the registered user image if the coexisting frame number is larger than a preset frame number threshold value and the positive emotion probability is larger than a preset probability threshold value, and storing the accompanying user image and the registered user image in a system user image library in a correlated manner.

The specific limitation concerning the falling behavior detection processing means may be referred to the limitation concerning the falling behavior detection processing method hereinabove, and will not be described in detail herein. The above-described respective modules in the tumbling behavior detection processing apparatus may be realized in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer equipment is used for storing data adopted or generated in the process of executing the tumbling action detection processing method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a fall behavior detection processing method.

In one embodiment, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the computer program to implement the steps of the method for detecting a falling behavior in the foregoing embodiment, such as steps S201 to S206 shown in fig. 2 or steps shown in fig. 3 to 8, and the repetition is avoided. Alternatively, the processor may implement the functions of each module/unit in this embodiment of the falling behavior detection processing apparatus when executing the computer program, for example, the functions of each module/unit/subunit shown in fig. 9, which are not described herein again for avoiding repetition.

In an embodiment, a computer readable storage medium is provided, and a computer program is stored on the computer readable storage medium, where the computer program when executed by a processor implements the steps of the method for detecting a falling behavior in the above embodiment, for example, steps S201 to S206 shown in fig. 2, or steps shown in fig. 3 to 8, and is not repeated here. Alternatively, the computer program when executed by the processor implements the functions of each module/unit in the embodiment of the falling behavior detection processing apparatus, for example, the functions of each module/unit/subunit shown in fig. 9, and will not be repeated here.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. The tumbling behavior detection processing method is characterized by comprising the following steps:

acquiring historical videos acquired by video acquisition equipment, wherein each historical video comprises at least one historical video image;

Identifying each historical video image by adopting a face detection algorithm, and acquiring at least one original face image in the historical video images;

inquiring a system user image library based on each original face image, and if a registered user image matched with the original face image exists, determining other original face images which are in the same historical video image as the registered user image as face images to be analyzed;

determining a historical video image containing the same registered user image and the same face image to be analyzed as a target video image, and counting the corresponding coexistence frame number of the target video image in unit time;

identifying the registered user image and the face image to be analyzed in each target video image by adopting a microexpressive identification model, and counting the probability of positive emotion which is in positive emotion at the same time;

if the coexistence frame number is greater than a preset frame number threshold and the positive emotion probability is greater than a preset probability threshold, determining the face image to be analyzed as a accompany user image corresponding to the registered user image, and storing the accompany user image and the registered user image in a system user image library in an associated manner;

if the behavior action corresponding to the video to be identified comprises a falling action, capturing target video fragments corresponding to the falling action from the video to be identified, wherein each target video fragment comprises at least one image to be identified;

identifying each image to be identified in the target video segment by adopting a microexpressive identification model, and acquiring a microexpressive type corresponding to each image to be identified; if the micro expression type is a preset expression type, detecting and positioning the image to be identified by adopting a face feature point detection algorithm to obtain a target mouth image containing a face mouth region, wherein the target mouth image comprises N face feature points and feature positions corresponding to the face feature points; acquiring average distance of lip points in a mouth based on the feature positions corresponding to the N face feature points, and determining an image to be identified corresponding to the target mouth image as a target identification image if the average distance of lip points in the mouth is greater than a preset distance threshold; dividing the target video segment into at least two segments to be processed, and acquiring a preset weight corresponding to each segment to be processed, wherein the preset weight of the segment to be processed after the time is greater than that of the segment to be processed before the time; acquiring a target score corresponding to each fragment to be processed based on the number of images corresponding to all the target identification images and the number of images corresponding to all the images to be identified in each fragment to be processed; weighting the preset weights and the target scores corresponding to at least two fragments to be processed to obtain the tumble scores corresponding to the target video fragments; inquiring a fall degree comparison table based on the fall grading value, and acquiring the fall severity corresponding to the target video segment;

2. The method for detecting and processing the tumbling behavior according to claim 1, wherein the identifying the video to be identified by using the R-C3D-based behavior detection model, and determining whether the behavior corresponding to the video to be identified includes the tumbling behavior, comprises:

based on a segment duration threshold and an overlapping duration threshold, performing staggered cutting on the video to be identified to obtain at least two original video segments, wherein each original video segment corresponds to a segment time stamp;

and inputting at least two original video clips into an R-C3D-based behavior detection model in turn according to the sequence of the clip time stamps to identify, and determining whether each original video clip comprises a falling action.

3. The method for detecting and processing the tumbling behavior according to claim 1, wherein if the behavior action corresponding to the video to be identified includes a tumbling action, intercepting a target video segment corresponding to the tumbling action from the video to be identified, comprises:

If the behavior action corresponding to the video to be identified comprises a falling action, determining a time stamp corresponding to the image to be identified corresponding to the falling action as an initial time stamp;

determining a termination timestamp based on the start timestamp and an analysis duration threshold;

and intercepting the video segments between the starting time stamp and the ending time stamp from the video to be identified, and determining the video segments as target video segments corresponding to the tumbling action.

4. The method for detecting and processing the falling behavior according to claim 1, wherein before the sending to the reminding terminal corresponding to the target video clip, the method for detecting and processing the falling behavior further comprises:

extracting a target face image from the target video segment, and inquiring a system user image library based on the target face image;

if a registered user image or an accompanying user image corresponding to the target face image exists in the system user image library, determining a registered terminal corresponding to the registered user image as a reminding terminal corresponding to the target video clip;

and if the registered user image or the accompanying user image corresponding to the target face image does not exist in the system user image library, determining a management terminal corresponding to the video acquisition equipment as a reminding terminal corresponding to the target video segment.

5. A tumble behavior detection processing device, characterized by comprising:

the system comprises a historical video acquisition unit, a video acquisition unit and a video processing unit, wherein the historical video acquisition unit is used for acquiring historical videos acquired by video acquisition equipment, and each historical video comprises at least one historical video image;

the original face image acquisition unit is used for identifying each historical video image by adopting a face detection algorithm to acquire at least one original face image in the historical video images;

the face image acquisition unit to be analyzed is used for inquiring a system user image library based on each original face image, and if registered user images matched with the original face images exist, other original face images which are in the same historical video image with the registered user images are determined to be face images to be analyzed;

the coexisting frame number statistics unit is used for determining historical video images containing the same registered user image and the same face image to be analyzed as target video images and counting coexisting frame numbers corresponding to the target video images in unit time;

the positive emotion probability acquisition unit is used for identifying the registered user image and the face image to be analyzed in each target video image by adopting a microexpressive identification model, and counting the positive emotion probability of being in positive emotion at the same time;

The accompanying user image determining unit is used for determining the face image to be analyzed as an accompanying user image corresponding to the registered user image if the coexisting frame number is larger than a preset frame number threshold value and the positive emotion probability is larger than a preset probability threshold value, and storing the accompanying user image and the registered user image in a system user image library in an associated manner;

the target video segment intercepting module is used for intercepting target video segments corresponding to the falling actions from the videos to be identified if the action actions corresponding to the videos to be identified comprise the falling actions, and each target video segment comprises at least one image to be identified;

the falling severity obtaining module is used for identifying each image to be identified in the target video segment by adopting a microexpressive identification model to obtain a microexpressive type corresponding to each image to be identified; if the micro expression type is a preset expression type, detecting and positioning the image to be identified by adopting a face feature point detection algorithm to obtain a target mouth image containing a face mouth region, wherein the target mouth image comprises N face feature points and feature positions corresponding to the face feature points; acquiring average distance of lip points in a mouth based on the feature positions corresponding to the N face feature points, and determining an image to be identified corresponding to the target mouth image as a target identification image if the average distance of lip points in the mouth is greater than a preset distance threshold; dividing the target video segment into at least two segments to be processed, and acquiring a preset weight corresponding to each segment to be processed, wherein the preset weight of the segment to be processed after the time is greater than that of the segment to be processed before the time; acquiring a target score corresponding to each fragment to be processed based on the number of images corresponding to all the target identification images and the number of images corresponding to all the images to be identified in each fragment to be processed; weighting the preset weights and the target scores corresponding to at least two fragments to be processed to obtain the tumble scores corresponding to the target video fragments; inquiring a fall degree comparison table based on the fall grading value, and acquiring the fall severity corresponding to the target video segment;

6. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the fall behavior detection processing method according to any one of claims 1 to 4 when the computer program is executed.

7. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the fall behavior detection processing method according to any one of claims 1 to 4.