WO2021031384A1 - Fall-down behavior detection processing method and apparatus, and computer device and storage medium - Google Patents

Fall-down behavior detection processing method and apparatus, and computer device and storage medium Download PDF

Info

Publication number
WO2021031384A1
WO2021031384A1 PCT/CN2019/116490 CN2019116490W WO2021031384A1 WO 2021031384 A1 WO2021031384 A1 WO 2021031384A1 CN 2019116490 W CN2019116490 W CN 2019116490W WO 2021031384 A1 WO2021031384 A1 WO 2021031384A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
target
image
fall
recognized
Prior art date
Application number
PCT/CN2019/116490
Other languages
French (fr)
Chinese (zh)
Inventor
王健宗
王义文
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021031384A1 publication Critical patent/WO2021031384A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/02Alarms for ensuring the safety of persons
    • G08B21/04Alarms for ensuring the safety of persons responsive to non-activity, e.g. of elderly persons
    • G08B21/0407Alarms for ensuring the safety of persons responsive to non-activity, e.g. of elderly persons based on behaviour analysis
    • G08B21/043Alarms for ensuring the safety of persons responsive to non-activity, e.g. of elderly persons based on behaviour analysis detecting an emergency event, e.g. a fall

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a fall behavior detection and processing method, device, computer equipment, and storage medium.
  • a fall is a sudden fall to the ground. If the circumstances are serious, it may have serious consequences for the health of the falling person. For example, for an elderly person who falls, it may cause psychological creation, fractures and soft tissue damage due to the fall, which may affect the physical and mental health of the person who falls.
  • the embodiments of the present application provide a fall behavior detection processing method, device, computer equipment, and storage medium to solve the problem of how to quickly and accurately identify whether there is a fall behavior and give targeted reminders.
  • a fall behavior detection and processing method including:
  • behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;
  • the target video segment and the medical advice information are sent to a reminder terminal corresponding to the target video segment.
  • a fall behavior detection and processing device including:
  • the to-be-recognized video acquisition module acquires the to-be-recognized video collected by the video capture device in real time;
  • a fall action detection module configured to use an R-C3D-based behavior detection model to recognize the video to be recognized, and determine whether the behavior action corresponding to the video to be recognized includes a fall action;
  • a target video segment interception module configured to, if the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;
  • a fall severity acquisition module configured to analyze the severity of the target video clip, and obtain the fall severity corresponding to the target video clip
  • a medical advice information acquisition module configured to acquire medical advice information corresponding to the target video clip based on the severity of the fall;
  • the information sending module sends the target video clip and the medical advice information to the reminder terminal corresponding to the target video clip.
  • a computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
  • behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;
  • the target video segment and the medical advice information are sent to a reminder terminal corresponding to the target video segment.
  • One or more readable storage media storing computer readable instructions
  • the computer readable storage medium storing computer readable instructions
  • the one Or multiple processors perform the following steps:
  • behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;
  • the target video segment and the medical advice information are sent to a reminder terminal corresponding to the target video segment.
  • FIG. 1 is a schematic diagram of an application environment of a fall behavior detection and processing method in an embodiment of the present application
  • FIG. 2 is a flowchart of a method for detecting and processing a fall behavior in an embodiment of the present application
  • FIG. 3 is another flowchart of the method for detecting and processing a fall behavior in an embodiment of the present application
  • FIG. 4 is another flowchart of a fall behavior detection processing method in an embodiment of the present application.
  • FIG. 5 is another flowchart of the method for detecting and processing a fall behavior in an embodiment of the present application
  • FIG. 6 is another flowchart of a method for detecting and processing a fall behavior in an embodiment of the present application
  • FIG. 7 is another flowchart of a fall behavior detection processing method in an embodiment of the present application.
  • FIG. 8 is another flowchart of a fall behavior detection processing method in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a fall behavior detection and processing device in an embodiment of the present application.
  • Fig. 10 is a schematic diagram of a computer device in an embodiment of the present application.
  • the fall behavior detection and processing method provided by the embodiment of the present application can be applied to the application environment as shown in FIG. 1.
  • the method for detecting and processing falling behaviors is applied in a system for detecting and processing falling behaviors.
  • the system for detecting and processing falling behaviors includes a client and a server as shown in FIG. 1.
  • the client and the server communicate through a network for Realize the rapid detection and recognition of the fall action from the video to be recognized and analyze the severity of the fall, and provide targeted reminders based on the severity of the fall to avoid delaying the treatment time and causing serious consequences.
  • the client is also called the client, which refers to the program that corresponds to the server and provides local services to the client.
  • the client can be installed on, but not limited to, various personal computers, laptops, smart phones, tablet computers, and portable wearable devices.
  • the server can be implemented as an independent server or a server cluster composed of multiple servers.
  • a method for detecting and processing a fall behavior is provided. Taking the method applied to the server in FIG. 1 as an example, the method includes the following steps:
  • S201 Acquire a to-be-recognized video captured by the video capture device in real time.
  • the to-be-identified video is an unidentified video collected in real time by a video acquisition device.
  • Video capture equipment is a device used to capture video, which can be set up in shopping malls, hospitals, nursing places or other public places, or set up by guardians in the homes of elderly people living alone.
  • S202 Recognize the video to be recognized by using the behavior detection model based on R-C3D, and determine whether the behavior action corresponding to the video to be recognized includes a fall motion.
  • the behavior detection model based on R-C3D uses the R-C3D network pre-trained to identify the behavior of people in the video model.
  • the behavior action corresponding to the video to be recognized refers to the behavior action recognized from the video to be recognized.
  • using the R-C3D behavior detection model to recognize the video to be recognized can quickly determine that the behavior of the person in the video to be recognized includes a falling motion.
  • the R-C3D network is a network trained in an end-to-end manner, and a three-dimensional convolution kernel is used to process the video to be recognized.
  • R-C3D has 8 convolution operations and 5 pooling operations.
  • the size of the convolution kernel is 3*3*3, and the step size is 1*1*1.
  • the pooling core is 2*2*2, but in order not to shorten the timing length too early, the pooling size and step size of the first layer are 1*2*2; finally, the R-C3D network passes through two full cycles.
  • the input image of the R-C3D network is 3*L*H*W, where 3 is RGB three channels, L is the number of frames of the input image, and H*W is the size of the input image.
  • the R-C3D-based behavior detection model is a model for detecting actions in video people obtained by end-to-end training based on the R-C3D network.
  • the positive samples that contain the fall action and the positive sample that does not contain the fall can be used in the model training process.
  • the negative samples of actions (that is, video clips corresponding to actions other than the falling action) are trained on the model according to a preset ratio (which can be set to 1:1 to achieve a balanced sample and avoid overfitting).
  • R-C3D is based on C3D for frame classification (Frame Label), it can quickly detect whether there is a fall action in the video to be recognized, and it can perform end-to-end fall detection for any length of video and any length of behavior, and share timing
  • the C3D parameters of the generation and classification network are very fast, which helps to ensure the efficiency of fall detection.
  • the target video segment is a video segment that is cut out from the video to be recognized and used to analyze the severity of the fall corresponding to the fall action. Understandably, after a person falls, because of the different pain caused by the fall, his facial micro-expression will show different micro-expression changes or the body posture will show actions that match the pain. Therefore, Analyze the video after the fall action in order to analyze the severity of the fall of the person who fell.
  • the server uses the R-C3D behavior detection model to recognize that the video to be recognized contains a falling motion, it can intercept a video clip containing the falling motion and a certain length of time after the falling motion from the video to be recognized as the target video clip , In order to analyze the severity of the fall, thereby reducing the amount of data processed by the severity analysis and improving the efficiency and accuracy of the analysis.
  • S204 Analyze the severity of the target video segment, and obtain the severity of the fall corresponding to the target video segment.
  • the server is analyzing the severity of the target video clip and can objectively and quickly analyze the fall of the person falling in the target video clip. Down the severity. For example, if the person who fell in the target video clip was a young person, and the target video clip after the fall showed that his facial micro-expression did not show painful expressions or showed painful expressions for a very short time, the severity of the fall could be determined Lower. For another example, if the falling person in the target video clip is an elderly person, the target video clip after the fall shows his facial micro-expression showing a painful expression that lasts for a long time, or has actions such as stroking the collision point for a long time. It can be determined that the severity of the fall is relatively high.
  • S205 Obtain medical advice information corresponding to the target video clip based on the severity of the fall.
  • the server compares the severity of the fall with a preset degree threshold according to the severity of the fall analyzed by the target video clip to determine whether medical advice needs to be provided.
  • the preset degree threshold is a preset degree threshold for evaluating whether medical advice needs to be provided. If the severity of the fall is less than the preset degree threshold, there is no need to provide medical advice, and the target video clip can be directly sent to the corresponding reminder terminal to remind the fall behavior. If the severity of the fall is not less than the preset degree threshold, identify the fall bone joint point corresponding to the fall action from the target video clip, query the medical advice database based on the fall bone joint point, and obtain the bone joint with the fall The medical advice information corresponding to the point is used as the medical advice information corresponding to the target video segment.
  • the falling skeletal joint points refer to the identification of the bones or joint points that the feller collided when he fell from the target video clip. By identifying the falling skeletal joint points, it can help to provide corresponding medical treatment to the falling person. Suggest.
  • the medical advice information database is an information database used to store medical test recommendations or medical medication recommendations that need to be performed when each bone joint point is fallen.
  • Medical advice information refers to information on determining which medical tests or medical medications are needed according to the bone joint points of the fall. For example, if the knee joint in the target video clip first touches the ground when a fall, the bone joint point of the fall is the knee joint, and the acquired medical advice information is medical advice related to knee joint injuries.
  • S206 Send the target video clip and the medical advice information to the reminder terminal corresponding to the target video clip.
  • the reminder terminal is a terminal for receiving target video clips or target video clips and medical advice information.
  • the reminder terminal is a terminal corresponding to the user who installs the video capture device.
  • its corresponding reminder terminal can be the reminder terminal corresponding to the staff in the public place, specifically it can be the mobile terminal carried by the staff working at the entrance and exit of the public place, in order to fall down
  • the reminder terminal may be a terminal bound to the video capture device.
  • the R-C3D behavior detection model can quickly identify whether the video to be recognized contains a fall action, so as to improve the detection efficiency and detection accuracy of the fall action;
  • the target video clip corresponding to the fall action is intercepted from the video to be recognized, and the severity of the target video clip is analyzed to reduce the amount of analysis data for the severity analysis, improve the analysis efficiency and accuracy; obtain corresponding medical advice based on the severity of the fall Information, medical advice information and target video clips are sent to the reminder terminal to realize targeted reminders for the fall behavior and avoid the risk of not adopting corresponding treatment measures after the fall.
  • step S202 that is, using a behavior detection model based on R-C3D to recognize the video to be recognized, and determining whether the behavior action corresponding to the video to be recognized includes a falling motion, includes the following steps:
  • S301 Based on the segment duration threshold and the overlap duration threshold, perform interlaced cutting of the video to be identified to obtain at least two original video segments, each of which corresponds to a segment time stamp.
  • the segment duration threshold is a preset threshold for cutting the duration of the original video segment, that is, the duration of each original video segment cut out in this embodiment is the segment duration threshold, such as 10s.
  • the overlap duration threshold is a preset threshold for the duration of overlapping two adjacent original video segments when cutting the original video segment, such as 3s.
  • the original video segment is a unit segment cut out from the video to be recognized and used for inputting the behavior detection model for recognition.
  • the segment time stamp corresponding to the original video segment may be the time stamp corresponding to the first image in the original video segment, so as to determine the corresponding video cutting sequence based on the segment time stamp.
  • the server performs interlaced cutting of the video to be identified based on the segment duration threshold and the overlap duration threshold, which can ensure that any two adjacent original video segments cut out are partially overlapped, so as to ensure the accuracy of subsequent fall motion detection Therefore, when a fall action is formed between two consecutive original video clips that do not overlap, any one of the original video clips cannot be judged separately to form a fall action. For example, for a video to be recognized, first cut the 0-10s segment to form the first original video segment, then cut the 7-17s segment to form the second original video segment, and then cut the 14th to the first original video segment. The 24s segment forms the third original video segment, and so on, until all the original video segments are cut.
  • S302 According to the sequence of the time stamps of the segments, sequentially input at least two original video segments into the R-C3D-based behavior detection model for recognition, and determine whether each original video segment includes a fall action.
  • the server determines the cutting sequence corresponding to at least two original video clips according to the sequence of the clip timestamps, and sequentially inputs the at least two original video clips into the R-C3D-based behavior detection model for recognition, so as to determine the original video clip Whether it includes a falling motion, it can be objectively and quickly determined whether each original video clip includes a falling motion.
  • the video to be recognized is interleaved according to the segment duration threshold and the overlap duration threshold to obtain at least two original video segments, and input the at least two original video segments in sequence based on R-C3D's behavior detection model performs recognition to quickly determine whether each original video segment contains a fall action, so as to split the long-duration to-be-recognized video into short-duration original video segments for recognition.
  • R-C3D's behavior detection model performs recognition to quickly determine whether each original video segment contains a fall action, so as to split the long-duration to-be-recognized video into short-duration original video segments for recognition.
  • step S203 that is, if the behavior action corresponding to the video to be recognized includes a falling motion
  • intercepting the target video segment corresponding to the falling motion from the video to be recognized includes the following steps:
  • the image to be recognized is an image that constitutes the video to be recognized.
  • the server recognizes that the behavior action corresponding to the video to be recognized includes a falling motion
  • the time stamp corresponding to the image to be recognized at the moment when the falling motion is detected is used as the starting timestamp corresponding to the segmentation target video segment.
  • the time stamp of the image to be recognized corresponding to the fall action is determined as the starting time stamp, so that the target video clip can be intercepted from this starting time stamp, and the micro-expression changes and body posture after the fall action are analyzed. Change to determine the severity of the corresponding fall.
  • S402 Determine a termination time stamp based on the starting time stamp and the analysis duration threshold.
  • the analysis duration threshold is a threshold preset by the system for determining the video duration for which the severity of the fall needs to be analyzed. Specifically, the server adds the duration of the analysis duration threshold from the start timestamp to determine the end timestamp corresponding to the target video segment for dividing.
  • S403 Intercept a video segment between the start time stamp and the end time stamp from the video to be identified, and determine the target video segment corresponding to the fall action.
  • each image to be identified in the video to be identified corresponds to a unique time stamp
  • the image to be identified corresponding to the start time stamp is intercepted from the video to be identified as the start image , Using the to-be-recognized image corresponding to the termination timestamp as the video segment of the termination image, and determining the video segment as the target video segment corresponding to the fall action, so as to use the target video segment to analyze the severity of the fall.
  • the time stamp corresponding to the to-be-recognized image corresponding to the fall action is first determined as the starting time stamp, the starting time stamp is used to determine the ending time stamp, and the starting time is intercepted
  • the video segment between the stamp and the termination time stamp is the target video segment, so that each image to be recognized in the target video segment is the analysis time threshold of the falling person after the fall, which can reflect the true emotional changes of the falling person
  • the objective authenticity of the analysis result can be guaranteed.
  • each target video segment includes at least one image to be recognized.
  • step S204 that is, analyzing the severity of the target video segment to obtain the severity of the fall corresponding to the target video segment, includes the following steps:
  • S501 Recognize each to-be-recognized image in the target video segment using a micro-expression recognition model, and obtain a micro-expression type corresponding to each to-be-recognized image.
  • the micro-expression recognition model is a model used to recognize the facial micro-expression in the image to be recognized.
  • the micro-expression recognition model captures the local features of the user's face in the image to be recognized, and determines each target facial action unit of the face in the image to be recognized based on the local features, and then according to the recognized target face The action unit determines its micro-expression model.
  • the micro-expression recognition model may be a neural network recognition model based on deep learning, a local recognition model based on classification, or a local emotion recognition model based on Local Binary Pattern (LBP).
  • LBP Local Binary Pattern
  • the micro-expression recognition model requires pre-collection of a large amount of training image data for model training.
  • the training image data contains positive samples of each facial motion unit and facial motion unit
  • the SVM classification algorithm can be used to train a large amount of training image data to obtain SVM classifiers corresponding to multiple facial action units. For example, it can be 39 SVM classifiers corresponding to 39 facial action units, or 54 SVM classifiers corresponding to 54 facial action units.
  • the training image data contains the positive samples of different facial action units and The more negative samples, the more SVM classifiers are obtained. Understandably, in forming a micro-expression recognition model by using multiple SVM classifiers, the more SVM classifiers obtained, the more accurate the micro-expression type recognized by the formed micro-expression recognition model.
  • micro-expression recognition model formed by the SVM classifier corresponding to 54 facial action units as an example, using this micro-expression recognition model to recognize each image to be recognized in the target video segment, 54 can be identified.
  • Types of micro-expression for example, 54 types of micro-expression including love, interest, surprise, expectation...aggression, conflict, insult, doubt, fear, and pain can be identified.
  • micro-expression type is the preset expression type
  • use the facial feature point detection algorithm to detect and locate the image to be recognized, and obtain a target mouth image that includes the mouth area of the face, and the target mouth image includes N face feature points And the corresponding feature location of each face feature point.
  • the preset expression type is the type that the system pre-sets that it may be the expression after a fall, such as pain and crying.
  • the face feature point detection algorithm is an algorithm for face feature point detection. This algorithm can identify N face feature points and the feature location corresponding to each face feature point from the image to be recognized, that is, in the image In the coordinates. The face feature point detection algorithm can detect and locate the face feature points of the left eye, right eye, left eyebrow, right eyebrow, nose and mouth in the image to be recognized.
  • the image corresponding to the face and mouth area of the standard area size is intercepted from the image to be recognized, and it is determined as the target mouth image for subsequent analysis.
  • the target mouth image is an image corresponding to the mouth area of the human face that matches the size of the standard area intercepted from the image to be recognized.
  • the standard area size is preset by the system to limit the area size for capturing the target mouth image. Specifically, the standard area size can be determined by limiting the width of the mouth. That is, the width of the standard area is determined.
  • the image to be recognized When intercepting the target mouth image from the image to be recognized, the image to be recognized needs to be up-sampled or down-sampled (ie, zooming) to make the person fall in the image to be recognized Match the width of the mouth with the width of the standardized area, and then take a screenshot to obtain a target mouth image with the same width, so as to ensure the accuracy of the analysis result when the subsequent analysis is based on the average distance of the inner lip of the mouth.
  • up-sampled or down-sampled ie, zooming
  • the target mouth image includes N facial feature points and feature positions corresponding to each facial feature point.
  • the N facial feature points in the target mouth image refer to the facial feature points corresponding to the contour of the mouth.
  • Mouth contour includes upper lip contour and lower lip contour
  • upper lip contour includes upper lip outer lip line and upper lip inner lip line
  • lower lip contour includes lower lip inner lip line and lower lip outer lip line.
  • a number of segmentation lines are configured to segment the contour of the mouth to determine the corresponding facial feature points.
  • three dividing lines can be drawn at 1/4, 1/2, 3/4 of the width of the mouth, and each dividing line is connected to the outer lip line of the upper lip, the inner lip line of the upper lip, and the lower lip.
  • the intersection of the inner lip line and the lower lip outer lip line respectively forms a set of facial features including upper lip outer lip point, upper lip inner lip point, lower lip inner lip point and lower lip outer lip point.
  • S503 Obtain the average distance of the inner lips of the mouth based on the feature positions corresponding to the N facial feature points. If the average distance of the inner lips of the mouth is greater than the preset distance threshold, determine the image to be recognized corresponding to the target mouth image as the target recognition image .
  • the server when a person falls, the more painful his micro-expression and the higher the degree of mouth opening, the higher his pain degree, and the more it can reflect the severity of the person who fell. Therefore, when the server obtains the N facial feature points of the target mouth image and their feature positions, it can calculate the average distance of the inner lips of the mouth that reflects the degree of mouth opening. Specifically, the server may first calculate the interline inner lip distance between the upper lip inner lip point and the lower lip inner lip point on each segmentation line, and then calculate the average distance between the inner lip lines corresponding to all the segmentation lines, and that's it. Obtain the average distance of the inner lip of the mouth, so as to use the average distance of the inner lip of the mouth to objectively analyze the pain degree of the customer. Understandably, since all target mouth images are images corresponding to the standard area size, their image widths are the same; at this time, the average distance between the inner lip points of the mouth between all the inner lip points of the mouth can reflect the mouth more accurately Degree of opening.
  • the preset distance threshold is a threshold that is preset by the system for assessing the degree of mouth opening to determine that it is painful. Specifically, after obtaining the average distance of the inner lip of the mouth, the server compares the average distance of the inner lip of the mouth with a preset distance threshold. If the average distance of the inner lip of the mouth is greater than the preset distance threshold, it indicates that the mouth is more open. If it is large, it reflects that the degree of pain is high, and the image to be recognized corresponding to the target mouth image is determined as the target recognition image for subsequent analysis of the severity of the fall.
  • the target recognition image is an image to be recognized whose micro-expression type is the preset expression type, and the average distance of the inner lip point of the mouth is greater than the preset distance threshold, through the micro-expression emotion recognition and the degree of mouth opening. Determining a target recognition image helps ensure the accuracy of the analysis result of the severity of the fall.
  • the target recognition image is the image to be recognized with the micro-expression type of the falling person as the preset expression type, and the average distance of the inner lip of the mouth is greater than the preset distance threshold, which can fully reflect that the falling person is in a more painful state after the fall image. Therefore, under the premise that the number of images to be recognized in the target video clip is fixed, the more the number of target recognition images that reflect the more painful state of the falling person after the fall, the higher the severity of the fall. That is, in this embodiment, the number of images corresponding to the target recognition image is proportional to the severity of the fall, and the corresponding comparison table can be preset to quickly determine the corresponding severity of the fall.
  • the image to be recognized is first subjected to micro-expression analysis, and only the image to be recognized whose micro-expression type is the preset expression type is treated for subsequent facial feature point detection and positioning, which is helpful Reduce the amount of data processing and speed up the efficiency of data processing; then determine the average distance of the inner mouth of the mouth according to the feature positions of the N facial feature points corresponding to the target mouth image, and select the image to be recognized with the average distance of the inner mouth of the mouth greater than the preset distance threshold It is determined as the target recognition image to ensure that the target recognition image can truly and objectively reflect the pain of the person who fell, which helps to ensure the accuracy of subsequent fall severity analysis. Using the number of images corresponding to all target recognition images in the target video segment can quickly determine the severity of its fall, which helps to ensure the analysis efficiency, accuracy and objectivity of the analysis and processing.
  • step S504 that is, obtaining the fall severity corresponding to the target video segment according to the number of images corresponding to all target recognition images in the target video segment, includes the following steps:
  • S601 Divide the target video segment into at least two segments to be processed, and obtain a preset weight corresponding to each segment to be processed.
  • the preset weight of the segment to be processed later is greater than the preset weight of the segment to be processed before the time .
  • the server may divide the target video clip into at least two video clips to be processed based on the unit duration, and the unit duration may be 1/M (M ⁇ 2) of the analysis duration threshold to divide the target video clip into equal durations. At least two fragments to be processed.
  • the preset weight is a weight preset by the system for each segment to be processed. Understandably, since the target video clip is a collection of all the images to be recognized from the recognition of the image to be recognized as a fall action to the analysis time threshold, the time stamp of the target recognition image in the target video clip is After that, it indicates that the duration of the painful state after the fall is higher.
  • the server can divide the target video segment into at least two segments to be processed according to the unit duration, and configure a corresponding preset for each segment to be processed
  • the weight can make the preset weight corresponding to the later segment to be processed be greater than the preset weight corresponding to the segment to be processed before the time, so as to ensure the objectivity and accuracy of subsequent analysis and processing.
  • S602 Based on the number of images corresponding to all target recognition images and the number of images corresponding to all the images to be recognized in each segment to be processed, obtain the target score corresponding to the segment to be processed.
  • the server uses This formula obtains the target score corresponding to the segment to be processed by the number of images corresponding to all target recognition images and the number of images corresponding to all the images to be recognized in each segment to be processed;
  • P is the corresponding segment to be processed Target score
  • A is the number of images corresponding to all target recognition images in a certain segment to be processed
  • B is the number of images corresponding to all recognition images in a certain segment to be processed
  • K is a constant, used to normalize the target score To a specific numerical interval.
  • S603 Perform weighting processing on the preset weights and target scores corresponding to at least two segments to be processed, to obtain a fall score corresponding to the target video segment.
  • the server uses This formula weights the preset weights and target scores corresponding to at least two segments to be processed to obtain the fall score value corresponding to the target video segment; where S is the fall score value corresponding to the target video segment, and P i Is the target score corresponding to the i-th segment to be processed, W i is the preset weight corresponding to the i-th segment to be processed, and j is the number of all segments to be processed in the target video segment.
  • S604 Query the fall degree comparison table based on the fall score value, and obtain the fall severity corresponding to the target video clip.
  • the fall degree comparison table is a comparison table preset by the system to reflect the fall severity and the corresponding scoring range. Specifically, the server obtains the fall score value corresponding to the target video segment, determines its corresponding score range according to the fall score value, and determines the fall severity corresponding to the score range as the fall severity of the target video segment. So as to achieve the purpose of quickly determining the severity of the fall.
  • the target video segment is first divided into at least two segments to be processed, and each segment to be processed corresponds to a preset weight, and then the target recognition image in a segment to be processed is counted Determine the corresponding target score, use the preset weight and target score to perform weighting calculation to obtain the fall score value corresponding to the target video clip, so that the fall score value is more Objective and more accurate; using the fall score value to query the fall degree comparison table, the fall severity corresponding to the target video segment can be quickly obtained, and the efficiency of the fall severity analysis is improved.
  • weighting is performed on the preset weights and target scores corresponding to at least two segments to be processed, and after obtaining the fall score corresponding to the target video segment, Extract the target face image from the target video segment (the implementation process is the same as step S701), and then use the pre-trained age detection model to detect the target face image, obtain the predicted age corresponding to the person who fell, and query based on the predicted age
  • the age score table is used to obtain the age constant corresponding to the predicted age, and the product of the fall score value multiplied by the age constant is used to update the fall score value corresponding to the target video clip, so that the fall score value can be considered for the fall Age status in order to better analyze the severity of the fall corresponding to the target video clip.
  • the age constant can be set in the value range of 0-2.
  • the age constant corresponding to the median age (such as 40 years old) can be set to 1; the older the age, the larger the value, indicating that age has an effect on falling The greater the effect of the severity of a person's fall, on the contrary, the younger the age, the smaller the value, indicating that the effect of age on the severity of a fall is smaller.
  • children generally do not have serious consequences when they fall, while elderly people have serious fractures or other risks when they fall.
  • the age detection model can be a model for predicting age obtained by training positive and negative samples with age labels using CNN or other networks.
  • the method for detecting a fall behavior before sending to the reminder terminal corresponding to the target video clip, the method for detecting a fall behavior further includes the following steps:
  • S701 Extract a target face image from a target video clip, and query a system user image database based on the target face image.
  • the target face image is a clearer image of the falling person that contains the front face of the person extracted from the target video clip.
  • the system user image library is a database used to store user images in the system.
  • the user image library of the system can store registered user images corresponding to registered users, as well as registered user images and accompanying user images corresponding to registered users.
  • the registered user may refer to a user registered in a system corresponding to a public place where the video capture device is installed.
  • the registered user image refers to the user image associated with the registered user, which can be the image of the registered user himself or the image corresponding to the object to be cared for by the registered user.
  • An accompanying user image can be understood as an image of an object that accompanies the registered user or enters a public place with the object to be cared for by the registered user but is not registered in the system.
  • the server may select a clearer image to be recognized that contains the front face of the face from the target video clip to determine the target face image, and then query the system user image database based on the target face image to determine whether the target face image is
  • a registered user image or an accompanying user image can be specifically identified by a facial feature similarity matching algorithm to determine whether it is a registered user image or an accompanying user image.
  • the registered terminal corresponding to the image is determined as the reminder terminal corresponding to the target video clip, so as to send the target video clip and medical advice information to the reminder terminal.
  • the corresponding registered user image can be found based on the accompanying user image, and the registered terminal corresponding to the registered user image can be determined as the reminder terminal corresponding to the target video clip, so as to send the target video clip and medical advice information to the reminder terminal.
  • the management terminal corresponding to the video capture device is determined as the reminder terminal corresponding to the target video segment.
  • a target face image is extracted from a target video clip, and the target face image is determined according to whether there is a corresponding registered user image or accompanying user image in the system user image library Different reminder terminals, so that the target video clips and medical advice information can be sent to the corresponding reminder terminal in order to achieve the purpose of precise reminder, which helps to avoid the seriousness caused by the faller who fails to adopt the corresponding treatment measures in time after the fall. as a result of.
  • the fall behavior detection processing method before acquiring the to-be-recognized video captured by the video capture device in real time, the fall behavior detection processing method further includes the following steps:
  • S801 Obtain historical videos collected by the video capture device, where each historical video includes at least one historical video image.
  • the historical video refers to the video collected before the server obtains the video to be recognized.
  • Historical video images are images that constitute historical videos.
  • S802 Recognize each historical video image using a face detection algorithm, and obtain at least one original face image in the historical video image.
  • the face detection algorithm is an algorithm used to detect whether an image contains a face.
  • the original face image is an image corresponding to the face area recognized by the face detection algorithm, and the original face image can be understood as the image corresponding to the face area selected by the face frame corresponding to the face detection algorithm.
  • S803 Query the system user image database based on each original face image, and if there is a registered user image matching the original face image, determine other original face images in the same historical video image as the registered user image as pending Analyze the face image.
  • the server queries the system user image database based on each original face image to determine whether there is a registered user image corresponding to the original face image.
  • the processing process is shown in step S701. To avoid repetition, we will not repeat them one by one. .
  • the image of the person to be analyzed can be understood as other non-registered user images in the same historical video image as the registered user image. For example, if a historical video image includes three original face images of X, Y, and Z, if it is recognized that X is a registered user image in the system user image library, and Y and Z are not registered user images, then Y and Z Determined as the face image to be analyzed.
  • the face image to be analyzed can be understood as a need to analyze whether it is an accompanying user image corresponding to the registered user.
  • S804 Determine the historical video image containing the same registered user image and the same face image to be analyzed as the target video image, and count the number of coexisting frames corresponding to the target video image per unit time.
  • the target video image is a historical video image that contains both the same registered user image and the same face image to be analyzed.
  • the target video image is all historical video images that contain both X and Y.
  • the unit time is the time set in advance. Understandably, the number of images of all historical video images in a unit time is determined, which can be determined as the number of coexisting frames corresponding to the target video image by counting the number of images of all target video images in the unit time.
  • the number of coexisting frames can be understood as the number of images of all target video images simultaneously containing X and Y per unit time.
  • the greater the number of coexistence frames the greater the probability that X and Y appear at the same time, and X and Y are most likely to accompany each other. Therefore, whether the image is an accompanying user can be evaluated based on the number of coexistence frames.
  • S805 Use the micro-expression recognition model to recognize the registered user image and the face image to be analyzed in each target video image, and count the probability of positive emotions that are simultaneously in a positive emotion.
  • the micro-expression recognition model may be the micro-expression recognition model in step S501, and the recognition process is the same as that in step S501. To avoid repetition, it will not be repeated here.
  • Positive emotions are happiness, happiness, or other emotions that reflect a person's positive state, as opposed to anger, anger, and other negative emotions that are in a negative state.
  • S806 If the number of coexisting frames is greater than the preset frame number threshold and the positive emotion probability is greater than the preset probability threshold, determine the face image to be analyzed as the accompanying user image corresponding to the registered user image, and store the accompanying user image in association with the registered user image In the system user image library.
  • the preset frame number threshold is a preset threshold used to evaluate whether they are accompanied by each other. If the number of coexisting frames is greater than the preset frame number threshold, it means that the number of times that the registered user image and the face image to be analyzed appear in one frame of the target video image at the same time is greater than the preset threshold that determines that the two are accompanied by each other.
  • the preset probability threshold is a preset threshold used to evaluate whether the two are friendly. If the positive emotion probability is greater than the preset probability threshold, it means that the two persons corresponding to the registered user image and the face image to be analyzed are more likely to be in positive emotions, and the probability that the two are friends is greater.
  • the server determines that the object corresponding to the accompanying user image is most likely to be the user corresponding to the accompanying registered user image entering the video capture device. Therefore, the face image to be analyzed can be determined as the accompanying user image corresponding to the registered user image, and the accompanying user image and the registered user image are associated and stored in the system user image library for subsequent query of the registered user When the image or the image of the accompanying user confirms the reminder terminal, a targeted reminder can be realized.
  • other original face images in the same historical video image as the registered user image are determined as the face images to be analyzed, so as to ensure the objectivity of the accompanying user image in the subsequent determination;
  • the face image to be analyzed with the number of coexisting frames greater than the preset frame number threshold and the positive emotion probability greater than the preset probability threshold is determined as the accompanying user image corresponding to the registered user image to ensure the accuracy and objectivity of the accompanying user image determination.
  • a fall behavior detection and processing device is provided, and the fall behavior detection and processing device corresponds to the fall behavior detection and processing method in the above-mentioned embodiment one-to-one.
  • the fall behavior detection processing device includes a video acquisition module 901 to be recognized, a fall motion detection module 902, a target video clip interception module 903, a fall severity acquisition module 904, a medical advice information acquisition module 905, and Information sending module 906.
  • the detailed description of each functional module is as follows:
  • the to-be-identified video acquisition module 901 acquires the to-be-identified video collected by the video acquisition device in real time.
  • the fall action detection module 902 is configured to use the R-C3D-based behavior detection model to recognize the video to be recognized, and determine whether the behavior action corresponding to the video to be recognized includes a fall action.
  • the target video segment interception module 903 is configured to, if the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized.
  • the fall severity acquisition module 904 is configured to analyze the severity of the target video segment and obtain the fall severity corresponding to the target video segment.
  • the medical advice information obtaining module 905 is used to obtain medical advice information corresponding to the target video clip based on the severity of the fall.
  • the information sending module 906 sends the target video clip and medical advice information to the reminder terminal corresponding to the target video clip.
  • the falling motion detection module 902 includes:
  • the original video segment acquisition unit is configured to perform interlaced cutting of the video to be identified based on the segment duration threshold and the overlap duration threshold to acquire at least two original video segments, each of which corresponds to a segment timestamp.
  • the video segment action detection unit is configured to sequentially input at least two original video segments into an R-C3D-based behavior detection model for identification according to the sequence of the segment timestamps, and determine whether each original video segment includes a fall action.
  • the target video segment interception module 903 includes:
  • the initial time stamp determining unit is configured to determine that the time stamp corresponding to the image to be identified corresponding to the falling action is the initial time stamp if the behavior action corresponding to the video to be recognized includes a falling action.
  • the termination time stamp determining unit is configured to determine the termination time stamp based on the starting time stamp and the analysis duration threshold.
  • the video segment intercepting unit is used to intercept the video segment between the start time stamp and the end time stamp from the video to be identified, and determine the target video segment corresponding to the fall action.
  • each target video segment includes at least one image to be recognized.
  • the fall severity acquisition module 904 includes:
  • the micro-expression type obtaining unit is used to recognize each image to be recognized in the target video segment by using a micro-expression recognition model, and obtain the micro-expression type corresponding to each image to be recognized.
  • the target mouth image acquisition unit is used to if the micro-expression type is the preset expression type, use the facial feature point detection algorithm to detect and locate the image to be recognized, and obtain the target mouth image containing the face mouth area.
  • the target mouth The image includes N face feature points and the feature location corresponding to each face feature point.
  • the target recognition image determination unit is used to obtain the average distance of the inner lips of the mouth based on the feature positions corresponding to the N facial feature points. If the average distance of the inner lips of the mouth is greater than the preset distance threshold, the target mouth image corresponding to the to-be-recognized The image is determined as the target recognition image.
  • the fall severity acquiring unit is configured to acquire the fall severity corresponding to the target video segment according to the number of images corresponding to all target recognition images in the target video segment.
  • the fall severity acquisition unit includes:
  • the preset weight obtaining subunit is used to divide the target video segment into at least two to-be-processed segments, and to obtain the preset weight corresponding to each to-be-processed segment.
  • the preset weight of the to-be-processed segment after the time is greater than that of the previous one
  • the preset weight of the segment to be processed is used to divide the target video segment into at least two to-be-processed segments, and to obtain the preset weight corresponding to each to-be-processed segment.
  • the target score obtaining subunit is used to obtain the target score corresponding to the segment to be processed based on the number of images corresponding to all target recognition images and the number of images corresponding to all the images to be recognized in each segment to be processed.
  • the fall score value obtaining subunit is configured to perform weighting processing on the preset weights and target scores corresponding to at least two segments to be processed to obtain the fall score value corresponding to the target video segment.
  • the fall severity acquisition subunit is used to query the fall degree comparison table based on the fall score value, and obtain the fall severity corresponding to the target video clip.
  • the device for detecting and processing a fall behavior further includes:
  • the target face image query unit is used to extract the target face image from the target video segment, and query the system user image database based on the target face image.
  • the first reminder terminal determining unit is configured to determine the registered terminal corresponding to the registered user image as the reminder terminal corresponding to the target video segment if there is a registered user image corresponding to the target face image or an accompanying user image in the system user image library .
  • the second reminder terminal determining unit is used to determine the management terminal corresponding to the video capture device as the reminder corresponding to the target video segment if there is no registered user image or accompanying user image corresponding to the target face image in the system user image library terminal.
  • the device for detecting and processing a fall behavior further includes:
  • the historical video acquisition unit is configured to acquire historical videos collected by the video acquisition device, and each historical video includes at least one historical video image.
  • the original face image acquisition unit is used to recognize each historical video image by using a face detection algorithm to acquire at least one original face image in the historical video image.
  • the face image acquisition unit to be analyzed is used to query the system user image database based on each original face image. If there is a registered user image that matches the original face image, it will be in the same historical video image as the registered user image Other original face images are determined as face images to be analyzed.
  • the coexistence frame number counting unit is used to determine the historical video image containing the same registered user image and the same face image to be analyzed as the target video image, and count the number of coexistence frames corresponding to the target video image per unit time.
  • the positive emotion probability acquisition unit is used to recognize the registered user image and the face image to be analyzed in each target video image by using the micro-expression recognition model, and to count the positive emotion probability of being in a positive emotion at the same time.
  • An accompanying user image determining unit configured to determine the face image to be analyzed as the accompanying user image corresponding to the registered user image if the number of coexisting frames is greater than the preset frame number threshold and the positive emotion probability is greater than the preset probability threshold, and the accompanying user image
  • the image of the registered user is associated and stored in the system user image library.
  • each module in the above-mentioned falling behavior detection and processing device can be implemented in whole or in part by software, hardware, and combinations thereof.
  • the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 10.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the database of the computer equipment is used to store the data adopted or generated during the process of executing the fall behavior detection processing method.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instruction is executed by the processor to realize a fall behavior detection processing method.
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor.
  • the processor executes the computer-readable instructions, the above-mentioned embodiments are implemented.
  • the steps of the inverted behavior detection processing method such as steps S201-S206 shown in FIG. 2, or the steps shown in FIG. 3 to FIG. 8, are not repeated here to avoid repetition.
  • the processor executes the computer-readable instructions, the functions of the modules/units in this embodiment of the device for detecting and processing falling behavior are implemented, such as the functions of the modules/units/subunits shown in FIG. 9. To avoid repetition, I won't repeat it here.
  • one or more readable storage media storing computer readable instructions are provided.
  • the computer readable storage medium stores computer readable instructions, and the computer readable instructions are executed by one or more processors. When executed, the one or more processors are executed to implement the steps of the fall behavior detection processing method in the foregoing embodiment, such as steps S201-S206 shown in FIG. 2, or the steps shown in FIGS. 3 to 8 To avoid repetition, I won’t repeat it here.
  • the computer-readable instruction is executed by the processor, the function of each module/unit in this embodiment of the apparatus for detecting and processing falling behavior is realized, for example, the function of each module/unit/subunit shown in FIG. 9 is To avoid repetition, I won't repeat it here.
  • the readable storage medium in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • ROM read only memory
  • PROM programmable ROM
  • EPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychology (AREA)
  • Gerontology & Geriatric Medicine (AREA)
  • Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed are a fall-down behavior detection processing method and apparatus, and a computer device and a storage medium. The method comprises: obtaining a video to be identified acquired by a video acquisition device in real time; identifying said video by using an R-C3D-based behavior detection model, and determining whether a behavior action corresponding to said video comprises a fall-down action; if the behavior action corresponding to said video comprises a fall-down action, intercepting a target video clip corresponding to the fall-down action from said video; performing severity analysis on the target video clip to obtain a fall-down severity corresponding to the target video clip; obtaining medical suggestion information corresponding to the target video clip on the basis of the fall-down severity; and sending the target video clip and the medical suggestion information to a reminding terminal corresponding to the target video clip. According to the method, whether said video comprises the fall-down action can be quickly and accurately detected, and targeted reminding is carried out on the basis of the fall-down action.

Description

摔倒行为检测处理方法、装置、计算机设备及存储介质Falling behavior detection and processing method, device, computer equipment and storage medium
本申请以2019年8月19日提交的申请号为201910763921.6,名称为“摔倒行为检测处理方法、装置、计算机设备及存储介质”的中国发明申请为基础,并要求其优先权。This application is based on the Chinese invention application filed on August 19, 2019 with the application number 201910763921.6 and titled "Falling behavior detection and processing method, device, computer equipment and storage medium", and claims its priority.
技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及一种摔倒行为检测处理方法、装置、计算机设备及存储介质。This application relates to the field of artificial intelligence technology, and in particular to a fall behavior detection and processing method, device, computer equipment, and storage medium.
背景技术Background technique
摔倒是一种突然发生的倒地现象,情节严重的话,可能对摔倒人的身体健康造成严重后果。例如,对于年老的摔倒人而言,可能会因为摔倒而导致心理创造、骨折及软组织损伤等严重后果,影响摔倒人的身心健康。当前独居人员或者在公共场所独行的摔倒人在意外摔倒时,由于摔倒人不够重视,没有及时采取救治措施,因为耽误救治时间而产生严重后果。因此,如何快速准确地识别是否存在摔倒行为并进行针对性提醒,已经成为公共场所、看护场所或者独居老人看护等场景下规避摔倒行为导致的风险所亟待解决的问题。A fall is a sudden fall to the ground. If the circumstances are serious, it may have serious consequences for the health of the falling person. For example, for an elderly person who falls, it may cause psychological creation, fractures and soft tissue damage due to the fall, which may affect the physical and mental health of the person who falls. When a person living alone or a person who falls alone in a public place accidentally falls, because the person who fell does not pay enough attention to it, no timely treatment measures are taken, and serious consequences are caused by the delay in the treatment time. Therefore, how to quickly and accurately identify whether there is a fall behavior and give targeted reminders has become an urgent problem to be solved in public places, nursing places, or nursing care of elderly people living alone to avoid the risks caused by falling behavior.
发明内容Summary of the invention
本申请实施例提供一种摔倒行为检测处理方法、装置、计算机设备及存储介质,以解决如何快速准确地识别是否存在摔倒行为并进行针对性提醒的问题。The embodiments of the present application provide a fall behavior detection processing method, device, computer equipment, and storage medium to solve the problem of how to quickly and accurately identify whether there is a fall behavior and give targeted reminders.
一种摔倒行为检测处理方法,包括:A fall behavior detection and processing method, including:
获取视频采集设备实时采集的待识别视频;Obtain the to-be-identified video captured by the video capture device in real time;
采用基于R-C3D的行为检测模型对所述待识别视频进行识别,确定所述待识别视频对应的行为动作是否包括摔倒动作;Recognizing the video to be recognized by using a behavior detection model based on R-C3D, and determining whether the behavior action corresponding to the video to be recognized includes a falling motion;
若所述待识别视频对应的行为动作包括摔倒动作,则从所述待识别视频中截取所述摔倒动作对应的目标视频片段;If the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;
对所述目标视频片段进行严重程度分析,获取所述目标视频片段对应的摔倒严重程度;Analyzing the severity of the target video segment to obtain the severity of the fall corresponding to the target video segment;
基于所述摔倒严重程度,获取所述目标视频片段对应的医疗建议信息;Obtaining medical advice information corresponding to the target video clip based on the severity of the fall;
将所述目标视频片段和所述医疗建议信息,发送给所述目标视频片段对应的提醒终端。The target video segment and the medical advice information are sent to a reminder terminal corresponding to the target video segment.
一种摔倒行为检测处理装置,包括:A fall behavior detection and processing device, including:
待识别视频获取模块,获取视频采集设备实时采集的待识别视频;The to-be-recognized video acquisition module acquires the to-be-recognized video collected by the video capture device in real time;
摔倒动作检测模块,用于采用基于R-C3D的行为检测模型对所述待识别视频进行识别,确定所述待识别视频对应的行为动作是否包括摔倒动作;A fall action detection module, configured to use an R-C3D-based behavior detection model to recognize the video to be recognized, and determine whether the behavior action corresponding to the video to be recognized includes a fall action;
目标视频片段截取模块,用于若所述待识别视频对应的行为动作包括摔倒动作,则从所述待识别视频中截取所述摔倒动作对应的目标视频片段;A target video segment interception module, configured to, if the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;
摔倒严重程度获取模块,用于对所述目标视频片段进行严重程度分析,获取所述目标视频片段对应的摔倒严重程度;A fall severity acquisition module, configured to analyze the severity of the target video clip, and obtain the fall severity corresponding to the target video clip;
医疗建议信息获取模块,用于基于所述摔倒严重程度,获取所述目标视频片段对应的医疗建议信息;A medical advice information acquisition module, configured to acquire medical advice information corresponding to the target video clip based on the severity of the fall;
信息发送模块,将所述目标视频片段和所述医疗建议信息,发送给所述目标视频片段对应的提醒终端。The information sending module sends the target video clip and the medical advice information to the reminder terminal corresponding to the target video clip.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上 运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
获取视频采集设备实时采集的待识别视频;Obtain the to-be-identified video captured by the video capture device in real time;
采用基于R-C3D的行为检测模型对所述待识别视频进行识别,确定所述待识别视频对应的行为动作是否包括摔倒动作;Recognizing the video to be recognized by using a behavior detection model based on R-C3D, and determining whether the behavior action corresponding to the video to be recognized includes a falling motion;
若所述待识别视频对应的行为动作包括摔倒动作,则从所述待识别视频中截取所述摔倒动作对应的目标视频片段;If the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;
对所述目标视频片段进行严重程度分析,获取所述目标视频片段对应的摔倒严重程度;Analyzing the severity of the target video segment to obtain the severity of the fall corresponding to the target video segment;
基于所述摔倒严重程度,获取所述目标视频片段对应的医疗建议信息;Obtaining medical advice information corresponding to the target video clip based on the severity of the fall;
将所述目标视频片段和所述医疗建议信息,发送给所述目标视频片段对应的提醒终端。The target video segment and the medical advice information are sent to a reminder terminal corresponding to the target video segment.
一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more readable storage media storing computer readable instructions, the computer readable storage medium storing computer readable instructions, and when the computer readable instructions are executed by one or more processors, the one Or multiple processors perform the following steps:
获取视频采集设备实时采集的待识别视频;Obtain the to-be-identified video captured by the video capture device in real time;
采用基于R-C3D的行为检测模型对所述待识别视频进行识别,确定所述待识别视频对应的行为动作是否包括摔倒动作;Recognizing the video to be recognized by using a behavior detection model based on R-C3D, and determining whether the behavior action corresponding to the video to be recognized includes a falling motion;
若所述待识别视频对应的行为动作包括摔倒动作,则从所述待识别视频中截取所述摔倒动作对应的目标视频片段;If the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;
对所述目标视频片段进行严重程度分析,获取所述目标视频片段对应的摔倒严重程度;Analyzing the severity of the target video segment to obtain the severity of the fall corresponding to the target video segment;
基于所述摔倒严重程度,获取所述目标视频片段对应的医疗建议信息;Obtaining medical advice information corresponding to the target video clip based on the severity of the fall;
将所述目标视频片段和所述医疗建议信息,发送给所述目标视频片段对应的提醒终端。The target video segment and the medical advice information are sent to a reminder terminal corresponding to the target video segment.
本申请的一个或多个实施例的细节在下面的附图及描述中提出。本申请的其他特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.
附图说明Description of the drawings
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.
图1是本申请一实施例中摔倒行为检测处理方法的一应用环境示意图;FIG. 1 is a schematic diagram of an application environment of a fall behavior detection and processing method in an embodiment of the present application;
图2是本申请一实施例中摔倒行为检测处理方法的一流程图;2 is a flowchart of a method for detecting and processing a fall behavior in an embodiment of the present application;
图3是本申请一实施例中摔倒行为检测处理方法的另一流程图;FIG. 3 is another flowchart of the method for detecting and processing a fall behavior in an embodiment of the present application;
图4是本申请一实施例中摔倒行为检测处理方法的另一流程图;FIG. 4 is another flowchart of a fall behavior detection processing method in an embodiment of the present application;
图5是本申请一实施例中摔倒行为检测处理方法的另一流程图;FIG. 5 is another flowchart of the method for detecting and processing a fall behavior in an embodiment of the present application;
图6是本申请一实施例中摔倒行为检测处理方法的另一流程图;FIG. 6 is another flowchart of a method for detecting and processing a fall behavior in an embodiment of the present application;
图7是本申请一实施例中摔倒行为检测处理方法的另一流程图;FIG. 7 is another flowchart of a fall behavior detection processing method in an embodiment of the present application;
图8是本申请一实施例中摔倒行为检测处理方法的另一流程图;FIG. 8 is another flowchart of a fall behavior detection processing method in an embodiment of the present application;
图9是本申请一实施例中摔倒行为检测处理装置的一示意图;FIG. 9 is a schematic diagram of a fall behavior detection and processing device in an embodiment of the present application;
图10是本申请一实施例中计算机设备的一示意图。Fig. 10 is a schematic diagram of a computer device in an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地 描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of this application.
本申请实施例提供的摔倒行为检测处理方法,该摔倒行为检测处理方法可应用如图1所示的应用环境中。具体地,该摔倒行为检测处理方法应用在摔倒行为检测处理系统中,该摔倒行为检测处理系统包括如图1所示的客户端和服务器,客户端与服务器通过网络进行通信,用于实现快速从待识别视频中检测识别出摔倒动作并分析其摔倒严重程度,基于摔倒严重程度进行针对性提醒,避免耽误救治时间而产生严重后果。其中,客户端又称为用户端,是指与服务器相对应,为客户提供本地服务的程序。客户端可安装在但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备上。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The fall behavior detection and processing method provided by the embodiment of the present application can be applied to the application environment as shown in FIG. 1. Specifically, the method for detecting and processing falling behaviors is applied in a system for detecting and processing falling behaviors. The system for detecting and processing falling behaviors includes a client and a server as shown in FIG. 1. The client and the server communicate through a network for Realize the rapid detection and recognition of the fall action from the video to be recognized and analyze the severity of the fall, and provide targeted reminders based on the severity of the fall to avoid delaying the treatment time and causing serious consequences. Among them, the client is also called the client, which refers to the program that corresponds to the server and provides local services to the client. The client can be installed on, but not limited to, various personal computers, laptops, smart phones, tablet computers, and portable wearable devices. The server can be implemented as an independent server or a server cluster composed of multiple servers.
在一实施例中,如图2所示,提供一种摔倒行为检测处理方法,以该方法应用在图1中的服务器为例进行说明,包括如下步骤:In an embodiment, as shown in FIG. 2, a method for detecting and processing a fall behavior is provided. Taking the method applied to the server in FIG. 1 as an example, the method includes the following steps:
S201:获取视频采集设备实时采集的待识别视频。S201: Acquire a to-be-recognized video captured by the video capture device in real time.
其中,待识别视频是采用视频采集设备实时采集到的未经识别处理的视频。视频采集设备是用于采集视频的设备,可以设置在商场、医院、看护场所或者其他公共场所,也可以由监护人设置在独居老人家中。Among them, the to-be-identified video is an unidentified video collected in real time by a video acquisition device. Video capture equipment is a device used to capture video, which can be set up in shopping malls, hospitals, nursing places or other public places, or set up by guardians in the homes of elderly people living alone.
S202:采用基于R-C3D的行为检测模型对待识别视频进行识别,确定待识别视频对应的行为动作是否包括摔倒动作。S202: Recognize the video to be recognized by using the behavior detection model based on R-C3D, and determine whether the behavior action corresponding to the video to be recognized includes a fall motion.
其中,基于R-C3D(Region Convolutional 3D Network for Temporal Activity Detection,用于时间活动检测的区域卷积3D网络)的行为检测模型是采用R-C3D网络预先训练的用于识别视频中人的行为的模型。该待识别视频对应的行为动作是指从待识别视频中识别出的行为动作。具体地,采用R-C3D的行为检测模型对待识别视频进行识别,可快速确定该待识别视频中人的行为动作中包含摔倒动作。Among them, the behavior detection model based on R-C3D (Region Convolutional 3D Network for Temporal Activity Detection, a regional convolutional 3D network for temporal activity detection) uses the R-C3D network pre-trained to identify the behavior of people in the video model. The behavior action corresponding to the video to be recognized refers to the behavior action recognized from the video to be recognized. Specifically, using the R-C3D behavior detection model to recognize the video to be recognized can quickly determine that the behavior of the person in the video to be recognized includes a falling motion.
其中,R-C3D网络是采用端到端的方式训练得到的网络,采用三维的卷积核处理待识别视频。R-C3D共有8次卷积操作,5次池化操作。其中,卷积核的大小均为3*3*3,步长为1*1*1。池化核为2*2*2,但是为了不过早的缩减在时序上的长度,第一层的池化大小和步长为1*2*2;最后,R-C3D网络在经过两次全连接层和softmax层后得到的最终的输出结果。R-C3D网络的输入图像为3*L*H*W,其中,3为RGB三通道,L为输入图像的帧数,H*W是输入图像的尺寸。Among them, the R-C3D network is a network trained in an end-to-end manner, and a three-dimensional convolution kernel is used to process the video to be recognized. R-C3D has 8 convolution operations and 5 pooling operations. Among them, the size of the convolution kernel is 3*3*3, and the step size is 1*1*1. The pooling core is 2*2*2, but in order not to shorten the timing length too early, the pooling size and step size of the first layer are 1*2*2; finally, the R-C3D network passes through two full cycles. The final output result obtained after connecting the layer and the softmax layer. The input image of the R-C3D network is 3*L*H*W, where 3 is RGB three channels, L is the number of frames of the input image, and H*W is the size of the input image.
可以理解地,该基于R-C3D的行为检测模型是基于R-C3D网络进行端到端训练所得到的用于对视频人中的动作进行检测的模型。为了保障基于R-C3D网络所训练出的基于R-C3D的行为检测模型对摔倒动作的检测效率和准确率,可以在模型训练过程中,采用包含摔倒动作的正样本和不包含摔倒动作(即除了摔倒动作以外的其他动作对应的视频片段)的负样本按预设比例(可以设置为1:1,以达到平衡样本,避免过拟合)进行模型训练。由于R-C3D是基于C3D做帧分类(Frame Label),可以快速检测到待识别视频中是否存在摔倒动作,可以针对任意长度视频、任意长度行为进行端到端的摔倒检测,并且通过共享时序生成和分类网络的C3D参数,速度很快,有助于保障摔倒检测的效率。Understandably, the R-C3D-based behavior detection model is a model for detecting actions in video people obtained by end-to-end training based on the R-C3D network. In order to ensure the detection efficiency and accuracy of the R-C3D-based behavior detection model trained on the R-C3D network, the positive samples that contain the fall action and the positive sample that does not contain the fall can be used in the model training process. The negative samples of actions (that is, video clips corresponding to actions other than the falling action) are trained on the model according to a preset ratio (which can be set to 1:1 to achieve a balanced sample and avoid overfitting). Since R-C3D is based on C3D for frame classification (Frame Label), it can quickly detect whether there is a fall action in the video to be recognized, and it can perform end-to-end fall detection for any length of video and any length of behavior, and share timing The C3D parameters of the generation and classification network are very fast, which helps to ensure the efficiency of fall detection.
S203:若待识别视频对应的行为动作包括摔倒动作,则从待识别视频中截取摔倒动作对应的目标视频片段。S203: If the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized.
其中,目标视频片段是从待识别视频中截取出的用于分析摔倒动作对应的摔倒严重程度的视频片段。可以理解地,人在摔倒之后,会因摔倒所带来的痛感不同,而使其脸部微表情表现出不同的微表情变化或者身体姿态表现出与其痛感相匹配的动作,因此,可通过对摔倒动作之后的视频进行分析,以便分析出摔倒人的摔倒严重程度。具体地,服务器在采用R-C3D的行为检测模型识别出待识别视频中包含摔倒动作之后,可以从待识别视频中 截取包含摔倒动作以及摔倒动作之后一定时长的视频片段作为目标视频片段,以便分析摔倒严重程度,从而减少严重程度分析处理的数据量,提高分析效率和准确性。Wherein, the target video segment is a video segment that is cut out from the video to be recognized and used to analyze the severity of the fall corresponding to the fall action. Understandably, after a person falls, because of the different pain caused by the fall, his facial micro-expression will show different micro-expression changes or the body posture will show actions that match the pain. Therefore, Analyze the video after the fall action in order to analyze the severity of the fall of the person who fell. Specifically, after the server uses the R-C3D behavior detection model to recognize that the video to be recognized contains a falling motion, it can intercept a video clip containing the falling motion and a certain length of time after the falling motion from the video to be recognized as the target video clip , In order to analyze the severity of the fall, thereby reducing the amount of data processed by the severity analysis and improving the efficiency and accuracy of the analysis.
S204:对目标视频片段进行严重程度分析,获取目标视频片段对应的摔倒严重程度。S204: Analyze the severity of the target video segment, and obtain the severity of the fall corresponding to the target video segment.
由于目标视频片段是从待识别视频中截取出来的用于分析摔倒严重程度的视频片段,服务器在对目标视频片段进行严重程度分析,可客观快速地分析出目标视频片段中摔倒人的摔倒严重程度。例如,若目标视频片段中的摔倒人是年青人,其摔倒之后的目标视频片段显示其脸部微表情并没有呈现痛苦表情或者呈现痛苦表情的时间极短,可认定其摔倒严重程度较低。又例如,若目标视频片段中的摔倒人为老年人,其摔倒之后的目标视频片段中显示其脸部微表情呈现持续长时间的痛苦表情,或者有长时间抚摩摔倒碰撞点等动作,可认定其摔倒严重程度较高。Since the target video clip is a video clip that is used to analyze the severity of the fall from the video to be identified, the server is analyzing the severity of the target video clip and can objectively and quickly analyze the fall of the person falling in the target video clip. Down the severity. For example, if the person who fell in the target video clip was a young person, and the target video clip after the fall showed that his facial micro-expression did not show painful expressions or showed painful expressions for a very short time, the severity of the fall could be determined Lower. For another example, if the falling person in the target video clip is an elderly person, the target video clip after the fall shows his facial micro-expression showing a painful expression that lasts for a long time, or has actions such as stroking the collision point for a long time. It can be determined that the severity of the fall is relatively high.
S205:基于摔倒严重程度,获取目标视频片段对应的医疗建议信息。S205: Obtain medical advice information corresponding to the target video clip based on the severity of the fall.
具体地,服务器根据目标视频片段分析出的摔倒严重程度,将该摔倒严重程度与预设程度阈值进行比较,以确定是否需要提供医疗建议。该预设程度阈值是预先设置的用于评估是否需要提供医疗建议的程度阈值。若该摔倒严重程度小于预设程度阈值,则无需提供提供医疗建议,则可以直接将目标视频片段发送给对应的提醒终端,以提醒有摔倒行为。若摔倒严重程度不小于预设程度阈值,则从目标视频片段中识别出摔倒动作对应的摔倒骨骼关节点,基于摔倒骨骼关节点查询医疗建议信息库,获取与该摔倒骨骼关节点相对应的医疗建议信息作为目标视频片段对应的医疗建议信息。Specifically, the server compares the severity of the fall with a preset degree threshold according to the severity of the fall analyzed by the target video clip to determine whether medical advice needs to be provided. The preset degree threshold is a preset degree threshold for evaluating whether medical advice needs to be provided. If the severity of the fall is less than the preset degree threshold, there is no need to provide medical advice, and the target video clip can be directly sent to the corresponding reminder terminal to remind the fall behavior. If the severity of the fall is not less than the preset degree threshold, identify the fall bone joint point corresponding to the fall action from the target video clip, query the medical advice database based on the fall bone joint point, and obtain the bone joint with the fall The medical advice information corresponding to the point is used as the medical advice information corresponding to the target video segment.
其中,摔倒骨骼关节点是指从目标视频片段中识别出摔倒人摔倒时碰撞到的骨骼或者关节点,通过识别摔倒骨骼关节点可以有助于后续给摔倒人提供相应的医疗建议。医疗建议信息库是用于存储每一种骨骼关节点被摔倒时需要进行的医疗检测建议或者医疗用药建议的信息库。医疗建议信息是指根据摔倒骨骼关节点,确定需要进行哪些医疗检测或者医疗用药等方面的信息。例如,若目标视频片段中膝关节在摔倒时先着地,则该摔倒骨骼关节点为膝关节,所获取的医疗建议信息是与膝关节受伤相关的医疗建议。Among them, the falling skeletal joint points refer to the identification of the bones or joint points that the feller collided when he fell from the target video clip. By identifying the falling skeletal joint points, it can help to provide corresponding medical treatment to the falling person. Suggest. The medical advice information database is an information database used to store medical test recommendations or medical medication recommendations that need to be performed when each bone joint point is fallen. Medical advice information refers to information on determining which medical tests or medical medications are needed according to the bone joint points of the fall. For example, if the knee joint in the target video clip first touches the ground when a fall, the bone joint point of the fall is the knee joint, and the acquired medical advice information is medical advice related to knee joint injuries.
S206:将目标视频片段和医疗建议信息,发送给目标视频片段对应的提醒终端。S206: Send the target video clip and the medical advice information to the reminder terminal corresponding to the target video clip.
其中,提醒终端是用于接收目标视频片段或者目标视频片段和医疗建议信息的终端。一般来说,该提醒终端是安装视频采集设备的用户对应的终端。如视频采集设备设置在公共场所上,其对应的提醒终端可以为该公共场所的工作人员对应的提醒终端,具体可以为在公共场所进出口处工作的工作人员所携带的移动终端,以便摔倒人在离开公共场所时,向摔倒人或者摔倒人对应的陪同人员告知相应的摔倒行为及医疗建议信息。如视频采集设备设置在独居老人家中,则其提醒终端可以为与该视频采集设备绑定的终端。Among them, the reminder terminal is a terminal for receiving target video clips or target video clips and medical advice information. Generally speaking, the reminder terminal is a terminal corresponding to the user who installs the video capture device. If the video capture device is installed in a public place, its corresponding reminder terminal can be the reminder terminal corresponding to the staff in the public place, specifically it can be the mobile terminal carried by the staff working at the entrance and exit of the public place, in order to fall down When a person leaves a public place, inform the person who fell or the person accompanying the person corresponding to the corresponding fall behavior and medical advice. If the video capture device is set in an elderly person living alone, the reminder terminal may be a terminal bound to the video capture device.
本实施例所提供的摔倒行为检测处理方法中,可基于R-C3D的行为检测模型快速识别待识别视频中是否包含摔倒动作,以提高摔倒动作的检测效率和检测准确率;再从待识别视频中截取摔倒动作对应的目标视频片段,对目标视频片段进行严重程度分析,以减少严重程度分析的分析数据量,提高分析效率和准确性;基于摔倒严重程度获取相应的医疗建议信息,并将医疗建议信息和目标视频片段发送给提醒终端,以实现针对摔倒行为进行针对性提醒,避免在摔倒后没有采用相应的救治措施而产生的风险。In the fall behavior detection and processing method provided in this embodiment, the R-C3D behavior detection model can quickly identify whether the video to be recognized contains a fall action, so as to improve the detection efficiency and detection accuracy of the fall action; The target video clip corresponding to the fall action is intercepted from the video to be recognized, and the severity of the target video clip is analyzed to reduce the amount of analysis data for the severity analysis, improve the analysis efficiency and accuracy; obtain corresponding medical advice based on the severity of the fall Information, medical advice information and target video clips are sent to the reminder terminal to realize targeted reminders for the fall behavior and avoid the risk of not adopting corresponding treatment measures after the fall.
在一实施例中,如图3所示,步骤S202,即采用基于R-C3D的行为检测模型对待识别视频进行识别,确定待识别视频对应的行为动作是否包括摔倒动作,包括如下步骤:In one embodiment, as shown in FIG. 3, step S202, that is, using a behavior detection model based on R-C3D to recognize the video to be recognized, and determining whether the behavior action corresponding to the video to be recognized includes a falling motion, includes the following steps:
S301:基于片段时长阈值和重叠时长阈值,对待识别视频进行交错切割,获取至少两个原始视频片段,每一原始视频片段对应一片段时间戳。S301: Based on the segment duration threshold and the overlap duration threshold, perform interlaced cutting of the video to be identified to obtain at least two original video segments, each of which corresponds to a segment time stamp.
其中,片段时长阈值是预先设置的用于切割原始视频片段的时长的阈值,即本实施例中切割出的每一原始视频片段的时长为片段时长阈值,如10s。重叠时长阈值是预先设置的用于在切割原始视频片段时使相邻两个原始视频片段重叠的时长的阈值,如3s。原始视频片段是从待识别视频中切割出的用于输入行为检测模型进行识别的单位片段。原始视频 片段对应的片段时间戳可以是原始视频片段中的第1张图像对应的时间戳,以便基于该片段时间戳确定相应的视频切割顺序。Wherein, the segment duration threshold is a preset threshold for cutting the duration of the original video segment, that is, the duration of each original video segment cut out in this embodiment is the segment duration threshold, such as 10s. The overlap duration threshold is a preset threshold for the duration of overlapping two adjacent original video segments when cutting the original video segment, such as 3s. The original video segment is a unit segment cut out from the video to be recognized and used for inputting the behavior detection model for recognition. The segment time stamp corresponding to the original video segment may be the time stamp corresponding to the first image in the original video segment, so as to determine the corresponding video cutting sequence based on the segment time stamp.
具体地,服务器基于片段时长阈值和重叠时长阈值,对待识别视频进行交错切割,可以保证所切割出的任意两个相邻原始视频片段之间有一部分重叠片段,以保证后续摔倒动作检测的准确性,避免摔倒动作在连续两个不重叠片段的原始视频片段之间形成时,任一个原始视频片段不能单独判断形成摔倒动作的情况。例如,对一待识别视频而言,先将其第0-10s的片段切割形成第1个原始视频片段,再将第7-17s的片段切割形成第2个原始视频片段,然后将第14-24s的片段形成第3个原始视频片段,依此类推,直至切割出所有的原始视频片段。Specifically, the server performs interlaced cutting of the video to be identified based on the segment duration threshold and the overlap duration threshold, which can ensure that any two adjacent original video segments cut out are partially overlapped, so as to ensure the accuracy of subsequent fall motion detection Therefore, when a fall action is formed between two consecutive original video clips that do not overlap, any one of the original video clips cannot be judged separately to form a fall action. For example, for a video to be recognized, first cut the 0-10s segment to form the first original video segment, then cut the 7-17s segment to form the second original video segment, and then cut the 14th to the first original video segment. The 24s segment forms the third original video segment, and so on, until all the original video segments are cut.
S302:依据片段时间戳的顺序,依次将至少两个原始视频片段输入基于R-C3D的行为检测模型进行识别,确定每一原始视频片段是否包括摔倒动作。S302: According to the sequence of the time stamps of the segments, sequentially input at least two original video segments into the R-C3D-based behavior detection model for recognition, and determine whether each original video segment includes a fall action.
具体地,服务器依据片段时间戳的顺序,确定至少两个原始视频片段对应的切割顺序,依次将至少两个原始视频片段输入到基于R-C3D的行为检测模型进行识别,以确定该原始视频片段是否包含摔倒动作,可客观且快速地确定每一原始视频片段是否包含摔倒动作。Specifically, the server determines the cutting sequence corresponding to at least two original video clips according to the sequence of the clip timestamps, and sequentially inputs the at least two original video clips into the R-C3D-based behavior detection model for recognition, so as to determine the original video clip Whether it includes a falling motion, it can be objectively and quickly determined whether each original video clip includes a falling motion.
本实施例所提供的摔倒行为检测处理方法中,将待识别视频按片段时长阈值和重叠时长阈值进行交错切割,以获取至少两个原始视频片段,并依次将至少两个原始视频片段输入基于R-C3D的行为检测模型进行识别,以快速确定每一原始视频片段是否包含摔倒动作,以实现将时长较长的待识别视频拆分成时长较短的原始视频片段进行识别,有助于提高原始视频片段的识别准确率和识别效率,并提高摔倒动作检测的容错率,避免时长较长的待识别视频在识别过程中因各种客观原因(如服务器宕机)而导致整体识别出错,后续需重新识别所需的时间较长。In the fall behavior detection processing method provided by this embodiment, the video to be recognized is interleaved according to the segment duration threshold and the overlap duration threshold to obtain at least two original video segments, and input the at least two original video segments in sequence based on R-C3D's behavior detection model performs recognition to quickly determine whether each original video segment contains a fall action, so as to split the long-duration to-be-recognized video into short-duration original video segments for recognition. Improve the recognition accuracy and efficiency of the original video clips, and improve the fault tolerance rate of the fall motion detection, and avoid the long-term video to be recognized in the recognition process due to various objective reasons (such as server downtime) leading to overall recognition errors , It will take a long time for subsequent re-identification.
在一实施例中,如图4所示,步骤S203,即若待识别视频对应的行为动作包括摔倒动作,则从待识别视频中截取摔倒动作对应的目标视频片段,包括如下步骤:In one embodiment, as shown in FIG. 4, step S203, that is, if the behavior action corresponding to the video to be recognized includes a falling motion, intercepting the target video segment corresponding to the falling motion from the video to be recognized includes the following steps:
S401:若待识别视频对应的行为动作包括摔倒动作,则确定摔倒动作对应的待识别图像对应的时间戳为起始时间戳。S401: If the behavior action corresponding to the to-be-recognized video includes a falling action, determine that the time stamp corresponding to the to-be-recognized image corresponding to the falling action is the starting time stamp.
其中,待识别图像是构成待识别视频的图像。具体地,服务器在识别出待识别视频对应的行为动作包括摔倒动作时,以检测到摔倒动作那一刻对应的待识别图像对应的时间戳作为划分目标视频片段对应的起始时间戳。可以理解地,将检测确定为摔倒动作对应的待识别图像的时间戳为起始时间戳,以便从这一起始时间戳开始截取目标视频片段,分析摔倒动作之后的微表情变化及身体姿态变化,以确定对应的摔倒严重程度。Among them, the image to be recognized is an image that constitutes the video to be recognized. Specifically, when the server recognizes that the behavior action corresponding to the video to be recognized includes a falling motion, the time stamp corresponding to the image to be recognized at the moment when the falling motion is detected is used as the starting timestamp corresponding to the segmentation target video segment. Understandably, the time stamp of the image to be recognized corresponding to the fall action is determined as the starting time stamp, so that the target video clip can be intercepted from this starting time stamp, and the micro-expression changes and body posture after the fall action are analyzed. Change to determine the severity of the corresponding fall.
S402:基于起始时间戳和分析时长阈值,确定终止时间戳。S402: Determine a termination time stamp based on the starting time stamp and the analysis duration threshold.
其中,分析时长阈值是系统预先设置的用于确定需要分析摔倒严重程度的视频时长的阈值。具体地,服务器从起始时间戳开始加上分析时长阈值的时长,即可确定用于划分目标视频片段对应的终止时间戳。Among them, the analysis duration threshold is a threshold preset by the system for determining the video duration for which the severity of the fall needs to be analyzed. Specifically, the server adds the duration of the analysis duration threshold from the start timestamp to determine the end timestamp corresponding to the target video segment for dividing.
S403:从待识别视频中截取起始时间戳与终止时间戳之间的视频片段,确定为摔倒动作对应的目标视频片段。S403: Intercept a video segment between the start time stamp and the end time stamp from the video to be identified, and determine the target video segment corresponding to the fall action.
由于待识别视频中的每一待识别图像均对应唯一的时间戳,在确定起始时间戳和终止时间戳之后,从待识别视频中截取以起始时间戳对应的待识别图像为起始图像,以终止时间戳对应的待识别图像为终止图像的视频片段,将该视频片段确定为摔倒动作对应的目标视频片段,以便利用该目标视频片段进行摔倒严重程度分析。Since each image to be identified in the video to be identified corresponds to a unique time stamp, after determining the start time stamp and the end time stamp, the image to be identified corresponding to the start time stamp is intercepted from the video to be identified as the start image , Using the to-be-recognized image corresponding to the termination timestamp as the video segment of the termination image, and determining the video segment as the target video segment corresponding to the fall action, so as to use the target video segment to analyze the severity of the fall.
本实施例所提供的摔倒行为检测处理方法中,先确定摔倒动作对应的待识别图像对应的时间戳为起始时间戳,利用该起始时间戳确定终止时间戳,并截取起始时间戳与终止时间戳之间的视频片段为目标视频片段,以使该目标视频片段中的每一待识别图像均为摔倒人在摔倒之后分析时长阈值内的可以反映摔倒人真实情绪变化的图像,以便利用该目标视频片段进行摔倒严重程度分析时,可以保证分析结果的客观真实性。In the fall behavior detection processing method provided in this embodiment, the time stamp corresponding to the to-be-recognized image corresponding to the fall action is first determined as the starting time stamp, the starting time stamp is used to determine the ending time stamp, and the starting time is intercepted The video segment between the stamp and the termination time stamp is the target video segment, so that each image to be recognized in the target video segment is the analysis time threshold of the falling person after the fall, which can reflect the true emotional changes of the falling person In order to use the target video clip to analyze the severity of a fall, the objective authenticity of the analysis result can be guaranteed.
在一实施例中,每一目标视频片段包括至少一个待识别图像。如图5所示,步骤S204,即对目标视频片段进行严重程度分析,获取目标视频片段对应的摔倒严重程度,包括如下步骤:In an embodiment, each target video segment includes at least one image to be recognized. As shown in FIG. 5, step S204, that is, analyzing the severity of the target video segment to obtain the severity of the fall corresponding to the target video segment, includes the following steps:
S501:采用微表情识别模型对目标视频片段中每一待识别图像进行识别,获取每一待识别图像对应的微表情类型。S501: Recognize each to-be-recognized image in the target video segment using a micro-expression recognition model, and obtain a micro-expression type corresponding to each to-be-recognized image.
其中,微表情识别模型是用于识别待识别图像中人脸微表情的模型。本实施例中,微表情识别模型是通过捕捉待识别图像中的用户脸部的局部特征,并根据局部特征确定待识别图像中人脸的各个目标面部动作单元,再根据所识别出的目标面部动作单元确定其微表情的模型。微表情识别模型可以是基于深度学习的神经网络识别模型,也可以是基于分类的局部识别模型,还可以是基于局部二值模式(Local Binary Pattern,LBP)的局部情绪识别模型。例如,采用基于分类的局部识别模型作为微表情识别模型时,该微表情识别模型需预先收集大量的训练图像数据进行模型训练,训练图像数据中包含每一面部动作单元的正样本和面部动作单元的负样本,通过分类算法对训练图像数据进行训练,获取微表情识别模型。具体可以采用SVM分类算法这种分类算法对大量的训练图像数据进行训练,以获取到与多个面部动作单元对应的SVM分类器。例如,可以是39个面部动作单元对应的39个SVM分类器,也可以是54个面部动作单元对应的54个SVM分类器,进行训练的训练图像数据中包含的不同面部动作单元的正样本和负样本越多,则获取到的SVM分类器数量越多。可以理解地,通过多个SVM分类器以形成微表情识别模型中,其获取到的SVM分类器越多,则形成的微表情识别模型所识别出的微表情类型越精准。Among them, the micro-expression recognition model is a model used to recognize the facial micro-expression in the image to be recognized. In this embodiment, the micro-expression recognition model captures the local features of the user's face in the image to be recognized, and determines each target facial action unit of the face in the image to be recognized based on the local features, and then according to the recognized target face The action unit determines its micro-expression model. The micro-expression recognition model may be a neural network recognition model based on deep learning, a local recognition model based on classification, or a local emotion recognition model based on Local Binary Pattern (LBP). For example, when a classification-based local recognition model is used as a micro-expression recognition model, the micro-expression recognition model requires pre-collection of a large amount of training image data for model training. The training image data contains positive samples of each facial motion unit and facial motion unit For the negative samples of, train the training image data through the classification algorithm to obtain the micro expression recognition model. Specifically, the SVM classification algorithm can be used to train a large amount of training image data to obtain SVM classifiers corresponding to multiple facial action units. For example, it can be 39 SVM classifiers corresponding to 39 facial action units, or 54 SVM classifiers corresponding to 54 facial action units. The training image data contains the positive samples of different facial action units and The more negative samples, the more SVM classifiers are obtained. Understandably, in forming a micro-expression recognition model by using multiple SVM classifiers, the more SVM classifiers obtained, the more accurate the micro-expression type recognized by the formed micro-expression recognition model.
本实施例中,以54个面部动作单元对应的SVM分类器所形成的微表情识别模型为例,采用这一微表情识别模型对目标视频片段中每一待识别图像进行识别,可识别出54种微表情类型,例如可识别出包含爱、感兴趣、惊喜、期待……攻击性、冲突、侮辱、怀疑、恐惧和痛苦等54种微表情类型。In this embodiment, taking the micro-expression recognition model formed by the SVM classifier corresponding to 54 facial action units as an example, using this micro-expression recognition model to recognize each image to be recognized in the target video segment, 54 can be identified. Types of micro-expression, for example, 54 types of micro-expression including love, interest, surprise, expectation...aggression, conflict, insult, doubt, fear, and pain can be identified.
S502:若微表情类型为预设表情类型,则采用人脸特征点检测算法对待识别图像进行检测定位,获取包含人脸嘴部区域的目标嘴部图像,目标嘴部图像包括N个人脸特征点及每个人脸特征点对应的特征位置。S502: If the micro-expression type is the preset expression type, use the facial feature point detection algorithm to detect and locate the image to be recognized, and obtain a target mouth image that includes the mouth area of the face, and the target mouth image includes N face feature points And the corresponding feature location of each face feature point.
其中,预设表情类型是系统预先设置的认定有可能为摔倒后表情的类型,如痛苦和哭等。其中,人脸特征点检测算法是用于进行人脸特征点检测的算法,利用该算法可从待识别图像中识别出N个人脸特征点以及每个人脸特征点对应的特征位置,即在图像中的坐标。该人脸特征点检测算法可检测定位出待识别图像中左眼、右眼、左眉毛、右眉毛、鼻子以及嘴巴等部位的人脸特征点。Among them, the preset expression type is the type that the system pre-sets that it may be the expression after a fall, such as pain and crying. Among them, the face feature point detection algorithm is an algorithm for face feature point detection. This algorithm can identify N face feature points and the feature location corresponding to each face feature point from the image to be recognized, that is, in the image In the coordinates. The face feature point detection algorithm can detect and locate the face feature points of the left eye, right eye, left eyebrow, right eyebrow, nose and mouth in the image to be recognized.
本实施例中,采用人脸特征点检测算法对待识别图像进行检测定位之后,从待识别图像中截取标准区域大小的人脸嘴部区域对应的图像,确定为需要进行后续分析的目标嘴部图像。即目标嘴部图像是从待识别图像中截取的与标准区域大小相匹配的人脸嘴部区域对应的图像。该标准区域大小是系统预先设置的用于限定截取目标嘴部图像的区域大小,具体可通过限定嘴部宽度来确定标准区域大小。即该标准区域大小的宽度确定,在从待识别图像中截取目标嘴部图像时,需先对待识别图像进行上采样处理或者下采样处理(即缩放处理),以使待识别图像中摔倒人的嘴部宽度与标准化区域大小的宽度相匹配,再进行截图,以获取宽度一致的目标嘴部图像,从而保证后续基于嘴部内唇点平均距离进行分析时,分析结果的准确性。In this embodiment, after the facial feature point detection algorithm is used to detect and locate the image to be recognized, the image corresponding to the face and mouth area of the standard area size is intercepted from the image to be recognized, and it is determined as the target mouth image for subsequent analysis. . That is, the target mouth image is an image corresponding to the mouth area of the human face that matches the size of the standard area intercepted from the image to be recognized. The standard area size is preset by the system to limit the area size for capturing the target mouth image. Specifically, the standard area size can be determined by limiting the width of the mouth. That is, the width of the standard area is determined. When intercepting the target mouth image from the image to be recognized, the image to be recognized needs to be up-sampled or down-sampled (ie, zooming) to make the person fall in the image to be recognized Match the width of the mouth with the width of the standardized area, and then take a screenshot to obtain a target mouth image with the same width, so as to ensure the accuracy of the analysis result when the subsequent analysis is based on the average distance of the inner lip of the mouth.
具体地,目标嘴部图像包括N个人脸特征点及每个人脸特征点对应的特征位置,此时,目标嘴部图像中的N个人脸特征点是指嘴部轮廓对应的人脸特征点。嘴部轮廓包括上唇轮廓和下唇轮廓,上唇轮廓包括上唇外唇线和上唇内唇线,下唇轮廓包括下唇内唇线和下唇外唇线,本实施例中可以在目标嘴部图像中依据预设规则配置若干条分割线,用于对嘴部轮廓进行分割,以确定相应的人脸特征点。例如,可以在目标嘴部图像中,嘴部宽度的1/4、 1/2、3/4等位置画出三条分割线,每一分割线与上唇外唇线、上唇内唇线、下唇内唇线和下唇外唇线相交处,分别形成一组包含上唇外唇点、上唇内唇点、下唇内唇点和下唇外唇点等人脸特征点。Specifically, the target mouth image includes N facial feature points and feature positions corresponding to each facial feature point. At this time, the N facial feature points in the target mouth image refer to the facial feature points corresponding to the contour of the mouth. Mouth contour includes upper lip contour and lower lip contour, upper lip contour includes upper lip outer lip line and upper lip inner lip line, lower lip contour includes lower lip inner lip line and lower lip outer lip line. In this embodiment, the target mouth image According to preset rules, a number of segmentation lines are configured to segment the contour of the mouth to determine the corresponding facial feature points. For example, in the target mouth image, three dividing lines can be drawn at 1/4, 1/2, 3/4 of the width of the mouth, and each dividing line is connected to the outer lip line of the upper lip, the inner lip line of the upper lip, and the lower lip. The intersection of the inner lip line and the lower lip outer lip line respectively forms a set of facial features including upper lip outer lip point, upper lip inner lip point, lower lip inner lip point and lower lip outer lip point.
S503:基于N个人脸特征点对应的特征位置,获取嘴部内唇点平均距离,若嘴部内唇点平均距离大于预设距离阈值,则将目标嘴部图像对应的待识别图像确定为目标识别图像。S503: Obtain the average distance of the inner lips of the mouth based on the feature positions corresponding to the N facial feature points. If the average distance of the inner lips of the mouth is greater than the preset distance threshold, determine the image to be recognized corresponding to the target mouth image as the target recognition image .
一般来说,人在摔倒时,其微表情越痛苦,嘴部张开程度越高,说明其痛苦程度越高,越能反映摔倒人的摔倒严重程度。因此,服务器在获取目标嘴部图像的N个人脸特征点及其的特征位置时,可以计算反映嘴部张开程度的嘴部内唇点平均距离。具体地,服务器可以先统计每一分割线上的上唇内唇点与下唇内唇点之间的线间内唇距离,再对所有分割线对应的线间内唇距离进行平均计算,即可获取嘴部内唇点平均距离,以便利用嘴部内唇点平均距离客观分析客户的痛苦程度。可以理解地,由于所有目标嘴部图像是标准区域大小对应的图像,其图像宽度一致;此时,通过嘴部所有内唇点之间的嘴部内唇点平均距离,可以较准确地反映嘴部张开程度。Generally speaking, when a person falls, the more painful his micro-expression and the higher the degree of mouth opening, the higher his pain degree, and the more it can reflect the severity of the person who fell. Therefore, when the server obtains the N facial feature points of the target mouth image and their feature positions, it can calculate the average distance of the inner lips of the mouth that reflects the degree of mouth opening. Specifically, the server may first calculate the interline inner lip distance between the upper lip inner lip point and the lower lip inner lip point on each segmentation line, and then calculate the average distance between the inner lip lines corresponding to all the segmentation lines, and that's it. Obtain the average distance of the inner lip of the mouth, so as to use the average distance of the inner lip of the mouth to objectively analyze the pain degree of the customer. Understandably, since all target mouth images are images corresponding to the standard area size, their image widths are the same; at this time, the average distance between the inner lip points of the mouth between all the inner lip points of the mouth can reflect the mouth more accurately Degree of opening.
其中,预设距离阈值是系统预先设置的用于评估其嘴部张开程度达到认定其为痛苦程度的阈值。具体地,服务器在获取嘴部内唇点平均距离之后,将该嘴部内唇点平均距离与预设距离阈值进行比较,若嘴部内唇点平均距离大于预设距离阈值,说明嘴部张开程度较大,反映其痛苦程度较高,则将该目标嘴部图像对应的待识别图像确定为目标识别图像,以便后续分析摔倒严重程度。此时,该目标识别图像为摔倒人的微表情类型为预设表情类型,而且嘴部内唇点平均距离大于预设距离阈值的待识别图像,通过微表情情绪识别和嘴部张开程度共同确定一目标识别图像,有助于保障摔倒严重程度分析结果的准确性。Among them, the preset distance threshold is a threshold that is preset by the system for assessing the degree of mouth opening to determine that it is painful. Specifically, after obtaining the average distance of the inner lip of the mouth, the server compares the average distance of the inner lip of the mouth with a preset distance threshold. If the average distance of the inner lip of the mouth is greater than the preset distance threshold, it indicates that the mouth is more open. If it is large, it reflects that the degree of pain is high, and the image to be recognized corresponding to the target mouth image is determined as the target recognition image for subsequent analysis of the severity of the fall. At this time, the target recognition image is an image to be recognized whose micro-expression type is the preset expression type, and the average distance of the inner lip point of the mouth is greater than the preset distance threshold, through the micro-expression emotion recognition and the degree of mouth opening. Determining a target recognition image helps ensure the accuracy of the analysis result of the severity of the fall.
S504:根据目标视频片段中所有目标识别图像对应的图像数量,获取目标视频片段对应的摔倒严重程度。S504: According to the number of images corresponding to all target recognition images in the target video segment, obtain a fall severity corresponding to the target video segment.
由于目标视频片段的时长与分析时长阈值相匹配,则该目标视频片段中所有待识别图像的图像数量是固定的。而目标识别图像是摔倒人的微表情类型为预设表情类型,而且嘴部内唇点平均距离大于预设距离阈值的待识别图像,是可以充分反映摔倒人摔倒之后处于较痛苦状态的图像。因此,在目标视频片段中待识别图像的图像数量固定的前提下,反映摔倒人摔倒之后处于较痛苦状态的目标识别图像的图像数量越多,则其摔倒严重程度越高。即本实施例中,目标识别图像对应的图像数量与其摔倒严重程度呈正比例关系,可以预先设置相应的对照表,即可快速确定其对应的摔倒严重程度。Since the duration of the target video segment matches the analysis duration threshold, the number of images of all to-be-recognized images in the target video segment is fixed. The target recognition image is the image to be recognized with the micro-expression type of the falling person as the preset expression type, and the average distance of the inner lip of the mouth is greater than the preset distance threshold, which can fully reflect that the falling person is in a more painful state after the fall image. Therefore, under the premise that the number of images to be recognized in the target video clip is fixed, the more the number of target recognition images that reflect the more painful state of the falling person after the fall, the higher the severity of the fall. That is, in this embodiment, the number of images corresponding to the target recognition image is proportional to the severity of the fall, and the corresponding comparison table can be preset to quickly determine the corresponding severity of the fall.
本实施例所提供的摔倒行为检测处理方法中,先对待识别图像进行微表情分析,并且只对微表情类型为预设表情类型的待识别图像对待后续人脸特征点检测定位,有助于减少数据处理量,加快数据处理效率;再根据目标嘴部图像对应的N个人脸特征点的特征位置,确定嘴部内唇平均距离,以选取嘴部内唇平均距离大于预设距离阈值的待识别图像确定为目标识别图像,以保证目标识别图像可以真实客观地反映摔倒人的痛苦程度,有助于保障后续摔倒严重程度分析的准确性。利用目标视频片段中所有目标识别图像对应的图像数量,可快速确定其摔倒严重程度,有助于保障分析处理的分析效率、准确率和客观性。In the fall behavior detection and processing method provided in this embodiment, the image to be recognized is first subjected to micro-expression analysis, and only the image to be recognized whose micro-expression type is the preset expression type is treated for subsequent facial feature point detection and positioning, which is helpful Reduce the amount of data processing and speed up the efficiency of data processing; then determine the average distance of the inner mouth of the mouth according to the feature positions of the N facial feature points corresponding to the target mouth image, and select the image to be recognized with the average distance of the inner mouth of the mouth greater than the preset distance threshold It is determined as the target recognition image to ensure that the target recognition image can truly and objectively reflect the pain of the person who fell, which helps to ensure the accuracy of subsequent fall severity analysis. Using the number of images corresponding to all target recognition images in the target video segment can quickly determine the severity of its fall, which helps to ensure the analysis efficiency, accuracy and objectivity of the analysis and processing.
在一实施例中,如图6所示,步骤S504,即根据目标视频片段中所有目标识别图像对应的图像数量,获取目标视频片段对应的摔倒严重程度,包括如下步骤:In one embodiment, as shown in FIG. 6, step S504, that is, obtaining the fall severity corresponding to the target video segment according to the number of images corresponding to all target recognition images in the target video segment, includes the following steps:
S601:将目标视频片段划分为至少两个待处理片段,获取每一待处理片段对应的预设权重,时间在后的待处理片段的预设权重大于时间在前的待处理片段的预设权重。S601: Divide the target video segment into at least two segments to be processed, and obtain a preset weight corresponding to each segment to be processed. The preset weight of the segment to be processed later is greater than the preset weight of the segment to be processed before the time .
具体地,服务器可基于单位时长,将目标视频片段划分为至少两个待处理视频片段,该单位时长可以是分析时长阈值的1/M(M≧2),以将目标视频片段划分成等时长的至少两个待处理片段。预设权重是系统预先给每一待处理片段预先设置的权重。可以理解地,由于目标视频片段是从识别确定为摔倒动作的待识别图像开始至分析时长阈值内的所有待 识别图像的集合,则该目标识别图像在该目标视频片段中的时间戳越往后,说明摔倒人摔倒后处于痛苦状态的持续时间越高,因此,服务器可将目标视频片段依据单位时长划分成至少两个待处理片段,并给每个待处理片段配置相应的预设权重,可使时间在后的待处理片段对应的预设权重大于时间在前的待处理片段对应的预设权重,以保证后续分析处理的客观性和准确性。Specifically, the server may divide the target video clip into at least two video clips to be processed based on the unit duration, and the unit duration may be 1/M (M≧2) of the analysis duration threshold to divide the target video clip into equal durations. At least two fragments to be processed. The preset weight is a weight preset by the system for each segment to be processed. Understandably, since the target video clip is a collection of all the images to be recognized from the recognition of the image to be recognized as a fall action to the analysis time threshold, the time stamp of the target recognition image in the target video clip is After that, it indicates that the duration of the painful state after the fall is higher. Therefore, the server can divide the target video segment into at least two segments to be processed according to the unit duration, and configure a corresponding preset for each segment to be processed The weight can make the preset weight corresponding to the later segment to be processed be greater than the preset weight corresponding to the segment to be processed before the time, so as to ensure the objectivity and accuracy of subsequent analysis and processing.
S602:基于每一待处理片段中,所有目标识别图像对应的图像数量与所有待识别图像对应的图像数量,获取待处理片段对应的目标分值。S602: Based on the number of images corresponding to all target recognition images and the number of images corresponding to all the images to be recognized in each segment to be processed, obtain the target score corresponding to the segment to be processed.
具体地,服务器采用
Figure PCTCN2019116490-appb-000001
这一公式对每一待处理片段中,所有目标识别图像对应的图像数量与所有待识别图像对应的图像数量,获取待处理片段对应的目标分值;其中,P为某一待处理片段对应的目标分值,A为某一待处理片段中所有目标识别图像对应的图像数量,B为某一待处理片段中所有待识别图像对应的图像数量,K为常数,用于将目标分值归一化到特定的数值区间。
Specifically, the server uses
Figure PCTCN2019116490-appb-000001
This formula obtains the target score corresponding to the segment to be processed by the number of images corresponding to all target recognition images and the number of images corresponding to all the images to be recognized in each segment to be processed; where P is the corresponding segment to be processed Target score, A is the number of images corresponding to all target recognition images in a certain segment to be processed, B is the number of images corresponding to all recognition images in a certain segment to be processed, K is a constant, used to normalize the target score To a specific numerical interval.
S603:对至少两个待处理片段对应的预设权重和目标分值进行加权处理,获取目标视频片段对应的摔倒评分值。S603: Perform weighting processing on the preset weights and target scores corresponding to at least two segments to be processed, to obtain a fall score corresponding to the target video segment.
具体地,服务器采用
Figure PCTCN2019116490-appb-000002
这一公式对至少两个待处理片段对应的预设权重和目标分值进行加权处理,获取目标视频片段对应的摔倒评分值;其中,S为目标视频片段对应的摔倒评分值,P i为第i个待处理片段对应的目标分值,W i为第i个待处理片段对应的预设权重,j为目标视频片段中所有待处理片段的数量。
Specifically, the server uses
Figure PCTCN2019116490-appb-000002
This formula weights the preset weights and target scores corresponding to at least two segments to be processed to obtain the fall score value corresponding to the target video segment; where S is the fall score value corresponding to the target video segment, and P i Is the target score corresponding to the i-th segment to be processed, W i is the preset weight corresponding to the i-th segment to be processed, and j is the number of all segments to be processed in the target video segment.
S604:基于摔倒评分值查询摔倒程度对照表,获取目标视频片段对应的摔倒严重程度。S604: Query the fall degree comparison table based on the fall score value, and obtain the fall severity corresponding to the target video clip.
其中,摔倒程度对照表是系统预先设置的用于反映摔倒严重程度及对应的评分范围的对照表。具体地,服务器在获取目标视频片段对应的摔倒评分值,根据该摔倒评分值确定其对应的评分范围,将该评分范围对应的摔倒严重程度确定为目标视频片段的摔倒严重程度,从而实现快速确定摔倒严重程度的目的。Among them, the fall degree comparison table is a comparison table preset by the system to reflect the fall severity and the corresponding scoring range. Specifically, the server obtains the fall score value corresponding to the target video segment, determines its corresponding score range according to the fall score value, and determines the fall severity corresponding to the score range as the fall severity of the target video segment. So as to achieve the purpose of quickly determining the severity of the fall.
本实施例所提供的摔倒行为检测处理方法中,先将目标视频片段划分成至少两个待处理片段且每一待处理片段对应一预设权重,再统计某一待处理片段中目标识别图像的图像数量与待识别图像的图像数量,确定相应的目标分值,利用预设权重和目标分值进行加权计算,以获取目标视频片段对应的摔倒评分值,以使该摔倒评分值更客观更准确;利用该摔倒评分值查询摔倒程度对照表,可快速获取目标视频片段对应的摔倒严重程度,提高摔倒严重程度分析的效率。In the fall behavior detection and processing method provided in this embodiment, the target video segment is first divided into at least two segments to be processed, and each segment to be processed corresponds to a preset weight, and then the target recognition image in a segment to be processed is counted Determine the corresponding target score, use the preset weight and target score to perform weighting calculation to obtain the fall score value corresponding to the target video clip, so that the fall score value is more Objective and more accurate; using the fall score value to query the fall degree comparison table, the fall severity corresponding to the target video segment can be quickly obtained, and the efficiency of the fall severity analysis is improved.
进一步地,在分析目标视频片段对应的摔倒严重程度时,对至少两个待处理片段对应的预设权重和目标分值进行加权处理,获取目标视频片段对应的摔倒评分值之后,还可以从目标视频片段中提取目标人脸图像(实现过程与步骤S701一致),再采用预先训练好的年龄检测模型对目标人脸图像进行检测,获取摔倒人对应的预测年龄,基于该预测年龄查询年龄评分表,获取该预测年龄相对应的年龄常数,采用该摔倒评分值乘以年龄常数的积更新目标视频片段对应的摔倒评分值,以使该摔倒评分值可以考虑摔倒人的年龄状态,以便后续更好的分析目标视频片段对应的摔倒严重程度。例如,该年龄常数可设置在0-2 的取值范围内,如可将年龄居中值(如40岁)对应的年龄常数设置为1;年龄越大,其值越大,说明年龄对摔倒人的摔倒严重程度的作用越大,反之,年龄越小,其值越小,说明年龄对摔倒人的摔倒严重程度的作用越小。例如,小孩子摔倒时一般不会产生较严重的后果,而老人摔倒时就会产生较严重骨折或者其他风险。该年龄检测模型可以采用CNN或者其他网络对携带年龄标签的正负样本进行训练所获取的用于预测年龄的模型。Further, when analyzing the severity of a fall corresponding to the target video segment, weighting is performed on the preset weights and target scores corresponding to at least two segments to be processed, and after obtaining the fall score corresponding to the target video segment, Extract the target face image from the target video segment (the implementation process is the same as step S701), and then use the pre-trained age detection model to detect the target face image, obtain the predicted age corresponding to the person who fell, and query based on the predicted age The age score table is used to obtain the age constant corresponding to the predicted age, and the product of the fall score value multiplied by the age constant is used to update the fall score value corresponding to the target video clip, so that the fall score value can be considered for the fall Age status in order to better analyze the severity of the fall corresponding to the target video clip. For example, the age constant can be set in the value range of 0-2. For example, the age constant corresponding to the median age (such as 40 years old) can be set to 1; the older the age, the larger the value, indicating that age has an effect on falling The greater the effect of the severity of a person's fall, on the contrary, the younger the age, the smaller the value, indicating that the effect of age on the severity of a fall is smaller. For example, children generally do not have serious consequences when they fall, while elderly people have serious fractures or other risks when they fall. The age detection model can be a model for predicting age obtained by training positive and negative samples with age labels using CNN or other networks.
在一实施例中,如图7所示,在发送给目标视频片段对应的提醒终端之前,摔倒行为检测处理方法还包括如下步骤:In one embodiment, as shown in FIG. 7, before sending to the reminder terminal corresponding to the target video clip, the method for detecting a fall behavior further includes the following steps:
S701:从目标视频片段中提取目标人脸图像,基于目标人脸图像查询系统用户图像库。S701: Extract a target face image from a target video clip, and query a system user image database based on the target face image.
其中,目标人脸图像是从目标视频片段中提取出的包含人脸正面的较清晰的摔倒人的图像。系统用户图像库是系统中用于存储用户图像的数据库。该系统用户图像库中可以存储注册用户对应的注册用户图像,也可以存储注册用户对应的注册用户图像和陪同用户图像。该注册用户可以是指在安装该视频采集设备的公共场所对应的系统进行注册的用户。注册用户图像是指与注册用户关联的用户图像,可以是注册用户本人的图像,也可以是注册用户所要看护的对象对应的图像。陪同用户图像可以理解为陪同注册用户或者与注册用户所要看护的对象进入公共场所但没有在系统进行注册的对象的图像。Among them, the target face image is a clearer image of the falling person that contains the front face of the person extracted from the target video clip. The system user image library is a database used to store user images in the system. The user image library of the system can store registered user images corresponding to registered users, as well as registered user images and accompanying user images corresponding to registered users. The registered user may refer to a user registered in a system corresponding to a public place where the video capture device is installed. The registered user image refers to the user image associated with the registered user, which can be the image of the registered user himself or the image corresponding to the object to be cared for by the registered user. An accompanying user image can be understood as an image of an object that accompanies the registered user or enters a public place with the object to be cared for by the registered user but is not registered in the system.
具体地,服务器可从目标视频片段中选取包含人脸正面且较清晰的待识别图像确定为目标人脸图像,再基于目标人脸图像查询系统用户图像库,以确定该目标人脸图像是否为注册用户图像或者陪同用户图像,具体可以采用人脸特征相似度匹配算法进行识别判断是否为注册用户图像或者陪同用户图像。Specifically, the server may select a clearer image to be recognized that contains the front face of the face from the target video clip to determine the target face image, and then query the system user image database based on the target face image to determine whether the target face image is A registered user image or an accompanying user image can be specifically identified by a facial feature similarity matching algorithm to determine whether it is a registered user image or an accompanying user image.
S702:若系统用户图像库中存在与目标人脸图像相对应的注册用户图像或者陪同用户图像,则将注册用户图像对应的注册终端确定为目标视频片段对应的提醒终端。S702: If there is a registered user image corresponding to the target face image or an accompanying user image in the system user image library, the registered terminal corresponding to the registered user image is determined as the reminder terminal corresponding to the target video clip.
具体地,若系统用户图像库中存在与目标人脸图像相对应的注册用户图像,则说明该摔倒人为注册用户本人或者预先通过系统注册确定的需要看护的对象,此时,可将注册用户图像对应的注册终端确定为目标视频片段对应的提醒终端,以向该提醒终端发送目标视频片段和医疗建议信息。若系统用户图像库中不存在与目标人脸图像相对应的注册用户图像,但存在与目标人脸图像相对应的陪同用户图像,则说明该摔倒人可能为注册用户认识或者比较熟悉的对象,可以基于该陪同用户图像查找到相应的注册用户图像,以将注册用户图像对应的注册终端确定为目标视频片段对应的提醒终端,以向该提醒终端发送目标视频片段和医疗建议信息。Specifically, if there is a registered user image corresponding to the target face image in the system user image library, it means that the person who fell down is the registered user himself or the object in need of care determined through system registration in advance. At this time, the registered user can be registered The registered terminal corresponding to the image is determined as the reminder terminal corresponding to the target video clip, so as to send the target video clip and medical advice information to the reminder terminal. If there is no registered user image corresponding to the target face image in the system user image library, but there is an accompanying user image corresponding to the target face image, it means that the person who fell down may be an object known or familiar to the registered user The corresponding registered user image can be found based on the accompanying user image, and the registered terminal corresponding to the registered user image can be determined as the reminder terminal corresponding to the target video clip, so as to send the target video clip and medical advice information to the reminder terminal.
S703:若系统用户图像库中不存在与目标人脸图像相对应的注册用户图像或陪同用户图像,则将视频采集设备对应的管理终端确定为目标视频片段对应的提醒终端。S703: If there is no registered user image or accompanying user image corresponding to the target face image in the system user image library, the management terminal corresponding to the video capture device is determined as the reminder terminal corresponding to the target video segment.
具体地,系统用户图像库中不存在与目标人脸图像相对应的注册用户图像或陪同用户图像时,说明该摔倒人为新进入公共场所的人员或者不是注册用户所认识的人员,此时,将视频采集设备对应的管理终端确定为目标视频片段对应的提醒终端。Specifically, when there is no registered user image or accompanying user image corresponding to the target face image in the system user image library, it means that the falling person is a new person entering a public place or is not a person recognized by the registered user. At this time, The management terminal corresponding to the video capture device is determined as the reminder terminal corresponding to the target video segment.
本实施例所提供的摔倒行为检测处理方法中,从目标视频片段中提取目标人脸图像,根据目标人脸图像在系统用户图像库中是否存在对应的注册用户图像或陪同用户图像,以确定不同的提醒终端,以便后续向相应的提醒终端发送目标视频片段和医疗建议信息,以达到精确提醒的目的,有助于避免摔倒人在摔倒之后没有及时采用相应的救治措施而产生的严重后果。In the fall behavior detection processing method provided in this embodiment, a target face image is extracted from a target video clip, and the target face image is determined according to whether there is a corresponding registered user image or accompanying user image in the system user image library Different reminder terminals, so that the target video clips and medical advice information can be sent to the corresponding reminder terminal in order to achieve the purpose of precise reminder, which helps to avoid the seriousness caused by the faller who fails to adopt the corresponding treatment measures in time after the fall. as a result of.
在一实施例中,如图8所示,在获取视频采集设备实时采集的待识别视频之前,摔倒行为检测处理方法还包括如下步骤:In one embodiment, as shown in FIG. 8, before acquiring the to-be-recognized video captured by the video capture device in real time, the fall behavior detection processing method further includes the following steps:
S801:获取视频采集设备采集的历史视频,每一历史视频包括至少一个历史视频图像。S801: Obtain historical videos collected by the video capture device, where each historical video includes at least one historical video image.
其中,历史视频是指在服务器获取待识别视频之前所采集到的视频。历史视频图像是构成历史视频的图像。Among them, the historical video refers to the video collected before the server obtains the video to be recognized. Historical video images are images that constitute historical videos.
S802:采用人脸检测算法对每一历史视频图像进行识别,获取历史视频图像中至少一 个原始人脸图像。S802: Recognize each historical video image using a face detection algorithm, and obtain at least one original face image in the historical video image.
其中,人脸检测算法是用于检测图像中是否包含人脸的算法。原始人脸图像是采用人脸检测算法识别出的人脸区域对应的图像,该原始人脸图像可以理解为采用人脸检测算法对应的人脸框所选中的人脸区域对应的图像。Among them, the face detection algorithm is an algorithm used to detect whether an image contains a face. The original face image is an image corresponding to the face area recognized by the face detection algorithm, and the original face image can be understood as the image corresponding to the face area selected by the face frame corresponding to the face detection algorithm.
S803:基于每一原始人脸图像查询系统用户图像库,若存在与原始人脸图像相匹配的注册用户图像,则将与注册用户图像处于同一历史视频图像中的其他原始人脸图像确定为待分析人脸图像。S803: Query the system user image database based on each original face image, and if there is a registered user image matching the original face image, determine other original face images in the same historical video image as the registered user image as pending Analyze the face image.
具体地,服务器基于每一原始人脸图像查询系统用户图像库,判断是否存在与该原始人脸图像相对应的注册用户图像,其处理过程如步骤S701所示,为避免重复,不一一赘述。待分析人图像可以理解为与注册用户图像在同一帧历史视频图像中的其他非注册用户图像。例如,若一历史视频图像包括X、Y和Z三个原始人脸图像,若识别到X为系统用户图像库中的注册用户图像,而Y和Z不为注册用户图像,则将Y和Z确定为待分析人脸图像。该待分析人脸图像可以理解为需要分析其是否为该注册用户对应的陪同用户图像。Specifically, the server queries the system user image database based on each original face image to determine whether there is a registered user image corresponding to the original face image. The processing process is shown in step S701. To avoid repetition, we will not repeat them one by one. . The image of the person to be analyzed can be understood as other non-registered user images in the same historical video image as the registered user image. For example, if a historical video image includes three original face images of X, Y, and Z, if it is recognized that X is a registered user image in the system user image library, and Y and Z are not registered user images, then Y and Z Determined as the face image to be analyzed. The face image to be analyzed can be understood as a need to analyze whether it is an accompanying user image corresponding to the registered user.
S804:将包含同一注册用户图像和同一待分析人脸图像的历史视频图像确定为目标视频图像,统计单位时间内目标视频图像对应的共存帧数。S804: Determine the historical video image containing the same registered user image and the same face image to be analyzed as the target video image, and count the number of coexisting frames corresponding to the target video image per unit time.
具体地,目标视频图像是同时包含同一注册用户图像和同一待分析人脸图像的历史视频图像,如上述例子中,目标视频图像是所有同时包含X和Y的历史视频图像。单位时间是预先设置的时间。可以理解地,在单位时间内所有历史视频图像的图像数量确定,可通过统计该单位时间内所有目标视频图像的图像数量,确定为该目标视频图像对应的共存帧数。该共存帧数可以理解为单位时间内同时包含X和Y的所有目标视频图像的图像数量。共存帧数越大,说明X和Y同时出现的概率越大,则X和Y极有可能相互陪同,因此,可基于共存帧数评估是否为陪同用户图像。Specifically, the target video image is a historical video image that contains both the same registered user image and the same face image to be analyzed. As in the above example, the target video image is all historical video images that contain both X and Y. The unit time is the time set in advance. Understandably, the number of images of all historical video images in a unit time is determined, which can be determined as the number of coexisting frames corresponding to the target video image by counting the number of images of all target video images in the unit time. The number of coexisting frames can be understood as the number of images of all target video images simultaneously containing X and Y per unit time. The greater the number of coexistence frames, the greater the probability that X and Y appear at the same time, and X and Y are most likely to accompany each other. Therefore, whether the image is an accompanying user can be evaluated based on the number of coexistence frames.
S805:采用微表情识别模型对每一目标视频图像中的注册用户图像和待分析人脸图像进行识别,统计同时处于积极情绪的积极情绪概率。S805: Use the micro-expression recognition model to recognize the registered user image and the face image to be analyzed in each target video image, and count the probability of positive emotions that are simultaneously in a positive emotion.
其中,微表情识别模型可以为上述步骤S501中的微表情识别模型,其识别处理过程与步骤S501一致,为避免重复,此处不一一赘述。积极情绪为高兴、开心或者其他反映人处于积极状态的情绪,与愤怒、生气等处于消极状态的消极情绪相对。The micro-expression recognition model may be the micro-expression recognition model in step S501, and the recognition process is the same as that in step S501. To avoid repetition, it will not be repeated here. Positive emotions are happiness, happiness, or other emotions that reflect a person's positive state, as opposed to anger, anger, and other negative emotions that are in a negative state.
具体地,若单位时间内所有目标视频图像对应的图像数量为R,采用微表情识别模型对每一目标视频图像中的注册用户图像和待分析人脸图像进行识别,若注册用户图像和待分析人脸图像所识别出的微表情类型均为积极情绪对应的微表情类型时,使得处于积极情绪的图像帧数U加1,直至分析完所有目标视频图像,以确定最终同时处于积极情绪的图像帧数U,采用L=U/R来计算同时处于积极情绪的积极情绪概率L。Specifically, if the number of images corresponding to all target video images per unit time is R, the micro-expression recognition model is used to identify the registered user image and the face image to be analyzed in each target video image. If the registered user image and the face image to be analyzed are When the micro-expression types identified in the face image are all the micro-expression types corresponding to the positive emotions, the number of image frames in the positive emotion U is increased by 1, until all the target video images are analyzed to determine the images that are in the positive emotion at the same time The number of frames is U, and L=U/R is used to calculate the probability of positive emotion L at the same time being in a positive emotion.
S806:若共存帧数大于预设帧数阈值且积极情绪概率大于预设概率阈值,则将待分析人脸图像确定为注册用户图像对应的陪同用户图像,将陪同用户图像与注册用户图像关联存储在系统用户图像库中。S806: If the number of coexisting frames is greater than the preset frame number threshold and the positive emotion probability is greater than the preset probability threshold, determine the face image to be analyzed as the accompanying user image corresponding to the registered user image, and store the accompanying user image in association with the registered user image In the system user image library.
其中,预设帧数阈值是预先设置的用于评估是否为相互陪同的阈值。若共存帧数大于预设帧数阈值,则说明注册用户图像和待分析人脸图像同时出现在一帧目标视频图像中的次数大于预先设置的认定两者相互陪同的阈值。Wherein, the preset frame number threshold is a preset threshold used to evaluate whether they are accompanied by each other. If the number of coexisting frames is greater than the preset frame number threshold, it means that the number of times that the registered user image and the face image to be analyzed appear in one frame of the target video image at the same time is greater than the preset threshold that determines that the two are accompanied by each other.
其中,预设概率阈值是预先设置的用于评估两者是否友好的阈值。若积极情绪概率大于预设概率阈值,则说明注册用户图像和待分析人脸图像对应的两人均处于积极情绪的概率较大,两人为朋友的概率较大。Among them, the preset probability threshold is a preset threshold used to evaluate whether the two are friendly. If the positive emotion probability is greater than the preset probability threshold, it means that the two persons corresponding to the registered user image and the face image to be analyzed are more likely to be in positive emotions, and the probability that the two are friends is greater.
具体地,服务器在共存帧数大于预设帧数阈值且积极情绪概率大于预设概率阈值时, 认定该陪同用户图像对应的对象极有可能是陪同注册用户图像对应的用户进入到视频采集设备对应的采集区域的对象,因此,可将该待分析人脸图像确定为注册用户图像对应的陪同用户图像,将陪同用户图像与注册用户图像关联存储在系统用户图像库中,以便后续通过查询注册用户图像或者陪同用户图像确定提醒终端时,可实现针对性提醒。Specifically, when the number of coexisting frames is greater than the preset frame number threshold and the positive emotion probability is greater than the preset probability threshold, the server determines that the object corresponding to the accompanying user image is most likely to be the user corresponding to the accompanying registered user image entering the video capture device. Therefore, the face image to be analyzed can be determined as the accompanying user image corresponding to the registered user image, and the accompanying user image and the registered user image are associated and stored in the system user image library for subsequent query of the registered user When the image or the image of the accompanying user confirms the reminder terminal, a targeted reminder can be realized.
本实施例所提供的摔倒行为检测处理方法中,将与注册用户图像处于同一历史视频图像中的其他原始人脸图像确定为待分析人脸图像,以保证后续确定陪同用户图像的客观性;将共存帧数大于预设帧数阈值且积极情绪概率大于预设概率阈值的待分析人脸图像确定为注册用户图像对应的陪同用户图像,以保证陪同用户图像确定的准确率和客观性。In the fall behavior detection and processing method provided in this embodiment, other original face images in the same historical video image as the registered user image are determined as the face images to be analyzed, so as to ensure the objectivity of the accompanying user image in the subsequent determination; The face image to be analyzed with the number of coexisting frames greater than the preset frame number threshold and the positive emotion probability greater than the preset probability threshold is determined as the accompanying user image corresponding to the registered user image to ensure the accuracy and objectivity of the accompanying user image determination.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present application.
在一实施例中,提供一种摔倒行为检测处理装置,该摔倒行为检测处理装置与上述实施例中摔倒行为检测处理方法一一对应。如图9所示,该摔倒行为检测处理装置包括待识别视频获取模块901、摔倒动作检测模块902、目标视频片段截取模块903、摔倒严重程度获取模块904、医疗建议信息获取模块905和信息发送模块906。各功能模块详细说明如下:In an embodiment, a fall behavior detection and processing device is provided, and the fall behavior detection and processing device corresponds to the fall behavior detection and processing method in the above-mentioned embodiment one-to-one. As shown in FIG. 9, the fall behavior detection processing device includes a video acquisition module 901 to be recognized, a fall motion detection module 902, a target video clip interception module 903, a fall severity acquisition module 904, a medical advice information acquisition module 905, and Information sending module 906. The detailed description of each functional module is as follows:
待识别视频获取模块901,获取视频采集设备实时采集的待识别视频。The to-be-identified video acquisition module 901 acquires the to-be-identified video collected by the video acquisition device in real time.
摔倒动作检测模块902,用于采用基于R-C3D的行为检测模型对待识别视频进行识别,确定待识别视频对应的行为动作是否包括摔倒动作。The fall action detection module 902 is configured to use the R-C3D-based behavior detection model to recognize the video to be recognized, and determine whether the behavior action corresponding to the video to be recognized includes a fall action.
目标视频片段截取模块903,用于若待识别视频对应的行为动作包括摔倒动作,则从待识别视频中截取摔倒动作对应的目标视频片段。The target video segment interception module 903 is configured to, if the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized.
摔倒严重程度获取模块904,用于对目标视频片段进行严重程度分析,获取目标视频片段对应的摔倒严重程度。The fall severity acquisition module 904 is configured to analyze the severity of the target video segment and obtain the fall severity corresponding to the target video segment.
医疗建议信息获取模块905,用于基于摔倒严重程度,获取目标视频片段对应的医疗建议信息。The medical advice information obtaining module 905 is used to obtain medical advice information corresponding to the target video clip based on the severity of the fall.
信息发送模块906,将目标视频片段和医疗建议信息,发送给目标视频片段对应的提醒终端。The information sending module 906 sends the target video clip and medical advice information to the reminder terminal corresponding to the target video clip.
优选地,摔倒动作检测模块902,包括:Preferably, the falling motion detection module 902 includes:
原始视频片段获取单元,用于基于片段时长阈值和重叠时长阈值,对待识别视频进行交错切割,获取至少两个原始视频片段,每一原始视频片段对应一片段时间戳。The original video segment acquisition unit is configured to perform interlaced cutting of the video to be identified based on the segment duration threshold and the overlap duration threshold to acquire at least two original video segments, each of which corresponds to a segment timestamp.
视频片段动作检测单元,用于依据片段时间戳的顺序,依次将至少两个原始视频片段输入基于R-C3D的行为检测模型进行识别,确定每一原始视频片段是否包括摔倒动作。The video segment action detection unit is configured to sequentially input at least two original video segments into an R-C3D-based behavior detection model for identification according to the sequence of the segment timestamps, and determine whether each original video segment includes a fall action.
优选地,目标视频片段截取模块903,包括:Preferably, the target video segment interception module 903 includes:
起始时间戳确定单元,用于若待识别视频对应的行为动作包括摔倒动作,则确定摔倒动作对应的待识别图像对应的时间戳为起始时间戳。The initial time stamp determining unit is configured to determine that the time stamp corresponding to the image to be identified corresponding to the falling action is the initial time stamp if the behavior action corresponding to the video to be recognized includes a falling action.
终止时间戳确定单元,用于基于起始时间戳和分析时长阈值,确定终止时间戳。The termination time stamp determining unit is configured to determine the termination time stamp based on the starting time stamp and the analysis duration threshold.
视频片段截取单元,用于从待识别视频中截取起始时间戳与终止时间戳之间的视频片段,确定为摔倒动作对应的目标视频片段。The video segment intercepting unit is used to intercept the video segment between the start time stamp and the end time stamp from the video to be identified, and determine the target video segment corresponding to the fall action.
优选地,每一目标视频片段包括至少一个待识别图像。摔倒严重程度获取模块904,包括:Preferably, each target video segment includes at least one image to be recognized. The fall severity acquisition module 904 includes:
微表情类型获取单元,用于采用微表情识别模型对目标视频片段中每一待识别图像进行识别,获取每一待识别图像对应的微表情类型。The micro-expression type obtaining unit is used to recognize each image to be recognized in the target video segment by using a micro-expression recognition model, and obtain the micro-expression type corresponding to each image to be recognized.
目标嘴部图像获取单元,用于若微表情类型为预设表情类型,则采用人脸特征点检测算法对待识别图像进行检测定位,获取包含人脸嘴部区域的目标嘴部图像,目标嘴部图像包括N个人脸特征点及每个人脸特征点对应的特征位置。The target mouth image acquisition unit is used to if the micro-expression type is the preset expression type, use the facial feature point detection algorithm to detect and locate the image to be recognized, and obtain the target mouth image containing the face mouth area. The target mouth The image includes N face feature points and the feature location corresponding to each face feature point.
目标识别图像确定单元,用于基于N个人脸特征点对应的特征位置,获取嘴部内唇点 平均距离,若嘴部内唇点平均距离大于预设距离阈值,则将目标嘴部图像对应的待识别图像确定为目标识别图像。The target recognition image determination unit is used to obtain the average distance of the inner lips of the mouth based on the feature positions corresponding to the N facial feature points. If the average distance of the inner lips of the mouth is greater than the preset distance threshold, the target mouth image corresponding to the to-be-recognized The image is determined as the target recognition image.
摔倒严重程度获取单元,用于根据目标视频片段中所有目标识别图像对应的图像数量,获取目标视频片段对应的摔倒严重程度。The fall severity acquiring unit is configured to acquire the fall severity corresponding to the target video segment according to the number of images corresponding to all target recognition images in the target video segment.
优选地,摔倒严重程度获取单元,包括:Preferably, the fall severity acquisition unit includes:
预设权重获取子单元,用于将目标视频片段划分为至少两个待处理片段,获取每一待处理片段对应的预设权重,时间在后的待处理片段的预设权重大于时间在前的待处理片段的预设权重。The preset weight obtaining subunit is used to divide the target video segment into at least two to-be-processed segments, and to obtain the preset weight corresponding to each to-be-processed segment. The preset weight of the to-be-processed segment after the time is greater than that of the previous one The preset weight of the segment to be processed.
目标分值获取子单元,用于基于每一待处理片段中,所有目标识别图像对应的图像数量与所有待识别图像对应的图像数量,获取待处理片段对应的目标分值。The target score obtaining subunit is used to obtain the target score corresponding to the segment to be processed based on the number of images corresponding to all target recognition images and the number of images corresponding to all the images to be recognized in each segment to be processed.
摔倒评分值获取子单元,用于对至少两个待处理片段对应的预设权重和目标分值进行加权处理,获取目标视频片段对应的摔倒评分值。The fall score value obtaining subunit is configured to perform weighting processing on the preset weights and target scores corresponding to at least two segments to be processed to obtain the fall score value corresponding to the target video segment.
摔倒严重程度获取子单元,用于基于摔倒评分值查询摔倒程度对照表,获取目标视频片段对应的摔倒严重程度。The fall severity acquisition subunit is used to query the fall degree comparison table based on the fall score value, and obtain the fall severity corresponding to the target video clip.
优选地,在信息发送模块906之前,摔倒行为检测处理装置还包括:Preferably, before the information sending module 906, the device for detecting and processing a fall behavior further includes:
目标人脸图像查询单元,用于从目标视频片段中提取目标人脸图像,基于目标人脸图像查询系统用户图像库。The target face image query unit is used to extract the target face image from the target video segment, and query the system user image database based on the target face image.
第一提醒终端确定单元,用于若系统用户图像库中存在与目标人脸图像相对应的注册用户图像或者陪同用户图像,则将注册用户图像对应的注册终端确定为目标视频片段对应的提醒终端。The first reminder terminal determining unit is configured to determine the registered terminal corresponding to the registered user image as the reminder terminal corresponding to the target video segment if there is a registered user image corresponding to the target face image or an accompanying user image in the system user image library .
第二提醒终端确定单元,用于若系统用户图像库中不存在与目标人脸图像相对应的注册用户图像或陪同用户图像,则将视频采集设备对应的管理终端确定为目标视频片段对应的提醒终端。The second reminder terminal determining unit is used to determine the management terminal corresponding to the video capture device as the reminder corresponding to the target video segment if there is no registered user image or accompanying user image corresponding to the target face image in the system user image library terminal.
优选地,在待识别视频获取模块901之前,摔倒行为检测处理装置还包括:Preferably, before the to-be-recognized video acquisition module 901, the device for detecting and processing a fall behavior further includes:
历史视频获取单元,用于获取视频采集设备采集的历史视频,每一历史视频包括至少一个历史视频图像。The historical video acquisition unit is configured to acquire historical videos collected by the video acquisition device, and each historical video includes at least one historical video image.
原始人脸图像获取单元,用于采用人脸检测算法对每一历史视频图像进行识别,获取历史视频图像中至少一个原始人脸图像。The original face image acquisition unit is used to recognize each historical video image by using a face detection algorithm to acquire at least one original face image in the historical video image.
待分析人脸图像获取单元,用于基于每一原始人脸图像查询系统用户图像库,若存在与原始人脸图像相匹配的注册用户图像,则将与注册用户图像处于同一历史视频图像中的其他原始人脸图像确定为待分析人脸图像。The face image acquisition unit to be analyzed is used to query the system user image database based on each original face image. If there is a registered user image that matches the original face image, it will be in the same historical video image as the registered user image Other original face images are determined as face images to be analyzed.
共存帧数统计单元,用于将包含同一注册用户图像和同一待分析人脸图像的历史视频图像确定为目标视频图像,统计单位时间内目标视频图像对应的共存帧数。The coexistence frame number counting unit is used to determine the historical video image containing the same registered user image and the same face image to be analyzed as the target video image, and count the number of coexistence frames corresponding to the target video image per unit time.
积极情绪概率获取单元,用于采用微表情识别模型对每一目标视频图像中的注册用户图像和待分析人脸图像进行识别,统计同时处于积极情绪的积极情绪概率。The positive emotion probability acquisition unit is used to recognize the registered user image and the face image to be analyzed in each target video image by using the micro-expression recognition model, and to count the positive emotion probability of being in a positive emotion at the same time.
陪同用户图像确定单元,用于若共存帧数大于预设帧数阈值且积极情绪概率大于预设概率阈值,则将待分析人脸图像确定为注册用户图像对应的陪同用户图像,将陪同用户图像与注册用户图像关联存储在系统用户图像库中。An accompanying user image determining unit, configured to determine the face image to be analyzed as the accompanying user image corresponding to the registered user image if the number of coexisting frames is greater than the preset frame number threshold and the positive emotion probability is greater than the preset probability threshold, and the accompanying user image The image of the registered user is associated and stored in the system user image library.
关于摔倒行为检测处理装置的具体限定可以参见上文中对于摔倒行为检测处理方法的限定,在此不再赘述。上述摔倒行为检测处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the fall behavior detection and processing device, please refer to the above limitation on the fall behavior detection and processing method, which will not be repeated here. Each module in the above-mentioned falling behavior detection and processing device can be implemented in whole or in part by software, hardware, and combinations thereof. The foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图10所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口 和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储执行摔倒行为检测处理方法过程采用或者生成的数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种摔倒行为检测处理方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 10. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store the data adopted or generated during the process of executing the fall behavior detection processing method. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instruction is executed by the processor to realize a fall behavior detection processing method.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现上述实施例中摔倒行为检测处理方法的步骤,例如图2所示的步骤S201-S206,或者图3至图8中所示的步骤,为避免重复,这里不再赘述。或者,处理器执行计算机可读指令时实现摔倒行为检测处理装置这一实施例中的各模块/单元的功能,例如图9所示的各模块/单元/子单元的功能,为避免重复,这里不再赘述。In one embodiment, a computer device is provided, including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor. When the processor executes the computer-readable instructions, the above-mentioned embodiments are implemented. The steps of the inverted behavior detection processing method, such as steps S201-S206 shown in FIG. 2, or the steps shown in FIG. 3 to FIG. 8, are not repeated here to avoid repetition. Or, when the processor executes the computer-readable instructions, the functions of the modules/units in this embodiment of the device for detecting and processing falling behavior are implemented, such as the functions of the modules/units/subunits shown in FIG. 9. To avoid repetition, I won't repeat it here.
在一实施例中,提供一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时实现上述实施例中摔倒行为检测处理方法的步骤,例如图2所示的步骤S201-S206,或者图3至图8中所示的步骤,为避免重复,这里不再赘述。或者,该计算机可读指令被处理器执行时实现上述摔倒行为检测处理装置这一实施例中的各模块/单元的功能,例如图9所示的各模块/单元/子单元的功能,为避免重复,这里不再赘述。本实施例中的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。In an embodiment, one or more readable storage media storing computer readable instructions are provided. The computer readable storage medium stores computer readable instructions, and the computer readable instructions are executed by one or more processors. When executed, the one or more processors are executed to implement the steps of the fall behavior detection processing method in the foregoing embodiment, such as steps S201-S206 shown in FIG. 2, or the steps shown in FIGS. 3 to 8 To avoid repetition, I won’t repeat it here. Alternatively, when the computer-readable instruction is executed by the processor, the function of each module/unit in this embodiment of the apparatus for detecting and processing falling behavior is realized, for example, the function of each module/unit/subunit shown in FIG. 9 is To avoid repetition, I won't repeat it here. The readable storage medium in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,计算机可读指令可存储于一非易失性可读存储介质也可以存储在一易失性可读存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Persons of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a non-volatile readable storage medium. It may also be stored in a volatile readable storage medium, and when the computer readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that for the convenience and conciseness of description, only the division of the above-mentioned functional units and modules is used as an example. In practical applications, the above-mentioned functions can be allocated to different functional units and modules as required. Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims (20)

  1. 一种摔倒行为检测处理方法,其特征在于,包括:A method for detecting and processing falling behavior, which is characterized in that it comprises:
    获取视频采集设备实时采集的待识别视频;Obtain the to-be-identified video captured by the video capture device in real time;
    采用基于R-C3D的行为检测模型对所述待识别视频进行识别,确定所述待识别视频对应的行为动作是否包括摔倒动作;Recognizing the video to be recognized by using a behavior detection model based on R-C3D, and determining whether the behavior action corresponding to the video to be recognized includes a falling motion;
    若所述待识别视频对应的行为动作包括摔倒动作,则从所述待识别视频中截取所述摔倒动作对应的目标视频片段;If the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;
    对所述目标视频片段进行严重程度分析,获取所述目标视频片段对应的摔倒严重程度;Analyzing the severity of the target video segment to obtain the severity of the fall corresponding to the target video segment;
    基于所述摔倒严重程度,获取所述目标视频片段对应的医疗建议信息;Obtaining medical advice information corresponding to the target video clip based on the severity of the fall;
    将所述目标视频片段和所述医疗建议信息,发送给所述目标视频片段对应的提醒终端。The target video segment and the medical advice information are sent to a reminder terminal corresponding to the target video segment.
  2. 如权利要求1所述的摔倒行为检测处理方法,其特征在于,所述采用基于R-C3D的行为检测模型对所述待识别视频进行识别,确定所述待识别视频对应的行为动作是否包括摔倒动作,包括:The fall behavior detection and processing method according to claim 1, wherein the behavior detection model based on R-C3D is used to recognize the video to be recognized, and it is determined whether the behavior action corresponding to the video to be recognized includes Falling actions, including:
    基于片段时长阈值和重叠时长阈值,对所述待识别视频进行交错切割,获取至少两个原始视频片段,每一所述原始视频片段对应一片段时间戳;Based on the segment duration threshold and the overlap duration threshold, interleave cutting the to-be-identified video to obtain at least two original video segments, each of the original video segments corresponding to a segment timestamp;
    依据所述片段时间戳的顺序,依次将至少两个所述原始视频片段输入基于R-C3D的行为检测模型进行识别,确定每一所述原始视频片段是否包括摔倒动作。According to the sequence of the timestamps of the fragments, at least two of the original video fragments are sequentially input into the R-C3D-based behavior detection model for recognition, and it is determined whether each of the original video fragments includes a fall action.
  3. 如权利要求1所述的摔倒行为检测处理方法,其特征在于,所述若所述待识别视频对应的行为动作包括摔倒动作,则从所述待识别视频中截取所述摔倒动作对应的目标视频片段,包括:The method for detecting and processing a fall behavior according to claim 1, wherein if the behavior action corresponding to the video to be recognized includes a fall action, then intercept the corresponding fall action from the video to be recognized The target video clips include:
    若所述待识别视频对应的行为动作包括摔倒动作,则确定所述摔倒动作对应的待识别图像对应的时间戳为起始时间戳;If the behavior action corresponding to the to-be-recognized video includes a falling action, determining that the time stamp corresponding to the to-be-recognized image corresponding to the falling action is the starting time stamp;
    基于所述起始时间戳和分析时长阈值,确定终止时间戳;Determine an end time stamp based on the start time stamp and the analysis duration threshold;
    从所述待识别视频中截取所述起始时间戳与所述终止时间戳之间的视频片段,确定为所述摔倒动作对应的目标视频片段。The video segment between the start timestamp and the end timestamp is intercepted from the video to be identified, and determined as the target video segment corresponding to the falling action.
  4. 如权利要求1所述的摔倒行为检测处理方法,其特征在于,每一所述目标视频片段包括至少一个待识别图像;The method for detecting and processing a fall behavior according to claim 1, wherein each of the target video segments includes at least one image to be recognized;
    所述对所述目标视频片段进行严重程度分析,获取所述目标视频片段对应的摔倒严重程度,包括:The analyzing the severity of the target video segment to obtain the severity of a fall corresponding to the target video segment includes:
    采用微表情识别模型对所述目标视频片段中每一所述待识别图像进行识别,获取每一所述待识别图像对应的微表情类型;Recognizing each of the to-be-recognized images in the target video segment using a micro-expression recognition model, and acquiring the micro-expression type corresponding to each of the to-be-recognized images;
    若所述微表情类型为预设表情类型,则采用人脸特征点检测算法对所述待识别图像进行检测定位,获取包含人脸嘴部区域的目标嘴部图像,所述目标嘴部图像包括N个人脸特征点及每个所述人脸特征点对应的特征位置;If the micro-expression type is a preset expression type, the face feature point detection algorithm is used to detect and locate the image to be recognized, and a target mouth image including the mouth area of the face is obtained, and the target mouth image includes N facial feature points and feature positions corresponding to each of the facial feature points;
    基于N个所述人脸特征点对应的特征位置,获取嘴部内唇点平均距离,若所述嘴部内唇点平均距离大于预设距离阈值,则将所述目标嘴部图像对应的待识别图像确定为目标识别图像;Based on the feature positions corresponding to the N facial feature points, the average distance of the inner lips of the mouth is obtained. If the average distance of the inner lips of the mouth is greater than the preset distance threshold, the target mouth image corresponding to the image to be recognized Determined as the target recognition image;
    根据所述目标视频片段中所有所述目标识别图像对应的图像数量,获取所述目标视频片段对应的摔倒严重程度。According to the number of images corresponding to all the target recognition images in the target video segment, the fall severity corresponding to the target video segment is acquired.
  5. 如权利要求4所述的摔倒行为检测处理方法,其特征在于,所述根据所述目标视频片段中所有所述目标识别图像对应的图像数量,获取所述目标视频片段对应的摔倒严重程度,包括:The method for detecting and processing a fall behavior according to claim 4, wherein the number of images corresponding to all the target recognition images in the target video clip is used to obtain the fall severity corresponding to the target video clip ,include:
    将所述目标视频片段划分为至少两个待处理片段,获取每一所述待处理片段对应的预设权重,时间在后的待处理片段的预设权重大于时间在前的待处理片段的预设权重;The target video segment is divided into at least two segments to be processed, and the preset weight corresponding to each segment to be processed is obtained. The preset weight of the segment to be processed after the time is greater than that of the segment to be processed before the time. Set weight
    基于每一所述待处理片段中,所有所述目标识别图像对应的图像数量与所有所述待识别图像对应的图像数量,获取所述待处理片段对应的目标分值;Obtaining the target score corresponding to the segment to be processed based on the number of images corresponding to all the target recognition images and the number of images corresponding to all the images to be recognized in each segment to be processed;
    对至少两个所述待处理片段对应的预设权重和目标分值进行加权处理,获取所述目标视频片段对应的摔倒评分值;Weighting the preset weights and target scores corresponding to at least two of the segments to be processed to obtain the fall score corresponding to the target video segment;
    基于所述摔倒评分值查询摔倒程度对照表,获取所述目标视频片段对应的摔倒严重程度。Query a fall degree comparison table based on the fall score value, and obtain the fall severity corresponding to the target video segment.
  6. 如权利要求1所述的摔倒行为检测处理方法,其特征在于,在所述发送给所述目标视频片段对应的提醒终端之前,所述摔倒行为检测处理方法还包括:The method for detecting and processing a fall behavior according to claim 1, wherein before the sending to the reminder terminal corresponding to the target video clip, the method for detecting and processing the fall behavior further comprises:
    从所述目标视频片段中提取目标人脸图像,基于所述目标人脸图像查询系统用户图像库;Extracting a target face image from the target video segment, and querying a system user image database based on the target face image;
    若所述系统用户图像库中存在与所述目标人脸图像相对应的注册用户图像或者陪同用户图像,则将所述注册用户图像对应的注册终端确定为所述目标视频片段对应的提醒终端;If there is a registered user image or an accompanying user image corresponding to the target face image in the system user image library, the registered terminal corresponding to the registered user image is determined as the reminder terminal corresponding to the target video clip;
    若所述系统用户图像库中不存在与所述目标人脸图像相对应的所述注册用户图像或所述陪同用户图像,则将所述视频采集设备对应的管理终端确定为所述目标视频片段对应的提醒终端。If the registered user image or the accompanying user image corresponding to the target face image does not exist in the system user image library, the management terminal corresponding to the video capture device is determined as the target video segment The corresponding reminder terminal.
  7. 如权利要求6所述的摔倒行为检测处理方法,其特征在于,在所述获取视频采集设备实时采集的待识别视频之前,所述摔倒行为检测处理方法还包括:7. The method for detecting and processing a fall behavior according to claim 6, wherein before said acquiring the to-be-recognized video collected in real time by the video capture device, the method for detecting and processing falling behavior further comprises:
    获取视频采集设备采集的历史视频,每一所述历史视频包括至少一个历史视频图像;Acquiring historical videos collected by a video capture device, each of the historical videos includes at least one historical video image;
    采用人脸检测算法对每一所述历史视频图像进行识别,获取所述历史视频图像中至少一个原始人脸图像;Use a face detection algorithm to recognize each of the historical video images, and obtain at least one original face image in the historical video images;
    基于每一所述原始人脸图像查询系统用户图像库,若存在与所述原始人脸图像相匹配的注册用户图像,则将与所述注册用户图像处于同一所述历史视频图像中的其他原始人脸图像确定为待分析人脸图像;Based on each of the original face images in the query system user image database, if there is a registered user image that matches the original face image, it will be other original images in the same historical video image as the registered user image. The face image is determined to be the face image to be analyzed;
    将包含同一所述注册用户图像和同一所述待分析人脸图像的历史视频图像确定为目标视频图像,统计单位时间内所述目标视频图像对应的共存帧数;Determining historical video images containing the same registered user image and the same face image to be analyzed as the target video image, and counting the number of coexisting frames corresponding to the target video image per unit time;
    采用微表情识别模型对每一所述目标视频图像中的注册用户图像和待分析人脸图像进行识别,统计同时处于积极情绪的积极情绪概率;Using a micro-expression recognition model to recognize the registered user image and the face image to be analyzed in each of the target video images, and count the probability of positive emotions that are simultaneously in positive emotions;
    若所述共存帧数大于预设帧数阈值且所述积极情绪概率大于预设概率阈值,则将所述待分析人脸图像确定为所述注册用户图像对应的陪同用户图像,将所述陪同用户图像与所述注册用户图像关联存储在系统用户图像库中。If the number of coexistence frames is greater than the preset number of frames threshold and the positive emotion probability is greater than the preset probability threshold, the face image to be analyzed is determined as the accompanying user image corresponding to the registered user image, and the accompanying user image The user image and the registered user image are associated and stored in the system user image library.
  8. 一种摔倒行为检测处理装置,其特征在于,包括:A device for detecting and processing falling behaviors, which is characterized by comprising:
    待识别视频获取模块,获取视频采集设备实时采集的待识别视频;The to-be-recognized video acquisition module acquires the to-be-recognized video collected by the video capture device in real time;
    摔倒动作检测模块,用于采用基于R-C3D的行为检测模型对所述待识别视频进行识别,确定所述待识别视频对应的行为动作是否包括摔倒动作;A fall action detection module, configured to use an R-C3D-based behavior detection model to recognize the video to be recognized, and determine whether the behavior action corresponding to the video to be recognized includes a fall action;
    目标视频片段截取模块,用于若所述待识别视频对应的行为动作包括摔倒动作,则从所述待识别视频中截取所述摔倒动作对应的目标视频片段;A target video segment interception module, configured to, if the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;
    摔倒严重程度获取模块,用于对所述目标视频片段进行严重程度分析,获取所述目标视频片段对应的摔倒严重程度;A fall severity acquisition module, configured to analyze the severity of the target video clip, and obtain the fall severity corresponding to the target video clip;
    医疗建议信息获取模块,用于基于所述摔倒严重程度,获取所述目标视频片段对应的医疗建议信息;A medical advice information acquisition module, configured to acquire medical advice information corresponding to the target video clip based on the severity of the fall;
    信息发送模块,将所述目标视频片段和所述医疗建议信息,发送给所述目标视频片段对应的提醒终端。The information sending module sends the target video clip and the medical advice information to the reminder terminal corresponding to the target video clip.
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device includes a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor, wherein the processor executes the computer-readable instructions as follows step:
    获取视频采集设备实时采集的待识别视频;Obtain the to-be-identified video captured by the video capture device in real time;
    采用基于R-C3D的行为检测模型对所述待识别视频进行识别,确定所述待识别视频对应的行为动作是否包括摔倒动作;Recognizing the video to be recognized by using a behavior detection model based on R-C3D, and determining whether the behavior action corresponding to the video to be recognized includes a falling motion;
    若所述待识别视频对应的行为动作包括摔倒动作,则从所述待识别视频中截取所述摔倒动作对应的目标视频片段;If the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;
    对所述目标视频片段进行严重程度分析,获取所述目标视频片段对应的摔倒严重程度;Analyzing the severity of the target video segment to obtain the severity of the fall corresponding to the target video segment;
    基于所述摔倒严重程度,获取所述目标视频片段对应的医疗建议信息;Obtaining medical advice information corresponding to the target video clip based on the severity of the fall;
    将所述目标视频片段和所述医疗建议信息,发送给所述目标视频片段对应的提醒终端。The target video segment and the medical advice information are sent to a reminder terminal corresponding to the target video segment.
  10. 如权利要求9所述的计算机设备,其特征在于,所述采用基于R-C3D的行为检测模型对所述待识别视频进行识别,确定所述待识别视频对应的行为动作是否包括摔倒动作,包括:9. The computer device according to claim 9, wherein the R-C3D-based behavior detection model is used to recognize the video to be recognized, and to determine whether the behavior action corresponding to the video to be recognized includes a fall motion, include:
    基于片段时长阈值和重叠时长阈值,对所述待识别视频进行交错切割,获取至少两个原始视频片段,每一所述原始视频片段对应一片段时间戳;Based on the segment duration threshold and the overlap duration threshold, interleave cutting the to-be-identified video to obtain at least two original video segments, each of the original video segments corresponding to a segment timestamp;
    依据所述片段时间戳的顺序,依次将至少两个所述原始视频片段输入基于R-C3D的行为检测模型进行识别,确定每一所述原始视频片段是否包括摔倒动作。According to the sequence of the timestamps of the fragments, at least two of the original video fragments are sequentially input into the R-C3D-based behavior detection model for recognition, and it is determined whether each of the original video fragments includes a fall action.
  11. 如权利要求9所述的计算机设备,其特征在于,所述若所述待识别视频对应的行为动作包括摔倒动作,则从所述待识别视频中截取所述摔倒动作对应的目标视频片段,包括:The computer device according to claim 9, wherein if the behavior action corresponding to the video to be recognized includes a fall action, then intercept the target video clip corresponding to the fall action from the video to be recognized ,include:
    若所述待识别视频对应的行为动作包括摔倒动作,则确定所述摔倒动作对应的待识别图像对应的时间戳为起始时间戳;If the behavior action corresponding to the to-be-recognized video includes a falling action, determining that the time stamp corresponding to the to-be-recognized image corresponding to the falling action is the starting time stamp;
    基于所述起始时间戳和分析时长阈值,确定终止时间戳;Determine an end time stamp based on the start time stamp and the analysis duration threshold;
    从所述待识别视频中截取所述起始时间戳与所述终止时间戳之间的视频片段,确定为所述摔倒动作对应的目标视频片段。The video segment between the start timestamp and the end timestamp is intercepted from the video to be identified, and determined as the target video segment corresponding to the falling action.
  12. 如权利要求9所述的计算机设备,其特征在于,每一所述目标视频片段包括至少一个待识别图像;9. The computer device of claim 9, wherein each of the target video segments includes at least one image to be recognized;
    所述对所述目标视频片段进行严重程度分析,获取所述目标视频片段对应的摔倒严重程度,包括:The analyzing the severity of the target video segment to obtain the severity of a fall corresponding to the target video segment includes:
    采用微表情识别模型对所述目标视频片段中每一所述待识别图像进行识别,获取每一所述待识别图像对应的微表情类型;Recognizing each of the to-be-recognized images in the target video segment using a micro-expression recognition model, and acquiring the micro-expression type corresponding to each of the to-be-recognized images;
    若所述微表情类型为预设表情类型,则采用人脸特征点检测算法对所述待识别图像进行检测定位,获取包含人脸嘴部区域的目标嘴部图像,所述目标嘴部图像包括N个人脸特征点及每个所述人脸特征点对应的特征位置;If the micro-expression type is a preset expression type, the facial feature point detection algorithm is used to detect and locate the image to be recognized, and a target mouth image including the mouth area of the human face is obtained, and the target mouth image includes N personal facial feature points and feature positions corresponding to each of the facial feature points;
    基于N个所述人脸特征点对应的特征位置,获取嘴部内唇点平均距离,若所述嘴部内唇点平均距离大于预设距离阈值,则将所述目标嘴部图像对应的待识别图像确定为目标识别图像;Based on the feature positions corresponding to the N facial feature points, the average distance of the inner lips of the mouth is obtained. If the average distance of the inner lips of the mouth is greater than the preset distance threshold, the target mouth image corresponding to the image to be recognized Determined as the target recognition image;
    根据所述目标视频片段中所有所述目标识别图像对应的图像数量,获取所述目标视频片段对应的摔倒严重程度。According to the number of images corresponding to all the target recognition images in the target video segment, the fall severity corresponding to the target video segment is acquired.
  13. 如权利要求12所述的计算机设备,其特征在于,所述根据所述目标视频片段中所有所述目标识别图像对应的图像数量,获取所述目标视频片段对应的摔倒严重程度,包括:The computer device of claim 12, wherein the acquiring the fall severity corresponding to the target video segment according to the number of images corresponding to all the target recognition images in the target video segment comprises:
    将所述目标视频片段划分为至少两个待处理片段,获取每一所述待处理片段对应的预设权重,时间在后的待处理片段的预设权重大于时间在前的待处理片段的预设权重;The target video segment is divided into at least two segments to be processed, and the preset weight corresponding to each segment to be processed is obtained. The preset weight of the segment to be processed after the time is greater than that of the segment to be processed before the time. Set weight
    基于每一所述待处理片段中,所有所述目标识别图像对应的图像数量与所有所述待识别图像对应的图像数量,获取所述待处理片段对应的目标分值;Obtaining the target score corresponding to the segment to be processed based on the number of images corresponding to all the target recognition images and the number of images corresponding to all the images to be recognized in each segment to be processed;
    对至少两个所述待处理片段对应的预设权重和目标分值进行加权处理,获取所述目标视频片段对应的摔倒评分值;Weighting the preset weights and target scores corresponding to at least two of the segments to be processed to obtain the fall scores corresponding to the target video segments;
    基于所述摔倒评分值查询摔倒程度对照表,获取所述目标视频片段对应的摔倒严重程度。Query a fall degree comparison table based on the fall score value, and obtain the fall severity corresponding to the target video segment.
  14. 如权利要求9所述的计算机设备,其特征在于,在所述获取视频采集设备实时采集的待识别视频之前,所述处理器执行所述计算机可读指令时还实现如下步骤:9. The computer device according to claim 9, wherein before said acquiring the to-be-recognized video captured by the video capture device in real time, the processor further implements the following steps when executing the computer-readable instruction:
    获取视频采集设备采集的历史视频,每一所述历史视频包括至少一个历史视频图像;Acquiring historical videos collected by a video capture device, each of the historical videos including at least one historical video image;
    采用人脸检测算法对每一所述历史视频图像进行识别,获取所述历史视频图像中至少一个原始人脸图像;Using a face detection algorithm to recognize each of the historical video images, and obtain at least one original face image in the historical video images;
    基于每一所述原始人脸图像查询系统用户图像库,若存在与所述原始人脸图像相匹配的注册用户图像,则将与所述注册用户图像处于同一所述历史视频图像中的其他原始人脸图像确定为待分析人脸图像;Based on each of the original facial image query system user image database, if there is a registered user image that matches the original facial image, it will be the other original in the same historical video image as the registered user image. The face image is determined to be the face image to be analyzed;
    将包含同一所述注册用户图像和同一所述待分析人脸图像的历史视频图像确定为目标视频图像,统计单位时间内所述目标视频图像对应的共存帧数;Determining historical video images containing the same registered user image and the same face image to be analyzed as the target video image, and counting the number of coexisting frames corresponding to the target video image per unit time;
    采用微表情识别模型对每一所述目标视频图像中的注册用户图像和待分析人脸图像进行识别,统计同时处于积极情绪的积极情绪概率;Using a micro-expression recognition model to recognize the registered user image and the face image to be analyzed in each of the target video images, and count the positive emotion probability of being in a positive emotion at the same time;
    若所述共存帧数大于预设帧数阈值且所述积极情绪概率大于预设概率阈值,则将所述待分析人脸图像确定为所述注册用户图像对应的陪同用户图像,将所述陪同用户图像与所述注册用户图像关联存储在系统用户图像库中。If the number of coexistence frames is greater than the preset number of frames threshold and the positive emotion probability is greater than the preset probability threshold, the face image to be analyzed is determined as the accompanying user image corresponding to the registered user image, and the accompanying user image The user image and the registered user image are associated and stored in the system user image library.
    在所述发送给所述目标视频片段对应的提醒终端之前,所述处理器执行所述计算机可读指令时还实现如下步骤:Before the sending to the reminder terminal corresponding to the target video clip, the processor further implements the following steps when executing the computer-readable instruction:
    从所述目标视频片段中提取目标人脸图像,基于所述目标人脸图像查询系统用户图像库;Extracting a target face image from the target video segment, and querying a system user image database based on the target face image;
    若所述系统用户图像库中存在与所述目标人脸图像相对应的注册用户图像或者陪同用户图像,则将所述注册用户图像对应的注册终端确定为所述目标视频片段对应的提醒终端;If there is a registered user image or an accompanying user image corresponding to the target face image in the system user image library, the registered terminal corresponding to the registered user image is determined as the reminder terminal corresponding to the target video clip;
    若所述系统用户图像库中不存在与所述目标人脸图像相对应的所述注册用户图像或所述陪同用户图像,则将所述视频采集设备对应的管理终端确定为所述目标视频片段对应的提醒终端。If the registered user image or the accompanying user image corresponding to the target face image does not exist in the system user image library, the management terminal corresponding to the video capture device is determined as the target video segment The corresponding reminder terminal.
  15. 一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读存储介质存储有计算机可读指令,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more readable storage media storing computer readable instructions, the computer readable storage medium storing computer readable instructions, characterized in that, when the computer readable instructions are executed by one or more processors, Make the one or more processors execute the following steps:
    获取视频采集设备实时采集的待识别视频;Obtain the to-be-identified video captured by the video capture device in real time;
    采用基于R-C3D的行为检测模型对所述待识别视频进行识别,确定所述待识别视频对应的行为动作是否包括摔倒动作;Recognizing the video to be recognized by using a behavior detection model based on R-C3D, and determining whether the behavior action corresponding to the video to be recognized includes a falling motion;
    若所述待识别视频对应的行为动作包括摔倒动作,则从所述待识别视频中截取所述摔倒动作对应的目标视频片段;If the behavior action corresponding to the video to be recognized includes a falling motion, intercept the target video segment corresponding to the falling motion from the video to be recognized;
    对所述目标视频片段进行严重程度分析,获取所述目标视频片段对应的摔倒严重程度;Analyzing the severity of the target video segment to obtain the severity of the fall corresponding to the target video segment;
    基于所述摔倒严重程度,获取所述目标视频片段对应的医疗建议信息;Obtaining medical advice information corresponding to the target video clip based on the severity of the fall;
    将所述目标视频片段和所述医疗建议信息,发送给所述目标视频片段对应的提醒终 端。The target video segment and the medical advice information are sent to a reminder terminal corresponding to the target video segment.
  16. 如权利要求15所述的可读存储介质,其特征在于,所述采用基于R-C3D的行为检测模型对所述待识别视频进行识别,确定所述待识别视频对应的行为动作是否包括摔倒动作,包括:The readable storage medium according to claim 15, wherein the R-C3D-based behavior detection model is used to recognize the video to be recognized, and to determine whether the behavior action corresponding to the video to be recognized includes falling Actions, including:
    基于片段时长阈值和重叠时长阈值,对所述待识别视频进行交错切割,获取至少两个原始视频片段,每一所述原始视频片段对应一片段时间戳;Based on the segment duration threshold and the overlap duration threshold, interleave cutting the to-be-identified video to obtain at least two original video segments, each of the original video segments corresponding to a segment timestamp;
    依据所述片段时间戳的顺序,依次将至少两个所述原始视频片段输入基于R-C3D的行为检测模型进行识别,确定每一所述原始视频片段是否包括摔倒动作。According to the sequence of the timestamps of the fragments, at least two of the original video fragments are sequentially input into the R-C3D-based behavior detection model for recognition, and it is determined whether each of the original video fragments includes a fall action.
  17. 如权利要求15所述的可读存储介质,其特征在于,所述若所述待识别视频对应的行为动作包括摔倒动作,则从所述待识别视频中截取所述摔倒动作对应的目标视频片段,包括:The readable storage medium according to claim 15, wherein if the behavior action corresponding to the video to be recognized includes a falling motion, the target corresponding to the falling motion is intercepted from the video to be recognized Video clips, including:
    若所述待识别视频对应的行为动作包括摔倒动作,则确定所述摔倒动作对应的待识别图像对应的时间戳为起始时间戳;If the behavior action corresponding to the to-be-recognized video includes a falling action, determining that the time stamp corresponding to the to-be-recognized image corresponding to the falling action is the starting time stamp;
    基于所述起始时间戳和分析时长阈值,确定终止时间戳;Determine an end time stamp based on the start time stamp and the analysis duration threshold;
    从所述待识别视频中截取所述起始时间戳与所述终止时间戳之间的视频片段,确定为所述摔倒动作对应的目标视频片段。The video segment between the start timestamp and the end timestamp is intercepted from the video to be identified, and determined as the target video segment corresponding to the falling action.
  18. 如权利要求15所述的可读存储介质,其特征在于,每一所述目标视频片段包括至少一个待识别图像;15. The readable storage medium of claim 15, wherein each target video segment includes at least one image to be recognized;
    所述对所述目标视频片段进行严重程度分析,获取所述目标视频片段对应的摔倒严重程度,包括:The analyzing the severity of the target video segment to obtain the severity of a fall corresponding to the target video segment includes:
    采用微表情识别模型对所述目标视频片段中每一所述待识别图像进行识别,获取每一所述待识别图像对应的微表情类型;Recognizing each of the to-be-recognized images in the target video segment using a micro-expression recognition model, and acquiring the micro-expression type corresponding to each of the to-be-recognized images;
    若所述微表情类型为预设表情类型,则采用人脸特征点检测算法对所述待识别图像进行检测定位,获取包含人脸嘴部区域的目标嘴部图像,所述目标嘴部图像包括N个人脸特征点及每个所述人脸特征点对应的特征位置;If the micro-expression type is a preset expression type, the face feature point detection algorithm is used to detect and locate the image to be recognized, and a target mouth image including the mouth area of the face is obtained, and the target mouth image includes N facial feature points and feature positions corresponding to each of the facial feature points;
    基于N个所述人脸特征点对应的特征位置,获取嘴部内唇点平均距离,若所述嘴部内唇点平均距离大于预设距离阈值,则将所述目标嘴部图像对应的待识别图像确定为目标识别图像;Based on the feature positions corresponding to the N facial feature points, the average distance of the inner lips of the mouth is obtained. If the average distance of the inner lips of the mouth is greater than the preset distance threshold, the target mouth image corresponding to the image to be recognized Determined as the target recognition image;
    根据所述目标视频片段中所有所述目标识别图像对应的图像数量,获取所述目标视频片段对应的摔倒严重程度。According to the number of images corresponding to all the target recognition images in the target video segment, the fall severity corresponding to the target video segment is acquired.
  19. 如权利要求18所述的可读存储介质,其特征在于,所述根据所述目标视频片段中所有所述目标识别图像对应的图像数量,获取所述目标视频片段对应的摔倒严重程度,包括:The readable storage medium according to claim 18, wherein said obtaining the severity of a fall corresponding to the target video clip according to the number of images corresponding to all the target recognition images in the target video clip comprises :
    将所述目标视频片段划分为至少两个待处理片段,获取每一所述待处理片段对应的预设权重,时间在后的待处理片段的预设权重大于时间在前的待处理片段的预设权重;The target video segment is divided into at least two segments to be processed, and the preset weight corresponding to each segment to be processed is obtained. The preset weight of the segment to be processed after the time is greater than that of the segment to be processed before the time. Set weight
    基于每一所述待处理片段中,所有所述目标识别图像对应的图像数量与所有所述待识别图像对应的图像数量,获取所述待处理片段对应的目标分值;Obtaining the target score corresponding to the segment to be processed based on the number of images corresponding to all the target recognition images and the number of images corresponding to all the images to be recognized in each segment to be processed;
    对至少两个所述待处理片段对应的预设权重和目标分值进行加权处理,获取所述目标视频片段对应的摔倒评分值;Weighting the preset weights and target scores corresponding to at least two of the segments to be processed to obtain the fall scores corresponding to the target video segments;
    基于所述摔倒评分值查询摔倒程度对照表,获取所述目标视频片段对应的摔倒严重程度。Query a fall degree comparison table based on the fall score value, and obtain the fall severity corresponding to the target video segment.
  20. 如权利要求15所述的可读存储介质,其特征在于,在所述获取视频采集设备实时采集的待识别视频之前,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:The readable storage medium according to claim 15, wherein, before the acquisition of the to-be-recognized video captured by the video capture device in real time, when the computer-readable instructions are executed by one or more processors, the One or more processors also perform the following steps:
    获取视频采集设备采集的历史视频,每一所述历史视频包括至少一个历史视频图像;Acquiring historical videos collected by a video capture device, each of the historical videos including at least one historical video image;
    采用人脸检测算法对每一所述历史视频图像进行识别,获取所述历史视频图像中至少一个原始人脸图像;Using a face detection algorithm to recognize each of the historical video images, and obtain at least one original face image in the historical video images;
    基于每一所述原始人脸图像查询系统用户图像库,若存在与所述原始人脸图像相匹配的注册用户图像,则将与所述注册用户图像处于同一所述历史视频图像中的其他原始人脸图像确定为待分析人脸图像;Based on each of the original face images in the query system user image database, if there is a registered user image that matches the original face image, it will be other original images in the same historical video image as the registered user image. The face image is determined to be the face image to be analyzed;
    将包含同一所述注册用户图像和同一所述待分析人脸图像的历史视频图像确定为目标视频图像,统计单位时间内所述目标视频图像对应的共存帧数;Determining historical video images containing the same registered user image and the same face image to be analyzed as the target video image, and counting the number of coexisting frames corresponding to the target video image per unit time;
    采用微表情识别模型对每一所述目标视频图像中的注册用户图像和待分析人脸图像进行识别,统计同时处于积极情绪的积极情绪概率;Using a micro-expression recognition model to recognize the registered user image and the face image to be analyzed in each of the target video images, and count the positive emotion probability of being in a positive emotion at the same time;
    若所述共存帧数大于预设帧数阈值且所述积极情绪概率大于预设概率阈值,则将所述待分析人脸图像确定为所述注册用户图像对应的陪同用户图像,将所述陪同用户图像与所述注册用户图像关联存储在系统用户图像库中。If the number of coexistence frames is greater than the preset number of frames threshold and the positive emotion probability is greater than the preset probability threshold, the face image to be analyzed is determined as the accompanying user image corresponding to the registered user image, and the accompanying user image The user image and the registered user image are associated and stored in the system user image library.
    在所述发送给所述目标视频片段对应的提醒终端之前,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:Before the sending to the reminder terminal corresponding to the target video clip, when the computer-readable instruction is executed by one or more processors, the one or more processors further execute the following steps:
    从所述目标视频片段中提取目标人脸图像,基于所述目标人脸图像查询系统用户图像库;Extracting a target face image from the target video segment, and querying a system user image database based on the target face image;
    若所述系统用户图像库中存在与所述目标人脸图像相对应的注册用户图像或者陪同用户图像,则将所述注册用户图像对应的注册终端确定为所述目标视频片段对应的提醒终端;If there is a registered user image or an accompanying user image corresponding to the target face image in the system user image library, the registered terminal corresponding to the registered user image is determined as the reminder terminal corresponding to the target video clip;
    若所述系统用户图像库中不存在与所述目标人脸图像相对应的所述注册用户图像或所述陪同用户图像,则将所述视频采集设备对应的管理终端确定为所述目标视频片段对应的提醒终端。If the registered user image or the accompanying user image corresponding to the target face image does not exist in the system user image library, the management terminal corresponding to the video capture device is determined as the target video segment The corresponding reminder terminal.
PCT/CN2019/116490 2019-08-19 2019-11-08 Fall-down behavior detection processing method and apparatus, and computer device and storage medium WO2021031384A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910763921.6A CN110647812B (en) 2019-08-19 2019-08-19 Tumble behavior detection processing method and device, computer equipment and storage medium
CN201910763921.6 2019-08-19

Publications (1)

Publication Number Publication Date
WO2021031384A1 true WO2021031384A1 (en) 2021-02-25

Family

ID=68990244

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116490 WO2021031384A1 (en) 2019-08-19 2019-11-08 Fall-down behavior detection processing method and apparatus, and computer device and storage medium

Country Status (2)

Country Link
CN (1) CN110647812B (en)
WO (1) WO2021031384A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505752A (en) * 2021-07-29 2021-10-15 中移(杭州)信息技术有限公司 Fall detection method, device, equipment and computer readable storage medium
CN114743157A (en) * 2022-03-30 2022-07-12 中科融信科技有限公司 Pedestrian monitoring method, device, equipment and medium based on video
CN114972419A (en) * 2022-04-12 2022-08-30 中国电信股份有限公司 Tumble detection method, tumble detection device, tumble detection medium, and electronic device
CN114998834A (en) * 2022-06-06 2022-09-02 杭州中威电子股份有限公司 Medical warning system based on face image and emotion recognition
CN115424353A (en) * 2022-09-07 2022-12-02 杭银消费金融股份有限公司 AI model-based service user feature identification method and system
CN115830489A (en) * 2022-11-03 2023-03-21 南京小网科技有限责任公司 Intelligent dynamic analysis system based on ai identification
CN116994214A (en) * 2023-09-25 2023-11-03 南京华琨信息科技有限公司 Highway road safety evaluation method and system
CN117196449A (en) * 2023-11-08 2023-12-08 讯飞智元信息科技有限公司 Video identification method, system and related device

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111724566A (en) * 2020-05-20 2020-09-29 同济大学 Pedestrian falling detection method and device based on intelligent lamp pole video monitoring system
CN111833568B (en) * 2020-07-08 2021-11-05 首都医科大学附属北京天坛医院 Tumble grading warning device based on piezoelectric signal monitoring and working method thereof
CN112101253A (en) * 2020-09-18 2020-12-18 广东机场白云信息科技有限公司 Civil airport ground guarantee state identification method based on video action identification
CN112633126A (en) * 2020-12-18 2021-04-09 联通物联网有限责任公司 Video processing method and device
CN112866808B (en) * 2020-12-31 2022-09-06 北京市商汤科技开发有限公司 Video processing method and device, electronic equipment and storage medium
CN112949417A (en) * 2021-02-05 2021-06-11 杭州萤石软件有限公司 Tumble behavior identification method, equipment and system
CN112998697B (en) * 2021-02-22 2022-06-14 电子科技大学 Tumble injury degree prediction method and system based on skeleton data and terminal
CN113450538A (en) * 2021-06-28 2021-09-28 杭州电子科技大学 Warning system based on painful expression discernment and fall action detection
CN114494976A (en) * 2022-02-17 2022-05-13 平安科技(深圳)有限公司 Human body tumbling behavior evaluation method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120114629A (en) * 2011-04-07 2012-10-17 연세대학교 산학협력단 Falldown detecting method using the image processing, image processing apparatus for the same
US20190012893A1 (en) * 2017-07-10 2019-01-10 Careview Communications, Inc. Surveillance system and method for predicting patient falls using motion feature patterns
CN109214285A (en) * 2018-08-01 2019-01-15 浙江深眸科技有限公司 Detection method is fallen down based on depth convolutional neural networks and shot and long term memory network
CN110047588A (en) * 2019-03-18 2019-07-23 平安科技(深圳)有限公司 Method of calling, device, computer equipment and storage medium based on micro- expression

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010074786A2 (en) * 2008-12-04 2010-07-01 Total Immersion Software, Inc. System and methods for dynamically injecting expression information into an animated facial mesh
CN108830212B (en) * 2018-06-12 2022-04-22 北京大学深圳研究生院 Video behavior time axis detection method
CN109460749A (en) * 2018-12-18 2019-03-12 深圳壹账通智能科技有限公司 Patient monitoring method, device, computer equipment and storage medium
CN109819325B (en) * 2019-01-11 2021-08-20 平安科技(深圳)有限公司 Hotspot video annotation processing method and device, computer equipment and storage medium
CN109858405A (en) * 2019-01-17 2019-06-07 深圳壹账通智能科技有限公司 Satisfaction evaluation method, apparatus, equipment and storage medium based on micro- expression
CN109886111A (en) * 2019-01-17 2019-06-14 深圳壹账通智能科技有限公司 Match monitoring method, device, computer equipment and storage medium based on micro- expression

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120114629A (en) * 2011-04-07 2012-10-17 연세대학교 산학협력단 Falldown detecting method using the image processing, image processing apparatus for the same
US20190012893A1 (en) * 2017-07-10 2019-01-10 Careview Communications, Inc. Surveillance system and method for predicting patient falls using motion feature patterns
CN109214285A (en) * 2018-08-01 2019-01-15 浙江深眸科技有限公司 Detection method is fallen down based on depth convolutional neural networks and shot and long term memory network
CN110047588A (en) * 2019-03-18 2019-07-23 平安科技(深圳)有限公司 Method of calling, device, computer equipment and storage medium based on micro- expression

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HUIJUAN XU; ABIR DAS; KATE SAENKO: "R-C3D: Region Convolutional 3D Network for Temporal Activity Detection", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 22 March 2017 (2017-03-22), 201 Olin Library Cornell University Ithaca, NY 14853, XP080758989, DOI: 10.1109/ICCV.2017.617 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505752A (en) * 2021-07-29 2021-10-15 中移(杭州)信息技术有限公司 Fall detection method, device, equipment and computer readable storage medium
CN113505752B (en) * 2021-07-29 2024-04-23 中移(杭州)信息技术有限公司 Tumble detection method, device, equipment and computer readable storage medium
CN114743157A (en) * 2022-03-30 2022-07-12 中科融信科技有限公司 Pedestrian monitoring method, device, equipment and medium based on video
CN114972419A (en) * 2022-04-12 2022-08-30 中国电信股份有限公司 Tumble detection method, tumble detection device, tumble detection medium, and electronic device
CN114972419B (en) * 2022-04-12 2023-10-03 中国电信股份有限公司 Tumble detection method, tumble detection device, medium and electronic equipment
CN114998834A (en) * 2022-06-06 2022-09-02 杭州中威电子股份有限公司 Medical warning system based on face image and emotion recognition
CN115424353B (en) * 2022-09-07 2023-05-05 杭银消费金融股份有限公司 Service user characteristic identification method and system based on AI model
CN115424353A (en) * 2022-09-07 2022-12-02 杭银消费金融股份有限公司 AI model-based service user feature identification method and system
CN115830489A (en) * 2022-11-03 2023-03-21 南京小网科技有限责任公司 Intelligent dynamic analysis system based on ai identification
CN115830489B (en) * 2022-11-03 2023-10-20 南京小网科技有限责任公司 Intelligent dynamic analysis system based on ai identification
CN116994214A (en) * 2023-09-25 2023-11-03 南京华琨信息科技有限公司 Highway road safety evaluation method and system
CN116994214B (en) * 2023-09-25 2023-12-08 南京华琨信息科技有限公司 Highway road safety evaluation method and system
CN117196449A (en) * 2023-11-08 2023-12-08 讯飞智元信息科技有限公司 Video identification method, system and related device
CN117196449B (en) * 2023-11-08 2024-04-09 讯飞智元信息科技有限公司 Video identification method, system and related device

Also Published As

Publication number Publication date
CN110647812A (en) 2020-01-03
CN110647812B (en) 2023-09-19

Similar Documents

Publication Publication Date Title
WO2021031384A1 (en) Fall-down behavior detection processing method and apparatus, and computer device and storage medium
CN111915842B (en) Abnormity monitoring method and device, computer equipment and storage medium
WO2020024400A1 (en) Class monitoring method and apparatus, computer device, and storage medium
WO2019095571A1 (en) Human-figure emotion analysis method, apparatus, and storage medium
WO2020024395A1 (en) Fatigue driving detection method and apparatus, computer device, and storage medium
EP3243163B1 (en) Method and apparatus for recognition of patient activity
CN111353366A (en) Emotion detection method and device and electronic equipment
US20210345925A1 (en) A data processing system for detecting health risks and causing treatment responsive to the detection
Awais et al. Automated eye blink detection and tracking using template matching
Chowdhury et al. Lip as biometric and beyond: a survey
Sorto et al. Face recognition and temperature data acquisition for COVID-19 patients in Honduras
CN113823376A (en) Intelligent medicine taking reminding method, device, equipment and storage medium
Joshi et al. Context-sensitive prediction of facial expressivity using multimodal hierarchical bayesian neural networks
Singh et al. A reliable and efficient machine learning pipeline for american sign language gesture recognition using EMG sensors
Singh et al. Prediction of pain intensity using multimedia data
Ghose et al. Human activity recognition from smart-phone sensor data using a multi-class ensemble learning in home monitoring
Nahar et al. Twins and Similar Faces Recognition Using Geometric and Photometric Features with Transfer Learning
US11527332B2 (en) Sensor data analyzing machines
US20220101655A1 (en) System and method of facial analysis
US10943693B2 (en) Concise datasets platform
CN113921098A (en) Medical service evaluation method and system
CN112487980A (en) Micro-expression-based treatment method, device, system and computer-readable storage medium
Lubis Machine Learning (Convolutional Neural Networks) for Face Mask Detection in Image and Video
Logronio et al. Age Range Classification Through Facial Recognition Using Keras Model
TWI805485B (en) Image recognition method and electronic apparatus thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19942415

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19942415

Country of ref document: EP

Kind code of ref document: A1