WO2022252642A1 - 基于视频图像的行为姿态检测方法、装置、设备及介质 - Google Patents

基于视频图像的行为姿态检测方法、装置、设备及介质 Download PDF

Info

Publication number
WO2022252642A1
WO2022252642A1 PCT/CN2022/072290 CN2022072290W WO2022252642A1 WO 2022252642 A1 WO2022252642 A1 WO 2022252642A1 CN 2022072290 W CN2022072290 W CN 2022072290W WO 2022252642 A1 WO2022252642 A1 WO 2022252642A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
rectangular area
points
behavior
difference
Prior art date
Application number
PCT/CN2022/072290
Other languages
English (en)
French (fr)
Inventor
吕根鹏
庄伯金
刘玉宇
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022252642A1 publication Critical patent/WO2022252642A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Definitions

  • the present application relates to the technical field of artificial intelligence living body detection, and in particular to a method, device, equipment and medium for behavior and posture detection based on video images.
  • the inventors have found that in the scene of single-person posture detection through the client or terminal, for example, when the self-service teller machine is operating, the client detects whether the cashier has fallen or suddenly fell down, or through the user's mobile terminal In the scene of detecting whether the elderly at home have fallen or suddenly fell down, it often occurs that the area occupied by the person in the surveillance image is very small, resulting in poor recognition performance when performing posture detection on the surveillance image. In order to improve the performance of posture detection, in the present In the existing technology, it is often necessary to locate the position of the person, extract the area of the person from the surveillance image, and then put the extracted image into the pose detection model for detection.
  • a target detection model is necessary to detect the person location and area, resulting in the addition of an additional model that prolongs the duration of the entire attitude detection and increases the operating capacity of the entire attitude detection model, resulting in high performance requirements for the client, which cannot allow mobile terminals with poor performance to achieve Due to the detection level, the performance of the pose detection model is greatly reduced.
  • the present application provides a behavior posture detection method, device, computer equipment and storage medium based on video images, which realizes the recognition of the image of the area to be recognized containing the person through simple framing of joint points and image comparison, thereby automatically recognizing
  • the result of identifying the behavior and posture of the characters in the video clip reduces the operating capacity of the entire posture detection model, reduces the performance requirements for the client, speeds up the efficiency of posture recognition, and improves customer experience satisfaction.
  • a method for detecting behavioral gestures based on video images comprising:
  • the video clip includes an image to be detected and a historical video image;
  • the preset number of frames is greater than two, and the image to be detected is a video frame image of the last frame in the video clip , the historical video image is a video frame image before the image to be detected in the video segment;
  • a device for detecting behavior and posture based on video images comprising:
  • the acquisition module is used to acquire a video clip with a preset number of frames in real time, and the video clip includes an image to be detected and a historical video image; the preset number of frames is greater than two, and the image to be detected is the last one in the video clip A video frame image of a frame, the historical video image is a video frame image before the image to be detected in the video segment;
  • the framing module is used to use the minimum rectangular frame boundary method to frame the joint points of all the historical video images to obtain a predicted rectangular area;
  • the comparison module is used to use the image pixel difference algorithm to compare the image to be detected and the historical video image of the previous frame of the image to be detected to obtain a difference rectangular area;
  • a determining module configured to use the maximum boundary method to determine the image of the area to be identified according to the predicted rectangular area and the difference rectangular area;
  • the output module is used to perform gesture feature extraction on the image of the area to be recognized through the behavior gesture detection model, and output a behavior result according to the extracted gesture feature; the behavior result characterizes the behavior gesture in the video clip.
  • a computer device comprising a memory, a processor, and computer-readable instructions stored in the memory and operable on the processor, and the processor implements the following steps when executing the computer-readable instructions:
  • the video clip includes an image to be detected and a historical video image;
  • the preset number of frames is greater than two, and the image to be detected is a video frame image of the last frame in the video clip , the historical video image is a video frame image before the image to be detected in the video segment;
  • One or more readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:
  • the video clip includes an image to be detected and a historical video image;
  • the preset number of frames is greater than two, and the image to be detected is a video frame image of the last frame in the video clip , the historical video image is a video frame image before the image to be detected in the video segment;
  • the behavior posture detection method, device, computer equipment and storage medium based on video images provided by the present application obtain preset frames in real time and include video clips of images to be detected and historical video images; using the minimum rectangular frame boundary method, for all The historical video image is framed by joint points to obtain a predicted rectangular area; using the image pixel difference algorithm, the image to be detected is compared with the historical video image of the previous frame of the image to be detected to obtain a difference rectangular area; Using the maximum boundary method, according to the predicted rectangular area and the difference rectangular area, determine the image of the area to be recognized; perform gesture feature extraction on the image of the area to be identified through the behavior and posture detection model, and output according to the extracted gesture feature
  • the behavior it is realized that by obtaining video clips with a preset number of frames in real time, using the minimum rectangular frame boundary method to frame the predicted rectangular area, and then using the image pixel difference algorithm to compare the difference rectangular area, and using the maximum boundary method, Determine the image of the area to be recognized that contains the person, and finally, use the behavioral
  • the operating capacity of the model reduces the performance requirements for the client, lowers the operating threshold, improves the compatibility of the attitude detection model, ensures the performance level of the attitude detection, shortens the duration of the entire attitude detection, and speeds up the efficiency of attitude recognition. Achieve the effect of no lag recognition and timely response, thereby improving customer experience satisfaction.
  • Fig. 1 is a schematic diagram of the application environment of the behavior gesture detection method based on video images in an embodiment of the present application;
  • Fig. 2 is the flow chart of the behavior posture detection method based on video image in one embodiment of the present application
  • Fig. 3 is the flow chart of step S20 of the method for detecting behavior and posture based on video images in an embodiment of the present application;
  • FIG. 4 is a flow chart of step S30 of the method for detecting behavior and posture based on video images in an embodiment of the present application;
  • FIG. 5 is a flow chart of step S40 of the method for detecting behavior and posture based on video images in an embodiment of the present application
  • Fig. 6 is a flow chart of step S50 of the behavior posture detection method based on video images in an embodiment of the present application
  • Fig. 7 is a functional block diagram of a behavior posture detection device based on a video image in an embodiment of the present application
  • Fig. 8 is a schematic diagram of computer equipment in an embodiment of the present application.
  • the behavior detection method based on video images provided in this application can be applied in the application environment as shown in Figure 1, where the client (computer device or terminal) communicates with the server through the network.
  • clients computer devices or terminals
  • the server can be implemented by an independent server or a server cluster composed of multiple servers.
  • a method for detecting behavior and posture based on video images is provided, and its technical solution mainly includes the following steps S10-S50:
  • the video clip includes an image to be detected and a historical video image; the preset number of frames is greater than two, and the image to be detected is the video of the last frame in the video clip A frame image, the historical video image is a video frame image before the image to be detected in the video segment.
  • the user collects real-time video through the camera of the client or the terminal, and obtains the video collected in real time, so as to obtain video clips with the preset number of frames, and the preset number of frames is a preset number of continuous frames
  • the total number of frames, and the preset number of frames is greater than two, for example, the preset number of frames is 2 frames, 5 frames, 10 frames, 20 frames, etc.
  • the behavior gesture of the video clip can be recognized in a rolling manner by acquiring the video clip in real time , that is, the video clips can be continuously refreshed in a rolling manner as time changes, so that rolling recognition can be realized.
  • the video clips include the image to be detected and the historical video image, and the image to be detected is the The video frame image of the last frame in the video clip, the historical video image is the video frame image before the image to be detected in the video clip.
  • the minimum rectangular frame boundary method is a method of using the smallest rectangular frame to frame all the joint points that need to be paid attention to in the historical video images.
  • the articulation point detection model that recognizes the joint point characteristics of the person performs joint point recognition on all the historical video images, that is, recognizes the joint points of the characters in each of the historical video images, and then performs all the identified related nodes
  • the removal of interference points is a process of finally framing all joint points that have undergone interference point removal through the minimum rectangular frame boundary method, and performing prediction and expansion processing to obtain a rectangular area.
  • the predicted rectangular area is an area in which approximate positions of next frame images of all historical video images are predicted, and the predicted rectangular area is a rectangular area relative to the coordinate range of the historical video images.
  • step S20 that is, using the minimum rectangular frame boundary method, all the historical video images are framed by joint points to obtain a predicted rectangular area, including:
  • S201 Perform joint point recognition on all the historical video images through the joint point detection model in the behavior posture detection model, identify the joint points in each of the historical video images, and identify the joints in each of the historical video images point.
  • the behavior posture detection model includes the joint point detection model
  • the joint point detection model is used to detect the model of the character joint points in the input image
  • the joint point detection model is joint point recognition and behavior posture
  • the common model in the detection process can ensure the consistency of the identified joint points.
  • the process of the joint point identification is to extract the joint point features of the historical video image through the joint point detection model, and perform joint point classification according to the extracted joint point features, so as to obtain the historical Articulation points in the video image, and articulation point category, and in described historical video image, mark out articulation point and its corresponding articulation point category, described articulation point category comprises head, right shoulder, right elbow, right hand, Right Hip, Right Knee, Right Foot, Neck, Spine, Left Shoulder, Left Elbow, Left Hand, Left Hip, Left Knee, Left Foot, etc.
  • the joint point feature is a feature related to the connection joint point of the character's limbs, for example: head feature, right shoulder feature, right elbow feature, right hand feature, right hip feature, right knee feature, right foot feature, neck feature Back features, spine features, left shoulder features, left elbow features, left hand features, left hip features, left knee features, left foot features, and so on.
  • the minimum rectangular frame boundary method is a method of using the smallest rectangular frame to frame all joint points that need attention in the historical video images, and establishes a spatial coordinate map with the same size as the historical video images, namely Establish horizontal axis and vertical axis coordinates according to the size of the historical video image, map the joint points in all the historical video images to the spatial coordinate diagram according to the position in the historical video image, according to the spatial coordinate The distribution of all the mapped joint points in the figure, find the aggregation center, and obtain the coordinate map to be processed by removing the interference points far away from the aggregation center, and use the minimum rectangular border method to pass the minimum rectangular area Frame the coordinate points in the coordinate map to be processed.
  • step S202 that is, using the minimum rectangular frame boundary method to define a minimum rectangular area based on all the identified joint points, including:
  • the coordinate points of each of the joint points are acquired, understandably, the coordinate points of each of the joint points are acquired, and the coordinate points reflect the position of the joint point in the historical video image.
  • All the joint points are aggregated in a spatial coordinate map with the same size as the historical video image.
  • a spatial coordinate map with the same size as the historical video image map all the joint points to the spatial coordinate map according to their positions in the historical video image, and use an aggregation method to find a For the aggregation center, the Euclidean distance between the aggregation center and the coordinate points in each of the spatial coordinate diagrams is optimal, so as to obtain the aggregated coordinate points of the aggregation center.
  • the minimum rectangular area is obtained by using the minimum rectangular border method.
  • the minimum and maximum values in the direction of the horizontal axis and the minimum and maximum values in the direction of the vertical axis are obtained, and according to the obtained minimum and maximum values in the direction of the horizontal axis value, and the minimum value and maximum value in the direction of the vertical axis constitute a rectangular area, which is determined as the minimum rectangular area.
  • the present application realizes that by acquiring the coordinate points of each of the joint points; aggregating all the joint points in a space coordinate map with the same size as the historical video image; removing interference points from the space coordinate map , determining the removed spatial coordinate map as the coordinate map to be processed; according to all coordinate points in the coordinate map to be processed, using the minimum rectangular border method to obtain the minimum rectangular area, so that it can be automatically Aggregate all relevant nodes, identify interference points and remove them, and use a minimum rectangular border method to accurately determine the smallest rectangular area.
  • the surrounding area of the minimum rectangular area is expanded according to a preset expansion amount, and the range of the minimum rectangular area is increased, and the preset expansion amount is a statistical output based on the range of movement of the character's behavior collected in history distance, the predictive expansion process is to respectively expand the preset expansion amount around the minimum rectangular area, and predict the possible movement range of the character to the surrounding area.
  • the present application realizes the joint point recognition of all the historical video images through the joint point detection model in the behavior posture detection model, recognizes the joint points in each of the historical video images, and identifies the joint points in each of the historical video images the joint points; use the minimum rectangular frame boundary method, according to all the joint points identified, frame the minimum rectangular area; predict and expand the minimum rectangular area to obtain the predicted rectangular area, so, through the attitude detection model
  • the shutdown point detection model in the system automatically identifies the joint points in all historical video images, uses the minimum rectangular border method to quickly determine the minimum rectangular area, and obtains the predicted rectangular area through the predictive expansion processing method, so that it can quickly locate
  • the activity range of the characters in the historical video image is predicted, and the activity range of the character in the next frame is predicted, and the joint point detection model in the behavior posture detection model is used, which can ensure the consistency of joint point recognition and improve the follow-up
  • the accuracy of behavioral attitude detection, and recognition through shared models can greatly reduce the capacity of the entire attitude detection model, improve the compatibility of the attitude detection model
  • the image pixel difference algorithm is to subtract the pixel values of the pixel points at the same coordinate position in two images of the same size to obtain the pixel difference of the pixel points at the same coordinate position in two images of the same size Value, and the method of taking the absolute difference of the pixel difference value, subtracting the pixel values between the pixel points corresponding to each other between the image to be detected and the historical video image of the previous frame of the image to be detected Then take the absolute value to obtain the absolute difference of the pixel, record the pixel corresponding to the absolute difference greater than the preset threshold as the point to be processed, and filter through the discrete point removal processing method to remove the discrete point
  • the processed points to be processed are recorded as moving points, and all the moving points are framed by a smallest rectangular area, so as to obtain the difference rectangular area.
  • the image to be detected is compared with the historical video image of the previous frame of the image to be detected , to get the difference rectangular area, including:
  • the pixel value corresponding to the pixel point in the image to be detected is recorded as the first pixel value
  • the pixel value corresponding to the pixel point in the comparison image is recorded as the second pixel value
  • the image pixel difference algorithm is to subtract the pixel values of the pixel points at the same coordinate position in two images of the same size to obtain the pixel difference of the pixel points at the same coordinate position in two images of the same size value, and the method of taking the absolute difference of the pixel difference, subtracting the first pixel value and the second pixel value corresponding to the pixel point with the same coordinates, and then taking the absolute value to obtain the pixel value corresponding to the pixel Points correspond to the absolute difference.
  • the pixel points corresponding to the absolute difference greater than the preset threshold value are recorded as points to be processed, through this processing process, the pixel points that actually have relatively large pixel differences can be identified, and then the discrete point
  • the removal processing method is to remove some misjudged pixel points.
  • the process of removing the discrete points is to identify non-aggregated points to be processed according to the distribution of all points to be processed, and remove non-aggregated points to be processed. Therefore, the remaining points to be processed are determined as the moving points.
  • the smallest rectangular area including the coordinates of all the moving points is determined, so as to obtain the difference rectangular area.
  • the present application realizes that by recording the historical video image of the previous frame of the image to be detected as a comparison image; obtaining each first pixel value in the image to be detected and each second pixel value in the comparison image ;Use the image pixel difference algorithm to obtain the absolute difference between the first pixel value and the second pixel value corresponding to the pixel point with the same coordinates; the pixel corresponding to the absolute difference greater than the preset threshold
  • the point is recorded as a point to be processed, and all the points to be processed are removed from the discrete point, and the point to be processed after the discrete point is removed is recorded as a moving point; according to all the moving points, the difference is determined
  • the rectangular area in this way, realizes the use of image pixel difference algorithm and discrete point removal processing, which can automatically identify the difference rectangular area with real difference, reduce the interfering pixels, and improve the accuracy of subsequent attitude detection.
  • the maximum boundary method is to expand the determined rectangular area to the surroundings according to the length of the preset expansion value, and determine the expanded boundary method, through the coordinates of the four intersection points of the predicted rectangular area, and
  • the coordinates of the four intersection points of the difference rectangular area can determine the rectangular area of the coordinates of the eight intersection points, and use the maximum boundary method to perform boundary expansion processing on the rectangular area to obtain an expanded area.
  • the image of the extended area is extracted during the detection, and the extracted image is determined as the image of the area to be processed, and the image of the area to be processed is an image of the real position or area range of the person in the image to be detected that needs to perform gesture detection .
  • step S40 that is, using the maximum boundary method to determine the image of the area to be recognized according to the predicted rectangular area and the difference rectangular area, including:
  • the four-point coordinates are coordinates corresponding to points where two rectangular sides intersect in the rectangular area, so that the four four-point coordinates of the predicted rectangular area and the four four-point coordinates of the difference rectangular area can be obtained.
  • the coordinates of the four points are coordinates corresponding to points where two rectangular sides intersect in the rectangular area, so that the four four-point coordinates of the predicted rectangular area and the four four-point coordinates of the difference rectangular area can be obtained. The coordinates of the four points.
  • S402. Determine the extreme values of the four points according to the coordinates of all the four points.
  • the extreme value identification is performed on the eight acquired four-point coordinates, that is, the maximum value of the horizontal axis, the minimum value of the horizontal axis, the maximum value of the vertical axis and the maximum value of the vertical axis among the eight coordinates of the four points are determined.
  • the minimum value the determined maximum value on the horizontal axis, minimum value on the horizontal axis, maximum value on the vertical axis, and minimum value on the vertical axis are respectively marked as the four four-point extreme values.
  • the maximum boundary method is to expand the determined rectangular area to the surroundings according to the length of the preset expansion value, and determine the method of the expanded boundary.
  • the minimum value and the minimum value of the vertical axis are respectively subtracted from the preset expansion value to obtain the minimum value of the expansion of the horizontal axis and the minimum value of the expansion of the vertical axis, and the maximum value of the horizontal axis and the maximum value of the horizontal axis among the four extreme values are obtained.
  • the maximum value of the vertical axis is respectively increased by the preset expansion value to obtain the maximum expansion value of the horizontal axis and the maximum expansion value of the vertical axis respectively, according to the minimum expansion value of the horizontal axis, the minimum expansion value of the vertical axis, the The maximum value of the axis expansion and the maximum value of the vertical axis expansion can determine a rectangular area, extract the image corresponding to the rectangular area in the image to be detected, and determine the extracted image as the area to be identified image.
  • the preset extended value can be set according to requirements, for example, the preset extended value can be set according to the average distance of the moving direction of the character, or can be set according to the statistical distance of the moving of the character collected in history.
  • the present application realizes by obtaining the four-point coordinates of the prediction rectangular area and the difference rectangular area; determining the extremum of the four points according to all the four-point coordinates; using the maximum boundary method, for all the four points
  • the extreme value is subjected to boundary expansion processing according to the preset expansion value to obtain the image of the area to be identified.
  • the area of the real person position is automatically identified from the image to be detected by the maximum boundary method, and the area to be identified is proposed. image, which improves the accuracy and reliability of gesture recognition for subsequent gesture detection.
  • the prediction center is determined according to the four-point coordinates of the prediction rectangular area
  • the difference center is determined according to the four-point coordinates of the difference rectangular area.
  • the coordinate point corresponding to the intersection point of the diagonals of the predicted rectangular area is determined, and the coordinate point is determined as the predicted center, and according to the difference rectangle
  • the four-point coordinates of the area determine the coordinate point corresponding to the first intersection point of the difference rectangular area object, and determine the coordinate point as the difference center.
  • a center distance between the prediction center and the difference center is obtained.
  • the Euclidean distance between the prediction center and the difference center is calculated, and the Euclidean distance is determined as the center distance.
  • the preset expansion value is determined according to the center distance and the preset frame number.
  • the center distance is divided by the preset number of frames to obtain an average moving distance of characters in the video clip, and the average distance is determined as the preset extended value.
  • the present application realizes that the prediction center is determined according to the four-point coordinates of the prediction rectangular area, and the difference center is determined according to the four-point coordinates of the difference rectangular area; the centers of the prediction center and the difference center are obtained Distance; according to the center distance and the preset number of frames, the preset expansion value is determined, so that the preset expansion value can be scientifically and objectively determined to simulate the moving distance of the person in the image to be detected , so that the image of the region to be recognized can be obtained by subsequent expansion, ensuring that the image of the region to be recognized contains people, and improving the accuracy and reliability of the subsequent pose detection.
  • the behavior and posture detection model is a trained deep learning model
  • the behavior and posture detection model is used to detect the behavior and posture of the characters in the input image
  • the network structure of the behavior and posture detection model can be based on requirements Set
  • the network structure of the posture detection model can be a network structure of DensePos, OpenPose, DeepPose, etc.
  • the behavior posture detection model extracts the posture features in the input image of the area to be recognized, that is, the The image of the area to be recognized is convolved, and the vector with the gesture feature is extracted.
  • the gesture feature is a feature related to the gesture of the character's behavior and action, and the fully connected layer is activated on the extracted vector with the gesture feature.
  • the behavior result of the person in the image to be detected is obtained. Waiting for behavior gestures that need to be paid attention to, when detecting that there is a behavior gesture of concern in the behavior result, take corresponding measures in a timely manner, such as triggering emergency rescue requests, alarms, etc.
  • This application realizes the real-time acquisition of video clips with a preset number of frames and includes images to be detected and historical video images; using the minimum rectangular frame boundary method, all the historical video images are framed by joint points to obtain a predicted rectangular area; using the image A pixel difference algorithm, comparing the image to be detected with the historical video image of the previous frame of the image to be detected to obtain a difference rectangular area; using the maximum boundary method, according to the predicted rectangular area and the difference rectangular area , determine the image of the region to be recognized; perform gesture feature extraction on the image of the region to be recognized through the behavior and posture detection model, and output the behavior result according to the extracted gesture feature, so that real-time acquisition of video clips with preset frames , use the minimum rectangular frame boundary method to frame the predicted rectangular area, then use the image pixel difference algorithm to compare the difference rectangular area, and use the maximum boundary method to determine the image of the area to be recognized that contains the person, and finally, use the behavior posture detection
  • the model automatically recognizes the behavior and posture of the image of the area to be identified, without
  • the threshold improves the compatibility of attitude detection models, ensures the performance level of attitude detection, shortens the duration of the entire attitude detection, speeds up the efficiency of attitude recognition, and achieves the effect of non-stuck recognition and timely response, thereby improving customer experience satisfaction Spend.
  • the gesture feature extraction is performed on the image of the area to be recognized through the behavior gesture detection model, and the behavior result is output according to the extracted gesture features, including :
  • the behavior and posture detection model is a DeepPose deep learning model based on cross-layer parameter sharing.
  • the behavior posture detection model is a DeepPose deep learning model based on cross-layer parameter sharing
  • the DeepPose model is a DNN model based on body joint regression, which estimates posture in a personal overall manner
  • a method of cross-layer parameter sharing is added, so as to train and learn to obtain a behavior posture detection model
  • the size conversion process is a process of converting the size of the image of the area to be recognized into a preset size
  • the The preset size is the size of the input image suitable for the posture detection of the behavior posture detection model
  • the image preprocessing process includes noise filtering processing and edge enhancement processing, and the noise filtering processing is to remove the image from the input image
  • the processing process of image enhancement such as noise in the image, sharpening the pixels of the image, etc.
  • the noise filtering process includes performing Gaussian filtering on the size-converted image of the region to be recognized, and the Gaussian filtering is to use a Gaussian filter to size Each pixel
  • the method of cross-layer parameter sharing is to share the weight parameters of each layer, and set the weight parameters of each layer to be consistent, or the weight parameters of each layer are composed of shared parameters and tolerance parameters.
  • the method of cross-layer parameter sharing greatly compresses the capacity of the weight parameters of each layer, thereby facilitating application to mobile devices.
  • the behavior posture detection model uses the method of cross-layer parameter sharing during the training and learning process. After the training is completed The parameters are the parameters obtained through the method of cross-layer parameter sharing, so that in the process of extracting the pose features of the preprocessed image, the parameters shared by the cross-layer parameters are used for extraction, which greatly reduces the behavior and pose detection.
  • the capacity of the model, as well as the running capacity, make the behavior and posture detection model develop in a lightweight direction, which greatly reduces the performance requirements of the client operation.
  • the extracted vectors with the pose features are activated in the fully connected layer, and then classified by the softmax layer, and the probability corresponding to each pose is identified, which indicates the probability of the pose, thereby completing the pose Finally, the gesture with the highest probability is determined as the behavior result of the person in the image to be detected.
  • the DeepPose behavior and posture detection model based on cross-layer parameter sharing is used to perform size conversion and image preprocessing on the image of the area to be recognized to obtain a preprocessed image; through the behavior and posture detection model, use cross-layer parameter sharing Parameters to extract the pose features in the pre-processed image; perform pose classification on the extracted pose features to obtain the behavior results, thus realizing the method of using size conversion and image preprocessing, and cross-layer
  • the method of parameter sharing automatically detects the poses of people in video clips through the DeepPose-based behavioral pose detection model, which improves the accuracy and reliability of pose detection.
  • a device for detecting behavior and posture based on video images includes an acquisition module 11 , a framing module 12 , a comparison module 13 , a determination module 14 and an output module 15 .
  • the video image-based behavior and posture detection device includes an acquisition module 11 , a framing module 12 , a comparison module 13 , a determination module 14 and an output module 15 .
  • the detailed description of each functional module is as follows:
  • Obtaining module 11 is used for real-time acquisition the video segment of preset frame number, and described video segment comprises image to be detected and historical video image; Described preset frame number is greater than two, and described image to be detected is last in described video segment A video frame image of one frame, the historical video image is a video frame image before the image to be detected in the video segment;
  • the framing module 12 is used to use the minimum rectangular frame boundary method to frame the joint points of all the historical video images to obtain a predicted rectangular area;
  • the comparison module 13 is used to use the image pixel difference algorithm to compare the image to be detected and the historical video image of the previous frame of the image to be detected to obtain a difference rectangular area;
  • the determining module 14 is used to determine the image of the area to be identified according to the predicted rectangular area and the difference rectangular area by using the maximum boundary method;
  • the output module 15 is configured to perform gesture feature extraction on the image of the region to be recognized through the behavior gesture detection model, and output a behavior result according to the extracted gesture feature; the behavior result characterizes the behavior gesture in the video clip.
  • Each module in the above-mentioned video image-based behavior and gesture detection device can be fully or partially realized by software, hardware and combinations thereof.
  • the above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can invoke and execute the corresponding operations of the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a client or a server, and its internal structure may be as shown in FIG. 8 .
  • the computer device includes a processor, memory, network interface and database connected by a system bus. Wherein, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a readable storage medium and an internal memory.
  • the readable storage medium stores an operating system, computer readable instructions and a database.
  • the internal memory provides an environment for the execution of the operating system and computer readable instructions in the readable storage medium.
  • the network interface of the computer device is used to communicate with an external terminal via a network connection. When the computer-readable instructions are executed by the processor, a method for detecting behavior and posture based on video images is realized.
  • the readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
  • a computer device including a memory, a processor, and computer-readable instructions stored on the memory and operable on the processor.
  • the processor executes the computer-readable instructions, the above-mentioned embodiments based on Behavior pose detection method for video images.
  • one or more readable storage media storing computer-readable instructions are provided.
  • the readable storage media provided in this embodiment include non-volatile readable storage media and volatile readable storage media. Medium; computer-readable instructions are stored on the readable storage medium, and when the computer-readable instructions are executed by one or more processors, one or more processors are implemented to implement the behavior detection method based on video images in the above-mentioned embodiments .
  • Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM random access memory
  • RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Chain Synchlink DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Abstract

本申请涉及人工智能的活体检测技术领域,本申请公开了一种基于视频图像的行为姿态检测方法、装置、设备及介质,所述方法包括:通过实时获取预设帧数且包含有待检测图像和历史视频图像的视频片段;运用最小矩形框边界法,对所有历史视频图像进行关节点框定,得到预测矩形区域;运用图像像素差算法,对待检测图像和待检测图像的前一帧历史视频图像进行对比,得到差异矩形区域;运用最大边界法,确定出待识别区域图像;通过行为姿态检测模型进行姿态特征提取,根据提取的姿态特征输出行为结果。因此,本申请实现了通过简单的关节点的框定和图像对比,确定出含有人物的待识别区域图像,自动识别出行为姿态结果,加快了姿态识别的效率。

Description

基于视频图像的行为姿态检测方法、装置、设备及介质
本申请要求于2021年6月1日提交中国专利局、申请号为202110609422.9,发明名称为“基于视频图像的行为姿态检测方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能的活体检测技术领域,尤其涉及一种基于视频图像的行为姿态检测方法、装置、设备及介质。
背景技术
目前,发明人发现在通过客户端或者终端进行单人姿态检测的场景中,例如在自助取款机操作时,通过客户端检测取款人是否摔倒或者突然倒下的场景,或者通过用户的移动终端检测家中老人是否跌倒或者突然倒下的场景,经常会出现人物在监控图像中所占的区域很小,导致对监控图像进行姿态检测时识别的性能很差,为了提高姿态检测的性能,在现有技术中,往往需要定位出人物的位置,将人物的区域从监控图像中提取出来,再将提取的图像放入姿态检测模型中进行检测,如此,就必然需要一个目标检测模型以检测出人物的位置及区域,从而导致额外增加一个模型以至拉长了整个姿态检测的时长,以及增加了整个姿态检测模型的运行容量,造成对客户端的性能要求较高,无法让性能较差的移动终端达到应有的检测水平,大大降低了姿态检测模型的性能。
技术问题
本申请提供一种基于视频图像的行为姿态检测方法、装置、计算机设备及存储介质,实现了通过简单的关节点的框定和图像对比,就可以识别出含有人物的待识别区域图像,从而自动识别出该视频片段中的人物所体现的行为姿态的结果,减少了整个姿态检测模型的运行容量,并且降低了对客户端的性能要求,加快了姿态识别的效率,提升了客户体验满意度。
技术解决方案
一种基于视频图像的行为姿态检测方法,包括:
实时获取预设帧数的视频片段,所述视频片段包括待检测图像和历史视频图像;所述预设帧数大于二,所述待检测图像为所述视频片段中最后一帧的视频帧图像,所述历史视频图像为所述视频片段中所述待检测图像之前的视频帧图像;
运用最小矩形框边界法,对所有所述历史视频图像进行关节点框定,得到预测矩形区域;
运用图像像素差算法,对所述待检测图像和所述待检测图像的前一帧所述历史视频图像进行对比,得到差异矩形区域;
运用最大边界法,根据所述预测矩形区域和所述差异矩形区域,确定出待识别区域图像;
通过行为姿态检测模型对所述待识别区域图像进行姿态特征提取,根据提取的所述姿态特征输出行为结果;所述行为结果表征了所述视频片段中的行为姿态。
一种基于视频图像的行为姿态检测装置,包括:
获取模块,用于实时获取预设帧数的视频片段,所述视频片段包括待检测图像和历史视频图像;所述预设帧数大于二,所述待检测图像为所述视频片段中最后一帧的视频帧图像,所述历史视频图像为所述视频片段中所述待检测图像之前的视频帧图像;
框定模块,用于运用最小矩形框边界法,对所有所述历史视频图像进行关节点框定,得到预测矩形区域;
对比模块,用于运用图像像素差算法,对所述待检测图像和所述待检测图像的前一帧所述历史视频图像进行对比,得到差异矩形区域;
确定模块,用于运用最大边界法,根据所述预测矩形区域和所述差异矩形区域,确定出待识别区域图像;
输出模块,用于通过行为姿态检测模型对所述待识别区域图像进行姿态特征提取,根据提取的所述姿态特征输出行为结果;所述行为结果表征了所述视频片段中的行为姿态。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
实时获取预设帧数的视频片段,所述视频片段包括待检测图像和历史视频图像;所述预设帧数大于二,所述待检测图像为所述视频片段中最后一帧的视频帧图像,所述历史视频图像为所述视频片段中所述待检测图像之前的视频帧图像;
运用最小矩形框边界法,对所有所述历史视频图像进行关节点框定,得到预测矩形区域;
运用图像像素差算法,对所述待检测图像和所述待检测图像的前一帧所述历史视频图像进行对比,得到差异矩形区域;
运用最大边界法,根据所述预测矩形区域和所述差异矩形区域,确定出待识别区域图像;
通过行为姿态检测模型对所述待识别区域图像进行姿态特征提取,根据提取的所述姿态特征输出行为结果;所述行为结果表征了所述视频片段中的行为姿态。
一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
实时获取预设帧数的视频片段,所述视频片段包括待检测图像和历史视频图像;所述预设帧数大于二,所述待检测图像为所述视频片段中最后一帧的视频帧图像,所述历史视频图像为所述视频片段中所述待检测图像之前的视频帧图像;
运用最小矩形框边界法,对所有所述历史视频图像进行关节点框定,得到预测矩形区域;
运用图像像素差算法,对所述待检测图像和所述待检测图像的前一帧所述历史视频图像进行对比,得到差异矩形区域;
运用最大边界法,根据所述预测矩形区域和所述差异矩形区域,确定出待识别区域图像;
通过行为姿态检测模型对所述待识别区域图像进行姿态特征提取,根据提取的所述姿态特征输出行为结果;所述行为结果表征了所述视频片段中的行为姿态。
有益效果
本申请提供的基于视频图像的行为姿态检测方法、装置、计算机设备及存储介质,通过实时获取预设帧数且包含有待检测图像和历史视频图像的视频片段;运用最小矩形框边界法,对所有所述历史视频图像进行关节点框定,得到预测矩形区域;运用图像像素差算法,对所述待检测图像和所述待检测图像的前一帧所述历史视频图像进行对比,得到差异矩形区域;运用最大边界法,根据所述预测矩形区域和所述差异矩形区域,确定出待识别区域图像;通过行为姿态检测模型对所述待识别区域图像进行姿态特征提取,根据提取的所述姿态特征输出行为结果,如此,实现了通过实时获取预设帧数的视频片段,运用最小矩形框边界法,框定出预测矩形区域,再运用图像像素差算法,对比出差异矩形区域,并运用最大边界法,确定出包含有人物的待识别区域图像,最后,运用行为姿态检测模型自动识别出该待识别区域图像的行为姿态,无需通过目标检测模型对图像进行目标检测,通过简单的关节点的框定和图像对比,就可以识别出含有人物的待识别区域图像,最后仅仅通过提取该待识别区域图像中的姿态特征,自动识别出该视频片段中的人物所体现的行为姿态的结果,减少了整个姿态检测模型的运行容量,并且降低了对客户端的性能要求,降低了运行门槛,提高了姿态检测模型的兼容性,保证姿态检测的性能水平,缩短了整个姿态检测的时长,加快了姿态识别的效率,达到识别不卡顿和及时响应的效果,从而提升了客户体验满意度。
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例中基于视频图像的行为姿态检测方法的应用环境示意图;
图2是本申请一实施例中基于视频图像的行为姿态检测方法的流程图;
图3是本申请一实施例中基于视频图像的行为姿态检测方法的步骤S20的流程图;
图4是本申请一实施例中基于视频图像的行为姿态检测方法的步骤S30的流程图;
图5是本申请一实施例中基于视频图像的行为姿态检测方法的步骤S40的流程图;
图6是本申请一实施例中基于视频图像的行为姿态检测方法的步骤S50的流程图;
图7是本申请一实施例中基于视频图像的行为姿态检测装置的原理框图;
图8是本申请一实施例中计算机设备的示意图。
 
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请提供的基于视频图像的行为姿态检测方法,可应用在如图1的应用环境中,其中,客户端(计算机设备或终端)通过网络与服务器进行通信。其中,客户端(计算机设备或终端)包括但不限于为各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一实施例中,如图2所示,提供一种基于视频图像的行为姿态检测方法,其技术方案主要包括以下步骤S10-S50:
S10,实时获取预设帧数的视频片段,所述视频片段包括待检测图像和历史视频图像;所述预设帧数大于二,所述待检测图像为所述视频片段中最后一帧的视频帧图像,所述历史视频图像为所述视频片段中所述待检测图像之前的视频帧图像。
可理解地,用户通过客户端或者终端的摄像头采集实时视频,对实时采集的视频进行获取,从而能够获取所述预设帧数的视频片段,所述预设帧数为预设的包含连续帧的总数,且所述预设帧数大于二,比如预设帧数为2帧、5帧、10帧、20帧等等,通过实时获取视频片段可以实现滚动式的识别出视频片段的行为姿态,即视频片段可以随着时间的变化,而不断滚动式的刷新,从而可以实现滚动式的识别,所述视频片段包括所述待检测图像和所述历史视频图像,所述待检测图像为所述视频片段中最后一帧的视频帧图像,所述历史视频图像为所述视频片段中所述待检测图像之前的视频帧图像。
S20,运用最小矩形框边界法,对所有所述历史视频图像进行关节点框定,得到预测矩形区域。
可理解地,所述最小矩形框边界法为运用最小的矩形边框框定出所有所述历史视频图像中的需要关注的关节点的方法,所述关节点框定的过程可以通过行为姿态检测模型中的识别人物的关节点特征的关节点检测模型对所有所述历史视频图像进行关节点的识别,即识别出各个所述历史视频图像中的人物的关节点,再通过将识别到的所有关节点进行干扰点去除,最终通过所述最小矩形框边界法框定所有经过干扰点去除的关节点,并进行预测扩充处理得到矩形区域的过程。
其中,所述预测矩形区域为预测出所有所述历史视频图像的下一帧图像的大概位置的区域,所述预测矩形区域为矩形的相对于所述历史视频图像的坐标范围的区域。
在一实施例中,如图3所示,所述步骤S20中,即所述运用最小矩形框边界法,对所有所述历史视频图像进行关节点框定,得到预测矩形区域,包括:
S201,通过行为姿态检测模型中的关节点检测模型对所有所述历史视频图像进行关节点识别,识别出各所述历史视频图像中的关节点,并标识出各所述历史视频图像中的关节点。
可理解地,所述行为姿态检测模型包括所述关节点检测模型,所述关节点检测模型用于检测输入图像中的人物关节点的模型,所述关节点检测模型为关节点识别和行为姿态检测的过程中共用的模型,如此,能够保证识别出的关节点的一致性,通过对历史视频图像进行关节点识别,能够大概地识别出历史视频图像中的人物的位置,为后续行为姿态检测提供了基础,所述关节点识别的过程为通过所述关节点检测模型对所述历史视频图像进行关节点特征提取,根据提取的所述关节点特征进行关节点分类,从而分类得到所述历史视频图像中的关节点,及关节点类别,并在所述历史视频图像中标注出关节点和与其对应的关节点类别,所述关节点类别包括头部、右肩、右肘、、右手、右臀、右膝、右脚、颈部、脊椎、左肩、左肘、左手、左臀、左膝、左脚等等。
其中,所述关节点特征为与人物肢体的连接关节点相关的特征,例如:头部特征、右肩特征、右肘特征、、右手特征、右臀特征、右膝特征、右脚特征、颈部特征、脊椎特征、左肩特征、左肘特征、左手特征、左臀特征、左膝特征、左脚特征等等。
S202,运用最小矩形边框边界法,根据标识的所有所述关节点,框定出最小矩形区域。
可理解地,所述最小矩形框边界法为运用最小的矩形边框框定出所有所述历史视频图像中的需要关注的关节点的方法,建立与所述历史视频图像尺寸相同的空间坐标图,即按照所述历史视频图像尺寸建立横轴和纵轴坐标,将所有所述历史视频图像中的所述关节点按照在历史视频图像中的位置映射至所述空间坐标图中,根据所述空间坐标图中所有映射完的所述关节点的分布情况,寻找聚合中心,通过去除远离该聚合中心的干扰点,从而得到待处理坐标图,运用所述最小矩形边框边界法,通过所述最小矩形区域框定出所述待处理坐标图中的坐标点。
在一实施例中,所述步骤S202中,即所述运用最小矩形边框边界法,根据标识的所有所述关节点,框定出最小矩形区域,包括:
获取各所述关节点的坐标点,可理解地,获取每个所述关节点的坐标点,所述坐标点体现了该关节点在历史视频图像中的位置。
对所有所述关节点进行聚合,聚合在与所述历史视频图像尺寸相同的空间坐标图中。
可理解地,构建与所述历史视频图像尺寸相同的空间坐标图,将所有所述关节点按照其在历史视频图像中的位置一一映射至所述空间坐标图中,运用聚合方式,寻找一个聚合中心,使该聚合中心离各个所述空间坐标图中的坐标点的欧式距离最佳,从而得到聚合后的聚合中心的坐标点。
对所述空间坐标图进行干扰点去除,将去除后的所述空间坐标图确定为待处理坐标图。
可理解地,以所述聚合中心为圆心,预设半径画圆,只要扫描出映射的坐标点,就按照预设增量增加预设半径,不断扩大圆形,直至扫描不出映射的坐标点,将此时的圆形以外的坐标点记录为干扰点,将所有所述干扰点去除,将去除后的所述空间坐标图记录为所述待处理坐标图。
根据所述待处理坐标图中的所有坐标点,运用所述最小矩形边框边界法,得到所述最小矩形区域。
可理解地,从所述待处理坐标图中所有坐标点中,获取横轴方向的最小值和最大值,以及纵轴方向的最小值和最大值,根据获取的横轴方向的最小值和最大值,以及纵轴方向的最小值和最大值,构成一个矩形区域,将该区域确定为所述最小矩形区域。
本申请实现了通过获取各所述关节点的坐标点;对所有所述关节点进行聚合,聚合在与所述历史视频图像尺寸相同的空间坐标图中;对所述空间坐标图进行干扰点去除,将去除后的所述空间坐标图确定为待处理坐标图;根据所述待处理坐标图中的所有坐标点,运用所述最小矩形边框边界法,得到所述最小矩形区域,如此,能够自动聚合所有关节点,并识别出干扰点进行去除,并运用个最小矩形边框边界法准确地确定出最小矩形区域。
S203,对所述最小矩形区域进行预测扩充处理,得到所述预测矩形区域。
可理解地,按照预设扩充量,对所述最小矩形区域的四周进行扩充,增大所述最小矩形区域的范围,所述预设扩充量为根据历史收集的人物行为移动的幅度统计输出的距离,所述预测扩充处理为对最小矩形区域的四周分别扩充所述预设扩充量,预测出人物向四周可能移动的范围。
本申请实现了通过行为姿态检测模型中的关节点检测模型对所有所述历史视频图像进行关节点识别,识别出各所述历史视频图像中的关节点,并标识出各所述历史视频图像中的关节点;运用最小矩形边框边界法,根据标识的所有所述关节点,框定出最小矩形区域;对所述最小矩形区域进行预测扩充处理,得到所述预测矩形区域,如此,通过姿态检测模型中的关机点检测模型自动识别出所有历史视频图像中的关节点,运用最小矩形边框边界法,快速地确定出最小矩形区域,并通过预测扩充处理方法,得到预测矩形区域,从而能够快速地定位出历史视频图像中人物的活动范围,以及预测出下一帧的图像中人物的活动范围,并使用了行为姿态检测模型中的关节点检测模型,能够保证关节点识别的一致性,提高了后续行为姿态检测的准确率,而且通过共享模型进行识别能够大大减少整个姿态检测模型的容量,提高了姿态检测模型的兼容性,降低了对客户端的性能要求。
S30,运用图像像素差算法,对所述待检测图像和所述待检测图像的前一帧所述历史视频图像进行对比,得到差异矩形区域。
可理解地,所述图像像素差算法为将两个相同尺寸的图像中相同坐标位置的像素点的像素值进行相减,求得两个相同尺寸的图像中相同坐标位置的像素点的像素差值,并取该像素差值的绝对差值的方法,将所述待检测图像和所述待检测图像的前一帧所述历史视频图像进行彼此相对应的像素点之间的像素值相减后取绝对值,得到该像素点的绝对差值,将与大于预设阈值的所述绝对差值对应的像素点记录为待处理点,并通过离散点去除处理方法进行过滤,将离散点去除处理后的所述待处理点记录为移动点,通过一个最小的矩形区域框住所有所述移动点,从而得到所述差异矩形区域。
在一实施例中,如图4所示,所述步骤S30中,即所述运用图像像素差算法,对所述待检测图像和所述待检测图像的前一帧所述历史视频图像进行对比,得到差异矩形区域,包括:
S301,将所述待检测图像的前一帧所述历史视频图像记录为对比图像。
S302,获取所述待检测图像中的各第一像素值和所述对比图像中的各第二像素值。
可理解地,将所述待检测图像中的像素点所对应的像素值记录为所述第一像素值,将所述对比图像中的像素点所对应的像素值记录为所述第二像素值。
S303,运用图像像素差值算法,获得与相同坐标的像素点对应的所述第一像素值和所述第二像素值的绝对差值。
可理解地,所述图像像素差算法为将两个相同尺寸的图像中相同坐标位置的像素点的像素值进行相减,求得两个相同尺寸的图像中相同坐标位置的像素点的像素差值,并取该像素差值的绝对差值的方法,将与相同坐标的像素点所对应的所述第一像素值和所述第二像素值相减,然后取绝对值,得到与该像素点对应的所述绝对差值。
S304,将与大于预设阈值的所述绝对差值对应的像素点记录为待处理点,并对所有所述待处理点进行离散点去除处理,将离散点去除处理后的所述待处理点记录为移动点。
可理解地,将与大于预设阈值的所述绝对差值对应的像素点记录为待处理点,通过该处理过程能够将真正存在较大像素差的像素点标识出来,再通过所述离散点去除处理方法,将一些误判的像素点进行去除,所述离散点去除处理的过程为根据所有待处理点的分布情况,识别出非聚集的待处理点,并非聚集的待处理点进行去除,从而将剩余的待处理点确定为所述移动点。
S305,根据所有所述移动点,确定出所述差异矩形区域。
可理解地,根据所有所述移动点的坐标,确定出包含所有所述移动点的坐标的最小的矩形区域,从而得到所述差异矩形区域。
本申请实现了通过将所述待检测图像的前一帧所述历史视频图像记录为对比图像;获取所述待检测图像中的各第一像素值和所述对比图像中的各第二像素值;运用图像像素差值算法,获得与相同坐标的像素点对应的所述第一像素值和所述第二像素值的绝对差值;将与大于预设阈值的所述绝对差值对应的像素点记录为待处理点,并对所有所述待处理点进行离散点去除处理,将离散点去除处理后的所述待处理点记录为移动点;根据所有所述移动点,确定出所述差异矩形区域,如此,实现了运用图像像素差值算法和离散点去除处理,能够自动识别出真正存在差异的差异矩形区域,减少干扰的像素点,提高了后续姿态检测的准确性。
S40,运用最大边界法,根据所述预测矩形区域和所述差异矩形区域,确定出待识别区域图像。
可理解地,所述最大边界法为对确定的矩形区域按照预设扩充值的长度向四周进行扩充,确定出扩充后的边界的方法,通过所述预测矩形区域的四个交点的坐标,以及所述差异矩形区域的四个交点的坐标,可以确定出框出八个交点的坐标的矩形区域,运用所述最大边界法,对该矩形区域进行边界扩充处理,得到扩充区域,在所述待检测中提取出该扩充区域的图像,提取后的图像确定为所述待处理区域图像,所述待处理区域图像为所述待检测图像中需要进行姿态检测的人物的真实位置或者区域范围的图像。
在一实施例中,如图5所示,所述步骤S40中,即所述运用最大边界法,根据所述预测矩形区域和所述差异矩形区域,确定出待识别区域图像,包括:
S401,获取所述预测矩形区域和所述差异矩形区域的四点坐标。
可理解地,所述四点坐标为矩形区域中两个矩形边相交的点所对应的坐标,从而可以获取所述预测矩形区域的四个所述四点坐标,以及所述差异矩形区域的四个所述四点坐标。
S402,根据所有所述四点坐标,确定出四点极值。
可理解地,对获取的八个所述四点坐标进行极值识别,即确定出八个所述四点坐标中横轴极大值、横轴极小值、纵轴极大值和纵轴极小值,将确定的所述横轴极大值、所述横轴极小值、所述纵轴极大值和所述纵轴极小值分别标记为四个所述四点极值。
S403,运用所述最大边界法,对所有所述四点极值按照预设扩充值进行边界扩充处理,得到所述待识别区域图像。
可理解地,所述最大边界法为对确定的矩形区域按照预设扩充值的长度向四周进行扩充,确定出扩充后的边界的方法,将所述四点极值中的所述横轴极小值和所述纵轴极小值分别减去预设扩充值,分别得到横轴扩充最小值和纵轴扩充最小值,以及将所述四点极值中的所述横轴极大值和所述纵轴极大值分别增加所述预设扩充值,分别得到横轴扩充最大值和纵轴扩充最大值,根据所述横轴扩充最小值、所述纵轴扩充最小值、所述横轴扩充最大值和所述纵轴扩充最大值,可以确定出一个矩形区域,对该矩形区域在所述待检测图像中所对应的图像进行提取,将提取后的图像确定为所述待识别区域图像。
其中,预设扩充值可以根据需求设定,比如预设扩充值可以通过人物移动的方向的平均距离进行设定,也可以根据历史收集的人物移动的统计距离进行设定。
本申请实现了通过获取所述预测矩形区域和所述差异矩形区域的四点坐标;根据所有所述四点坐标,确定出四点极值;运用所述最大边界法,对所有所述四点极值按照预设扩充值进行边界扩充处理,得到所述待识别区域图像,如此,实现了通过最大边界法,自动从待检测图像中识别出真正的人物位置的区域,并提出得到待识别区域图像,为后续的姿态检测提高了姿态识别准确率和可靠性。
在一实施例中,所述对所有所述四点极值按照预设扩充值进行边界扩充处理之前,包括:
根据所述预测矩形区域的四点坐标,确定出预测中心,同时根据所述差异矩形区域的四点坐标,确定出差异中心。
可理解地,根据所述预测矩形区域的四点坐标,确定出所述预测矩形区域对角线的相交点所对应的坐标点,将该坐标点确定为所述预测中心,根据所述差异矩形区域的四点坐标,确定出所述差异矩形区域对象先的相交点所对应的坐标点,将该坐标点确定为所述差异中心。
获得所述预测中心和所述差异中心的中心距离。
可理解地,根据所述预测中心和所述差异中心,计算所述预测中心和所述差异中心之间的欧式距离,将该欧式距离确定为中心距离。
根据所述中心距离和所述预设帧数,确定出所述预设扩充值。
可理解地,将所述中心距离除以所述预设帧数,得到所述视频片段人物移动的平均距离,将该平均距离确定为所述预设扩充值。
本申请实现了通过根据所述预测矩形区域的四点坐标,确定出预测中心,同时根据所述差异矩形区域的四点坐标,确定出差异中心;获得所述预测中心和所述差异中心的中心距离;根据所述中心距离和所述预设帧数,确定出所述预设扩充值,如此,实现了科学地、客观地确定出预设扩充值,模拟待检测图像中的人物的移动距离,以便后续扩充得到待识别区域图像,确保待识别区域图像中包含有人物,提高了后续姿态检测的准确性和可靠性。
S50,通过行为姿态检测模型对所述待识别区域图像进行姿态特征提取,根据提取的所述姿态特征输出行为结果;所述行为结果表征了所述视频片段中的行为姿态。
可理解地,所述行为姿态检测模型为训练完成的深度学习模型,所述行为姿态检测模型用于检测输入图像中的人物的行为姿态的模型,所述行为姿态检测模型的网络结构可以根据需求设定,比如姿态检测模型的网络结构可以为DensePos、OpenPose、DeepPose等等的网络结构,所述行为姿态检测模型通过提取输入的所述待识别区域图像中的所述姿态特征,即对所述待识别区域图像进行卷积,提取出具有所述姿态特征的向量,所述姿态特征为与人物行为动作的姿势相关的特征,通过对提取出的具有所述姿态特征的向量进行全连接层激活,再经softmax层的分类,得到所述待检测图像中的人物的行为结果,该行为结果体现了所述视频片段中的行为姿态的行为结果,所述行为结果包括人体姿态为摔倒、跳跃等需要关注的行为姿态,在检测到所述行为结果存在关注的行为姿态时,及时做出相应的措施,例如:触发应急救助请求、报警等。
本申请实现了通过实时获取预设帧数且包含有待检测图像和历史视频图像的视频片段;运用最小矩形框边界法,对所有所述历史视频图像进行关节点框定,得到预测矩形区域;运用图像像素差算法,对所述待检测图像和所述待检测图像的前一帧所述历史视频图像进行对比,得到差异矩形区域;运用最大边界法,根据所述预测矩形区域和所述差异矩形区域,确定出待识别区域图像;通过行为姿态检测模型对所述待识别区域图像进行姿态特征提取,根据提取的所述姿态特征输出行为结果,如此,实现了通过实时获取预设帧数的视频片段,运用最小矩形框边界法,框定出预测矩形区域,再运用图像像素差算法,对比出差异矩形区域,并运用最大边界法,确定出包含有人物的待识别区域图像,最后,运用行为姿态检测模型自动识别出该待识别区域图像的行为姿态,无需通过目标检测模型对图像进行目标检测,通过简单的关节点的框定和图像对比,就可以识别出含有人物的待识别区域图像,最后仅仅通过提取该待识别区域图像中的姿态特征,自动识别出该视频片段中的人物所体现的行为姿态的结果,减少了整个姿态检测模型的运行容量,并且降低了对客户端的性能要求,降低了运行门槛,提高了姿态检测模型的兼容性,保证姿态检测的性能水平,缩短了整个姿态检测的时长,加快了姿态识别的效率,达到识别不卡顿和及时响应的效果,从而提升了客户体验满意度。
在一实施例中,如图6所示,所述步骤S50中,即所述通过行为姿态检测模型对所述待识别区域图像进行姿态特征提取,根据提取的所述姿态特征输出行为结果,包括:
S501,通过所述行为姿态检测模型对所述待识别区域图像进行尺寸转换以及图像预处理,得到预处理图像;所述行为姿态检测模型为基于跨层参数共享的DeepPose的深度学习模型。
可理解地,所述行为姿态检测模型为基于跨层参数共享的DeepPose的深度学习模型,所述DeepPose模型为基于身体关节进行回归的DNN模型,以一种个人整体的方式估计姿态的模型,而且在所述DeepPose模型中增加了跨层参数共享的方法,从而训练学习获得行为姿态检测模型,所述尺寸转换处理为将所述待识别区域图像的尺寸转换成预设尺寸的处理过程,所述预设尺寸为适用于所述行为姿态检测模型进行姿态检测的输入图像的尺寸,所述图像预处理的过程包括噪音过滤处理和边缘增强处理,所述噪音过滤处理为对输入的图像进行去除图像中的噪音、锐化图像的像素等图像增强的处理过程,所述噪音过滤处理包括对尺寸转换后的所述待识别区域图像进行高斯滤波处理,所述高斯滤波处理为运用高斯滤波器对尺寸转换后的所述待识别区域图像中的各个像素进行过滤,作为优选,所述高斯滤波器的高斯核的大小为3×3,标准差为1.4,再对过滤后的图像进行边缘增强处理,所述边缘增强处理为对该图像中的各物件的边缘线进行增强的过程,从而得到所述预处理图像。
S502,通过所述行为姿态检测模型,运用跨层参数共享的参数,提取所述预处理图像中的所述姿态特征。
可理解地,所述跨层参数共享的方法为将每层的权重参数进行共享,将各层的权重参数设置成一致,或者各层的权重参数通过由共享参数和公差参数构成,通过所述跨层参数共享的方法大大压缩了各层的权重参数的容量,从而便于应用至移动设备,所述行为姿态检测模型在训练学习的过程中就运用所述跨层参数共享的方法,训练完成后的参数就是通过跨层参数共享的方法获得的参数,从而在对所述预处理图像进行所述姿态特征的提取过程中,就运用了跨层参数共享的参数进行提取,大大减少了行为姿态检测模型的容量,以及运行容量,令行为姿态检测模型向轻量化的方向发展,大大降低了客户端运行的性能要求。
S503,对提取的所述姿态特征进行姿态分类,得到所述行为结果。
可理解地,对提取出的具有所述姿态特征的向量进行全连接层激活,再经softmax层的分类,并且识别出各个姿态所对应的概率,该概率表明了所属姿态的几率,从而完成姿态的分类,最终,将概率最高的姿态确定为所述待检测图像中的人物的行为结果。
本申请实现了通过基于跨层参数共享的DeepPose的行为姿态检测模型对所述待识别区域图像进行尺寸转换以及图像预处理,得到预处理图像;通过所述行为姿态检测模型,运用跨层参数共享的参数,提取所述预处理图像中的所述姿态特征;对提取的所述姿态特征进行姿态分类,得到所述行为结果,如此,实现了运用尺寸转换和图像预处理的方法,以及跨层参数共享的方法,通过基于DeepPose的行为姿态检测模型自动检测出视频片段中的人物的姿态,提高了姿态检测的准确性和可靠性。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种基于视频图像的行为姿态检测装置,该基于视频图像的行为姿态检测装置与上述实施例中基于视频图像的行为姿态检测方法一一对应。如图7所示,该基于视频图像的行为姿态检测装置包括获取模块11、框定模块12、对比模块13、确定模块14和输出模块15。各功能模块详细说明如下:
获取模块11,用于实时获取预设帧数的视频片段,所述视频片段包括待检测图像和历史视频图像;所述预设帧数大于二,所述待检测图像为所述视频片段中最后一帧的视频帧图像,所述历史视频图像为所述视频片段中所述待检测图像之前的视频帧图像;
框定模块12,用于运用最小矩形框边界法,对所有所述历史视频图像进行关节点框定,得到预测矩形区域;
对比模块13,用于运用图像像素差算法,对所述待检测图像和所述待检测图像的前一帧所述历史视频图像进行对比,得到差异矩形区域;
确定模块14,用于运用最大边界法,根据所述预测矩形区域和所述差异矩形区域,确定出待识别区域图像;
输出模块15,用于通过行为姿态检测模型对所述待识别区域图像进行姿态特征提取,根据提取的所述姿态特征输出行为结果;所述行为结果表征了所述视频片段中的行为姿态。
关于基于视频图像的行为姿态检测装置的具体限定可以参见上文中对于基于视频图像的行为姿态检测方法的限定,在此不再赘述。上述基于视频图像的行为姿态检测装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是客户端或者服务端,其内部结构图可以如图8所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括可读存储介质、内存储器。该可读存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为可读存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种基于视频图像的行为姿态检测方法。本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,处理器执行计算机可读指令时实现上述实施例中基于视频图像的行为姿态检测方法。
在一个实施例中,提供了一个或多个存储有计算机可读指令的可读存储介质,本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质;该可读存储介质上存储有计算机可读指令,该计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现上述实施例中基于视频图像的行为姿态检测方法。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质或易失性可读存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink) DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种基于视频图像的行为姿态检测方法,其中,包括:
    实时获取预设帧数的视频片段,所述视频片段包括待检测图像和历史视频图像;所述预设帧数大于二,所述待检测图像为所述视频片段中最后一帧的视频帧图像,所述历史视频图像为所述视频片段中所述待检测图像之前的视频帧图像;
    运用最小矩形框边界法,对所有所述历史视频图像进行关节点框定,得到预测矩形区域;
    运用图像像素差算法,对所述待检测图像和所述待检测图像的前一帧所述历史视频图像进行对比,得到差异矩形区域;
    运用最大边界法,根据所述预测矩形区域和所述差异矩形区域,确定出待识别区域图像;
    通过行为姿态检测模型对所述待识别区域图像进行姿态特征提取,根据提取的所述姿态特征输出行为结果;所述行为结果表征了所述视频片段中的行为姿态。
  2. 如权利要求1所述的基于视频图像的行为姿态检测方法,其中,所述运用最小矩形框边界法,对所有所述历史视频图像进行关节点框定,得到预测矩形区域,包括:
    通过行为姿态检测模型中的关节点检测模型对所有所述历史视频图像进行关节点识别,识别出各所述历史视频图像中的关节点,并标识出各所述历史视频图像中的关节点;
    运用最小矩形边框边界法,根据标识的所有所述关节点,框定出最小矩形区域;
    对所述最小矩形区域进行预测扩充处理,得到所述预测矩形区域。
  3. 如权利要求1所述的基于视频图像的行为姿态检测方法,其中,所述运用最小矩形边框边界法,根据标识的所有所述关节点,框定出最小矩形区域,包括:
    获取各所述关节点的坐标点;
    对所有所述关节点进行聚合,聚合在与所述历史视频图像尺寸相同的空间坐标图中;
    对所述空间坐标图进行干扰点去除,将去除后的所述空间坐标图确定为待处理坐标图;
    根据所述待处理坐标图中的所有坐标点,运用所述最小矩形边框边界法,得到所述最小矩形区域。
  4. 如权利要求1所述的基于视频图像的行为姿态检测方法,其中,所述运用图像像素差算法,对所述待检测图像和所述待检测图像的前一帧所述历史视频图像进行对比,得到差异矩形区域,包括:
    将所述待检测图像的前一帧所述历史视频图像记录为对比图像;
    获取所述待检测图像中的各第一像素值和所述对比图像中的各第二像素值;
    运用图像像素差值算法,获得与相同坐标的像素点对应的所述第一像素值和所述第二像素值的绝对差值;
    将与大于预设阈值的所述绝对差值对应的像素点记录为待处理点,并对所有所述待处理点进行离散点去除处理,将离散点去除处理后的所述待处理点记录为移动点;
    根据所有所述移动点,确定出所述差异矩形区域。
  5. 如权利要求1所述的基于视频图像的行为姿态检测方法,其中,所述运用最大边界法,根据所述预测矩形区域和所述差异矩形区域,确定出待识别区域图像,包括:
    获取所述预测矩形区域和所述差异矩形区域的四点坐标;
    根据所有所述四点坐标,确定出四点极值;
    运用所述最大边界法,对所有所述四点极值按照预设扩充值进行边界扩充处理,得到所述待识别区域图像。
  6. 如权利要求5所述的基于视频图像的行为姿态检测方法,其中,所述对所有所述四点极值按照预设扩充值进行边界扩充处理之前,包括:
    根据所述预测矩形区域的四点坐标,确定出预测中心,同时根据所述差异矩形区域的四点坐标,确定出差异中心;
    获得所述预测中心和所述差异中心的中心距离;
    根据所述中心距离和所述预设帧数,确定出所述预设扩充值。
  7. 如权利要求1所述的基于视频图像的行为姿态检测方法,其中,所述通过行为姿态检测模型对所述待识别区域图像进行姿态特征提取,根据提取的所述姿态特征输出行为结果,包括:
    通过所述行为姿态检测模型对所述待识别区域图像进行尺寸转换以及图像预处理,得到预处理图像;所述行为姿态检测模型为基于跨层参数共享的DeepPose的深度学习模型;
    通过所述行为姿态检测模型,运用跨层参数共享的参数,提取所述预处理图像中的所述姿态特征;
    对提取的所述姿态特征进行姿态分类,得到所述行为结果。
  8. 一种基于视频图像的行为姿态检测装置,其中,包括:
    获取模块,用于实时获取预设帧数的视频片段,所述视频片段包括待检测图像和历史视频图像;所述预设帧数大于二,所述待检测图像为所述视频片段中最后一帧的视频帧图像,所述历史视频图像为所述视频片段中所述待检测图像之前的视频帧图像;
    框定模块,用于运用最小矩形框边界法,对所有所述历史视频图像进行关节点框定,得到预测矩形区域;
    对比模块,用于运用图像像素差算法,对所述待检测图像和所述待检测图像的前一帧所述历史视频图像进行对比,得到差异矩形区域;
    确定模块,用于运用最大边界法,根据所述预测矩形区域和所述差异矩形区域,确定出待识别区域图像;
    输出模块,用于通过行为姿态检测模型对所述待识别区域图像进行姿态特征提取,根据提取的所述姿态特征输出行为结果;所述行为结果表征了所述视频片段中的行为姿态。
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:实时获取预设帧数的视频片段,所述视频片段包括待检测图像和历史视频图像;所述预设帧数大于二,所述待检测图像为所述视频片段中最后一帧的视频帧图像,所述历史视频图像为所述视频片段中所述待检测图像之前的视频帧图像;
    运用最小矩形框边界法,对所有所述历史视频图像进行关节点框定,得到预测矩形区域;
    运用图像像素差算法,对所述待检测图像和所述待检测图像的前一帧所述历史视频图像进行对比,得到差异矩形区域;
    运用最大边界法,根据所述预测矩形区域和所述差异矩形区域,确定出待识别区域图像;
    通过行为姿态检测模型对所述待识别区域图像进行姿态特征提取,根据提取的所述姿态特征输出行为结果;所述行为结果表征了所述视频片段中的行为姿态。
  10. 如权利要求9所述的计算机设备,其中,所述运用最小矩形框边界法,对所有所述历史视频图像进行关节点框定,得到预测矩形区域,包括:
    通过行为姿态检测模型中的关节点检测模型对所有所述历史视频图像进行关节点识别,识别出各所述历史视频图像中的关节点,并标识出各所述历史视频图像中的关节点;
    运用最小矩形边框边界法,根据标识的所有所述关节点,框定出最小矩形区域;
    对所述最小矩形区域进行预测扩充处理,得到所述预测矩形区域。
  11. 如权利要求9所述的计算机设备,其中,所述运用最小矩形边框边界法,根据标识的所有所述关节点,框定出最小矩形区域,包括:
    获取各所述关节点的坐标点;
    对所有所述关节点进行聚合,聚合在与所述历史视频图像尺寸相同的空间坐标图中;
    对所述空间坐标图进行干扰点去除,将去除后的所述空间坐标图确定为待处理坐标图;
    根据所述待处理坐标图中的所有坐标点,运用所述最小矩形边框边界法,得到所述最小矩形区域。
  12. 如权利要求9所述的计算机设备,其中,所述运用图像像素差算法,对所述待检测图像和所述待检测图像的前一帧所述历史视频图像进行对比,得到差异矩形区域,包括:
    将所述待检测图像的前一帧所述历史视频图像记录为对比图像;
    获取所述待检测图像中的各第一像素值和所述对比图像中的各第二像素值;
    运用图像像素差值算法,获得与相同坐标的像素点对应的所述第一像素值和所述第二像素值的绝对差值;
    将与大于预设阈值的所述绝对差值对应的像素点记录为待处理点,并对所有所述待处理点进行离散点去除处理,将离散点去除处理后的所述待处理点记录为移动点;
    根据所有所述移动点,确定出所述差异矩形区域。
  13. 如权利要求9所述的计算机设备,其中,所述运用最大边界法,根据所述预测矩形区域和所述差异矩形区域,确定出待识别区域图像,包括:
    获取所述预测矩形区域和所述差异矩形区域的四点坐标;
    根据所有所述四点坐标,确定出四点极值;
    运用所述最大边界法,对所有所述四点极值按照预设扩充值进行边界扩充处理,得到所述待识别区域图像。
  14. 如权利要求13所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还实现如下步骤:
    根据所述预测矩形区域的四点坐标,确定出预测中心,同时根据所述差异矩形区域的四点坐标,确定出差异中心;
    获得所述预测中心和所述差异中心的中心距离;
    根据所述中心距离和所述预设帧数,确定出所述预设扩充值。
  15. 一个或多个存储有计算机可读指令的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    实时获取预设帧数的视频片段,所述视频片段包括待检测图像和历史视频图像;所述预设帧数大于二,所述待检测图像为所述视频片段中最后一帧的视频帧图像,所述历史视频图像为所述视频片段中所述待检测图像之前的视频帧图像;
    运用最小矩形框边界法,对所有所述历史视频图像进行关节点框定,得到预测矩形区域;
    运用图像像素差算法,对所述待检测图像和所述待检测图像的前一帧所述历史视频图像进行对比,得到差异矩形区域;
    运用最大边界法,根据所述预测矩形区域和所述差异矩形区域,确定出待识别区域图像;
    通过行为姿态检测模型对所述待识别区域图像进行姿态特征提取,根据提取的所述姿态特征输出行为结果;所述行为结果表征了所述视频片段中的行为姿态。
  16. 如权利要求15所述的可读存储介质,其中,所述运用最小矩形框边界法,对所有所述历史视频图像进行关节点框定,得到预测矩形区域,包括:
    通过行为姿态检测模型中的关节点检测模型对所有所述历史视频图像进行关节点识别,识别出各所述历史视频图像中的关节点,并标识出各所述历史视频图像中的关节点;
    运用最小矩形边框边界法,根据标识的所有所述关节点,框定出最小矩形区域;
    对所述最小矩形区域进行预测扩充处理,得到所述预测矩形区域。
  17. 如权利要求15所述的可读存储介质,其中,所述运用最小矩形边框边界法,根据标识的所有所述关节点,框定出最小矩形区域,包括:
    获取各所述关节点的坐标点;
    对所有所述关节点进行聚合,聚合在与所述历史视频图像尺寸相同的空间坐标图中;
    对所述空间坐标图进行干扰点去除,将去除后的所述空间坐标图确定为待处理坐标图;
    根据所述待处理坐标图中的所有坐标点,运用所述最小矩形边框边界法,得到所述最小矩形区域。
  18. 如权利要求15所述的可读存储介质,其中,所述运用图像像素差算法,对所述待检测图像和所述待检测图像的前一帧所述历史视频图像进行对比,得到差异矩形区域,包括:
    将所述待检测图像的前一帧所述历史视频图像记录为对比图像;
    获取所述待检测图像中的各第一像素值和所述对比图像中的各第二像素值;
    运用图像像素差值算法,获得与相同坐标的像素点对应的所述第一像素值和所述第二像素值的绝对差值;
    将与大于预设阈值的所述绝对差值对应的像素点记录为待处理点,并对所有所述待处理点进行离散点去除处理,将离散点去除处理后的所述待处理点记录为移动点;
    根据所有所述移动点,确定出所述差异矩形区域。
  19. 如权利要求15所述的可读存储介质,其中,所述运用最大边界法,根据所述预测矩形区域和所述差异矩形区域,确定出待识别区域图像,包括:
    获取所述预测矩形区域和所述差异矩形区域的四点坐标;
    根据所有所述四点坐标,确定出四点极值;
    运用所述最大边界法,对所有所述四点极值按照预设扩充值进行边界扩充处理,得到所述待识别区域图像。
  20. 如权利要求19所述的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:
    根据所述预测矩形区域的四点坐标,确定出预测中心,同时根据所述差异矩形区域的四点坐标,确定出差异中心;
    获得所述预测中心和所述差异中心的中心距离;
    根据所述中心距离和所述预设帧数,确定出所述预设扩充值。
     
PCT/CN2022/072290 2021-06-01 2022-01-17 基于视频图像的行为姿态检测方法、装置、设备及介质 WO2022252642A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110609422.9A CN113239874B (zh) 2021-06-01 2021-06-01 基于视频图像的行为姿态检测方法、装置、设备及介质
CN202110609422.9 2021-06-01

Publications (1)

Publication Number Publication Date
WO2022252642A1 true WO2022252642A1 (zh) 2022-12-08

Family

ID=77136291

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/072290 WO2022252642A1 (zh) 2021-06-01 2022-01-17 基于视频图像的行为姿态检测方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN113239874B (zh)
WO (1) WO2022252642A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116311542A (zh) * 2023-05-23 2023-06-23 广州英码信息科技有限公司 兼容拥挤场景和非拥挤场景的人体摔倒检测方法及系统
CN116503958A (zh) * 2023-06-27 2023-07-28 江西师范大学 人体姿态识别方法、系统、存储介质及计算机设备
CN117132798A (zh) * 2023-10-26 2023-11-28 江西省国土资源测绘工程总院有限公司 一种国土空间规划生态系统修复分区识别方法及装置

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239874B (zh) * 2021-06-01 2024-05-03 平安科技(深圳)有限公司 基于视频图像的行为姿态检测方法、装置、设备及介质
CN114972419B (zh) * 2022-04-12 2023-10-03 中国电信股份有限公司 摔倒检测方法、装置、介质与电子设备
CN116168313A (zh) * 2022-12-05 2023-05-26 广州视声智能股份有限公司 一种智能设备的控制方法、装置、存储介质和电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109146915A (zh) * 2018-08-01 2019-01-04 浙江深眸科技有限公司 判断异常活动物体的低计算量运动检测方法
CN110472614A (zh) * 2019-08-22 2019-11-19 四川自由健信息科技有限公司 一种晕倒行为的识别方法
CN111881853A (zh) * 2020-07-31 2020-11-03 中北大学 一种超大桥隧中异常行为识别方法和装置
CN113239874A (zh) * 2021-06-01 2021-08-10 平安科技(深圳)有限公司 基于视频图像的行为姿态检测方法、装置、设备及介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738101B (zh) * 2019-09-04 2023-07-25 平安科技(深圳)有限公司 行为识别方法、装置及计算机可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109146915A (zh) * 2018-08-01 2019-01-04 浙江深眸科技有限公司 判断异常活动物体的低计算量运动检测方法
CN110472614A (zh) * 2019-08-22 2019-11-19 四川自由健信息科技有限公司 一种晕倒行为的识别方法
CN111881853A (zh) * 2020-07-31 2020-11-03 中北大学 一种超大桥隧中异常行为识别方法和装置
CN113239874A (zh) * 2021-06-01 2021-08-10 平安科技(深圳)有限公司 基于视频图像的行为姿态检测方法、装置、设备及介质

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116311542A (zh) * 2023-05-23 2023-06-23 广州英码信息科技有限公司 兼容拥挤场景和非拥挤场景的人体摔倒检测方法及系统
CN116311542B (zh) * 2023-05-23 2023-08-04 广州英码信息科技有限公司 兼容拥挤场景和非拥挤场景的人体摔倒检测方法及系统
CN116503958A (zh) * 2023-06-27 2023-07-28 江西师范大学 人体姿态识别方法、系统、存储介质及计算机设备
CN116503958B (zh) * 2023-06-27 2023-10-03 江西师范大学 人体姿态识别方法、系统、存储介质及计算机设备
CN117132798A (zh) * 2023-10-26 2023-11-28 江西省国土资源测绘工程总院有限公司 一种国土空间规划生态系统修复分区识别方法及装置
CN117132798B (zh) * 2023-10-26 2024-01-26 江西省国土资源测绘工程总院有限公司 一种国土空间规划生态系统修复分区识别方法及装置

Also Published As

Publication number Publication date
CN113239874B (zh) 2024-05-03
CN113239874A (zh) 2021-08-10

Similar Documents

Publication Publication Date Title
WO2022252642A1 (zh) 基于视频图像的行为姿态检测方法、装置、设备及介质
Mahmood et al. Facial expression recognition in image sequences using 1D transform and gabor wavelet transform
TWI677825B (zh) 視頻目標跟蹤方法和裝置以及非易失性電腦可讀儲存介質
US11830230B2 (en) Living body detection method based on facial recognition, and electronic device and storage medium
WO2018188453A1 (zh) 人脸区域的确定方法、存储介质、计算机设备
US11915514B2 (en) Method and apparatus for detecting facial key points, computer device, and storage medium
WO2021017261A1 (zh) 识别模型训练方法、图像识别方法、装置、设备及介质
WO2019033572A1 (zh) 人脸遮挡检测方法、装置及存储介质
WO2017088432A1 (zh) 图像识别方法和装置
WO2019033569A1 (zh) 眼球动作分析方法、装置及存储介质
WO2019237567A1 (zh) 基于卷积神经网络的跌倒检测方法
WO2019033525A1 (zh) Au特征识别方法、装置及存储介质
CN109145717B (zh) 一种在线学习的人脸识别方法
US20130342636A1 (en) Image-Based Real-Time Gesture Recognition
CN111989689A (zh) 用于识别图像内目标的方法和用于执行该方法的移动装置
CN111091075B (zh) 人脸识别方法、装置、电子设备及存储介质
CN112052837A (zh) 基于人工智能的目标检测方法以及装置
CN110287836B (zh) 图像分类方法、装置、计算机设备和存储介质
WO2019033570A1 (zh) 嘴唇动作分析方法、装置及存储介质
CN111368672A (zh) 一种用于遗传病面部识别模型的构建方法及装置
CN111428664B (zh) 一种基于深度学习技术的计算机视觉的实时多人姿态估计方法
WO2021218238A1 (zh) 图像处理方法和图像处理装置
WO2019033568A1 (zh) 嘴唇动作捕捉方法、装置及存储介质
CN110807362A (zh) 一种图像检测方法、装置和计算机可读存储介质
WO2023083030A1 (zh) 一种姿态识别方法及其相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22814695

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE