WO2021097750A1 - 人体姿态的识别方法、装置、存储介质及电子设备 - Google Patents

人体姿态的识别方法、装置、存储介质及电子设备 Download PDF

Info

Publication number
WO2021097750A1
WO2021097750A1 PCT/CN2019/119926 CN2019119926W WO2021097750A1 WO 2021097750 A1 WO2021097750 A1 WO 2021097750A1 CN 2019119926 W CN2019119926 W CN 2019119926W WO 2021097750 A1 WO2021097750 A1 WO 2021097750A1
Authority
WO
WIPO (PCT)
Prior art keywords
human body
block diagram
key point
posture
image
Prior art date
Application number
PCT/CN2019/119926
Other languages
English (en)
French (fr)
Inventor
郭子亮
Original Assignee
深圳市欢太科技有限公司
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市欢太科技有限公司, Oppo广东移动通信有限公司 filed Critical 深圳市欢太科技有限公司
Priority to PCT/CN2019/119926 priority Critical patent/WO2021097750A1/zh
Priority to CN201980100467.4A priority patent/CN114402369A/zh
Publication of WO2021097750A1 publication Critical patent/WO2021097750A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • This application belongs to the field of electronic technology, and in particular relates to a method, device, storage medium, and electronic equipment for recognizing human posture.
  • human gesture recognition that is, recognize the posture of the human body in the video frame.
  • the accuracy and rapidity of human gesture recognition will directly affect the results of the subsequent work of the video analysis system.
  • the embodiments of the present application provide a method, a device, a storage medium, and an electronic device for recognizing a human posture, which can improve the accuracy of recognizing a human posture.
  • an embodiment of the present application provides a method for recognizing a human body posture, including:
  • the first human body image including at least one human body
  • the posture of the human body in each human body block diagram is determined to obtain the posture of the human body in the first human body image.
  • an embodiment of the present application provides a human body posture recognition device, including:
  • An acquiring module configured to acquire a first human body image, the first human body image including at least one human body;
  • the first determining module is configured to determine at least one human body block diagram according to the first human body image, and each human body block diagram contains only one human body;
  • the second determination module is used to determine multiple key point coordinates of the human body in each human body block diagram
  • the third determining module is used to determine the posture of the human body in each human body block diagram according to the preset posture recognition model and the multiple key point coordinates of the human body in each human body block diagram to obtain the human body in the first human body image Stance.
  • an embodiment of the present application provides a storage medium on which a computer program is stored, wherein, when the computer program is executed on a computer, the computer is caused to execute the human body posture recognition method provided in this embodiment.
  • an embodiment of the present application provides an electronic device including a memory and a processor, the memory stores a computer program, and the processor invokes the computer program stored in the memory to execute:
  • the first human body image including at least one human body
  • the posture of the human body in each human body block diagram is determined to obtain the posture of the human body in the first human body image.
  • FIG. 1 is a schematic flowchart of the first method for recognizing a human posture provided by an embodiment of the present application.
  • Fig. 2 is a schematic diagram of a first scenario of a method for recognizing a human posture provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a second flow of a method for recognizing a human body posture provided by an embodiment of the present application.
  • Fig. 4 is a schematic diagram of a second scenario of a method for recognizing a human body posture provided by an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of a human body posture recognition device provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of the first structure of an electronic device provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a second structure of an electronic device provided by an embodiment of the present application.
  • FIG. 1 is a schematic flowchart of the first method for recognizing a human posture according to an embodiment of the present application.
  • the process of the human body posture recognition method may include:
  • the first human body image refers to an image containing a human body.
  • the first human body image may include at least one human body.
  • the format of the first human body image may be jpg, png, bmp, or the like.
  • the electronic device may first extract the human body image from the video.
  • the human body image may be the first human body image.
  • each human body block diagram includes only one human body.
  • the electronic device may input the first human body image into a preset target detection network model to obtain at least one human body block diagram.
  • each human body block diagram contains only one human body.
  • the electronic device can obtain three human body block diagrams according to the first human body image G1, which are a human body block diagram B1, a human body block diagram B2, and a human body block diagram B3.
  • the electronic device can determine multiple key point coordinates of the human body in each human body block diagram.
  • the key points can include: head, neck, chest, elbow, left wrist, right wrist, left knee or right knee, etc.
  • the number of key points can be 14, 17, or 21, etc., and there is no specific limitation here.
  • the key point coordinates include x and y coordinates, that is, each key point coordinate can be represented by a set of (x, y) coordinates.
  • the upper left corner of the human body block diagram can be the origin, and the two sides intersecting at the upper left corner can be the x-axis and the y-axis, respectively, to establish a plane rectangular coordinate system.
  • the key point coordinates of the human body in the human body block diagram can be represented by the coordinates of a certain point in the rectangular coordinate system of the plane.
  • the electronic device can input the multiple key point coordinates of each human body block diagram into a preset gesture recognition model to identify each human body The posture of the human body in the block diagram, thereby obtaining the posture of the human body in the first human body image.
  • the preset gesture recognition model is a trained model.
  • a first human body image is acquired; at least one human body block diagram is determined according to the first human body image; multiple key point coordinates of the human body in each human body block diagram are determined; The posture recognition model and the multiple key point coordinates of the human body in each human body block diagram determine the posture of the human body in each human body block diagram to obtain the posture of the human body in the first human body image.
  • the human body posture recognition method provided by the embodiments of this application can intelligently recognize the human body posture in the human body image by using a preset posture recognition model.
  • the preset posture recognition model is a trained model. Improve the accuracy of human body gesture recognition.
  • FIG. 3 is a schematic diagram of the second flow of the method for recognizing a human posture according to an embodiment of the application.
  • the recognition method of the human body posture may include:
  • An electronic device acquires a first human body image, where the first human body image includes at least one human body.
  • the first human body image refers to an image containing a human body.
  • the first human body image may be a color image or a grayscale image.
  • the first human body image may include at least one human body.
  • the format of the first human body image may be jpg, png, bmp, or the like.
  • the electronic device may first extract the human body image from the video.
  • the human body image may be the first human body image.
  • the first human body image may be G2.
  • the electronic device determines at least one human body block diagram according to the first human body image, and each human body block diagram includes only one human body.
  • the electronic device may input the first human body image into a preset target detection network model to obtain at least one human body block diagram.
  • each human body block diagram contains only one human body.
  • the electronic device can input the first human body image G2 into a preset target detection network model, To get two human body diagrams.
  • One of the human body block diagrams is B4.
  • the electronic device inputs each human body block diagram into a preset key point detection model to obtain multiple heat maps corresponding to each human body block diagram.
  • the electronic device obtains multiple key point coordinates of the human body in each human body block diagram according to the multiple heat maps corresponding to each human body block diagram, where one heat map corresponds to one key point coordinate.
  • the electronic device may train a preset Cascaded Pyramid Network (CPN) model in advance, and use the trained cascaded pyramid network model as the preset key point detection model. After obtaining at least one human body block diagram, the electronic device can input each human body block diagram into a preset key point detection model to obtain multiple heat maps corresponding to each human body block diagram.
  • CPN Cascaded Pyramid Network
  • the electronic device can search for the position of the maximum probability pixel on each heat map corresponding to each human body block diagram, and the maximum probability pixel on each heat map corresponding to each human body block diagram
  • the position of the probability pixel is the key point coordinate of each heat map corresponding to each human body block diagram, so that multiple key point coordinates of the human body in each human body block diagram can be obtained.
  • the key points may include: head, neck, chest, left elbow, right elbow, left wrist, right wrist, left knee or right knee, etc.
  • the number of key points can be 14, 17, or 21, etc., and there is no specific limitation here.
  • the key point coordinates include x and y coordinates, that is, each key point coordinate can be represented by a set of (x, y) coordinates.
  • the electronic device can determine the coordinates of the key points of the human body in the human body block diagram B4. For example, the electronic device can determine the head coordinates, left shoulder coordinates, and left elbow coordinates of the human body in the human body block diagram B4. It should be noted that the positions and numbers of the key points marked in the human body block diagram B4 are merely examples provided in the embodiments of the present application, and are not used to limit the present application.
  • the heat map and the key point coordinates are in a one-to-one correspondence. For example, if there are 17 heat maps, 17 key point coordinates can be correspondingly obtained; if there are 21 heat maps, 21 key point coordinates can be correspondingly obtained.
  • the electronic device determines the posture of the human body in each human body block diagram according to the preset posture recognition model and the multiple key point coordinates of the human body in each human body block diagram to obtain the posture of the human body in the first human body image.
  • the electronic device can input the multiple key point coordinates of each human body block diagram into a preset gesture recognition model to identify each human body The posture of the human body in the block diagram, thereby obtaining the posture of the human body in the first human body image.
  • the preset gesture recognition model is a trained model.
  • the electronic device may input the multiple key point coordinates of the human body in the human body block diagram B4 into a preset gesture recognition model,
  • the posture of the human body in the human body block diagram B4 is recognized.
  • the posture of the human body in the human body block diagram B4 may be "standing with hands on hips".
  • the process 203 may include:
  • the electronic device inputs each human body block diagram into a preset key point detection model to obtain multiple sets of feature maps corresponding to each human body block diagram, wherein each set of feature maps includes multiple feature maps of different sizes;
  • the electronic device performs fusion processing on the feature maps in each group of feature maps corresponding to each human body block diagram to obtain multiple heat maps corresponding to each human body block diagram, where one set of feature maps corresponds to one heat map.
  • the electronic device may input each human body block diagram into a preset key point detection model to obtain multiple sets of feature maps corresponding to each human body block diagram.
  • Each group of feature maps includes multiple feature maps of different sizes.
  • the electronic device can perform fusion processing on multiple feature maps of different sizes in each group of feature maps corresponding to each human body block diagram to fuse information of different receptive fields to obtain multiple heat maps corresponding to each human body block diagram.
  • a set of feature maps corresponds to a heat map.
  • the electronic device may arrange multiple feature maps of different scales in each group of feature maps in descending order. Then, the electronic device determines the feature map arranged in the middle of each group of feature maps as the first feature map. Then, the electronic device can use the first feature map as a standard to perform up-sampling or down-sampling processing on other feature maps in each group of feature maps, so that the size of the other feature maps after the up-sampling or down-sampling processing is the same as that of the other feature maps. The size of the first feature map is the same. Subsequently, the electronic device may perform fusion processing on the first feature map and other feature maps that have undergone up-sampling or down-sampling processing to obtain a heat map corresponding to each human body block diagram.
  • the up-sampling process is to enlarge the size of the feature map
  • the down-sampling process is to reduce the size of the feature map.
  • the feature maps in each group of feature maps that are smaller than the first feature map can be up-sampling
  • the feature maps in each group of feature maps that are larger than the second feature map can be down-sampled.
  • the electronic device can input each human body block diagram into a preset key point detection model, and multiple convolutional layers (such as convolutional layers c2, c3, c4, and convolutional layers) of the model can be detected through the preset key points.
  • the residual block of c5) obtains multiple sets of second feature maps corresponding to each human body block diagram.
  • each set of second feature maps includes multiple second feature maps
  • each convolutional layer corresponds to one of the second feature maps in each set of second feature maps.
  • the depth of the convolution layer c2 is smaller than the depth of the convolution layer c3, the depth of the convolution layer c3 is smaller than the depth of the convolution layer c4, and the depth of the convolution layer c4 is smaller than the depth of the convolution layer c5.
  • the electronic device may connect the multiple second feature maps in each group of second feature maps corresponding to each human body block diagram to different numbers of bottleneck blocks to obtain multiple sets of feature maps corresponding to each human body block diagram.
  • Each group of feature maps includes multiple feature maps of different sizes. Among them, the deeper the depth of the convolutional layer, the greater the number of bottleneck blocks connected to the feature map.
  • the electronic device can upsample the feature maps in each group of feature maps to a unified dimension and then perform fusion processing.
  • the feature maps after the upsampling of the unified dimension are added pixel by pixel to obtain multiple corresponding to each human body block diagram. Heat map.
  • the process 203 may further include:
  • the electronic device performs Gaussian filtering on each heat map corresponding to each human body block diagram to obtain multiple target heat maps corresponding to each human body block diagram;
  • the process 204 may include:
  • the electronic device obtains multiple key point coordinates of the human body in each human body block diagram according to multiple target heat maps corresponding to each human body block diagram, where one target heat map corresponds to one key point coordinate.
  • each of the multiple heat maps corresponding to each human body block diagram obtained by the electronic device has more or less noise
  • the electronic device Gaussian filtering can be performed on each heat map corresponding to each human body block diagram to filter out the noise of each heat map corresponding to each human body block diagram to obtain multiple target heat maps corresponding to each human body block diagram.
  • the electronic device can obtain multiple key point coordinates of the human body in each human body block diagram according to the multiple target heat maps corresponding to each human body block diagram.
  • a target heat map corresponds to a key point coordinate.
  • noise refers to points that interfere with obtaining key points, that is, the presence of noise may lead to inaccurate determination of key points.
  • the accuracy of determining the key point coordinates according to the target heat map is higher than the accuracy of determining the key point coordinates according to the heat map, but the process of obtaining the target heat map also requires certain processor resources, so it can be processed In the case of sufficient processor resources, the key point coordinates are determined according to the target heat map; in the case of insufficient processor resources, the key point coordinates are determined according to the heat map.
  • the process 201 may further include:
  • the electronic device obtains the coordinates of multiple key points corresponding to the human body in each sample human body block diagram
  • the electronic device trains the preset neural network model by using multiple sample human body block diagrams and multiple key point coordinates corresponding to the human body in each sample human body block diagram;
  • the electronic device uses the trained neural network model as a preset key point detection model.
  • the electronic device can obtain multiple sample body block diagrams stored in it from a database or other devices.
  • each sample human body block diagram is marked with multiple key point coordinates.
  • the multiple key point coordinates marked in each sample human body block diagram correspond to the human body in each sample human body block diagram.
  • the electronic device may obtain multiple key point coordinates marked on each sample human body block diagram, that is, multiple key point coordinates corresponding to the human body in each sample human body block diagram.
  • the electronic device can use the multiple sample human body block diagrams and multiple key points corresponding to the human body in each sample human body block diagram.
  • the coordinates train the preset neural network model.
  • the trained neural network model is the preset key point detection model.
  • the electronic device may also use the multiple sample human body block diagrams, multiple key point coordinates corresponding to the human body in each sample human body block diagram, and a preset loss function to train the preset neural network model.
  • the trained neural network model is the preset key point detection model.
  • the loss function is usually used to estimate the degree of inconsistency between the predicted value of the model (such as the key point coordinates predicted by the model) and the true value (such as the actual marked key point coordinates). It is a non-negative real-valued function. In general, the smaller the loss function, the better the robustness of the model. The loss function can be set according to actual needs.
  • the preset neural network model may be a cascaded pyramid network model.
  • the cascaded pyramid network model may include GlobalNet network and RefineNet network.
  • the GlobalNet network can be used for rough training of all key points of the human body.
  • the RefineNet network can refine the key points that are difficult to train reflected by the GlobalNet network.
  • the preset neural network model may include an inception-v4 network or an attention resnet network and a RefineNet network.
  • the inception-v4 network or attention resnet network can be used for rough training of all key points of the human body.
  • the RefineNet network can refine the key points that are difficult to train reflected by the GlobalNet network.
  • the process 201 may further include:
  • the electronic device obtains multiple sets of key point coordinates, where each set of key point coordinates includes multiple key point coordinates;
  • the electronic device obtains the human body posture corresponding to each group of key point coordinates
  • the electronic device uses multiple sets of key point coordinates and the human posture corresponding to each set of key point coordinates to train the preset shallow neural network model;
  • the electronic device uses the trained shallow neural network model as a preset gesture recognition model.
  • the electronic device can obtain multiple sets of key point coordinates and the posture of the human body corresponding to each set of key point coordinates.
  • each group of key point coordinates includes multiple key point coordinates.
  • the electronic device can train the preset shallow neural network model by using multiple sets of key point coordinates and the human posture corresponding to each set of key point coordinates.
  • the trained shallow neural network model can be used as a preset gesture recognition model.
  • the electronic device may also use multiple sets of key point coordinates, a human posture (real human posture) corresponding to each set of key point coordinates, and a preset loss function to train a preset shallow neural network model.
  • the trained shallow neural network model can be used as a preset gesture recognition model. .
  • the loss function is usually used to estimate the degree of inconsistency between the predicted value of the model (such as the human posture predicted by the model) and the true value (such as the real human posture). It is a non-negative real-valued function. In general, the smaller the loss function, the better the robustness of the model. The loss function can be set according to actual needs.
  • the preset shallow neural network model may be a resnet 18 network model.
  • the electronic device can obtain the coordinates of multiple sets of key points, and then determine the coordinates of multiple sets of key points.
  • the key point coordinates in the point coordinates are normalized. For example, the following formula can be used to normalize the key point coordinates:
  • N2 represents the normalized x-coordinate or y-coordinate.
  • N1 represents the x-coordinate or y-coordinate before normalization.
  • N min represents the smallest x-coordinate or y-coordinate among the multiple sets of key point coordinates.
  • N max represents the x-coordinate or y-coordinate with the largest value among the multiple sets of key point coordinates.
  • A is a constant, and the value of A can be 240, 264, 293, 320, 335, 370 and so on.
  • the x-coordinate and y-coordinate of the same key point may be placed in the same position in different channels for training. For example, suppose a set of key points includes 5 key points, and the coordinates of these 5 key points are (x1, y1), (x2, y2), (x3, y3), (x4, y4) and (x5, y5) ), the human body posture corresponding to this group of key points is "standing".
  • the data to be trained in the preset shallow neural network model is (a, b), then [x1, x2, x3, x4, x5] and [y1, y2, y3, y4, y5] can be used as a ,
  • the human body posture "standing" can be regarded as b.
  • the first human body image is a frame of human body image in the video to be classified
  • the method for recognizing human body posture may further include:
  • the electronic device extracts at least one second human body image from the video to be classified
  • the electronic device determines the posture of the human body in each frame of the second human body image
  • the electronic device determines the category of the video to be classified according to the posture of the human body in the first human body image and the posture of the human body in each frame of the second human body image.
  • the first human body image may be a frame of human body image in the video to be classified.
  • the electronic device can decompose the video to be classified into multiple video frames, that is, multiple frames of images. Then, the electronic device can detect whether there is an image containing a human body in the multi-frame image. If there is an image containing a human body in the multi-frame images, the electronic device can select the image containing the human body from the multi-frame images, and determine the image containing the human body except the first image as at least one second frame. Human body image.
  • the electronic device can determine the posture of the human body in each frame of the second human body image.
  • the electronic device may use the human body posture recognition method provided in the embodiment of the present application to determine the posture of the human body in each frame of the second human body image.
  • the electronic device may determine the category of the video to be classified according to the posture of the human body in the first human body image and the posture of the human body in each frame of the second human body image.
  • the electronic device may determine the video to be classified as a dance video.
  • the electronic device determines the category of the video to be classified according to the posture of the human body in the first human body image and the posture of the human body in each frame of the second human body image, which may include:
  • the electronic device determines the category corresponding to the first human body image according to the posture of the human body in the first human body image, and determines the category corresponding to each second human body image according to the posture of the human body in each second human body image to obtain multiple category;
  • Electronic equipment determines the quantity of the same category from multiple categories
  • the electronic device determines the same category with the largest number as the category of the video to be classified.
  • the electronic device can determine the category corresponding to the first human body image according to the posture of the human body in the first human body image .
  • the electronic device may determine the category corresponding to each frame of the second human body image according to the posture of the human body in each frame of the second human body image, so as to obtain multiple categories. For example, when the posture of the human body in the first human body image or a certain second image is a dance action, the electronic device may determine the first human body image or the second human body image as a dance image.
  • the electronic device can determine the number of the same category from the multiple categories, and determine the same category with the largest number as the category of the video to be classified. For example, assuming 10 categories are obtained, there are 5 dancing categories, 3 singing categories, and 2 basketball categories. Then, the electronic device may determine the video to be classified as a dance video.
  • the video to be classified can be a dance video, a singing video, or a basketball video.
  • the category corresponding to at least two images of the same category can be the category of the video. category. For example, suppose that there are 5 dancing images, 3 singing images, and 2 basketball images among the 10 human images included in a video. Then, the video can belong to both dancing video and singing. The video can also belong to the basketball video.
  • the electronic device extracting at least one second human body image from the video to be classified may include:
  • the electronic device decomposes the video to be classified into multiple frames of images
  • the electronic device selects an image of a human body from multiple frames of images
  • the electronic device determines images other than the first human body image in the images of the human body as the second human body image, and obtains at least one frame of the second human body image.
  • the electronic device may decompose the video to be classified into multiple video frames, that is, multiple frames of images. Then, the electronic device can select an image with a human body from the multiple frames of images, and determine an image other than the first human body image among the images with a human body as the second human body image, to obtain at least one frame of the second human body image.
  • the method for recognizing human posture may further include:
  • the electronic device obtains the user portrait of the user
  • the electronic device judges whether to push the video to be classified to the user according to the user portrait and the category of the video to be classified;
  • the electronic device pushes the video to be classified to the user.
  • the electronic device can obtain the user portrait of the user.
  • the user portrait refers to the abstraction of each specific information of the user into tags, and the use of these tags to concretize the user's image, so as to provide users with targeted services.
  • the user portrait of a user can describe which types of articles a user frequently browses, which types of videos the user frequently watches, and which types of items the user frequently buys, etc. Wait. Therefore, after acquiring a user portrait of a certain user, the electronic device can determine which types of videos the user frequently watches. Then, the electronic device can determine whether the category of the video to be classified belongs to one of the categories corresponding to the video frequently watched by the user. If the category of the video to be classified belongs to one of the categories corresponding to the video frequently watched by the user, the electronic device may push the video to be classified to the user for the user to watch.
  • FIG. 5 is a schematic structural diagram of a human body posture recognition device provided by an embodiment of the application.
  • the human body posture recognition device may include: an acquisition module 301, a first determination module 302, a second determination module 303, and a third determination module 304.
  • the acquiring module 301 is configured to acquire a first human body image, and the first human body image includes at least one human body.
  • the first determining module 302 is configured to determine at least one human body block diagram according to the first human body image, and each human body block diagram contains only one human body.
  • the second determining module 303 is used to determine multiple key point coordinates of the human body in each human body block diagram.
  • the third determining module 304 is used to determine the posture of the human body in each human body block diagram according to the preset posture recognition model and the multiple key point coordinates of the human body in each human body block diagram to obtain the image in the first human body image The posture of the human body.
  • the second determination module 303 may include: inputting each human body block diagram into a preset key point detection model to obtain multiple heat maps corresponding to each human body block diagram; A heat map is used to obtain multiple key point coordinates of the human body in each human body block diagram, where one heat map corresponds to one key point coordinate.
  • the second determination module 303 may include: inputting each human body block diagram into a preset key point detection model to obtain multiple sets of feature maps corresponding to each human body block diagram, wherein each set of feature maps includes multiple Feature maps of different sizes; the feature maps in each group of feature maps corresponding to each human body block diagram are fused to obtain multiple heat maps corresponding to each human body block diagram, where one set of feature maps corresponds to one heat map.
  • the second determining module 303 may include: performing Gaussian filtering processing on each heat map corresponding to each human body block diagram to obtain multiple target heat maps corresponding to each human body block diagram; Multiple target heat maps of the human body are obtained, and multiple key point coordinates of the human body in each human body block diagram are obtained, where one target heat map corresponds to one key point coordinate.
  • the acquiring module 301 may include: acquiring multiple sample human body block diagrams; acquiring multiple key point coordinates corresponding to the human body in each sample human body block diagram; using the multiple sample human body block diagrams and each sample human body The multiple key point coordinates corresponding to the human body in the block diagram train the preset neural network model; the trained neural network model is used as the preset key point detection model.
  • the acquiring module 301 may include: acquiring multiple sets of key point coordinates, where each set of key point coordinates includes multiple key point coordinates; acquiring the human posture corresponding to each set of key point coordinates; and using the multiple sets of key point coordinates.
  • the key point coordinates and the human posture corresponding to each group of key point coordinates are trained on the preset shallow neural network model; the trained shallow neural network model is used as the preset gesture recognition model.
  • the third determining module 304 may be used to: extract at least one frame of the second human body image from the video to be classified; determine the posture of the human body in each frame of the second human body image; The posture of the human body in a human body image and the posture of the human body in each frame of the second human body image determine the category of the video to be classified.
  • the third determining module 304 may be used to determine the category corresponding to the first human body image according to the posture of the human body in the first human body image, and to determine the category corresponding to the first human body image in each frame of the second human body image. For the posture of the human body, the category corresponding to each frame of the second human body image is determined to obtain multiple categories; the number of the same categories is determined from the multiple categories; the same category with the largest number is determined as the category of the video to be classified.
  • the third determining module 304 may be used to: decompose the video to be classified into multiple frames of images; select images with human bodies from the multiple frames of images; divide the images with human bodies The images other than the first human body image are determined to be the second human body image, and at least one frame of the second human body image is obtained.
  • the third determining module 304 may be used to: obtain a user portrait of a user; determine whether to push the to-be-categorized video to the user according to the user portrait and the category of the to-be-categorized video; If yes, push the video to be classified to the user.
  • the embodiment of the present application provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed on a computer, the computer is caused to execute the method for recognizing a human body posture provided in this embodiment. Process.
  • An embodiment of the present application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory.
  • the processor is configured to execute the computer program stored in the memory by calling the computer program stored in the memory. The process in the recognition method of human body posture.
  • the above-mentioned electronic device may be a mobile terminal such as a tablet computer or a smart phone.
  • a mobile terminal such as a tablet computer or a smart phone.
  • FIG. 6 is a schematic diagram of the first structure of an electronic device provided by an embodiment of the application.
  • the electronic device 400 may include components such as a memory 401 and a processor 402. Those skilled in the art can understand that the structure of the electronic device shown in FIG. 6 does not constitute a limitation on the electronic device, and may include more or fewer components than those shown in the figure, or a combination of certain components, or different component arrangements.
  • the memory 401 can be used to store application programs and data.
  • the application program stored in the memory 401 contains executable code.
  • Application programs can be composed of various functional modules.
  • the processor 402 executes various functional applications and data processing by running application programs stored in the memory 401.
  • the processor 402 is the control center of the electronic device. It uses various interfaces and lines to connect the various parts of the entire electronic device. It executes the electronic device by running or executing the application program stored in the memory 401 and calling the data stored in the memory 401. The various functions and processing data of the electronic equipment can be used to monitor the electronic equipment as a whole.
  • the processor 402 in the electronic device will load the executable code corresponding to the process of one or more application programs into the memory 401 according to the following instructions, and the processor 401 will run and store the executable code in the memory.
  • the application in 401 so as to realize the process:
  • the first human body image including at least one human body
  • the posture of the human body in each human body block diagram is determined to obtain the posture of the human body in the first human body image.
  • FIG. 7 is a schematic diagram of the second structure of the electronic device provided by the embodiment of the application.
  • the electronic device 400 may include a memory 401, a processor 402, an input unit 403, an output unit 404, a display screen 405 and other components.
  • the memory 401 can be used to store application programs and data.
  • the application program stored in the memory 401 contains executable code.
  • Application programs can be composed of various functional modules.
  • the processor 402 executes various functional applications and data processing by running application programs stored in the storage 401.
  • the processor 402 is the control center of the electronic device. It uses various interfaces and lines to connect the various parts of the entire electronic device. It executes the electronic device by running or executing the application program stored in the memory 401 and calling the data stored in the memory 401. The various functions and processing data of the electronic equipment can be used to monitor the electronic equipment as a whole.
  • the input unit 403 can be used to receive inputted numbers, character information, or user characteristic information (such as fingerprints), and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control.
  • user characteristic information such as fingerprints
  • the output unit 404 may be used to display information input by the user or information provided to the user and various graphical user interfaces of the electronic device. These graphical user interfaces may be composed of graphics, text, icons, videos, and any combination thereof.
  • the output unit may include a display panel.
  • the display screen 405 can be used to display information such as text and pictures.
  • the processor 402 in the electronic device will load the executable code corresponding to the process of one or more application programs into the memory 401 according to the following instructions, and the processor 402 will run and store the executable code in the memory.
  • the application in 401 so as to realize the process:
  • the first human body image including at least one human body
  • the posture of the human body in each human body block diagram is determined to obtain the posture of the human body in the first human body image.
  • the processor 402 when the processor 402 executes the determination of multiple key point coordinates of the human body in each human body block diagram, it may execute: input each human body block diagram into a preset key point detection model to obtain each human body Multiple heat maps corresponding to the block diagram; according to the multiple heat maps corresponding to each human body block diagram, multiple key point coordinates of the human body in each human body block diagram are obtained, where one heat map corresponds to one key point coordinate.
  • the processor 402 when the processor 402 executes the input of each human body block diagram into the preset key point detection model, and obtains multiple heat maps corresponding to each human body block diagram, it may execute: input each human body block diagram into the preset key point detection model.
  • the key point detection model set up multiple sets of feature maps corresponding to each human body block diagram are obtained, where each set of feature maps includes multiple feature maps of different sizes; for each human body block diagram corresponding to each set of feature maps in the feature map Perform fusion processing to obtain multiple heat maps corresponding to each human body block diagram, where a set of feature maps corresponds to one heat map.
  • the processor 402 executes the fusion processing of the feature maps in each group of feature maps corresponding to each human body block diagram, and after obtaining multiple heat maps corresponding to each human body block diagram, it may also perform: Each heat map corresponding to the personal body block diagram is processed by Gaussian filtering to obtain multiple target heat maps corresponding to each human body block diagram; then the processor 402 executes the multiple heat maps corresponding to each human body block diagram to obtain each human body The multiple key point coordinates of the human body in the block diagram.
  • one heat map corresponds to one key point coordinate
  • it can be executed: According to the multiple target heat maps corresponding to each human body block diagram, obtain multiple human body in each human body block diagram Key point coordinates, where one target heat map corresponds to one key point coordinate.
  • the processor 402 may also execute: acquire multiple sample human body block diagrams; acquire multiple key point coordinates corresponding to the human body in each sample human body block diagram; use the Multiple sample human body block diagrams and multiple key point coordinates corresponding to the human body in each sample human body block diagram are trained on the preset neural network model; the trained neural network model is used as the preset key point detection model.
  • the processor 402 may also execute: acquire multiple sets of key point coordinates, where each set of key point coordinates includes multiple key point coordinates; acquire each set of key point coordinates Corresponding human pose; use the multiple sets of key point coordinates and the human pose corresponding to each set of key point coordinates to train the preset shallow neural network model; use the trained shallow neural network model as the preset gesture recognition model.
  • the first human body image is a frame of human body image in the video to be classified
  • the processor 402 may also execute: extract at least one frame of second human body image from the video to be classified; determine each frame The posture of the human body in the second human body image; the category of the video to be classified is determined according to the posture of the human body in the first human body image and the posture of the human body in each frame of the second human body image.
  • the processor 402 when the processor 402 executes the determination of the category of the video to be classified according to the posture of the human body in the first human body image and the posture of the human body in each frame of the second human body image, it may execute: Determine the category corresponding to the first human body image according to the posture of the human body in the first human body image, and determine the category corresponding to each second human body image according to the posture of the human body in each second human body image to obtain Multiple categories; determine the number of the same category from the multiple categories; determine the same category with the largest number as the category of the video to be classified.
  • the processor 402 when the processor 402 executes the extraction of at least one frame of the second human body image from the to-be-classified video, it may execute: decompose the to-be-classified video into multiple frames of images; An image with a human body is selected from the image; images other than the first human body image in the image with a human body are determined as the second human body image, and at least one frame of the second human body image is obtained.
  • the processor 402 may also execute: obtain a user portrait of the user; determine whether to push the video to be classified to the user according to the user portrait and the category of the video to be classified; if so, then Push the video to be classified to the user.
  • the device for recognizing human postures provided in the embodiments of the present application belongs to the same concept as the method for recognizing human postures in the above embodiments, and the device for recognizing human postures can be run on the device for recognizing human postures.
  • the device for recognizing human postures can be run on the device for recognizing human postures.
  • any method provided please refer to the embodiment of the method for recognizing human posture for the specific implementation process, which will not be repeated here.
  • the human body posture recognition method described in the embodiments of this application a person of ordinary skill in the art can understand that all or part of the process of implementing the human body posture recognition method described in the embodiments of this application can be implemented by a computer program.
  • the computer program may be stored in a computer readable storage medium, such as stored in a memory, and executed by at least one processor.
  • the execution process may include the recognition of the human body posture.
  • the storage medium may be a magnetic disk, an optical disc, a read only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), etc.
  • the human body posture recognition device of the embodiment of the present application its functional modules can be integrated in one processing chip, or each module can exist alone physically, or two or more modules can be integrated in one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium, such as a read-only memory, a magnetic disk or an optical disk, etc. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本申请公开了一种人体姿态的识别方法、装置、存储介质及电子设备,该方法包括:获取第一人体图像;根据第一人体图像,确定至少一个人体框图;确定每个人体框图中的人体的多个关键点坐标;根据预设的姿态识别模型和每个人体框图中的人体的多个关键点坐标,得到所述第一人体图像中的人体的姿态。

Description

人体姿态的识别方法、装置、存储介质及电子设备 技术领域
本申请属于电子技术领域,尤其涉及一种人体姿态的识别方法、装置、存储介质及电子设备。
背景技术
随着计算机技术与人工智能的发展和应用,视频分析技术迅速兴起并得到了广泛关注。视频分析中的一个核心就是人体姿态识别。人体姿态识别,即识别视频帧中的人体的姿态。人体姿态识别的准确性和快速性将直接影响视频分析系统后续工作的结果。
发明内容
本申请实施例提供一种人体姿态的识别方法、装置、存储介质及电子设备,可以提高对人体姿态进行识别的准确性。
第一方面,本申请实施例提供一种人体姿态的识别方法,包括:
获取第一人体图像,所述第一人体图像中包括至少一个人体;
根据所述第一人体图像,确定至少一个人体框图,每个人体框图中仅包含一个人体;
确定每个人体框图中的人体的多个关键点坐标;
根据预设的姿态识别模型和每个人体框图中的人体的多个关键点坐标,确定每个人体框图中的人体的姿态,得到所述第一人体图像中的人体的姿态。
第二方面,本申请实施例提供一种人体姿态的识别装置,包括:
获取模块,用于获取第一人体图像,所述第一人体图像中包括至少一个人体;
第一确定模块,用于根据所述第一人体图像,确定至少一个人体框图,每个人体框图中仅包含一个人体;
第二确定模块,用于确定每个人体框图中的人体的多个关键点坐标;
第三确定模块,用于根据预设的姿态识别模型和每个人体框图中的人体的多个关键点坐标,确定每个人体框图中的人体的姿态,得到所述第一人体图像中的人体的姿态。
第三方面,本申请实施例提供一种存储介质,其上存储有计算机程序,其 中,当所述计算机程序在计算机上执行时,使得所述计算机执行本实施例提供的人体姿态的识别方法。
第四方面,本申请实施例提供一种电子设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行:
获取第一人体图像,所述第一人体图像中包括至少一个人体;
根据所述第一人体图像,确定至少一个人体框图,每个人体框图中仅包含一个人体;
确定每个人体框图中的人体的多个关键点坐标;
根据预设的姿态识别模型和每个人体框图中的人体的多个关键点坐标,确定每个人体框图中的人体的姿态,得到所述第一人体图像中的人体的姿态。
附图说明
下面结合附图,通过对本申请的具体实施方式详细描述,将使本申请的技术方案及其有益效果显而易见。
图1是本申请实施例提供的人体姿态的识别方法的第一种流程示意图。
图2是本申请实施例提供的人体姿态的识别方法的第一种场景示意图。
图3是本申请实施例提供的人体姿态的识别方法的第二种流程示意图。
图4是本申请实施例提供的人体姿态的识别方法的第二种场景示意图。
图5是本申请实施例提供的人体姿态的识别装置的结构示意图。
图6是本申请实施例提供的电子设备的第一种结构示意图。
图7是本申请实施例提供的电子设备的第二种结构示意图。
具体实施方式
请参照图示,其中相同的组件符号代表相同的组件,本申请的原理是以实施在一适当的运算环境中来举例说明。以下的说明是基于所例示的本申请具体实施例,其不应被视为限制本申请未在此详述的其它具体实施例。
请参阅图1,图1是本申请实施例提供的人体姿态的识别方法的第一种流程示意图。该人体姿态的识别方法的流程可以包括:
101、获取第一人体图像,该第一人体图像中包括至少一个人体。
其中,第一人体图像指包含人体的图像。该第一人体图像中可包括至少一 个人体。该第一人体图像的格式可以为jpg、png或bmp等。
在本申请实施例中,在需要确定某视频的类别时,电子设备可先从该视频中提取出人体图像。该人体图像可为第一人体图像。
102、根据第一人体图像,确定至少一个人体框图,每个人体框图中仅包含一个人体。
比如,在得到第一人体图像之后,电子设备可将该第一人体图像输入预设的目标检测网络模型中,以得到至少一个人体框图。其中,每个人体框图中仅包含一个人体。
例如,如图2所示,电子设备根据第一人体图像G1可得到3个人体框图,分别为人体框图B1、人体框图B2和人体框图B3。
103、确定每个人体框图中的人体的多个关键点坐标。
比如,在得到至少一个人体框图之后,电子设备可以确定每个人体框图中的人体的多个关键点坐标。其中,关键点可包括:头部、颈部、胸部、肘部、左手腕、右手腕、左膝或右膝等。关键点的数量可以为14、17或21等,此处不做具体限制。关键点坐标包括x坐标和y坐标,也就是说,每个关键点坐标可以用一组(x,y)坐标表示。比如,可以人体框图的左上角为原点,在左上角相交的两条边分别为x轴和y轴,建立一平面直角坐标系。人体框图中的人体的关键点坐标可用该平面直角坐标系中的某一点的坐标表示。
104、根据预设的姿态识别模型和每个人体框图中的人体的多个关键点坐标,确定每个人体框图中的人体的姿态,得到第一人体图像中的人体的姿态。
比如,在确定出每个人体框图中的人体的多个关键点坐标之后,电子设备可将每个人体框图中的多个关键点坐标输入预设的姿态识别模型中,以识别出每个人体框图中的人体的姿态,从而得到第一人体图像中的人体的姿态。其中,该预设的姿态识别模型是一经过训练的模型。
可以理解的是,本申请实施例中,获取第一人体图像;根据所述第一人体图像,确定至少一个人体框图;确定每个人体框图中的人体的多个关键点坐标;根据预设的姿态识别模型和每个人体框图中的人体的多个关键点坐标,确定每个人体框图中的人体的姿态,得到所述第一人体图像中的人体的姿态。由上可知,本申请实施例所提供的人体姿态识别方法,可以利用预设的姿态识别模型智能地识别出人体图像中的人体姿态,该预设的姿态识别模型是一经过训练的 模型,可以提高人体姿态识别的准确性。
请参阅图3,图3为本申请实施例提供的人体姿态的识别方法的第二种流程示意图。该人体姿态的识别方法可以包括:
201、电子设备获取第一人体图像,该第一人体图像中包括至少一个人体。
其中,第一人体图像指包含人体的图像。该第一人体图像可为彩色图像或灰度图像。该第一人体图像中可包括至少一个人体。该第一人体图像的格式可以为jpg、png或bmp等。
在本申请实施例中,在需要确定某视频的类别时,电子设备可先从该视频中提取出人体图像。该人体图像可为第一人体图像。
例如,如图4所示,该第一人体图像可为G2。
202、电子设备根据第一人体图像,确定至少一个人体框图,每个人体框图中仅包含一个人体。
比如,在得到第一人体图像之后,电子设备可将该第一人体图像输入预设的目标检测网络模型中,以得到至少一个人体框图。其中,每个人体框图中仅包含一个人体。
例如,如图4所示,在得到第一人体图像G2(该第一人体图像G2中包括两个人体)之后,电子设备可将该第一人体图像G2输入预设的目标检测网络模型中,以得到两个人体框图。其中一个人体框图为B4。
203、电子设备将每个人体框图输入预设的关键点检测模型中,得到每个人体框图对应的多个热力图。
204、电子设备根据每个人体框图对应的多个热力图,得到每个人体框图中的人体的多个关键点坐标,其中,一个热力图对应一个关键点坐标。
比如,电子设备可预先对预设的级联金字塔网络(Cascaded Pyramid Network,CPN)模型进行训练,将训练好的级联金字塔网络模型作为预设的关键点检测模型。在得到至少一个人体框图之后,电子设备可将每个人体框图输入预设的关键点检测模型中,得到每个人体框图对应的多个热力图。
在得到每个人体框图对应的多个热力图之后,电子设备可在每个人体框图对应的每个热力图上寻找最大概率像素所在的位置,每个人体框图对应的每个热力图上的最大概率像素所在的位置即为每个人体框图对应的每个热力图对应的关键点坐标,从而可得到每个人体框图中的人体的多个关键点坐标。
其中,关键点可包括:头部、颈部、胸部、左肘部、右肘部、左手腕、右手腕、左膝或右膝等。关键点的数量可以为14、17或21等,此处不做具体限制。其中,关键点坐标包括x坐标和y坐标,也就是说,每个关键点坐标可以用一组(x,y)坐标表示。
例如,如图4所示,当得到人体框图B4之后,电子设备可确定人体框图B4中的人体的关键点坐标。比如,电子设备可确定人体框图B4中的人体的头部坐标、左肩坐标和左肘部坐标等。需要说明的是,该人体框图B4中所标示的关键点的位置和数量仅仅只是本申请实施例所提供一种示例,并不用于限制本申请。
可以理解的是,在本申请实施例中,热力图和关键点坐标是一一对应的。例如,若有17个热力图,便能对应得到17个关键点坐标;若有21个热力图,便能对应得到21个关键点坐标。
205、电子设备根据预设的姿态识别模型和每个人体框图中的人体的多个关键点坐标,确定每个人体框图中的人体的姿态,得到第一人体图像中的人体的姿态。
比如,在确定出每个人体框图中的人体的多个关键点坐标之后,电子设备可将每个人体框图中的多个关键点坐标输入预设的姿态识别模型中,以识别出每个人体框图中的人体的姿态,从而得到第一人体图像中的人体的姿态。其中,该预设的姿态识别模型为一经过训练的模型。
例如,如图4所示,在确定出人体框图B4中的人体的多个关键点坐标之后,电子设备可将人体框图B4中的人体的多个关键点坐标输入预设的姿态识别模型中,从而识别出该人体框图B4中的人体的姿态。比如,该人体框图B4中的人体的姿态可为“双手叉腰站立”。
在一些实施例中,流程203,可以包括:
电子设备将每个人体框图输入预设的关键点检测模型中,得到每个人体框图对应的多组特征图,其中,每组特征图包括多个不同尺寸的特征图;
电子设备对每个人体框图对应的每组特征图中的特征图进行融合处理,得到每个人体框图对应的多个热力图,其中,一组特征图对应一个热力图。
比如,当得到至少一个人体框图之后,电子设备可将每个人体框图输入预设的关键点检测模型中,得到每个人体框图对应的多组特征图。其中每组特征 图包括多个不同尺寸的特征图。
然后,电子设备可对每个人体框图对应的每组特征图中的多个不同尺寸的特征图进行融合处理,以融合不同感受野的信息,得到每个人体框图对应的多个热力图。其中,一组特征图对应得到一个热力图。
例如,电子设备可将每组特征图中的多个不同尺度的特征图按照从大到小的顺序排列。然后,电子设备确定出每组特征图中排列在中间的特征图,作为第一特征图。接着,电子设备可以以该第一特征图为标准,对每组特征图中的其他特征图进行上采样或者下采样处理,以使经上采样或者下采样处理后的其他特征图的尺寸与该第一特征图的尺寸相同。随后,电子设备可对该第一特征图、经过上采样或者下采样处理的其他特征图进行融合处理,得到每个人体框图对应的热力图。
可以理解的是,上采样处理即放大特征图的尺寸,下采样处理即缩小特征图的尺寸。在本申请实施例中,可对每组特征图中小于第一特征图的特征图进行上采样处理,并对每组特征图中大于第二特征图的特征图进行下采样处理。
在一些实施例中,电子设备可将每个人体框图输入预设的关键点检测模型中,通过该预设的关键点检测模型的多个卷积层(如卷积层c2、c3、c4和c5)的残差块得到每个人体框图对应的多组第二特征图。其中,每组第二特征图包括多个第二特征图,每个卷积层对应每组第二特征图中的其中一个第二特征图。卷积层c2的深度小于卷积层c3的深度,卷积层c3的深度小于卷积层c4的深度,卷积层c4的深度小于卷积层c5的深度。然后,电子设备可将每个人体框图对应的每组第二特征图中的多个第二特征图连接不同数目的瓶颈块,以得到每个人体框图对应的多组特征图。每组特征图包括多个不同尺寸的特征图。其中,深度越深的卷积层对应的特征图所连接的瓶颈块的数目越多。接着,电子设备可将每组特征图中的特征图进行上采样统一维度后进行融合处理,如对经过上采样统一维度后的特征图进行逐像素相加,得到每个人体框图对应的多个热力图。
在一些实施例中,在流程203之后,还可以包括:
电子设备对每个人体框图对应的每个热力图进行高斯滤波处理,得到每个人体框图对应的多个目标热力图;
流程204可以包括:
电子设备根据每个人体框图对应的多个目标热力图,得到每个人体框图中的人体的多个关键点坐标,其中,一个目标热力图对应一个关键点坐标。
比如,由于电子设备所得到每个人体框图对应的多个热力图中的每个热力图都或多或少存在一些噪点,因此,当得到每个人体框图对应的多个热力图之后,电子设备可对每个人体框图对应的每个热力图进行高斯滤波处理,以滤除每个人体框图对应的每个热力图的噪点,得到每个人体框图对应的多个目标热力图。随后,电子设备可根据每个人体框图对应的多个目标热力图,得到每个人体框图中的人体的多个关键点坐标。其中,一个目标热力图对应一个关键点坐标。
需要说明的是,噪点是指对得到关键点有干扰的点,即有噪点的存在可能导致关键点确定不准确。
可以理解的是,根据目标热力图确定关键点坐标的准确性高于根据热力图确定关键点坐标的准确性,但得到目标热力图的过程也需要消耗一定的处理器资源,因此,可在处理器资源充足的情况下,根据目标热力图确定关键点坐标;在处理器资源不足的情况下,根据热力图确定关键点坐标。
在一些实施例中,在流程201之前,还可以包括:
电子设备获取多个样本人体框图;
电子设备获取每个样本人体框图中的人体对应的多个关键点坐标;
电子设备利用多个样本人体框图和每个样本人体框图中的人体对应的多个关键点坐标对预设的神经网络模型进行训练;
电子设备将训练后的神经网络模型作为预设的关键点检测模型。
比如,电子设备可从数据库或其他设备获取存储于其中的多个样本人体框图。并且,每个样本人体框图均标记有多个关键点坐标。其中,每个样本人体框图标记有的多个关键点坐标对应每个样本人体框图中的人体。在本申请实施例中,电子设备可获取每个样本人体框图标记有的多个关键点坐标,即每个样本人体框图中的人体对应的多个关键点坐标。
在得到多个样本人体框图和每个样本人体框图中的人体对应的多个关键点坐标之后,电子设备可利用该多个样本人体框图和每个样本人体框图中的人体对应的多个关键点坐标对预设的神经网络模型进行训练。训练后的神经网络模型即为预设的关键点检测模型。
在一些实施例中,电子设备还可利用该多个样本人体框图、每个样本人体框图中的人体对应的多个关键点坐标和预设的损失函数对预设的神经网络模型进行训练。训练后的神经网络模型即为预设的关键点检测模型。
需要说明的是,损失函数通常是用来估量模型的预测值(如模型所预测的关键点坐标)与真实值(如实际标记的关键点坐标)的不一致程度。它是一个非负实值函数。一般情况下,损失函数越小,模型的鲁棒性就越好。损失函数可以根据实际需求来设置。
其中,预设的神经网络模型可以为级联金字塔网络模型。该级联金字塔网络模型可包括GlobalNet网络和RefineNet网络。该GlobalNet网络可用于对人体的所有关键点进行粗训练。该RefineNet网络可对该GlobalNet网络反映的难以训练的关键点进行精炼。
在一些实施例中,该预设的神经网络模型可包括inception-v4网络或attention resnet网络和RefineNet网络。该inception-v4网络或attention resnet网络可用于对人体的所有关键点进行粗训练。该RefineNet网络可对该GlobalNet网络反映的难以训练的关键点进行精炼。
在一些实施例中,在流程201之前,还可以包括:
电子设备获取多组关键点坐标,其中,每组关键点坐标包括多个关键点坐标;
电子设备获取每组关键点坐标对应的人体姿态;
电子设备利用多组关键点坐标和每组关键点坐标对应的人体姿态对预设的浅层神经网络模型进行训练;
电子设备将训练后的浅层神经网络模型作为预设的姿态识别模型。
比如,电子设备可获取多组关键点坐标以及每组关键点坐标对应的人体姿态。其中,每组关键点坐标包括多个关键点坐标。
在得到多组关键点坐标以及每组关键点坐标对应的人体姿态,电子设备可利用多组关键点坐标和每组关键点坐标对应的人体姿态对预设的浅层神经网络模型进行训练。训练后的浅层神经网络模型可作为预设的姿态识别模型。
在一些实施例中,电子设备还可利用多组关键点坐标、每组关键点坐标对应的人体姿态(真实人体姿态)和预设的损失函数对预设的浅层神经网络模型进行训练。训练后的浅层神经网络模型可作为预设的姿态识别模型。。
需要说明的是,损失函数通常是用来估量模型的预测值(如模型所预测的人体姿态)与真实值(如真实人体姿态)的不一致程度。它是一个非负实值函数。一般情况下,损失函数越小,模型的鲁棒性就越好。损失函数可以根据实际需求来设置。
其中,该预设的浅层神经网络模型可以为resnet 18网络模型。
在一些实施例中,由于相同姿态的两个人在图片的不同位置,其坐标表现是非常不同的,为了对这一变量进行控制,电子设备可在获取多组关键点坐标之后,对多组关键点坐标中的关键点坐标进行归一化。例如,可采用下述公式对关键点坐标进行归一化:
Figure PCTCN2019119926-appb-000001
在该公式中,N2表示归一化后的x坐标或y坐标。N1表示归一化前的x坐标或y坐标。N min表示多组关键点坐标中值最小的x坐标或y坐标。N max表示多组关键点坐标中值最大的x坐标或y坐标。A为一常数,A的取值可以为240、264、293、320、335、370等等。
在另一些实施例中,为了体现同一关键点的x坐标和y坐标的关联性,可将同一关键点的x坐标和y坐标放在不同通道的同一位置进行训练。例如,假设一组关键点包括5个关键点,这5个关键点的坐标分别为(x1,y1)、(x2,y2),(x3,y3),(x4,y4)和(x5,y5),这组关键点对应的人体姿态为“站立”。需输入预设的浅层神经网络模型中的待训练数据为(a,b),那么,[x1,x2,x3,x4,x5]和[y1,y2,y3,y4,y5]可作为a,人体姿态“站立”可作为b。
在一些实施例中,第一人体图像为待分类视频中的一帧人体图像,该人体姿态的识别方法还可以包括:
电子设备从待分类视频中提取出至少一帧第二人体图像;
电子设备确定每帧第二人体图像中的人体的姿态;
电子设备根据第一人体图像中的人体的姿态和每帧第二人体图像中的人体的姿态,确定待分类视频的类别。
可以理解的是,第一人体图像可为待分类视频中的一帧人体图像。该待分类视频中还可能存在其他人体图像,即第二人体图像。比如,电子设备可将该 待分类视频分解为多个视频帧,即多帧图像。然后,电子设备可检测该多帧图像中是否存在包含人体的图像。若该多帧图像中存在包含人体的图像,电子设备可从该多帧图像中选取出包含人体的图像,并将包含人体的图像中除第一图像之外的图像确定为至少一帧第二人体图像。
当得到至少一帧第二人体图像之后,电子设备可确定每帧第二人体图像中的人体的姿态。比如,电子设备可采用本申请实施例所提供的人体姿态的识别方法来确定每帧第二人体图像中的人体的姿态。
当确定出每帧第二人体图像中的人体的姿态之后,电子设备可根据第一人体图像中的人体的姿态和每帧第二人体图像中的人体的姿态,确定待分类视频的类别。
例如,假设第一人体图像和至少一帧人体图像中,存在大部分人体图像中的人体的姿态为舞蹈动作,那么,电子设备可将待分类视频确定为舞蹈类视频。
在一些实施例中,电子设备根据第一人体图像中的人体的姿态和每帧第二人体图像中的人体的姿态,确定待分类视频的类别,可以包括:
电子设备根据第一人体图像中的人体的姿态,确定第一人体图像对应的类别,并根据每帧第二人体图像中的人体的姿态,确定每帧第二人体图像对应的类别,得到多个类别;
电子设备从多个类别中确定出相同类别的数量;
电子设备将数量最多的相同类别确定为待分类视频的类别。
比如,当确定出第一人体图像中的人体的姿态和每帧第二人体图像中的人体的姿态之后,电子设备可根据第一人体图像中的人体的姿态,确定第一人体图像对应的类别。并且,电子设备可根据每帧第二人体图像中的人体的姿态,确定每帧第二人体图像对应的类别,从而得到多个类别。例如,当第一人体图像或某第二图像中的人体的姿态为舞蹈动作时,电子设备可将第一人体图像或该第二人体图像确定为舞蹈类图像。
当得到多个类别之后,电子设备可从多个类别中确定出相同类别的数量,并将数量最多的相同类别确定为待分类视频的类别。例如,假设得到10个类别,存在5个舞蹈类、3个唱歌类和2个打篮球类。那么,电子设备可将待分类视频确定为舞蹈类视频。
在一些实施例中,待分类视频的类别可以为多个。比如,待分类视频既可 以属于舞蹈类视频、又可以属于唱歌类视频,还可以属于打篮球类视频。比如,当确定出某段视频所包括的多个人体图像分别对应的类别之后,只要存在至少两个图像对应的类别相同,该类别相同的至少两个图像对应的类别即可为该段视频的类别。例如,假设一段视频所包括的10个人体图像中,存在5个舞蹈类图像、3个唱歌类图像、2个打篮球类图像,那么,该段视频既可以属于舞蹈类视频、又可以属于唱歌类视频,还可以属于打篮球类视频。
在一些实施例中,电子设备从待分类视频中提取出至少一帧第二人体图像,可以包括:
电子设备将待分类视频分解为多帧图像;
电子设备从多帧图像中选取出存在人体的图像;
电子设备将存在人体的图像中除第一人体图像之外的图像确定为第二人体图像,得到至少一帧第二人体图像。
比如,当得到待分类视频之后,电子设备可将该待分类视频分解为多个视频帧,即多帧图像。然后,电子设备可从该多帧图像中选取出存在人体的图像,并将存在人体的图像中除第一人体图像之外的图像确定为第二人体图像,得到至少一帧第二人体图像。
在一些实施例中,该人体姿态的识别方法还可以包括:
电子设备获取用户的用户画像;
电子设备根据用户画像和待分类视频的类别,判断是否将待分类视频推送给用户;
若根据用户画像和待分类视频的类别,判定将待分类视频推送给用户,则电子设备将待分类视频推送给所述用户。
比如,当确定出待分类视频的类别之后,电子设备可获取用户的用户画像。其中,用户画像是指将用户的每个具体信息抽象成标签,利用这些标签将用户形象具体化,从而为用户提供有针对性的服务。通俗来讲,某个用户的用户画像可描述某个用户经常浏览的文章为哪些类别的文章、该用户经常观看的视频为哪些类别的视频、该用户经常购买的物品为哪些类别的物品,等等。因此,当获取到某用户的用户画像之后,电子设备可确定该用户经常观看的视频为哪些类别的视频。然后,电子设备可判断该待分类视频的类别是否属于该用户经常观看的视频所对应的类别中的其中一个类别。若该待分类视频的类别属于该 用户经常观看的视频所对应的类别中的其中一个类别,电子设备可将该待分类视频推送给用户,以供用户观看。
请参阅图5,图5为本申请实施例提供的人体姿态的识别装置的结构示意图。该人体姿态的识别装置可以包括:获取模块301、第一确定模块302、第二确定模块303和第三确定模块304。
获取模块301,用于获取第一人体图像,所述第一人体图像中包括至少一个人体。
第一确定模块302,用于根据所述第一人体图像,确定至少一个人体框图,每个人体框图中仅包含一个人体。
第二确定模块303,用于确定每个人体框图中的人体的多个关键点坐标。
第三确定模块304,用于根据预设的姿态识别模型和每个人体框图中的人体的多个关键点坐标,确定每个人体框图中的人体的姿态,得到所述第一人体图像中的人体的姿态。
在一些实施例中,第二确定模块303,可以包括:将每个人体框图输入预设的关键点检测模型中,得到每个人体框图对应的多个热力图;根据每个人体框图对应的多个热力图,得到每个人体框图中的人体的多个关键点坐标,其中,一个热力图对应一个关键点坐标。
在一些实施例中,第二确定模块303,可以包括:将每个人体框图输入预设的关键点检测模型中,得到每个人体框图对应的多组特征图,其中,每组特征图包括多个不同尺寸的特征图;对每个人体框图对应的每组特征图中的特征图进行融合处理,得到每个人体框图对应的多个热力图,其中,一组特征图对应一个热力图。
在一些实施例中,第二确定模块303,可以包括:对每个人体框图对应的每个热力图进行高斯滤波处理,得到每个人体框图对应的多个目标热力图;根据每个人体框图对应的多个目标热力图,得到每个人体框图中的人体的多个关键点坐标,其中,一个目标热力图对应一个关键点坐标。
在一些实施例中,获取模块301,可以包括:获取多个样本人体框图;获取每个样本人体框图中的人体对应的多个关键点坐标;利用所述多个样本人体框图和每个样本人体框图中的人体对应的多个关键点坐标对预设的神经网络 模型进行训练;将训练后的神经网络模型作为预设的关键点检测模型。
在一些实施例中,获取模块301,可以包括:获取多组关键点坐标,其中,每组关键点坐标包括多个关键点坐标;获取每组关键点坐标对应的人体姿态;利用所述多组关键点坐标和每组关键点坐标对应的人体姿态对预设的浅层神经网络模型进行训练;将训练后的浅层神经网络模型作为预设的姿态识别模型。
在一些实施例中,第三确定模块304,可以用于:从所述待分类视频中提取出至少一帧第二人体图像;确定每帧第二人体图像中的人体的姿态;根据所述第一人体图像中的人体的姿态和每帧第二人体图像中的人体的姿态,确定所述待分类视频的类别。
在一些实施例中,第三确定模块304,可以用于:根据所述第一人体图像中的人体的姿态,确定所述第一人体图像对应的类别,并根据每帧第二人体图像中的人体的姿态,确定每帧第二人体图像对应的类别,得到多个类别;从所述多个类别中确定出相同类别的数量;将数量最多的相同类别确定为所述待分类视频的类别。
在一些实施例中,第三确定模块304,可以用于:将所述待分类视频分解为多帧图像;从所述多帧图像中选取出存在人体的图像;将存在人体的图像中除所述第一人体图像之外的图像确定为第二人体图像,得到至少一帧第二人体图像。
在一些实施例中,第三确定模块304,可以用于:获取用户的用户画像;根据所述用户画像和所述待分类视频的类别,判断是否将所述待分类视频推送给所述用户;若是,则将所述待分类视频推送给所述用户。
本申请实施例提供一种计算机可读的存储介质,其上存储有计算机程序,当所述计算机程序在计算机上执行时,使得所述计算机执行如本实施例提供的人体姿态的识别方法中的流程。
本申请实施例还提供一种电子设备,包括存储器,处理器,所述存储器中存储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行本实施例提供的人体姿态的识别方法中的流程。
例如,上述电子设备可以是诸如平板电脑或者智能手机等移动终端。请参阅图6,图6为本申请实施例提供的电子设备的第一种结构示意图。
该电子设备400可以包括存储器401、处理器402等部件。本领域技术人员可以理解,图6中示出的电子设备结构并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
存储器401可用于存储应用程序和数据。存储器401存储的应用程序中包含有可执行代码。应用程序可以组成各种功能模块。处理器402通过运行存储在存储器401的应用程序,从而执行各种功能应用以及数据处理。
处理器402是电子设备的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或执行存储在存储器401内的应用程序,以及调用存储在存储器401内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。
在本实施例中,电子设备中的处理器402会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行代码加载到存储器401中,并由处理器401来运行存储在存储器401中的应用程序,从而实现流程:
获取第一人体图像,所述第一人体图像中包括至少一个人体;
根据所述第一人体图像,确定至少一个人体框图,每个人体框图中仅包含一个人体;
确定每个人体框图中的人体的多个关键点坐标;
根据预设的姿态识别模型和每个人体框图中的人体的多个关键点坐标,确定每个人体框图中的人体的姿态,得到所述第一人体图像中的人体的姿态。
请参阅图7,图7为本申请实施例提供的电子设备的第二种结构示意图。
该电子设备400可以包括存储器401、处理器402、输入单元403、输出单元404、显示屏405等部件。
存储器401可用于存储应用程序和数据。存储器401存储的应用程序中包含有可执行代码。应用程序可以组成各种功能模块。处理器402通过运行存储在存储401的应用程序,从而执行各种功能应用以及数据处理。
处理器402是电子设备的控制中心,利用各种接口和线路连接整个电子设备的各个部分,通过运行或执行存储在存储器401内的应用程序,以及调用存储在存储器401内的数据,执行电子设备的各种功能和处理数据,从而对电子设备进行整体监控。
输入单元403可用于接收输入的数字、字符信息或用户特征信息(比如指纹),以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。
输出单元404可用于显示由用户输入的信息或提供给用户的信息以及电子设备的各种图形用户接口,这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。输出单元可包括显示面板。
显示屏405可以用于显示文字、图片等信息。
在本实施例中,电子设备中的处理器402会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行代码加载到存储器401中,并由处理器402来运行存储在存储器401中的应用程序,从而实现流程:
获取第一人体图像,所述第一人体图像中包括至少一个人体;
根据所述第一人体图像,确定至少一个人体框图,每个人体框图中仅包含一个人体;
确定每个人体框图中的人体的多个关键点坐标;
根据预设的姿态识别模型和每个人体框图中的人体的多个关键点坐标,确定每个人体框图中的人体的姿态,得到所述第一人体图像中的人体的姿态。
在一些实施方式中,处理器402执行所述确定每个人体框图中的人体的多个关键点坐标时,可以执行:将每个人体框图输入预设的关键点检测模型中,得到每个人体框图对应的多个热力图;根据每个人体框图对应的多个热力图,得到每个人体框图中的人体的多个关键点坐标,其中,一个热力图对应一个关键点坐标。
在一些实施方式中,处理器402执行所述将每个人体框图输入预设的关键点检测模型中,得到每个人体框图对应的多个热力图时,可以执行:将每个人体框图输入预设的关键点检测模型中,得到每个人体框图对应的多组特征图,其中,每组特征图包括多个不同尺寸的特征图;对每个人体框图对应的每组特征图中的特征图进行融合处理,得到每个人体框图对应的多个热力图,其中,一组特征图对应一个热力图。
在一些实施方式中,处理器402执行所述对每个人体框图对应的每组特征图中的特征图进行融合处理,得到每个人体框图对应的多个热力图之后,还可 以执行:对每个人体框图对应的每个热力图进行高斯滤波处理,得到每个人体框图对应的多个目标热力图;则处理器402执行所述根据每个人体框图对应的多个热力图,得到每个人体框图中的人体的多个关键点坐标,其中,一个热力图对应一个关键点坐标时,可以执行:根据每个人体框图对应的多个目标热力图,得到每个人体框图中的人体的多个关键点坐标,其中,一个目标热力图对应一个关键点坐标。
在一些实施方式中,处理器402执行所述获取第一人体图像之前,还可以执行:获取多个样本人体框图;获取每个样本人体框图中的人体对应的多个关键点坐标;利用所述多个样本人体框图和每个样本人体框图中的人体对应的多个关键点坐标对预设的神经网络模型进行训练;将训练后的神经网络模型作为预设的关键点检测模型。
在一些实施方式中,处理器402执行所述获取第一人体图像之前,还可以执行:获取多组关键点坐标,其中,每组关键点坐标包括多个关键点坐标;获取每组关键点坐标对应的人体姿态;利用所述多组关键点坐标和每组关键点坐标对应的人体姿态对预设的浅层神经网络模型进行训练;将训练后的浅层神经网络模型作为预设的姿态识别模型。
在一些实施方式中,所述第一人体图像为待分类视频中的一帧人体图像,处理器402还可以执行:从所述待分类视频中提取出至少一帧第二人体图像;确定每帧第二人体图像中的人体的姿态;根据所述第一人体图像中的人体的姿态和每帧第二人体图像中的人体的姿态,确定所述待分类视频的类别。
在一些实施方式中,处理器402执行所述根据所述第一人体图像中的人体的姿态和每帧第二人体图像中的人体的姿态,确定所述待分类视频的类别时,可以执行:根据所述第一人体图像中的人体的姿态,确定所述第一人体图像对应的类别,并根据每帧第二人体图像中的人体的姿态,确定每帧第二人体图像对应的类别,得到多个类别;从所述多个类别中确定出相同类别的数量;将数量最多的相同类别确定为所述待分类视频的类别。
在一些实施方式中,处理器402执行所述从所述待分类视频中提取出至少一帧第二人体图像时,可以执行:将所述待分类视频分解为多帧图像;从所述多帧图像中选取出存在人体的图像;将存在人体的图像中除所述第一人体图像之外的图像确定为第二人体图像,得到至少一帧第二人体图像。
在一些实施方式中,处理器402还可以执行:获取用户的用户画像;根据所述用户画像和所述待分类视频的类别,判断是否将所述待分类视频推送给所述用户;若是,则将所述待分类视频推送给所述用户。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见上文针对人体姿态的识别方法的详细描述,此处不再赘述。
本申请实施例提供的所述人体姿态的识别装置与上文实施例中的人体姿态的识别方法属于同一构思,在所述人体姿态的识别装置上可以运行所述人体姿态的识别方法实施例中提供的任一方法,其具体实现过程详见所述人体姿态的识别方法实施例,此处不再赘述。
需要说明的是,对本申请实施例所述人体姿态的识别方法而言,本领域普通技术人员可以理解实现本申请实施例所述人体姿态的识别方法的全部或部分流程,是可以通过计算机程序来控制相关的硬件来完成,所述计算机程序可存储于一计算机可读取存储介质中,如存储在存储器中,并被至少一个处理器执行,在执行过程中可包括如所述人体姿态的识别方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)等。
对本申请实施例的所述人体姿态的识别装置而言,其各功能模块可以集成在一个处理芯片中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中,所述存储介质譬如为只读存储器,磁盘或光盘等。
以上对本申请实施例所提供的一种人体姿态的识别方法、装置、存储介质以及电子设备进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种人体姿态的识别方法,其中,包括:
    获取第一人体图像,所述第一人体图像中包括至少一个人体;
    根据所述第一人体图像,确定至少一个人体框图,每个人体框图中仅包含一个人体;
    确定每个人体框图中的人体的多个关键点坐标;
    根据预设的姿态识别模型和每个人体框图中的人体的多个关键点坐标,确定每个人体框图中的人体的姿态,得到所述第一人体图像中的人体的姿态。
  2. 根据权利要求1所述的人体姿态的识别方法,其中,所述确定每个人体框图中的人体的多个关键点坐标,包括:
    将每个人体框图输入预设的关键点检测模型中,得到每个人体框图对应的多个热力图;
    根据每个人体框图对应的多个热力图,得到每个人体框图中的人体的多个关键点坐标,其中,一个热力图对应一个关键点坐标。
  3. 根据权利要求2所述的人体姿态的识别方法,其中,所述将每个人体框图输入预设的关键点检测模型中,得到每个人体框图对应的多个热力图,包括:
    将每个人体框图输入预设的关键点检测模型中,得到每个人体框图对应的多组特征图,其中,每组特征图包括多个不同尺寸的特征图;
    对每个人体框图对应的每组特征图中的特征图进行融合处理,得到每个人体框图对应的多个热力图,其中,一组特征图对应一个热力图。
  4. 根据权利要求3所述的人体姿态的识别方法,其中,在所述对每个人体框图对应的每组特征图中的特征图进行融合处理,得到每个人体框图对应的多个热力图之后,还包括:
    对每个人体框图对应的每个热力图进行高斯滤波处理,得到每个人体框图对应的多个目标热力图;
    所述根据每个人体框图对应的多个热力图,得到每个人体框图中的人体的多个关键点坐标,其中,一个热力图对应一个关键点坐标,包括:
    根据每个人体框图对应的多个目标热力图,得到每个人体框图中的人体的多个关键点坐标,其中,一个目标热力图对应一个关键点坐标。
  5. 根据权利要求2所述的人体姿态的识别方法,其中,在所述获取第一人体图像之前,还包括:
    获取多个样本人体框图;
    获取每个样本人体框图中的人体对应的多个关键点坐标;
    利用所述多个样本人体框图和每个样本人体框图中的人体对应的多个关键点坐标对预设的神经网络模型进行训练;
    将训练后的神经网络模型作为预设的关键点检测模型。
  6. 根据权利要求1所述的人体姿态的识别方法,其中,在所述获取第一人体图像之前,还包括:
    获取多组关键点坐标,其中,每组关键点坐标包括多个关键点坐标;
    获取每组关键点坐标对应的人体姿态;
    利用所述多组关键点坐标和每组关键点坐标对应的人体姿态对预设的浅层神经网络模型进行训练;
    将训练后的浅层神经网络模型作为预设的姿态识别模型。
  7. 根据权利要求1所述的人体姿态的识别方法,其中,所述第一人体图像为待分类视频中的一帧人体图像,所述方法还包括:
    从所述待分类视频中提取出至少一帧第二人体图像;
    确定每帧第二人体图像中的人体的姿态;
    根据所述第一人体图像中的人体的姿态和每帧第二人体图像中的人体的姿态,确定所述待分类视频的类别。
  8. 根据权利要求7所述的人体姿态的识别方法,其中,所述根据所述第一人体图像中的人体的姿态和每帧第二人体图像中的人体的姿态,确定所述待分类视频的类别,包括:
    根据所述第一人体图像中的人体的姿态,确定所述第一人体图像对应的类别,并根据每帧第二人体图像中的人体的姿态,确定每帧第二人体图像对应的类别,得到多个类别;
    从所述多个类别中确定出相同类别的数量;
    将数量最多的相同类别确定为所述待分类视频的类别。
  9. 根据权利要求7所述的人体姿态的识别方法,其中,所述从所述待分类视频中提取出至少一帧第二人体图像,包括:
    将所述待分类视频分解为多帧图像;
    从所述多帧图像中选取出存在人体的图像;
    将存在人体的图像中除所述第一人体图像之外的图像确定为第二人体图像,得到至少一帧第二人体图像。
  10. 根据权利要求7所述的人体姿态的识别方法,其中,所述方法还包括:
    获取用户的用户画像;
    根据所述用户画像和所述待分类视频的类别,判断是否将所述待分类视频推送给所述用户;
    若是,则将所述待分类视频推送给所述用户。
  11. 一种人体姿态的识别装置,其中,包括:
    获取模块,用于获取第一人体图像,所述第一人体图像中包括至少一个人体;
    第一确定模块,用于根据所述第一人体图像,确定至少一个人体框图,每个人体框图中仅包含一个人体;
    第二确定模块,用于确定每个人体框图中的人体的多个关键点坐标;
    第三确定模块,用于根据预设的姿态识别模型和每个人体框图中的人体的多个关键点坐标,确定每个人体框图中的人体的姿态,得到所述第一人体图像中的人体的姿态。
  12. 一种存储介质,其中,所述存储介质中存储有计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行权利要求1至10任一项所述的人体姿态的识别方法。
  13. 一种电子设备,其中,所述电子设备包括处理器和存储器,所述存储器中存储有计算机程序,所述处理器通过调用所述存储器中存储的所述计算机程序,用于执行:
    获取第一人体图像,所述第一人体图像中包括至少一个人体;
    根据所述第一人体图像,确定至少一个人体框图,每个人体框图中仅包含一个人体;
    确定每个人体框图中的人体的多个关键点坐标;
    根据预设的姿态识别模型和每个人体框图中的人体的多个关键点坐标,确定每个人体框图中的人体的姿态,得到所述第一人体图像中的人体的姿态。
  14. 根据权利要求13所述的电子设备,其中,所述处理器用于执行:
    将每个人体框图输入预设的关键点检测模型中,得到每个人体框图对应的多个热力图;
    根据每个人体框图对应的多个热力图,得到每个人体框图中的人体的多个关键点坐标,其中,一个热力图对应一个关键点坐标。
  15. 根据权利要求14所述的电子设备,其中,所述处理器用于执行:
    将每个人体框图输入预设的关键点检测模型中,得到每个人体框图对应的多组特征图,其中,每组特征图包括多个不同尺寸的特征图;
    对每个人体框图对应的每组特征图中的特征图进行融合处理,得到每个人体框图对应的多个热力图,其中,一组特征图对应一个热力图。
  16. 根据权利要求15所述的电子设备,其中,所述处理器用于执行:
    对每个人体框图对应的每个热力图进行高斯滤波处理,得到每个人体框图对应的多个目标热力图;
    根据每个人体框图对应的多个目标热力图,得到每个人体框图中的人体的多个关键点坐标,其中,一个目标热力图对应一个关键点坐标。
  17. 根据权利要求14所述的电子设备,其中,所述处理器用于执行:
    获取多个样本人体框图;
    获取每个样本人体框图中的人体对应的多个关键点坐标;
    利用所述多个样本人体框图和每个样本人体框图中的人体对应的多个关键点坐标对预设的神经网络模型进行训练;
    将训练后的神经网络模型作为预设的关键点检测模型。
  18. 根据权利要求13所述的电子设备,其中,所述处理器用于执行:
    获取多组关键点坐标,其中,每组关键点坐标包括多个关键点坐标;
    获取每组关键点坐标对应的人体姿态;
    利用所述多组关键点坐标和每组关键点坐标对应的人体姿态对预设的浅层神经网络模型进行训练;
    将训练后的浅层神经网络模型作为预设的姿态识别模型。
  19. 根据权利要求13所述的电子设备,其中,所述处理器用于执行:
    从所述待分类视频中提取出至少一帧第二人体图像;
    确定每帧第二人体图像中的人体的姿态;
    根据所述第一人体图像中的人体的姿态和每帧第二人体图像中的人体的姿态,确定所述待分类视频的类别。
  20. 根据权利要求19所述的电子设备,其中,所述处理器用于执行:
    根据所述第一人体图像中的人体的姿态,确定所述第一人体图像对应的类别,并根据每帧第二人体图像中的人体的姿态,确定每帧第二人体图像对应的类别,得到多个类别;
    从所述多个类别中确定出相同类别的数量;
    将数量最多的相同类别确定为所述待分类视频的类别。
PCT/CN2019/119926 2019-11-21 2019-11-21 人体姿态的识别方法、装置、存储介质及电子设备 WO2021097750A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/119926 WO2021097750A1 (zh) 2019-11-21 2019-11-21 人体姿态的识别方法、装置、存储介质及电子设备
CN201980100467.4A CN114402369A (zh) 2019-11-21 2019-11-21 人体姿态的识别方法、装置、存储介质及电子设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/119926 WO2021097750A1 (zh) 2019-11-21 2019-11-21 人体姿态的识别方法、装置、存储介质及电子设备

Publications (1)

Publication Number Publication Date
WO2021097750A1 true WO2021097750A1 (zh) 2021-05-27

Family

ID=75980293

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/119926 WO2021097750A1 (zh) 2019-11-21 2019-11-21 人体姿态的识别方法、装置、存储介质及电子设备

Country Status (2)

Country Link
CN (1) CN114402369A (zh)
WO (1) WO2021097750A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326778A (zh) * 2021-05-31 2021-08-31 中科计算技术西部研究院 基于图像识别的人体姿态检测方法、装置和存储介质
CN113673318A (zh) * 2021-07-12 2021-11-19 浙江大华技术股份有限公司 一种动作检测方法、装置、计算机设备和存储介质
CN113706463A (zh) * 2021-07-22 2021-11-26 杭州键嘉机器人有限公司 基于深度学习的关节影像关键点自动检测方法、装置、设备及存储介质
CN113837130A (zh) * 2021-09-29 2021-12-24 福州大学 一种人体手部骨架检测方法及系统
CN113955594A (zh) * 2021-10-18 2022-01-21 日立楼宇技术(广州)有限公司 一种电梯控制方法、装置、计算机设备和存储介质
CN115115851A (zh) * 2022-08-30 2022-09-27 广州市玄武无线科技股份有限公司 一种商品姿态估计的方法、装置及存储介质
WO2023185241A1 (zh) * 2022-03-31 2023-10-05 腾讯科技(深圳)有限公司 数据处理方法、装置、设备以及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150186713A1 (en) * 2013-12-31 2015-07-02 Konica Minolta Laboratory U.S.A., Inc. Method and system for emotion and behavior recognition
CN109344790A (zh) * 2018-10-16 2019-02-15 浩云科技股份有限公司 一种基于姿态分析的人体行为分析方法及系统
CN110163046A (zh) * 2018-06-19 2019-08-23 腾讯科技(深圳)有限公司 人体姿态识别方法、装置、服务器及存储介质
CN110321795A (zh) * 2019-05-24 2019-10-11 平安科技(深圳)有限公司 用户姿态识别方法、装置、计算机装置及计算机存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150186713A1 (en) * 2013-12-31 2015-07-02 Konica Minolta Laboratory U.S.A., Inc. Method and system for emotion and behavior recognition
CN110163046A (zh) * 2018-06-19 2019-08-23 腾讯科技(深圳)有限公司 人体姿态识别方法、装置、服务器及存储介质
CN109344790A (zh) * 2018-10-16 2019-02-15 浩云科技股份有限公司 一种基于姿态分析的人体行为分析方法及系统
CN110321795A (zh) * 2019-05-24 2019-10-11 平安科技(深圳)有限公司 用户姿态识别方法、装置、计算机装置及计算机存储介质

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326778A (zh) * 2021-05-31 2021-08-31 中科计算技术西部研究院 基于图像识别的人体姿态检测方法、装置和存储介质
CN113673318A (zh) * 2021-07-12 2021-11-19 浙江大华技术股份有限公司 一种动作检测方法、装置、计算机设备和存储介质
CN113673318B (zh) * 2021-07-12 2024-05-03 浙江大华技术股份有限公司 一种动作检测方法、装置、计算机设备和存储介质
CN113706463A (zh) * 2021-07-22 2021-11-26 杭州键嘉机器人有限公司 基于深度学习的关节影像关键点自动检测方法、装置、设备及存储介质
CN113706463B (zh) * 2021-07-22 2024-04-26 杭州键嘉医疗科技股份有限公司 基于深度学习的关节影像关键点自动检测方法、装置
CN113837130A (zh) * 2021-09-29 2021-12-24 福州大学 一种人体手部骨架检测方法及系统
CN113837130B (zh) * 2021-09-29 2023-08-08 福州大学 一种人体手部骨架检测方法及系统
CN113955594A (zh) * 2021-10-18 2022-01-21 日立楼宇技术(广州)有限公司 一种电梯控制方法、装置、计算机设备和存储介质
CN113955594B (zh) * 2021-10-18 2024-02-27 日立楼宇技术(广州)有限公司 一种电梯控制方法、装置、计算机设备和存储介质
WO2023185241A1 (zh) * 2022-03-31 2023-10-05 腾讯科技(深圳)有限公司 数据处理方法、装置、设备以及介质
CN115115851A (zh) * 2022-08-30 2022-09-27 广州市玄武无线科技股份有限公司 一种商品姿态估计的方法、装置及存储介质
CN115115851B (zh) * 2022-08-30 2023-01-31 广州市玄武无线科技股份有限公司 一种商品姿态估计的方法、装置及存储介质

Also Published As

Publication number Publication date
CN114402369A (zh) 2022-04-26

Similar Documents

Publication Publication Date Title
WO2021097750A1 (zh) 人体姿态的识别方法、装置、存储介质及电子设备
CN108492363B (zh) 基于增强现实的结合方法、装置、存储介质及电子设备
US11430265B2 (en) Video-based human behavior recognition method, apparatus, device and storage medium
WO2018177379A1 (zh) 手势识别、控制及神经网络训练方法、装置及电子设备
US9436883B2 (en) Collaborative text detection and recognition
US9256795B1 (en) Text entity recognition
TWI724669B (zh) 病灶檢測方法及其裝置及其設備及儲存媒體
US10671841B2 (en) Attribute state classification
CN104050443B (zh) 使用肤色检测的视频流的姿势预处理
CN103916647B (zh) 采用拖延期的视频流的姿势预处理来减少平台功率
CN107766349B (zh) 一种生成文本的方法、装置、设备及客户端
US8965051B2 (en) Method and apparatus for providing hand detection
CN107766403B (zh) 一种相册处理方法、移动终端以及计算机可读存储介质
US11948088B2 (en) Method and apparatus for image recognition
WO2019174398A1 (zh) 一种利用手势模拟鼠标操作的方法、装置及终端
Sharma et al. Air-swipe gesture recognition using OpenCV in Android devices
CN110909638A (zh) 一种基于arm平台的人脸识别方法及系统
CN109376618A (zh) 图像处理方法、装置及电子设备
CN116301551A (zh) 触控识别方法、触控识别装置、电子设备及介质
CN117011929A (zh) 一种头部姿态估计方法、装置、设备以及存储介质
WO2022103519A1 (en) Semantic segmentation for stroke classification in inking application
CN111079662A (zh) 一种人物识别方法、装置、机器可读介质及设备
CN111797656A (zh) 人脸关键点检测方法、装置、存储介质及电子设备
CN115147902B (zh) 人脸活体检测模型的训练方法、装置及计算机程序产品
WO2023185787A1 (zh) 一种物品的搭配方法以及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19953323

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.11.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19953323

Country of ref document: EP

Kind code of ref document: A1