WO2022127494A1 - 位姿识别模型训练方法、装置、位姿识别方法和终端设备 - Google Patents

位姿识别模型训练方法、装置、位姿识别方法和终端设备 Download PDF

Info

Publication number
WO2022127494A1
WO2022127494A1 PCT/CN2021/131148 CN2021131148W WO2022127494A1 WO 2022127494 A1 WO2022127494 A1 WO 2022127494A1 CN 2021131148 W CN2021131148 W CN 2021131148W WO 2022127494 A1 WO2022127494 A1 WO 2022127494A1
Authority
WO
WIPO (PCT)
Prior art keywords
pose
point
training
recognition model
rgb image
Prior art date
Application number
PCT/CN2021/131148
Other languages
English (en)
French (fr)
Inventor
林灿然
郭渺辰
程骏
庞建新
Original Assignee
深圳市优必选科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市优必选科技股份有限公司 filed Critical 深圳市优必选科技股份有限公司
Publication of WO2022127494A1 publication Critical patent/WO2022127494A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to a pose recognition model training method, device, pose recognition method and terminal device.
  • RGB images are easily disturbed by background noise.
  • the background changes, it will lead to the recognition accuracy of the model trained with this data.
  • the image-based behavior recognition method also requires a large amount of data, and generally requires a huge amount of data to train a more robust model.
  • the present application proposes a pose recognition model training method, device, pose recognition method and terminal device.
  • An embodiment of the present application proposes a method for training a pose recognition model, and the method for training a pose recognition model includes:
  • the pose recognition model is trained by using a plurality of training RGB images until the loss function corresponding to the pose recognition model converges.
  • the method for training a pose recognition model after extracting a pose point-line diagram consisting of a preset number of pose key point coordinates from the original RGB image, further includes:
  • feature classification is performed on the coordinates of each pose key point.
  • the above-mentioned training method for a pose recognition model wherein the pose point-line diagram is drawn on the background board with a target color that is different from the predetermined background board color, including:
  • the dotted lines formed by the coordinates of the pose key points in the same feature category are marked with the same preset color, and the dotted lines formed by the coordinates of the pose key points in each feature category are marked with different colors and are different from the specified color.
  • the color of the predetermined background plate is different.
  • the feature categories include head and neck feature categories, arm feature categories and lower limb feature categories.
  • the color of the predetermined background plate is black
  • the dotted lines formed by the coordinates of the pose key points in the head and neck feature category, the arm feature category and the lower limb feature category are sequentially marked with green , one of blue and red.
  • the above-mentioned training method for a pose recognition model wherein using a preset method for extracting a pose point-line diagram to extract a pose point-line diagram consisting of a preset number of pose key point coordinates from the original RGB image, including:
  • the pose point-line diagram is determined according to the position information of a preset number of key points.
  • Yet another embodiment of the present application provides an apparatus for training a pose recognition model, the apparatus comprising:
  • the original RGB image acquisition module is used to obtain the original RGB image
  • a pose point-line graph extraction module used for extracting a pose-dot-line graph consisting of a preset number of pose key point coordinates from the original RGB image by using a preset pose-dot-line graph extraction method
  • a training RGB image determination module is used to draw the pose point-line diagram on the background plate with a target color different from the predetermined background plate color, respectively, to determine the training RGB image;
  • the pose recognition model training module is used for training the pose recognition model by using a plurality of training RGB images until the loss function corresponding to the pose recognition model converges.
  • pose recognition method comprising:
  • the pose corresponding to the dotted line RGB image is determined by using the pose recognition model trained by the pose recognition model training method described in the embodiment of the present application.
  • the embodiments of the present application relate to a terminal device, including a memory and a processor, where the memory is used to store a computer program, and the computer program executes the pose recognition model training described in the embodiments of the present application when the computer program runs on the processor The method or the pose recognition method described in the embodiments of the present application.
  • the embodiments of the present application relate to a readable storage medium, which stores a computer program, and the computer program executes the pose recognition model training method described in the embodiments of the present application or the methods described in the embodiments of the present application when the computer program runs on the processor. Pose recognition model training method.
  • the pose recognition model training method disclosed in the present application includes: obtaining an original RGB image; extracting a pose consisting of a preset number of pose key point coordinates from the original RGB image by using a preset pose point-line graph extraction method Dot-line diagram; draw the pose dot-line diagram on the background board with a target color different from the predetermined background board color, respectively, to determine the training RGB image; use a plurality of training RGB images to train the pose recognition model until all the The corresponding loss function of the pose recognition model converges.
  • the background colors of each training RGB image are uniform, and the pose point-and-line diagram on the training RGB image only includes the point-line connection relationship of key parts, which is directly related to the original training sample, that is, the original RGB image is directly used as the training sample.
  • training RGB images can effectively reduce the complexity of the original training samples, realize that the training of the pose recognition model can be completed with less training RGB images, and reduce the training time of the pose recognition model.
  • FIG. 1 shows a schematic flowchart of a training method for a pose recognition model proposed by an embodiment of the present application
  • FIG. 2 shows a schematic flowchart of another method for training a pose recognition model proposed by an embodiment of the present application
  • FIG. 3 shows a schematic structural diagram of a pose recognition model training device proposed by an embodiment of the present application
  • FIG. 4 shows a schematic flowchart of a pose recognition method proposed by an embodiment of the present application
  • FIG. 5 shows a schematic diagram of a pose point-line diagram composed of 18 key points proposed by an embodiment of the present application
  • FIG. 6 shows a schematic diagram of another pose point-line diagram composed of 25 key points proposed by an embodiment of the present application
  • FIG. 7 shows a schematic diagram of a point-line diagram of hugging pose proposed by an embodiment of the present application.
  • FIG. 8 shows a schematic diagram of a point-line diagram of a raised hand position and posture proposed by an embodiment of the present application
  • FIG. 9 shows a schematic diagram of a point-line diagram of a squatting posture proposed by an embodiment of the present application.
  • FIG. 10 shows a schematic diagram of a point-line diagram of a standing posture proposed by an embodiment of the present application.
  • 1- training device for pose recognition model 10- original RGB image acquisition module; 20- pose point-line map extraction module; 30- training RGB image determination module; 40- pose recognition model training module.
  • Body pose and gesture pose are important ways for humans to express and communicate.
  • Human-based behavior recognition is also one of the important research directions of human-computer interaction, which plays an important role in video surveillance, somatosensory games and other fields.
  • Human behavior and posture are complex and changeable.
  • training a behavior recognition model requires a large amount of human behavior data.
  • This application proposes a method for training a pose recognition model, which reduces the amount of training data for behavior recognition tasks from the perspective of reducing feature redundancy and improving model attention.
  • the main problems to be solved by this application are: (1) Human behavior is complex and changeable, and there are usually interfering factors such as background. If the originally collected RGB image features are used for pose recognition, once the scene changes, the model recognition effect will be sharp. decline. Moreover, in the actual application process, it is difficult to customize the design for each scene, which is time-consuming and labor-intensive; (2) Usually, training a relatively robust behavior recognition model requires a large amount of data and cannot cover all the scenes.
  • this application proposes a training method for a pose recognition model, which converts the original RGB image into a skeleton map containing the key point features of the human body through the pose point-line map extraction method, that is, the pose point-line map.
  • the pose point-line diagram is drawn on the preset color background board, which realizes the decoupling of background information and improves the model's focus on feature recognition.
  • the position information (horizontal and vertical coordinates) of the 18 key points of the original RGB image is extracted through the OpenPose pose point-line map extraction method, and it is drawn on the black background image to obtain the same value as the original RGB image.
  • the images correspond to the training RGB images including the pose point-line map.
  • the human body is decoupled into three parts, namely head and neck, arms and lower limbs, because the parts of the human body that distinguish different behaviors depend on are different. And render the points and lines of the three sections with different colors.
  • the rendered human skeleton map data is sent to a preset convolutional neural network for feature extraction and training.
  • FIG. 1 shows that a method for training a pose recognition model includes the following steps:
  • the original RGB image includes the behavior gesture or gesture gesture to be recognized.
  • the acquisition of the original RGB image can be acquired by image acquisition equipment, such as video cameras, cameras, thermal image acquisition equipment, etc.; it can also be acquired by acquiring multiple video frames of video recording; it can also be acquired directly from the original RGB image stored
  • the storage device can be an electronic device with a storage function, such as a hard disk, a U disk, a PAD, and a notebook. This embodiment does not limit the manner in which each original RGB image is acquired.
  • RGB images are also called full color images, which have three channels: R (red), G (green), and B (blue). RGB is the color representing the three channels of red, green and blue.
  • the RGB standard includes almost all colors that can be perceived by human vision, and is one of the most widely used color systems.
  • the position information of a preset number of key points of the original RGB image can be extracted by using the OpenPose human body pose estimation algorithm; the pose point-line diagram is determined according to the position information of the preset number of key points.
  • OpenPose is an open source library based on convolutional neural networks and supervised learning and developed with caffe as the framework, which can be used for pose estimation such as human movements, facial expressions, and finger movements.
  • OpenPose can extract a pose point-line diagram composed of a preset number of pose key point coordinates from each original RGB image, that is, it can identify each human body joint point or human body key part in the original RGB image, and output each human body joint Coordinates of points or key parts of the human body.
  • the preset number may be one of 18 and 25.
  • the 18 key points extracted by OpenPose are represented in the form of coordinates, each key point corresponds to a specific serial number, and each serial number corresponds to a specific key part of the human body, as shown in the following table.
  • the 25 key points extracted by OpenPose are expressed in the form of coordinates, each key point corresponds to a specific serial number, each serial number corresponds to a specific key part of the human body, and the correspondence between the 25 key points and the serial number is as follows: shown.
  • each key point coordinate can be correspondingly connected according to the body structure to determine the pose point line diagram.
  • the pose point-line diagram composed of 18 key points is shown in Figure 5
  • the pose point-line diagram composed of 25 key points is shown in Figure 6.
  • the generated background board has a uniform color, which can be used to make a training RGB image with a uniform background color.
  • Each pose point-and-line diagram can be drawn on the corresponding background board by using the target color different from the predetermined background board color, and the background and the pose point-line diagram to be recognized can be distinguished, which is convenient for the gesture recognition model to quickly and accurately. Identify the pose point-line diagram to be identified, effectively avoiding background interference.
  • training RGB images are required to train the pose recognition model, the background colors of each training RGB image are uniform, and the pose point-line diagram on the training RGB image only includes the point-line connection relationship of key parts, which is different from the original training sample. That is, compared with directly using the original RGB image as the training sample, training the RGB image can effectively reduce the complexity of the original training sample, realize that the training of the pose recognition model can be completed by using less training RGB images, and reduce the pose recognition model. training time.
  • the training method for a pose recognition model disclosed in this embodiment includes: acquiring an original RGB image; extracting a pose consisting of a preset number of pose key point coordinates from the original RGB image by using a preset pose point-line graph extraction method Pose point line diagram; draw the pose point line diagram on the background plate with a target color different from the predetermined background plate color, respectively, to determine the training RGB image; use a plurality of training RGB images to train the pose recognition model until The loss function corresponding to the pose recognition model converges.
  • the background colors of each training RGB image are uniform, and the pose point-line diagram on the training RGB image only includes the point-line connection relationship of key parts, which is directly related to the original training sample, that is, the original RGB image is directly used for training.
  • training RGB images can effectively reduce the complexity of the original training samples, realize that the training of the pose recognition model can be completed with less training RGB images, and reduce the training time of the pose recognition model.
  • FIG. 2 shows that another method for training a pose recognition model includes the following steps:
  • S200 Using a preset method for extracting pose point and line diagrams, extract a pose point line diagram consisting of a preset number of pose key point coordinates from each original RGB image.
  • S300 Perform feature classification on the coordinates of each pose key point according to the attributes of the coordinates of each pose key point in the pose point-line diagram.
  • the attributes of the pose key point coordinates feature classification for each pose key point coordinate.
  • each posture key point corresponds to each part of the body.
  • the human body parts depend on different behaviors and postures are different. For example, hugging and raising hands are more dependent on the key points of the arm part, while the squatting behavior is more dependent on the key points of the lower body part. Therefore, when recognizing human poses, each key part of the human body can be classified into features, and the corresponding feature categories include head and neck feature categories, arm feature categories, and lower limb feature categories.
  • the key points corresponding to the left ear, left eye, nose, right eye and right ear can be connected in sequence, the key points corresponding to the nose and neck are connected, the right wrist, right elbow, right shoulder, The key points corresponding to the neck, left shoulder, left elbow, and left wrist can be connected in sequence; the key points corresponding to the neck, left hip, left knee, and left ankle can be connected in sequence; the key corresponding to the neck, right hip, right knee, and right ankle The points can be connected in sequence, and the pose point-line diagram will be obtained after the above connection is completed.
  • the pose point-line graph extraction method extracts 18 key points, the serial numbers 0, 1 and 14-17 belong to the head and neck feature category, the serial numbers 2-7 belong to the arm feature category, and the serial numbers 8-13 belong to the lower limb feature. Category; if the pose point-line graph extraction method extracts 25 key points, the serial numbers 0, 1 and 15-18 belong to the head and neck feature category, the serial numbers 2-7 belong to the arm feature category, and the serial numbers are 8-14 and 19-24 Belongs to the lower extremity feature category.
  • Background panels with three RGB channels are all predetermined background panel colors, and the number of the background panels is the same as the number of the original RGB images.
  • each pose point-line diagram can be drawn on the corresponding background plate in equal proportions; when the length and width of the background plate and the original RGB image are different At the same time, according to the length ratio and width ratio of the background board and the original RGB image, each pose point-line diagram can be drawn on the corresponding background board according to the length ratio and the width ratio respectively.
  • the dotted lines included in the head and neck feature category, the arm feature category, and the lower limb feature category can be marked with different colors, for example, the dotted lines included in the head and neck feature category, the arm feature category, and the lower limb feature category Can be labeled green, blue, and red in sequence.
  • the head and neck are marked in green
  • the arms are marked in blue
  • the lower extremities are marked in red. Because the arms and lower limbs are used more frequently, and the arms and lower limbs are more conducive to the recognition of human posture, the use of blue and red with a higher degree of discrimination makes the arm features and lower limb features have a greater degree of discrimination.
  • the arms can also be marked in red, and the lower limbs can be marked in blue. As long as the degree of distinction between the arm feature and the lower limb feature can be increased, this embodiment does not limit it herein.
  • the dotted line composed of the pose key point coordinates of is marked in blue
  • the color of the predetermined background board is black, because drawing the human body pose point-line diagram on the black background board can reduce background interference and highlight the human body pose information that needs to be learned, so that the pose recognition model is more focused on learning human body pose information.
  • the parameters before each ",”: the first is the width, the second is the height, the third is the channel, and 0:3 is the three channels of 0, 1, and 2, representing R(red), G(green) ) and B(blue) three channels, " 0" means to generate a background color with black background.
  • the pose recognition model can be a convolutional neural network, which can be one of ResNet18, MobileNet v2 and ShuffleNet v2. Because convolutional neural networks are sensitive to features such as color and texture, the rendered training RGB images are very suitable for convolutional neural networks for feature extraction and learning. Therefore, the amount of training data required by the pose recognition model can be greatly reduced. Only a few rounds of training of the pose recognition model with less training data can be used to obtain a pose recognition model with good robustness.
  • the original RGB image is used to train the pose recognition model composed of the ResNet18 network.
  • the pose recognition model can accurately identify hugging, raising hands, squatting and standing, at least the original RGB images of each pose category are required. There are 500 RGB images respectively, and they cannot cover all the scenes, and the recognition effect of the trained pose recognition model is not ideal.
  • to train the pose recognition model composed of the ResNet18 network using the training RGB images proposed in the embodiment of the present application only 50 original RGB images of each pose category are required, and the pose recognition model can be obtained by performing a few rounds of training on the pose recognition model. A very robust pose recognition model.
  • the dotted line graph of hugging posture is shown in Figure 7
  • the dotted line graph of raised hand posture is shown in Figure 8
  • the dotted line graph of squatting posture is shown in Figure 9
  • the dotted line graph of standing posture is shown in Figure 10 shown. It can be understood that in Figures 7 to 10, the head and neck parts are marked in green, the arms are marked in blue, and the lower limbs are marked in red.
  • a pose recognition model training device 1 includes: an original RGB image acquisition module 10 , a pose point-line graph extraction module 20 , a training RGB image determination module 30 , and a pose recognition model training module 10 . module 40.
  • the original RGB image acquisition module 10 is used to acquire the original RGB image; the pose point-and-line image extraction module 20 is used to extract a preset number of pose keys from the original RGB image by using a preset pose point-and-line image extraction method.
  • the pose recognition model training module 40 is configured to use the plurality of training RGB images to train the pose recognition model until the loss function corresponding to the pose recognition model converges.
  • the pose recognition model training device 1 also includes:
  • the feature classification module is configured to perform feature classification on the coordinates of each pose key point according to the attributes of the coordinates of each pose key point in the pose point-line graph.
  • a preset method for extracting pose point-and-line diagrams from the original RGB image to extract a pose point-line diagram consisting of a preset number of pose key point coordinates including:
  • the pose point-line diagram is determined according to the position information of 18 key points.
  • the training RGB image determination module 30 includes:
  • the pose point-line diagram drawing unit is used to draw the pose point-line diagrams on the background board respectively;
  • the feature category color distinguishing unit is used for the background board to mark the point lines formed by the coordinates of the pose key points in the same feature category. are the same preset color, and the marking colors of the dotted lines formed by the coordinates of the pose key points in each feature category are different and different from the color of the predetermined background plate.
  • the feature categories include head and neck feature categories, arm feature categories and lower limb feature categories.
  • the color of the predetermined background plate is black
  • the head and neck feature category, arm feature category and lower limb feature category are one of green, blue and red
  • the pose in each feature category is black.
  • the marking colors of the dotted lines formed by the coordinates of the key points are different and different from the color of the predetermined background board.
  • the pose recognition model training device includes one of ResNet18, MobileNet v2 and ShuffleNet v2.
  • the pose recognition model training device 1 disclosed in this embodiment is used in conjunction with the original RGB image acquisition module 10 , the pose point-line graph extraction module 20 , the training RGB image determination module 30 and the pose recognition model training module 40 to execute
  • the method for training a pose recognition model described in the foregoing embodiment, the implementation and beneficial effects involved in the foregoing embodiment are also applicable in this embodiment, and are not repeated here.
  • a pose recognition method includes the following steps:
  • S41 Determine the pose corresponding to the point-line RGB image by using a pre-trained pose recognition model that meets the target.
  • the probability value of each dotted RGB image belonging to a certain category can be obtained. It can be understood that, The category to which the largest probability value belongs is the behavior finally recognized by the pose recognition model.
  • step S21 may further include: classifying the coordinates of each pose key point according to the attributes of the coordinates of each pose key point in the pose point-line diagram.
  • the feature categories include head and neck feature categories, arm feature categories and lower limb feature categories.
  • step S31 includes:
  • each pose point-line diagram on the corresponding background board; on each background board, mark the point line formed by the pose key point coordinates in the same feature category as the preset same color, each feature category The marked color of the dotted line formed by the pose key point coordinates in is different and different from the color of the predetermined background plate.
  • the color of the predetermined background plate is black
  • the dotted lines formed by the coordinates of the pose key points in the head and neck feature category, the arm feature category and the lower limb feature category are sequentially marked as one of green, blue and red
  • the marking colors of the dotted lines formed by the coordinates of the pose key points in each feature category are different and different from the color of the predetermined background plate.
  • the embodiments of the present application relate to a terminal device, including a memory and a processor, where the memory is used to store a computer program, and when the computer program runs on the processor, the poses described in the embodiments of the present application are executed The recognition model training method or the pose recognition method described in the embodiments of the present application.
  • the embodiments of the present application relate to a readable storage medium, which stores a computer program, and the computer program executes the pose recognition model training method described in the embodiments of the present application or the embodiments of the present application when the computer program runs on the processor.
  • the described pose recognition method is a readable storage medium, which stores a computer program, and the computer program executes the pose recognition model training method described in the embodiments of the present application or the embodiments of the present application when the computer program runs on the processor.
  • the described pose recognition method is a readable storage medium, which stores a computer program, and the computer program executes the pose recognition model training method described in the embodiments of the present application or the embodiments of the present application when the computer program runs on the processor.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more functions for implementing the specified logical function(s) executable instructions. It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures.
  • each block of the block diagrams and/or flow diagrams, and combinations of blocks in the block diagrams and/or flow diagrams can be implemented using dedicated hardware-based systems that perform the specified functions or actions. be implemented, or may be implemented in a combination of special purpose hardware and computer instructions.
  • each functional module or unit in each embodiment of the present application may be integrated together to form an independent part, or each module may exist independently, or two or more modules may be integrated to form an independent part.
  • the functions are implemented in the form of software function modules and sold or used as independent products, they may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution, and the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a smart phone, a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

Abstract

本申请实施例公开了位姿识别模型训练方法、装置、位姿识别方法和终端设备,该训练方法包括:获取原始RGB图像;利用预设的位姿点线图提取方法从所述原始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图;将所述位姿点线图以区别于预定背景板颜色的目标颜色绘制在背景板上,以确定训练RGB图像;利用多个训练RGB图像训练所述位姿识别模型直至所述位姿识别模型对应的损失函数收敛。本申请的技术方案实现利用较少的训练RGB图像即可完成对位姿识别模型的训练,降低位姿识别模型的训练时间。

Description

位姿识别模型训练方法、装置、位姿识别方法和终端设备
相关申请的交叉引用
本申请要求于2020年12月18日提交中国专利局的申请号为2020115035761、名称为“位姿识别模型训练方法、装置、位姿识别方法和终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种位姿识别模型训练方法、装置、位姿识别方法和终端设备。
背景技术
随着深度学习技术的不断发展,越来越多人研究基于图像的行为识别方法,但是,RGB图像极易受到背景噪声的干扰,当背景发生变化时会导致利用该数据训练出来的模型识别精度急剧下降,同时基于图像的行为识别方法对数据量的要求也比较大,一般需要庞大的数据量才能训练出来一个较为鲁棒的模型。
申请内容
鉴于上述问题,本申请提出一种位姿识别模型训练方法、装置、位姿识别方法和终端设备。
本申请的一个实施例提出一种位姿识别模型训练方法,该位姿识别模型训练方法包括:
获取原始RGB图像;
利用预设的位姿点线图提取方法从所述原始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图;
将所述位姿点线图以区别于预定背景板颜色的目标颜色绘制在背景板上,以确定训练RGB图像;
利用多个训练RGB图像训练所述位姿识别模型直至所述位姿识别模型对应的损失函数收敛。
本申请另一个实施例所述的位姿识别模型训练方法,在所述原始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图之后,还包括:
根据所述位姿点线图中各个位姿关键点坐标的属性对各个位姿关键点坐标进行特征分类。
上述的位姿识别模型训练方法,所述将所述位姿点线图以区别于所述预定背景板颜色的目标颜色分别绘制在背景板上,包括:
将所述位姿点线图绘制在所述背景板上;
在所述背景板上,将同一特征类别中的位姿关键点坐标构成的点线标注为预设的同一颜色,各个特征类别中的位姿关键点坐标构成的点线标注颜色不同且与所述预定背景板颜色不同。
上述的位姿识别模型训练方法,在识别人体位姿时,所述特征类别包括头颈特征类别、手臂特征类别和下肢特征类别。
上述的位姿识别模型训练方法,所述预定背景板颜色为黑色,所述头颈特征类别、所述手臂特征类别和所述下肢特征类别中的位姿关键点坐标构成的点线依次标注为绿色、蓝色和红色之一。
上述的位姿识别模型训练方法,所述利用预设的位姿点线图提取方法从所述原始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图,包括:
利用OpenPose人体姿态估计算法提取所述原始RGB图像的预设数目个关键点的位置信息;
根据预设数目个关键点的位置信息确定所述位姿点线图。
本申请的再一个实施例提出一种位姿识别模型训练装置,该装置包括:
原始RGB图像获取模块,用于获取原始RGB图像;
位姿点线图提取模块,用于利用预设的位姿点线图提取方法从所述原始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图;
训练RGB图像确定模块,用于将所述位姿点线图以区别于所述预定背景板颜色的目标颜色分别绘制在背景板上,以确定训练RGB图像;
位姿识别模型训练模块,用于利用多个训练RGB图像训练所述位姿识别模型直至所述位姿识别模型对应的损失函数收敛。
本申请的又一个实施例提出一种位姿识别方法,该位姿识别方法包括:
获取待识别的初始RGB图像;
利用预设的位姿点线图提取方法从所述初始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图;
将所述位姿点线图以区别于所述预定背景板颜色的目标颜色绘制在所述背景板上,以确定待识别的点线RGB图像;
利用本申请实施例所述位姿识别模型训练方法训练获得的位姿识别模型确定所述点线RGB图像对应的位姿。
本申请实施例涉及一种终端设备,包括存储器和处理器,所述存储器用于存储计算机程序,所述计算机程序在所述处理器上运行时执行本申请实施例所述的位姿识别模型训练方法或本申请实施例所述的位姿识别方法。
本申请实施例涉及一种可读存储介质,其存储有计算机程序,所述计算机程序在处理器上运行时执行本申请实施例所述的位姿识别模型训练方法或本申请实施例所述的位姿识别模型训练方法。
本申请公开的位姿识别模型训练方法包括:获取原始RGB图像;利用预设的位姿点线图提取方法从所述原始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图;将所述位姿点线图以区别于预定背景板颜色的目标颜色分别绘制在背景板上,以确定训练RGB图像;利用多个训练RGB图像训练所述位姿识别模型直至所述位姿识别模型对应的损失函数收敛。本申请的技术方案各个训练RGB图像的背景颜色统一,且训练RGB图像上的位姿点线图仅仅包括关键部位的点线连接关系,与原有训练样本,即直接利用原始RGB图像作为训练样本相比,训练RGB图像可以有效降低原有训练样本的复杂度,实现利用较少的训练RGB图像即可完成对位姿识别模型的训练,降低位姿识别模型的训练时间。
附图说明
为了更清楚地说明本申请的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对本申请保护范围的限定。在各个附图中,类似的构成部分采用类似的编号。
图1示出了本申请实施例提出的一种位姿识别模型训练方法的流程示意图;
图2示出了本申请实施例提出的另一种位姿识别模型训练方法的流程示意图;
图3示出了本申请实施例提出的一种位姿识别模型训练装置的结构示意图;
图4示出了本申请实施例提出的一种位姿识别方法的流程示意图;
图5示出了本申请实施例提出的一种18个关键点组成的位姿点线图的示意图;
图6示出了本申请实施例提出的另一种25个关键点组成的位姿点线图的示意图;
图7示出了本申请实施例提出的一种拥抱位姿点线图的示意图;
图8示出了本申请实施例提出的一种举手位姿点线图的示意图;
图9示出了本申请实施例提出的一种下蹲位姿点线图的示意图;
图10示出了本申请实施例提出的一种站立位姿点线图的示意图。
主要元件符号说明:
1-位姿识别模型训练装置;10-原始RGB图像获取模块;20-位姿点线图提取模块;30-训练RGB图像确定模块;40-位姿识别模型训练模块。
具体实施方式
下面将结合本申请实施例中附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。
通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请的选定实施例。基于本申请的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本申请保护的范围。
在下文中,可在本申请的各种实施例中使用的术语“包括”、“具有”及其同源词仅意在表示特定特征、数字、步骤、操作、元件、组件或前述项的组合,并且不应被理解为首先排除一个或更多个其它特征、数字、步骤、操作、元件、组件或前述项的组合的存在或增加一个或更多个特征、数字、步骤、操作、元件、组件或前述项的组合的可能性。
此外,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。
除非另有限定,否则在这里使用的所有术语(包括技术术语和科学术语)具有与本申请的各种实施例所属领域普通技术人员通常理解的含义相同的含义。所述术语(诸如在一般使用的词典中限定的术语)将被解释为具有与在相关技术领域中的语境含义相同的含义并且将不被解释为具有理想化的含义或过于正式的含义,除非在本申请的各种实施例中被清楚地限定。
身体位姿和手势位姿是人类之间进行表达与交流的重要方式,基于人体的行为识别也是人机交互的重要研究方向之一,在视频监控、体感游戏等领域发挥着重要的作用。人体行为姿态复杂多变,通常训练一个行为识别模型需要大量的人体行为数据。本申请提出了一种用于训练位姿识别模型的方法,从减少特征冗余、提高模型注意力的角度减少行为识别任务的训练数据量。进一步的,本申请主要解决的问题是:(1)人体行为复杂多变,通常存在背景等干扰因素,如果使用原始采集的RGB图像特征进行位姿识别,一旦场景发生改变,模型识别效果会急剧下降。并且,在实际应用过程中很难做到针对每个场景进行定制化设计,耗时耗力;(2)通常训练一个较为鲁棒的行为识别模型需要大量的数据,无法覆盖到所有的场景。
为解决上述问题,本申请提出了一种位姿识别模型训练方法,通过位姿点线图提取方法将原始的RGB图像转化为含有人体关键点特征的骨架图,即位姿点线图,通过将位姿点线图绘制在预设颜色的背景板上,实现对背景信息的解耦,同时提高模型对特征识别的专注度。
本申请包括以下关键步骤:
1)在数据预处理阶段,通过OpenPose位姿点线图提取方法,提取原始RGB图像的18个关键点的位置信息(横纵坐标),并将其绘制于黑色背景图上,得到与原始RGB图像一一对应的包括位姿点线图的训练RGB图像。
2)在数据预处理阶段,将人体解耦为三部分,即头颈、手臂和下肢,因为区分不同行为依赖的人体部位是不同的。并将三个部分的点和线用不同颜色渲染。
3)在训练阶段,将渲染后的人体骨架图数据送入预设的卷积神经网络进行特征提取和训练。
实施例1
本实施例,参见图1,示出了一种位姿识别模型训练方法包括以下步骤:
S10:获取原始RGB图像。
可以理解,原始RGB图像上包括待识别的行为姿态或手势姿态。原始RGB图像的获取,可以通过图像采集设备采集获取,例如,摄像机、照相机、热图像采集设备等;还可以通过采集视频录像的多个视频帧的方式获取;还可以直接从存储有原始RGB图像的存储设备中导入,例如,存储设备可以是硬盘、U盘、PAD和笔记本等具有存储功能的电子设备。本实施例对于获取各个原始RGB图像的方式在此不做限定。
RGB图像也叫全彩图,其有三个通道,分别为:R(red)、G(green)和B(blue)。RGB即是代表红、绿、蓝三个通道的颜色,RGB标准几乎包括了人类视力所能感知的所有颜色,是目前运用最广的颜色系统之一。
S20:利用预设的位姿点线图提取方法从所述原始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图。
可以利用OpenPose人体姿态估计算法提取所述原始RGB图像的预设数目个关键点的位置信息;根据预设数目个关键点的位置信息确定所述位姿点线图。
OpenPose基于卷积神经网络和监督学习并以caffe为框架开发的开源库,可以用于人体动作、面部表情、手指运动等姿态估计。OpenPose可以从每一个原始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图,即可以识别原始RGB图像中的各个人体关节点或人体关键部位,并输出各个人体关节点或人体关键部位的坐标。其中,预设数目可以是18和25中的一种。
示范性的,OpenPose提取的18个关键点,以坐标的形式表示,每一关键点对应一个特定的序号,每一序号对应一个特定的人体关键部位,如下表所示。
关键部位 序号 X(横坐标) Y(纵坐标)
鼻子 0 379.16229 172.26562
1 381.96054 270.70312
右肩 2 320.39914 263.67188
右肘 3 289.61844 348.04688
右手腕 4 284.02194 446.48438
左肩 5 435.1272 270.70312
左肘 6 457.51315 362.10938
左手腕 7 457.51315 453.51562
右臀 8 331.5921 460.54688
右膝 9 312.00439 608.20312
右脚踝 10 300.8114 748.82812
左臀 11 401.54825 474.60938
左膝 12 390.35529 615.23438
左脚踝 13 373.5658 741.79688
右眼 14 370.76755 158.20312
左眼 15 387.55704 158.20312
右耳 16 353.97809 179.29688
左耳 17 404.3465 179.29688
示范性的,OpenPose提取的25个关键点,以坐标的形式表示,每一关键点对应一个特定的序号,每一序号对应一个特定的人体关键部位,25个关键点与序号的对应关系如下表所示。
关键部位 序号
鼻子 0
1
右肩 2
右肘 3
右手腕 4
左肩 5
左肘 6
左手腕 7
臀中 8
右臀 9
右膝 10
右脚踝 11
左臀 12
左膝 13
左脚踝 14
右眼 15
左眼 16
右耳 17
左耳 18
左大脚趾 19
左小脚趾 20
左脚跟 21
右大脚趾 22
右小脚趾 23
右脚跟 24
显然,利用所述预设数目个位姿关键点坐标和各个位姿关键点坐标对应的身体部位可以按照身体结构对各个关键点坐标进行对应的连接,确定位姿点线图。18个关键点组成的位姿点线图如图5所示,25个关键点组成的位姿点线图如图6所示。
S30:将所述位姿点线图以区别于所述预定背景板颜色的目标颜色分别绘制在背景板上,以确定训练RGB图像。
可以预先生成RGB三通道均为预定背景板颜色的背景板,即生成的背景板颜色统一,可以用于制作背景颜色统一的训练RGB图像。
可以利用区别于所述预定背景板颜色的目标颜色将各个位姿点线图分别绘制在对应的背景板上,将背景与待识别的位姿点线图进行区分,便于姿态识别模型快速准确的识别待识别的位姿点线图,有效避免背景干扰。
S40:利用多个训练RGB图像训练所述位姿识别模型直至所述位姿识别模型对应的损失函数收敛。
需要多个训练RGB图像训练所述位姿识别模型,各个训练RGB图像的背景颜色统一,且训练RGB图像上的位姿点线图仅仅包括关键部位的点线连接关系,与原有训练样本,即直接利用原始RGB图像作为训练样本相比,训练RGB图像可以有效降低原有训练样本的复杂度,实现利用较少的训练RGB图像即可完成对位姿识别模型的训练,降低位姿识别模型的训练时间。
本实施例公开的位姿识别模型训练方法包括:获取原始RGB图像;利用预设的位姿点线图提取方法从所述原始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图;将所述位姿点线图以区别于预定背景板颜色的目标颜色分别绘制在背景板上,以确定训练RGB图像;利用多个训练RGB图像训练所述位姿识别模型直至所述位姿识别模型对应的损失函数收敛。本实施例的技术方案各个训练RGB图像的背景颜色统一, 且训练RGB图像上的位姿点线图仅仅包括关键部位的点线连接关系,与原有训练样本,即直接利用原始RGB图像作为训练样本相比,训练RGB图像可以有效降低原有训练样本的复杂度,实现利用较少的训练RGB图像即可完成对位姿识别模型的训练,降低位姿识别模型的训练时间。
实施例2
本实施例,参见图2,示出了另一种位姿识别模型训练方法包括以下步骤:
S100:获取各个原始RGB图像。
S200:利用预设的位姿点线图提取方法从每一个原始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图。
S300:根据所述位姿点线图中各个位姿关键点坐标的属性对各个位姿关键点坐标进行特征分类。
利用所述位姿点线图提取方法从每一个原始RGB图像中提取所述预设数目个位姿关键点坐标和各个位姿关键点坐标对应的属性,根据所述位姿点线图中各个位姿关键点坐标的属性对各个位姿关键点坐标进行特征分类。
示范性的,在人体姿态识别过程中,各个位姿关键点坐标的属性对应身体的各个部位。考虑到不同行为姿态依赖的人体部位是不同的。例如,拥抱、举手行为更多依赖于手臂部分的关键点,而蹲下行为更多依赖于下肢部分的关键点。因此,在识别人体位姿时,可以将各个人体关键部位进行特征分类,对应的特征类别包括头颈特征类别、手臂特征类别和下肢特征类别。
进一步的,在人体姿态识别过程中,左耳、左眼、鼻子、右眼和右耳对应的关键点可以顺次连接,鼻子和颈对应的关键点连接,右手腕、右肘、右肩、颈、左肩、左肘和左手腕对应的关键点可以顺次连接,颈、左臀、左膝、左脚踝对应的关键点可以顺次连接;颈、右臀、右膝、右脚踝对应的关键点可以顺次连接,上述连接完成后将获得位姿点线图。
进一步的,若位姿点线图提取方法提取18个关键点,则序号为0、1和14~17属于头颈特征类别,序号为2~7属于手臂特征类别,序号为8~13属于下肢特征类别;若位姿点线图提取方法提取25个关键点,则序号为0、1和15~18属于头颈特征类别,序号为2~7属于手臂特征类别,序号为8~14和19~24属于下肢特征类别。
S400:将各个位姿点线图分别绘制在对应的背景板上。
可以生成RGB三通道均为预定背景板颜色的背景板,所述背景板的数目与所述原始RGB图像的数目相同。
可以理解,在背景板与原始RGB图像的长和宽都相同时,可以将各个位姿点线图等比例分别绘制在对应的背景板上;在背景板与原始RGB图像的长和宽不相同时,可以通过背景板与原始RGB图像的长度比例和宽度比例,将各个位姿点线图按照长度比例和宽度比例分别绘制在对应的背景板上。
S500:在每一个背景板上,将同一特征类别中的位姿关键点坐标构成的点线标注为预设的同一颜色,各个特征类别中的位姿关键点坐标构成的点线的标注颜色不同且与所述预定背景板颜色不同,以确定对应的训练RGB图像。
可以将头颈特征类别、手臂特征类别和下肢特征类别中所包括的点线标注为不同的颜色,例如,所述头颈特征类别、所述手臂特征类别和所述下肢特征类别中所包括的点线可以依次标注为绿色、蓝色和红色。示范性的,头颈使用绿色标注,手臂使用蓝色标注,下肢使用红色标注。因为手臂和下肢使用的频率较大,并且,手臂和下肢更有利于对于人体姿态的识别,因此,使用区分度较大的蓝色和红色,使手臂特征和下肢特征具有更大的区分度。
可以理解,还可将手臂使用红色标注,下肢使用蓝色标注。只要可以增加手臂特征和下肢特征的区分度即可,本实施例在此不做限定。
示范性的,可以利用“color_head=[0,255,0]”将头颈特征类别中的位姿关键点坐标构成的点线标注为绿色,可以利用“color_arm=[0,0,255]”将手臂特征类别中的位姿关键点坐标构成的点线标注为蓝色,可以利用“color_leg=[255,0,0]”将手臂特征类别中的位姿关键点坐标构成的点线标注为红色。
预定背景板颜色为黑色,因为将人体位姿点线图绘制于黑色背景板中,可以减少背景干扰,突出需要学习的人体姿态信息,使得位姿识别模型更专注于学习人体姿态信息。
示范性的,可以利用“image=image[:,:,0:3]=0”,生成一幅RGB三通道均为黑色的,并且与原始RGB图像的长和宽相同的背景板。其中,每个“,”前参数:第一个是宽,第二个是高,第三个是通道,0:3就是0、1、2三个通道,代表R(red)、G(green)和B(blue)三个通道,“=0”代表生成一幅背景颜色为黑的背景板。
S600:利用所述各个训练RGB图像训练所述位姿识别模型直至所述位姿识别模型对应的损失函数收敛。
位姿识别模型可以是卷积神经网络,可以是ResNet18、MobileNet v2和ShuffleNet v2中的一种。因为卷积神经网络对颜色、纹理等特征较为敏感,渲染后的训练RGB图像十分适合卷积神经网络进行特征提取和学习,因此,位姿识别模型所需要的训练数据量可以大大减少。只需要利用较少的训练数据对位姿识别模型进行简单几轮训练,即可得到鲁棒性很好的位姿识别模型。
示范性的,经过试验对比,使用原始RGB图像训练由ResNet18网络构成的位姿识别模型,若使位姿识别模型可以准确的识别拥抱、举手、蹲下和站立,至少需要各个姿态类别的原始RGB图像分别500张,而且还不能覆盖所有场景,训练完成的位姿识别模型识别效果也不理想。但是,使用本申请实施例提出的训练RGB图像训练由ResNet18网络构成的位姿识别模型,仅仅需要各个姿态类别的原始RGB图像分别50张,对位姿识别模型进行简单几轮训练,即可得到鲁棒性很好的位姿识别模型。
示范性的,拥抱位姿点线图如图7所示、举手位姿点线图如图8所示、蹲下位姿点线图如图9所示和站立位姿点线图如图10所示。可以理解,图7至图10中,头颈部分使用绿色标注,手臂部分使用蓝色标注,下肢部分使用红色标注。
实施例3
本实施例,参见图3,示出了一种位姿识别模型训练装置1包括:原始RGB图像获取模块10、位姿点线图提取模块20、训练RGB图像确定模块30和位姿识别模型训练模块40。
原始RGB图像获取模块10,用于获取原始RGB图像;位姿点线图提取模块20,用于利用预设的位姿点线图提取方法从原始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图;训练RGB图像确定模块30,用于将位姿点线图以区别于所述预定背景板颜色的目标颜色绘制在背景板上,以确定训练RGB图像;位姿识别模型训练模块40,用于利用所述多个训练RGB图像训练所述位姿识别模型直至所述位姿识别模型对应的损失函数收敛。
进一步的,位姿识别模型训练装置1,还包括:
特征分类模块,用于根据所述位姿点线图中各个位姿关键点坐标的属性对各个位姿关键点坐标进行特征分类。
进一步的,利用预设的位姿点线图提取方法从所述原始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图,包括:
利用OpenPose人体姿态估计算法提取所述原始RGB图像的18个关键点的位置信息;
根据18个关键点的位置信息确定所述位姿点线图。
进一步的,训练RGB图像确定模块30,包括:
位姿点线图绘制单元,用于将位姿点线图分别绘制背景板上;特征类别颜色区分单元,用于背景板上,将同一特征类别中的位姿关键点坐标构成的点线标注为预设的同一颜色,各个特征类别中的位姿关键点坐标构成的点线的标注颜色不同且与所述预定背景板颜色不同。
进一步的,位姿识别模型训练装置,在识别人体位姿时,所述特征类别包括头颈特征类别、手臂特征类别和下肢特征类别。
进一步的,位姿识别模型训练装置,所述预定背景板颜色为黑色,所述头颈特征类别、手臂特征类别和下肢特征类别为绿色、蓝色和红色之一,且各个特征类别中的位姿关键点坐标构成的点线的标注颜色不同,且与所述预定背景板颜色不同。
进一步的,位姿识别模型训练装置,所述预定的位姿识别模型包括ResNet18、MobileNet v2和ShuffleNet v2中的一种。
本实施例公开的位姿识别模型训练装置1通过原始RGB图像获取模块10、位姿点线图提取模块20、训练RGB图像确定模块30和位姿识别模型训练模块40的配合使用,用于执行上述实施例所述的位姿识别模型训练方法,上述实施例所涉及的实施方案以及有益效果在本实施例中同样适用,在此不再赘述。
实施例4
本实施例,参见图4,示出了一种位姿识别方法包括以下步骤:
S11:获取待识别的初始RGB图像;
S21:利用预设的位姿点线图提取方法从所述初始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图;
S31:将所述位姿点线图以区别于所述预定背景板颜色的目标颜色分别绘制在所述背景板上,以确定待识别的点线RGB图像;
S41:利用预先训练达标的位姿识别模型确定所述点线RGB图像对应的位姿。
通过利用本申请实施例涉及的位姿识别模型训练方法训练好的位姿识别模型对点线RGB图像进行预测,可以得到每个点线RGB图像属于其中某个类别各自的概率值,可以理解,取其中最大的概率值所属的类别即为位姿识别模型最终识别的行为。
进一步的,步骤S21还可以包括:根据所述位姿点线图中各个位姿关键点坐标的属性对各个位姿关键点坐标进行特征分类。
进一步的,在识别人体位姿时,所述特征类别包括头颈特征类别、手臂特征类别和下肢特征类别。
进一步的,步骤S31包括:
将各个位姿点线图分别绘制在对应的背景板上;在一每个背景板上,将同一特征类别中的位姿关键点坐标构成的点线标注为预设的同一颜色,各个特征类别中的位姿关键点坐标构成的点线的标注颜色不同且与所述预定背景板颜色不同。
进一步的,预定背景板颜色为黑色,所述头颈特征类别、所述手臂特征类别和所述下肢特征类别中的位姿关键点坐标构成的点线依次标注为绿色、蓝色和红色之一,各个特征类别中的位姿关键点坐标构成的点线的标注颜色不同且与所述预定背景板的颜色不同。
可以理解,本申请实施例涉及一种终端设备,包括存储器和处理器,所述存储器用于存储计算机程序,所述计算机程序在所述处理器上运行时执行本申请实施例所述的位姿识别模型训练方法或本申请实施例所述的位姿识别方法。
可以理解,本申请实施例涉及一种可读存储介质,其存储有计算机程序,所述计算机程序在处理器上运行时执行本申请实施例所述的位姿识别模型训练方法或本申请实施例所述的位姿识别方法。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,附图中的流程图和结构图显示了根据本申请的多个实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在作为替换的实现方式中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是, 结构图和/或流程图中的每个方框、以及结构图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
另外,在本申请各个实施例中的各功能模块或单元可以集成在一起形成一个独立的部分,也可以是各个模块单独存在,也可以两个或更多个模块集成形成一个独立的部分。
所述功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是智能手机、个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。

Claims (10)

  1. 一种位姿识别模型训练方法,其特征在于,该位姿识别模型训练方法包括:
    获取原始RGB图像;
    利用预设的位姿点线图提取方法从所述原始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图;
    将所述位姿点线图以区别于预定背景板颜色的目标颜色绘制在背景板上,以确定训练RGB图像;
    利用多个训练RGB图像训练所述位姿识别模型直至所述位姿识别模型对应的损失函数收敛。
  2. 根据权利要求1所述的位姿识别模型训练方法,其特征在于,在从所述原始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图之后,还包括:
    根据所述位姿点线图中各个位姿关键点坐标的属性对各个位姿关键点坐标进行特征分类。
  3. 根据权利要求2所述的位姿识别模型训练方法,其特征在于,所述将所述位姿点线图以区别于预定背景板颜色的目标颜色分别绘制在背景板上,包括:
    将所述位姿点线图绘制在所述背景板上;
    在所述背景板上,将同一特征类别中的位姿关键点坐标构成的点线标注为预设的同一颜色,各个特征类别中的位姿关键点坐标构成的点线标注颜色不同且与所述预定背景板颜色不同。
  4. 根据权利要求3所述的位姿识别模型训练方法,其特征在于,在识别人体位姿时,所述特征类别包括头颈特征类别、手臂特征类别和下肢特征类别。
  5. 根据权利要求4所述的位姿识别模型训练方法,其特征在于,所述预定背景板颜色为黑色,所述头颈特征类别、所述手臂特征类别和所述下肢特征类别中的位姿关键点坐标构成的点线标注为绿色、蓝色和红色之一。
  6. 根据权利要求1至5任一项所述的位姿识别模型训练方法,其特征在于,所述利用预设的位姿点线图提取方法从所述原始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图,包括:
    利用OpenPose人体姿态估计算法提取所述原始RGB图像的预设数目个关键点的位置信息;
    根据预设数目个关键点的位置信息确定所述位姿点线图。
  7. 一种位姿识别模型训练装置,其特征在于,该装置包括:
    原始RGB图像获取模块,用于获取原始RGB图像;
    位姿点线图提取模块,用于利用预设的位姿点线图提取方法从所述原始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图;
    训练RGB图像确定模块,用于将所述位姿点线图以区别于预定背景板颜色的目标颜色绘制在背景板上,以确定训练RGB图像;
    位姿识别模型训练模块,用于利用多个训练RGB图像训练所述位姿识别模型直至所述位姿识别模型对应的损失函数收敛。
  8. 一种位姿识别方法,其特征在于,该位姿识别方法包括:
    获取待识别的初始RGB图像;
    利用预设的位姿点线图提取方法从所述初始RGB图像中提取由预设数目个位姿关键点坐标组成的位姿点线图;
    将所述位姿点线图以区别于所述预定背景板颜色的目标颜色分别绘制在所述背景板上,以确定待识别的点线RGB图像;
    利用权利要求1至5任一项所述位姿识别模型训练方法训练获得的位姿识别模型确定所述点线RGB图像对应的位姿。
  9. 一种终端设备,其特征在于,包括存储器和处理器,所述存储器用于存储计算机程序,所述计算机程序在所述处理器上运行时执行权利要求1至6任一项所述的位姿识别模型训练方法或权利要求8所述的位姿识别方法。
  10. 一种可读存储介质,其特征在于,其存储有计算机程序,所述计算机程序在处理器上运行时执行权利要求1至6任一项所述的位姿识别模型训练方法或权利要求8所述的位姿识别方法。
PCT/CN2021/131148 2020-12-18 2021-11-17 位姿识别模型训练方法、装置、位姿识别方法和终端设备 WO2022127494A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011503576.1A CN112489129A (zh) 2020-12-18 2020-12-18 位姿识别模型训练方法、装置、位姿识别方法和终端设备
CN202011503576.1 2020-12-18

Publications (1)

Publication Number Publication Date
WO2022127494A1 true WO2022127494A1 (zh) 2022-06-23

Family

ID=74914695

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/131148 WO2022127494A1 (zh) 2020-12-18 2021-11-17 位姿识别模型训练方法、装置、位姿识别方法和终端设备

Country Status (2)

Country Link
CN (1) CN112489129A (zh)
WO (1) WO2022127494A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116509333A (zh) * 2023-05-20 2023-08-01 中国医学科学院阜外医院 基于双目图像的人体平衡能力评估方法、系统及设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112489129A (zh) * 2020-12-18 2021-03-12 深圳市优必选科技股份有限公司 位姿识别模型训练方法、装置、位姿识别方法和终端设备
CN113077512B (zh) * 2021-03-24 2022-06-28 浙江中体文化集团有限公司 一种rgb-d位姿识别模型训练方法及系统
CN113362467B (zh) * 2021-06-08 2023-04-07 武汉理工大学 基于点云预处理和ShuffleNet的移动端三维位姿估计方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886089A (zh) * 2017-12-11 2018-04-06 深圳市唯特视科技有限公司 一种基于骨架图回归的三维人体姿态估计的方法
CN108846365A (zh) * 2018-06-24 2018-11-20 深圳市中悦科技有限公司 视频中打架行为的检测方法、装置、存储介质及处理器
US20200160046A1 (en) * 2017-06-30 2020-05-21 The Johns Hopkins University Systems and method for action recognition using micro-doppler signatures and recurrent neural networks
CN111274998A (zh) * 2020-02-17 2020-06-12 上海交通大学 帕金森病手指敲击动作识别方法及系统、存储介质及终端
CN112489129A (zh) * 2020-12-18 2021-03-12 深圳市优必选科技股份有限公司 位姿识别模型训练方法、装置、位姿识别方法和终端设备

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3721623B2 (ja) * 1995-12-29 2005-11-30 カシオ計算機株式会社 描画色変更方法、及び動画再生装置
JP4067957B2 (ja) * 2002-12-20 2008-03-26 富士通株式会社 境界検出方法、プログラム、及び画像処理装置
CN106991449B (zh) * 2017-04-10 2020-10-23 大连大学 一种生活场景重构辅助识别蓝莓品种的方法
CN108510491B (zh) * 2018-04-04 2020-07-24 深圳市未来媒体技术研究院 虚化背景下人体骨骼关键点检测结果的过滤方法
CN109686198A (zh) * 2019-02-25 2019-04-26 韩嘉言 一种基于计算机视觉技术的摆动运动实验测量方法和系统
CN109902659B (zh) * 2019-03-15 2021-08-20 北京字节跳动网络技术有限公司 用于处理人体图像的方法和装置
CN110378432B (zh) * 2019-07-24 2022-04-12 阿里巴巴(中国)有限公司 图片生成方法、装置、介质及电子设备
CN110659683A (zh) * 2019-09-20 2020-01-07 杭州智团信息技术有限公司 图像处理方法、装置及电子设备
CN110598675B (zh) * 2019-09-24 2022-10-11 深圳度影医疗科技有限公司 一种超声胎儿姿态的识别方法、存储介质及电子设备
CN111079543B (zh) * 2019-11-20 2022-02-15 浙江工业大学 一种基于深度学习的高效车辆颜色识别方法
CN111046819B (zh) * 2019-12-18 2023-09-05 浙江大华技术股份有限公司 一种行为识别处理方法及装置
CN111832383B (zh) * 2020-05-08 2023-12-08 北京嘀嘀无限科技发展有限公司 姿态关键点识别模型的训练方法、姿态识别方法及装置
CN111753656B (zh) * 2020-05-18 2024-04-26 深圳大学 特征提取方法、装置、设备及计算机可读存储介质
CN111723687A (zh) * 2020-06-02 2020-09-29 北京的卢深视科技有限公司 基于神经网路的人体动作识别方法和装置
CN111915676B (zh) * 2020-06-17 2023-09-22 深圳大学 图像生成方法、装置、计算机设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200160046A1 (en) * 2017-06-30 2020-05-21 The Johns Hopkins University Systems and method for action recognition using micro-doppler signatures and recurrent neural networks
CN107886089A (zh) * 2017-12-11 2018-04-06 深圳市唯特视科技有限公司 一种基于骨架图回归的三维人体姿态估计的方法
CN108846365A (zh) * 2018-06-24 2018-11-20 深圳市中悦科技有限公司 视频中打架行为的检测方法、装置、存储介质及处理器
CN111274998A (zh) * 2020-02-17 2020-06-12 上海交通大学 帕金森病手指敲击动作识别方法及系统、存储介质及终端
CN112489129A (zh) * 2020-12-18 2021-03-12 深圳市优必选科技股份有限公司 位姿识别模型训练方法、装置、位姿识别方法和终端设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116509333A (zh) * 2023-05-20 2023-08-01 中国医学科学院阜外医院 基于双目图像的人体平衡能力评估方法、系统及设备
CN116509333B (zh) * 2023-05-20 2024-02-02 中国医学科学院阜外医院 基于双目图像的人体平衡能力评估方法、系统及设备

Also Published As

Publication number Publication date
CN112489129A (zh) 2021-03-12

Similar Documents

Publication Publication Date Title
WO2022127494A1 (zh) 位姿识别模型训练方法、装置、位姿识别方法和终端设备
CN109359538B (zh) 卷积神经网络的训练方法、手势识别方法、装置及设备
US11908244B2 (en) Human posture detection utilizing posture reference maps
CN107633207B (zh) Au特征识别方法、装置及存储介质
CN104463101B (zh) 用于文字性试题的答案识别方法及系统
US9020250B2 (en) Methods and systems for building a universal dress style learner
CN107679447A (zh) 面部特征点检测方法、装置及存储介质
Premaratne et al. Hand gesture tracking and recognition system using Lucas–Kanade algorithms for control of consumer electronics
Agrawal et al. A survey on manual and non-manual sign language recognition for isolated and continuous sign
CN106874826A (zh) 人脸关键点跟踪方法和装置
CN110569756A (zh) 人脸识别模型构建方法、识别方法、设备和存储介质
CN112906550B (zh) 一种基于分水岭变换的静态手势识别方法
CN110163111A (zh) 基于人脸识别的叫号方法、装置、电子设备及存储介质
WO2021223738A1 (zh) 模型参数的更新方法、装置、设备及存储介质
CN112114675B (zh) 基于手势控制的非接触电梯键盘的使用方法
CN110046544A (zh) 基于卷积神经网络的数字手势识别方法
Vishwakarma et al. Simple and intelligent system to recognize the expression of speech-disabled person
Zheng et al. Static Hand Gesture Recognition Based on Gaussian Mixture Model and Partial Differential Equation.
CN109325408A (zh) 一种手势判断方法及存储介质
CN112487981A (zh) 基于双路分割的ma-yolo动态手势快速识别方法
CN111862031A (zh) 一种人脸合成图检测方法、装置、电子设备及存储介质
Manaf et al. Color recognition system with augmented reality concept and finger interaction: Case study for color blind aid system
CN110321009B (zh) Ar表情处理方法、装置、设备和存储介质
CN109359543B (zh) 一种基于骨骼化的人像检索方法及装置
CN113052194A (zh) 一种基于深度学习的服装色彩认知系统及其认知方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21905423

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21905423

Country of ref document: EP

Kind code of ref document: A1