WO2021217937A1 - Procédé et dispositif d'apprentissage de modèle de posture, procédé et dispositif de reconnaissance de posture - Google Patents

Procédé et dispositif d'apprentissage de modèle de posture, procédé et dispositif de reconnaissance de posture Download PDF

Info

Publication number
WO2021217937A1
WO2021217937A1 PCT/CN2020/105898 CN2020105898W WO2021217937A1 WO 2021217937 A1 WO2021217937 A1 WO 2021217937A1 CN 2020105898 W CN2020105898 W CN 2020105898W WO 2021217937 A1 WO2021217937 A1 WO 2021217937A1
Authority
WO
WIPO (PCT)
Prior art keywords
recognition model
human body
posture
gesture recognition
human
Prior art date
Application number
PCT/CN2020/105898
Other languages
English (en)
Chinese (zh)
Inventor
姜沛
曹锋铭
Original Assignee
平安国际智慧城市科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安国际智慧城市科技股份有限公司 filed Critical 平安国际智慧城市科技股份有限公司
Publication of WO2021217937A1 publication Critical patent/WO2021217937A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method and equipment for training a gesture recognition model, a method and equipment for gesture recognition.
  • multi-person gesture recognition technology With the continuous development of computer vision technology, multi-person gesture recognition technology continues to appear in people’s lives. For example, in elderly care institutions or home care scenarios, multi-person gesture recognition technology can recognize the dangerous actions of the elderly and perform Alerts can evaluate the mobility of the elderly so as to take better care of the elderly.
  • Multi-person gesture recognition technology includes two indicators: recognition accuracy and recognition speed.
  • the inventor realizes that in related technologies, the recognition accuracy is improved by continuously increasing the structural complexity of the gesture recognition model, but it consumes a lot of system resources, and the cost of technology implementation is relatively high. high.
  • the recognition accuracy will be reduced. Therefore, there is an urgent need for a gesture recognition model whose recognition accuracy and recognition speed can meet application requirements.
  • a method for training a gesture recognition model including: acquiring a human body sample image and a corresponding human body sample pose; inputting the human body sample image into a first gesture recognition model and a second gesture recognition model; wherein, the first gesture recognition The model includes a first stacked hourglass network, the first stacked hourglass network includes a first layer of hourglass network, the second gesture recognition model includes a second stacked hourglass network, the second stacked hourglass network includes a second layer Hourglass network, the first number of layers is greater than the second number of layers; training the first gesture recognition model according to the output of the first gesture recognition model, and according to the output of the first gesture recognition model Output and the output of the second posture recognition model to train the second posture recognition model; and when the number of training times reaches a preset threshold, complete the first posture recognition model and the second posture recognition model Training.
  • a posture recognition method including: acquiring a current frame of human body image to be recognized, and a human body posture corresponding to the last frame of human body image; wherein, the human body posture includes the position of a human skeleton point; and inputting the current frame of human body image into training After the second posture recognition model; wherein, the second posture recognition model is generated after training through the training method of the aforementioned posture recognition model; according to the position of the human skeleton point corresponding to the last frame of the human body image, the current The predicted position of the human skeleton point corresponding to the frame of the human body image; and according to the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame of human body image, the human body corresponding to the current frame of human body image is generated attitude.
  • a computer device includes a memory and a processor, the memory is used to store information including program instructions, the processor is used to control the execution of the program instructions, and the following steps are implemented when the program instructions are loaded and executed by the processor:
  • a human body sample image and a corresponding human body sample pose input the human body sample image into a first pose recognition model and a second pose recognition model respectively; wherein, the first pose recognition model includes a first stacked hourglass network, and the first pose recognition model includes a first stacked hourglass network.
  • a stacked hourglass network includes a first number of hourglass networks
  • the second gesture recognition model includes a second stacked hourglass network
  • the second stacked hourglass network includes a second number of hourglass networks
  • the first number of layers is greater than The second layer number
  • training the first gesture recognition model according to the output of the first gesture recognition model, and according to the output of the first gesture recognition model and the output of the second gesture recognition model , Training the second posture recognition model; and when the number of training times reaches a preset threshold, completing the training of the first posture recognition model and the second posture recognition model.
  • a computer device includes a memory and a processor, the memory is used to store information including program instructions, the processor is used to control the execution of the program instructions, and the following steps are implemented when the program instructions are loaded and executed by the processor:
  • the human body posture includes the position of the human skeleton point; inputting the current frame of human body image into the trained second posture recognition model;
  • the second posture recognition model is generated after training by the training method of the aforementioned posture recognition model; according to the position of the human skeleton point corresponding to the last frame of human body image, the human skeleton point corresponding to the current frame of human body image is generated And according to the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame of the human image, the human posture corresponding to the current frame of the human image is generated.
  • a computer-readable storage medium including a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to perform the following steps:
  • a human body sample image and a corresponding human body sample pose input the human body sample image into a first pose recognition model and a second pose recognition model respectively; wherein, the first pose recognition model includes a first stacked hourglass network, and the first pose recognition model includes a first stacked hourglass network.
  • a stacked hourglass network includes a first number of hourglass networks
  • the second gesture recognition model includes a second stacked hourglass network
  • the second stacked hourglass network includes a second number of hourglass networks
  • the first number of layers is greater than The second layer number
  • training the first gesture recognition model according to the output of the first gesture recognition model, and according to the output of the first gesture recognition model and the output of the second gesture recognition model , Training the second posture recognition model; and when the number of training times reaches a preset threshold, completing the training of the first posture recognition model and the second posture recognition model.
  • a computer-readable storage medium including a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to perform the following steps:
  • the human body posture includes the position of the human skeleton point; inputting the current frame of human body image into the trained second posture recognition model;
  • the second posture recognition model is generated after training by the training method of the aforementioned posture recognition model; according to the position of the human skeleton point corresponding to the last frame of human body image, the human skeleton point corresponding to the current frame of human body image is generated And according to the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame of the human image, the human posture corresponding to the current frame of the human image is generated.
  • FIG. 1 is a schematic flowchart of a method for training a gesture recognition model provided by an embodiment of the application
  • Figure 2 is a schematic diagram of the position distribution of human bone points
  • Figure 3 is a schematic diagram of the structure of the hourglass network
  • Figure 4 is a schematic diagram of the structure of a stacked hourglass network
  • FIG. 5 is a schematic flowchart of a gesture recognition method proposed in an embodiment of this application.
  • FIG. 6 is a schematic structural diagram of a training device for a gesture recognition model proposed in an embodiment of this application.
  • FIG. 7 is a schematic structural diagram of a gesture recognition device proposed in an embodiment of this application.
  • FIG. 8 is a schematic diagram of a computer device provided by an embodiment of this application.
  • first, second, third, etc. may be used in the embodiments of the present application to describe the preset range, etc., these preset ranges should not be limited to these terms. These terms are only used to distinguish the preset ranges from each other.
  • the first preset range may also be referred to as the second preset range, and similarly, the second preset range may also be referred to as the first preset range.
  • the word “if” as used herein can be interpreted as “when” or “when” or “in response to determination” or “in response to detection”.
  • the phrase “if determined” or “if detected (statement or event)” can be interpreted as “when determined” or “in response to determination” or “when detected (statement or event) )” or “in response to detection (statement or event)”.
  • the multi-person gesture recognition technology includes two indicators of recognition accuracy and recognition speed.
  • the multi-person gesture recognition technology specifically includes two steps in the realization process. The first step is to detect the human body target, and the second step is to detect the human body pose for each human target. Among them, the detection of the human body pose takes up about approximately Five-sixths of the time. Therefore, to improve the recognition speed of multi-person gesture recognition technology is mainly to simplify the gesture recognition model so that the recognition accuracy and recognition speed can meet the application requirements.
  • the embodiment of the present application provides a method for training a gesture recognition model, which uses the output of the first gesture recognition model with a larger number of layers to help the second gesture recognition model with a smaller number of layers to train, so that the second gesture recognition model after training is
  • the accuracy of the second gesture recognition model is close to that of the first gesture recognition model, but the amount of data processing is much smaller than that of the first gesture recognition model.
  • FIG. 1 is a schematic flowchart of a method for training a gesture recognition model provided by an embodiment of the application. As shown in Figure 1, the method includes the following steps:
  • Step S101 Obtain an image of a human body sample and a corresponding posture of the human body sample.
  • the human body sample image is an image in which the human body posture has been determined, and the correct recognition result is the corresponding human body sample posture. Therefore, it can be used to train the gesture recognition model. It should be emphasized that, in order to further ensure the privacy and security of the human body sample image and the corresponding human body sample posture, the human body sample image and the corresponding human body sample posture may also be stored in a node of a blockchain.
  • the posture of the human body includes the positions of the human bone points
  • FIG. 2 is a schematic diagram of the position distribution of the human bone points.
  • various parts of the human body can be determined by human bone points.
  • each human bone point is numbered, and according to the coordinates of each human bone point in the image, one of the different human bone points is determined. The relative position between the two, corresponding to different human postures.
  • Step S102 Input the human body sample image into the trained first posture recognition model and the second posture recognition model respectively.
  • the first gesture recognition model includes a first stacked hourglass network
  • the first stacked hourglass network includes a first layer of hourglass network
  • the second gesture recognition model includes a second stacked hourglass network
  • the second stacked hourglass network includes a second layer In the hourglass network, the first layer is greater than the second layer.
  • Figure 3 is a schematic diagram of the structure of the hourglass network.
  • the input of a single hourglass network is an image
  • the output is an image feature.
  • the image processing process can be divided into two parts: a convolution path and a step-by-step path.
  • the convolution path convolves the image through the convolution path residual module
  • the output of the last convolution path residual module is used as the input of the first upsampling module.
  • the size of the block model in Figure 3 represents the size of the input resolution
  • the output resolution of the first convolution path residual module is half of the input resolution
  • the second convolution path residual module The input of is the output of the first convolution path residual module, that is, the input resolution of the second convolution path residual module is half of the output resolution of the first convolution path residual module.
  • the output resolution of each up-sampling module is twice the input resolution, so that the up-sampling module and the convolution path residual module have a one-to-one correspondence.
  • the output resolution of the fourth convolutional neural network in Figure 3 is equal to the input resolution of the first upsampling module, and the input resolution of the fourth convolutional neural network is the same as the first upsampling
  • the output resolutions of the modules are equal.
  • part of the output of the convolution path residual module is processed by multiple convolution path residual modules and multiple up-sampling modules, and the other part is processed by step-by-step
  • the processing of the path residual module is superimposed with the same resolution.
  • part of the output of the first residual module is processed by the second, third, fourth, and five residual modules of the convolution path, and the input and output resolutions of the fifth convolution path residual module are equal, and then After the first, second, and third up-sampling modules are up-sampling, the resolution is the same as the output resolution of the first residual module.
  • the output of the first residual module is processed by the step-by-step residual module, and the resolution remains unchanged, which is also the same as the output resolution of the first residual module. Therefore, after the two parts of the output of the first residual module are processed differently, the resolution is the same and can be superimposed, and the superimposed result is used as the input of the fourth upsampling module.
  • the characteristic image output by the hourglass network not only retains the information of all layers, but also can determine the human skeleton points from it.
  • Figure 4 is a schematic diagram of the structure of a stacked hourglass network. As shown in Figure 4, cascade multiple hourglass networks (the output of the previous hourglass network is used as the input of the next hourglass network) to get the stacked hourglass network, and the next hourglass network in the stacked hourglass network can use the previous hourglass The relationship between the human bone points determined by the network makes the determination of the human bone points in the next hourglass network output more accurate.
  • the more layers of the stacked hourglass network the more accurate the determination of human bone points. Therefore, the accuracy of the first posture recognition model is higher than that of the second posture recognition model, but the data processing amount of the first posture recognition model in use is also greater than that of the second posture recognition model, and the recognition speed is lower.
  • the embodiments of the present application aim to use the output of the trained first gesture recognition model with a larger number of layers to help train the second gesture recognition model with a smaller number of layers, so that the accuracy of the trained second gesture recognition model is close to that of the first gesture recognition model.
  • a gesture recognition model, but the amount of data processing is much smaller than the first gesture recognition model.
  • Step S103 training the second gesture recognition model according to the output of the first gesture recognition model and the output of the second gesture recognition model.
  • the first gesture recognition model is trained first, and when the recognition accuracy of the first gesture recognition model meets the preset condition, the training of the first gesture recognition model is completed, and the training is used After completing the first posture recognition model, train the second posture recognition model.
  • the human body sample image is input into the trained first gesture recognition model and the second gesture recognition model, respectively, to obtain the outputs of the first gesture recognition model and the second gesture recognition model.
  • the second posture recognition model is trained.
  • the first gesture recognition model may include an 8-layer stacked hourglass network
  • the second gesture recognition model may include a 4-layer stacked hourglass network, so that when the second gesture recognition model is used, the amount of data processing is much smaller than that of the first pose Recognize the model to improve the recognition speed.
  • the dimension of the feature vector input by the second gesture recognition model should also be smaller than that of the first gesture recognition model.
  • the dimension of the feature vector input by the first gesture recognition model can be 256 dimensions.
  • the dimension may be 128 dimensions, so that the data processing amount of the second gesture recognition model is smaller than that of the first gesture recognition model.
  • the first gesture recognition model in the embodiment of the present application is trained through the following steps:
  • Step S11 Determine the first difference between the output of the first posture recognition model and the posture of the human body sample.
  • the output of the first posture recognition model is a human skeleton point
  • the human skeleton point is in the form of coordinates.
  • the coordinates (x, y) of the k-th human skeleton point output by the first pose recognition model, and the coordinates of the k-th human skeleton point in the human sample pose are (x k , y k ), then according to the formula Calculate the distribution of the k-th human skeleton points, where ⁇ 2 is the variance of the Gaussian distribution, according to the formula Calculate the first difference between the output of the first posture recognition model and the posture of the human body sample.
  • step S12 the parameters of the first gesture recognition model are optimized according to the first difference.
  • the parameters of the first gesture recognition model can be optimized so that L 1 is gradually reduced.
  • Step S103 training the second gesture recognition model according to the output of the first gesture recognition model and the output of the second gesture recognition model, including:
  • Step S21 Determine the second difference between the output of the second posture recognition model and the posture of the human body sample.
  • the output of the second pose recognition model is also the human skeleton point, the coordinates of the kth human skeleton point (x, y), the coordinates of the kth human skeleton point in the human sample pose are (x k , y k ), According to the formula Calculate the distribution of the k-th human skeleton points, where ⁇ 2 is the variance of the Gaussian distribution, according to the formula Calculate the second difference between the output of the second posture recognition model and the posture of the human body sample.
  • Step S22 Determine the third difference between the output of the first gesture recognition model and the output of the second gesture recognition model.
  • step S23 the parameters of the second gesture recognition model are optimized according to the second difference and the third difference.
  • One possible implementation is to add weights to the second difference and the third difference to generate the fourth difference.
  • the sum of the weight corresponding to the second difference and the weight corresponding to the third difference is one, and the parameters of the second gesture recognition model are optimized according to the fourth difference.
  • the parameters of the second gesture recognition model can be optimized so that L 4 is gradually reduced.
  • Step S104 When the number of training times reaches the preset threshold, the training of the second gesture recognition model is completed.
  • the learning rate in the gradient descent method that is, the step length in the gradient descent process
  • the learning rate in the gradient descent method needs to be continuously adjusted. Continuously reduce the learning rate to reduce the scope of parameter optimization.
  • the value of the learning rate is 0.01 . After training all 29k human sample images, and testing all the remaining 11k, the corresponding recognition accuracy is obtained as a training.
  • the value of the learning rate is adjusted to 0.001
  • the value of the learning rate is adjusted to 0.0001
  • the training of the second gesture recognition model is completed.
  • the training method of the posture recognition model proposed in the embodiment of the application obtains the human body sample image and the corresponding human body sample posture, and inputs the human body sample image into the trained first posture recognition model and the second posture recognition model respectively.
  • the first number of layers of the hourglass network corresponding to the first gesture recognition model is greater than the second number of layers of the hourglass network corresponding to the second gesture recognition model.
  • the second posture recognition model is trained. When the number of training times reaches the preset threshold, the training of the second gesture recognition model is completed.
  • the output of the trained first gesture recognition model with a larger number of layers is used to help the second gesture recognition model with a smaller number of layers to train, so that the accuracy of the trained second gesture recognition model is close to that of the first.
  • Posture recognition model but the amount of data processing is much smaller than the first posture recognition model.
  • FIG. 5 is the application A schematic flowchart of the gesture recognition method proposed in the embodiment. As shown in Figure 5, the method includes the following steps:
  • Step S201 Obtain the current frame of human body image to be recognized and the human body posture corresponding to the last frame of human body image.
  • the posture of the human body includes the position of the bone point of the human body. It should be emphasized that, in order to further ensure the privacy and security of the human body sample image and the corresponding human body sample posture, the human body sample image and the corresponding human body sample posture can also be stored in a blockchain node.
  • the embodiment of this application uses optical flow The algorithm compensates the recognition accuracy of the second gesture recognition model.
  • the optical flow is the instantaneous velocity of the pixel motion of the spatially moving object on the observation imaging plane.
  • the optical flow algorithm uses the changes in the time domain of the pixels in the image sequence and the correlation between adjacent frames to find the previous frame.
  • the optical flow algorithm can predict the position of the corresponding human skeleton point in the current frame of the human body image by analyzing the position of the human skeleton point in the previous frame of the human body image.
  • Step S202 Input the current frame of human body image into the second posture recognition model after training.
  • the second gesture recognition model is generated after training by the training method of the aforementioned gesture recognition model.
  • the second gesture model has a smaller amount of data processing, so the recognition speed is faster.
  • Step S203 According to the position of the human skeleton point corresponding to the last frame of the human body image, the predicted position of the human skeleton point corresponding to the current frame of the human body image is generated.
  • Step S204 according to the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame of the human image, generate the human posture corresponding to the current frame of the human image.
  • the predicted position of the human skeleton point corresponding to the human body image in the current frame is obtained according to the optical flow corresponding to the human body image in the previous frame and the current frame.
  • the human body posture corresponding to the current frame of human body image is calculated. in, Is the predicted position of the k-th human skeleton point corresponding to the current frame of the human body image, and K cur is the position of the k-th human skeleton point in the output of the second pose recognition model, Is the position of the k-th human skeleton point corresponding to the current frame of the human body image, and ⁇ is the correction coefficient, which is a constant between 0.25-0.3. According to the positions of all the human skeleton points, the posture of the human body corresponding to the current frame of the human body image can be determined.
  • the posture recognition method proposed in the embodiment of the present application obtains the current frame of human body image to be recognized and the human body posture corresponding to the previous frame of human body image.
  • the current frame of the human body image is input into the trained second posture recognition model, and the predicted position of the human skeleton point corresponding to the current frame of the human body image is generated according to the position of the human skeleton point corresponding to the previous frame of the human body image.
  • the human posture corresponding to the current frame of the human image is generated.
  • the optical flow algorithm is used to compensate the output of the second posture recognition model, and the accuracy of human posture recognition is improved.
  • FIG. 6 is a schematic structural diagram of a training device for a gesture recognition model proposed in an embodiment of this application. As shown in FIG. 6, the device includes: a first acquisition module 310, a first input module 320, a training module 330, and a completion module 340.
  • the first acquisition module 310 is used to acquire a human body sample image and a corresponding human body sample pose.
  • the first input module 320 is configured to input the human body sample image into the trained first posture recognition model and the second posture recognition model respectively.
  • the first gesture recognition model includes a first stacked hourglass network
  • the first stacked hourglass network includes a first layer of hourglass network
  • the second gesture recognition model includes a second stacked hourglass network
  • the second stacked hourglass network includes a second layer In the hourglass network, the first layer is greater than the second layer.
  • the training module 330 is configured to train the second gesture recognition model according to the output of the first gesture recognition model and the output of the second gesture recognition model.
  • the completion module 340 is configured to complete the training of the second gesture recognition model when the number of training times reaches the preset threshold.
  • the device further includes: a determining module 350 for determining the first difference between the output of the first posture recognition model and the posture of the human body sample .
  • the optimization module 360 is configured to optimize the parameters of the first gesture recognition model according to the first difference.
  • the training module 330 includes: a first determination sub-module 331 for determining the output of the second posture recognition model and the posture of the human body sample The second difference.
  • the second determination sub-module 332 is used to determine the third difference between the output of the first gesture recognition model and the output of the second gesture recognition model.
  • the optimization sub-module 333 is used to optimize the parameters of the second gesture recognition model according to the second difference and the third difference.
  • the optimization sub-module 333 includes: a summation unit 333a for calculating the second difference And the third difference is weighted and summed to generate the fourth difference. Wherein, the sum of the weight corresponding to the second difference and the weight corresponding to the third difference is one.
  • the optimization unit 333b is configured to optimize the parameters of the second gesture recognition model according to the fourth difference.
  • the training device for the gesture recognition model acquires the human body sample image and the corresponding human body sample posture when training the gesture recognition model, and inputs the human body sample image into the trained first The gesture recognition model and the second gesture recognition model.
  • the first number of layers of the hourglass network corresponding to the first gesture recognition model is greater than the second number of layers of the hourglass network corresponding to the second gesture recognition model.
  • the second posture recognition model is trained. When the number of training times reaches the preset threshold, the training of the second gesture recognition model is completed.
  • the output of the trained first gesture recognition model with a larger number of layers is used to help the second gesture recognition model with a smaller number of layers to train, so that the accuracy of the trained second gesture recognition model is close to that of the first.
  • Posture recognition model but the amount of data processing is much smaller than the first posture recognition model.
  • FIG. 7 is a schematic structural diagram of a gesture recognition device proposed in an embodiment of the application. As shown in FIG. 7, the device includes: a second acquisition module 410, a second input module 420, a first generation module 430, and a second generation module 440.
  • the second acquisition module 410 is configured to acquire the current frame of human body image to be recognized and the human body posture corresponding to the last frame of human body image.
  • the posture of the human body includes the position of the bone point of the human body. It should be emphasized that, in order to further ensure the privacy and security of the human body sample image and the corresponding human body sample posture, the human body sample image and the corresponding human body sample posture may also be stored in a node of a blockchain.
  • the second input module 420 is configured to input the current frame of the human body image into the trained second posture recognition model.
  • the second gesture recognition model is generated after being trained by the training device of the aforementioned gesture recognition model.
  • the first generating module 430 is configured to generate the predicted position of the human skeleton point corresponding to the current frame of the human body image according to the position of the human skeleton point corresponding to the previous frame of the human body image.
  • the second generating module 440 is configured to generate the human body pose corresponding to the current frame of the human body image according to the output of the second pose recognition model and the predicted position of the human skeleton point corresponding to the current frame of human body image.
  • the gesture recognition device acquires the current frame of human body image to be recognized and the human body posture corresponding to the previous frame of human body image when performing gesture recognition.
  • the current frame of the human body image is input into the trained second posture recognition model, and the predicted position of the human bone point corresponding to the current frame of the human body image is generated according to the position of the human skeleton point corresponding to the previous frame of the human body image.
  • the human posture corresponding to the current frame of the human image is generated.
  • the optical flow algorithm is used to compensate the output of the second posture recognition model, and the accuracy of human posture recognition is improved.
  • the embodiments of the present application also propose a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to implement the aforementioned method
  • a computer device including a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to implement the aforementioned method
  • the steps of the training method of the gesture recognition model of the embodiment are described in order to implement the above-mentioned embodiments.
  • the embodiments of the present application also propose a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor, where the processor implements the aforementioned method when the computer program is executed.
  • the steps of the gesture recognition method of the embodiment are described in detail below.
  • FIG. 8 is a schematic diagram of a computer device provided by an embodiment of this application.
  • the computer device 50 of this embodiment includes: a processor 51, a memory 52, and a computer program 53 stored in the memory 52 and running on the processor 51.
  • the computer program 53 is executed by the processor 51, In order to avoid repetition, the training method of the gesture recognition model and the method of gesture recognition in the embodiment are not repeated here.
  • the computer program is executed by the processor 51, the function of each model/unit in the baby crying-based emotion detection device in the embodiment is realized. To avoid repetition, it will not be repeated here.
  • the computer device 50 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device may include, but is not limited to, a processor 51 and a memory 52.
  • FIG. 8 is only an example of the computer device 50, and does not constitute a limitation on the computer device 50. It may include more or less components than those shown in the figure, or a combination of certain components, or different components.
  • computer equipment may also include input and output devices, network access devices, buses, and so on.
  • the so-called processor 51 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 52 may be an internal storage unit of the computer device 50, such as a hard disk or memory of the computer device 50.
  • the memory 52 may also be an external storage device of the computer device 50, such as a plug-in hard disk equipped on the computer device 50, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, and a flash memory card (Flash). Card) and so on.
  • the memory 52 may also include both an internal storage unit of the computer device 50 and an external storage device.
  • the memory 52 is used to store computer programs and other programs and data required by the computer equipment.
  • the memory 52 can also be used to temporarily store data that has been output or will be output.
  • the embodiments of the present application also propose a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium stores a computer program. Wherein, when the computer program is executed by the processor, the steps of the training method of the gesture recognition model as in the foregoing method embodiment are implemented.
  • the embodiment of the present application also proposes a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program. step.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the above-mentioned integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium.
  • the above-mentioned software functional unit is stored in a storage medium and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (Processor) execute part of the steps of the methods in the various embodiments of the present application .
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

Procédé d'apprentissage de modèle de reconnaissance de posture, se rapportant à l'intelligence artificielle et consistant à : obtenir une image d'échantillon de corps humain et une posture d'échantillon de corps humain correspondante, et entrer respectivement l'image d'échantillon de corps humain dans un premier modèle de reconnaissance de posture entraîné et un second modèle de reconnaissance de posture entraîné, le nombre de premières couches d'un réseau de sablier correspondant au premier modèle de reconnaissance de posture étant supérieur au nombre de secondes couches d'un réseau de sablier correspondant au second modèle de reconnaissance de posture ; entraîner le second modèle de reconnaissance de posture en fonction de la sortie du premier modèle de reconnaissance de posture et de la sortie du second modèle de reconnaissance de posture ; et lorsque le nombre d'apprentissages atteint un seuil prédéfini, mettre fin à l'apprentissage du second modèle de reconnaissance de posture. De plus, la présente invention concerne également une technologie de chaîne de blocs, l'image d'échantillon de corps humain et la posture d'échantillon de corps humain correspondante étant stockées dans la chaîne de blocs.
PCT/CN2020/105898 2020-04-27 2020-07-30 Procédé et dispositif d'apprentissage de modèle de posture, procédé et dispositif de reconnaissance de posture WO2021217937A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010343546.2 2020-04-27
CN202010343546.2A CN111539349A (zh) 2020-04-27 2020-04-27 姿态识别模型的训练方法及装置、姿态识别方法及其装置

Publications (1)

Publication Number Publication Date
WO2021217937A1 true WO2021217937A1 (fr) 2021-11-04

Family

ID=71977326

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/105898 WO2021217937A1 (fr) 2020-04-27 2020-07-30 Procédé et dispositif d'apprentissage de modèle de posture, procédé et dispositif de reconnaissance de posture

Country Status (2)

Country Link
CN (1) CN111539349A (fr)
WO (1) WO2021217937A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114463856A (zh) * 2022-04-13 2022-05-10 深圳金信诺高新技术股份有限公司 姿态估计模型的训练与姿态估计方法、装置、设备及介质
CN114782981A (zh) * 2022-03-07 2022-07-22 奥比中光科技集团股份有限公司 人体姿态估计模型、模型训练方法及人体姿态估计方法
CN115410137A (zh) * 2022-11-01 2022-11-29 杭州新中大科技股份有限公司 基于时空特征的双流工人劳动状态识别方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101259A (zh) * 2020-09-21 2020-12-18 中国农业大学 一种基于堆叠沙漏网络的单只猪体姿态识别系统及方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160086015A1 (en) * 2007-01-09 2016-03-24 Si Corporation Method and system for automated face detection and recognition
CN107480720A (zh) * 2017-08-18 2017-12-15 成都通甲优博科技有限责任公司 人体姿态模型训练方法及装置
CN109410240A (zh) * 2018-10-09 2019-03-01 电子科技大学中山学院 一种量体特征点定位方法、装置及其存储介质
CN109409209A (zh) * 2018-09-11 2019-03-01 广州杰赛科技股份有限公司 一种人体行为识别方法与装置
CN109508688A (zh) * 2018-11-26 2019-03-22 平安科技(深圳)有限公司 基于骨架的行为检测方法、终端设备及计算机存储介质
CN109753891A (zh) * 2018-12-19 2019-05-14 山东师范大学 基于人体关键点检测的足球运动员姿势校准方法及系统
CN109766887A (zh) * 2019-01-16 2019-05-17 中国科学院光电技术研究所 一种基于级联沙漏神经网络的多目标检测方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160086015A1 (en) * 2007-01-09 2016-03-24 Si Corporation Method and system for automated face detection and recognition
CN107480720A (zh) * 2017-08-18 2017-12-15 成都通甲优博科技有限责任公司 人体姿态模型训练方法及装置
CN109409209A (zh) * 2018-09-11 2019-03-01 广州杰赛科技股份有限公司 一种人体行为识别方法与装置
CN109410240A (zh) * 2018-10-09 2019-03-01 电子科技大学中山学院 一种量体特征点定位方法、装置及其存储介质
CN109508688A (zh) * 2018-11-26 2019-03-22 平安科技(深圳)有限公司 基于骨架的行为检测方法、终端设备及计算机存储介质
CN109753891A (zh) * 2018-12-19 2019-05-14 山东师范大学 基于人体关键点检测的足球运动员姿势校准方法及系统
CN109766887A (zh) * 2019-01-16 2019-05-17 中国科学院光电技术研究所 一种基于级联沙漏神经网络的多目标检测方法

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114782981A (zh) * 2022-03-07 2022-07-22 奥比中光科技集团股份有限公司 人体姿态估计模型、模型训练方法及人体姿态估计方法
CN114463856A (zh) * 2022-04-13 2022-05-10 深圳金信诺高新技术股份有限公司 姿态估计模型的训练与姿态估计方法、装置、设备及介质
CN115410137A (zh) * 2022-11-01 2022-11-29 杭州新中大科技股份有限公司 基于时空特征的双流工人劳动状态识别方法
CN115410137B (zh) * 2022-11-01 2023-04-14 杭州新中大科技股份有限公司 基于时空特征的双流工人劳动状态识别方法

Also Published As

Publication number Publication date
CN111539349A (zh) 2020-08-14

Similar Documents

Publication Publication Date Title
WO2021217937A1 (fr) Procédé et dispositif d'apprentissage de modèle de posture, procédé et dispositif de reconnaissance de posture
CN108205655B (zh) 一种关键点预测方法、装置、电子设备及存储介质
US10733431B2 (en) Systems and methods for optimizing pose estimation
WO2019228317A1 (fr) Procédé et dispositif de reconnaissance faciale et support lisible par ordinateur
US20190172224A1 (en) Optimizations for Structure Mapping and Up-sampling
JP2019096006A (ja) 情報処理装置、情報処理方法
CN110889446A (zh) 人脸图像识别模型训练及人脸图像识别方法和装置
CN110879982B (zh) 一种人群计数系统及方法
CN111782840A (zh) 图像问答方法、装置、计算机设备和介质
CN114332578A (zh) 图像异常检测模型训练方法、图像异常检测方法和装置
CN111104925B (zh) 图像处理方法、装置、存储介质和电子设备
CN113095129B (zh) 姿态估计模型训练方法、姿态估计方法、装置和电子设备
CN113642431A (zh) 目标检测模型的训练方法及装置、电子设备和存储介质
WO2023098912A1 (fr) Procédé et appareil de traitement d'image, support de stockage, et dispositif électronique
TWI803243B (zh) 圖像擴增方法、電腦設備及儲存介質
CN111368768A (zh) 一种基于人体关键点的员工手势指引检测方法
CN114495006A (zh) 遗留物体的检测方法、装置及存储介质
CN111353325A (zh) 关键点检测模型训练方法及装置
Madane et al. Social distancing detection and analysis through computer vision
CN115239508A (zh) 基于人工智能的场景规划调整方法、装置、设备及介质
CN116453221A (zh) 目标对象姿态确定方法、训练方法、装置以及存储介质
Feng Mask RCNN-based single shot multibox detector for gesture recognition in physical education
CN114120454A (zh) 活体检测模型的训练方法、装置、电子设备及存储介质
CN112101185A (zh) 一种训练皱纹检测模型的方法、电子设备及存储介质
CN111126566A (zh) 基于gan模型的异常家具布局数据检测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20933544

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 15.03.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20933544

Country of ref document: EP

Kind code of ref document: A1