WO2021217937A1 - Posture recognition model training method and device, and posture recognition method and device - Google Patents
Posture recognition model training method and device, and posture recognition method and device Download PDFInfo
- Publication number
- WO2021217937A1 WO2021217937A1 PCT/CN2020/105898 CN2020105898W WO2021217937A1 WO 2021217937 A1 WO2021217937 A1 WO 2021217937A1 CN 2020105898 W CN2020105898 W CN 2020105898W WO 2021217937 A1 WO2021217937 A1 WO 2021217937A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- recognition model
- human body
- posture
- gesture recognition
- human
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Definitions
- This application relates to the field of artificial intelligence technology, and in particular to a method and equipment for training a gesture recognition model, a method and equipment for gesture recognition.
- multi-person gesture recognition technology With the continuous development of computer vision technology, multi-person gesture recognition technology continues to appear in people’s lives. For example, in elderly care institutions or home care scenarios, multi-person gesture recognition technology can recognize the dangerous actions of the elderly and perform Alerts can evaluate the mobility of the elderly so as to take better care of the elderly.
- Multi-person gesture recognition technology includes two indicators: recognition accuracy and recognition speed.
- the inventor realizes that in related technologies, the recognition accuracy is improved by continuously increasing the structural complexity of the gesture recognition model, but it consumes a lot of system resources, and the cost of technology implementation is relatively high. high.
- the recognition accuracy will be reduced. Therefore, there is an urgent need for a gesture recognition model whose recognition accuracy and recognition speed can meet application requirements.
- a method for training a gesture recognition model including: acquiring a human body sample image and a corresponding human body sample pose; inputting the human body sample image into a first gesture recognition model and a second gesture recognition model; wherein, the first gesture recognition The model includes a first stacked hourglass network, the first stacked hourglass network includes a first layer of hourglass network, the second gesture recognition model includes a second stacked hourglass network, the second stacked hourglass network includes a second layer Hourglass network, the first number of layers is greater than the second number of layers; training the first gesture recognition model according to the output of the first gesture recognition model, and according to the output of the first gesture recognition model Output and the output of the second posture recognition model to train the second posture recognition model; and when the number of training times reaches a preset threshold, complete the first posture recognition model and the second posture recognition model Training.
- a posture recognition method including: acquiring a current frame of human body image to be recognized, and a human body posture corresponding to the last frame of human body image; wherein, the human body posture includes the position of a human skeleton point; and inputting the current frame of human body image into training After the second posture recognition model; wherein, the second posture recognition model is generated after training through the training method of the aforementioned posture recognition model; according to the position of the human skeleton point corresponding to the last frame of the human body image, the current The predicted position of the human skeleton point corresponding to the frame of the human body image; and according to the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame of human body image, the human body corresponding to the current frame of human body image is generated attitude.
- a computer device includes a memory and a processor, the memory is used to store information including program instructions, the processor is used to control the execution of the program instructions, and the following steps are implemented when the program instructions are loaded and executed by the processor:
- a human body sample image and a corresponding human body sample pose input the human body sample image into a first pose recognition model and a second pose recognition model respectively; wherein, the first pose recognition model includes a first stacked hourglass network, and the first pose recognition model includes a first stacked hourglass network.
- a stacked hourglass network includes a first number of hourglass networks
- the second gesture recognition model includes a second stacked hourglass network
- the second stacked hourglass network includes a second number of hourglass networks
- the first number of layers is greater than The second layer number
- training the first gesture recognition model according to the output of the first gesture recognition model, and according to the output of the first gesture recognition model and the output of the second gesture recognition model , Training the second posture recognition model; and when the number of training times reaches a preset threshold, completing the training of the first posture recognition model and the second posture recognition model.
- a computer device includes a memory and a processor, the memory is used to store information including program instructions, the processor is used to control the execution of the program instructions, and the following steps are implemented when the program instructions are loaded and executed by the processor:
- the human body posture includes the position of the human skeleton point; inputting the current frame of human body image into the trained second posture recognition model;
- the second posture recognition model is generated after training by the training method of the aforementioned posture recognition model; according to the position of the human skeleton point corresponding to the last frame of human body image, the human skeleton point corresponding to the current frame of human body image is generated And according to the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame of the human image, the human posture corresponding to the current frame of the human image is generated.
- a computer-readable storage medium including a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to perform the following steps:
- a human body sample image and a corresponding human body sample pose input the human body sample image into a first pose recognition model and a second pose recognition model respectively; wherein, the first pose recognition model includes a first stacked hourglass network, and the first pose recognition model includes a first stacked hourglass network.
- a stacked hourglass network includes a first number of hourglass networks
- the second gesture recognition model includes a second stacked hourglass network
- the second stacked hourglass network includes a second number of hourglass networks
- the first number of layers is greater than The second layer number
- training the first gesture recognition model according to the output of the first gesture recognition model, and according to the output of the first gesture recognition model and the output of the second gesture recognition model , Training the second posture recognition model; and when the number of training times reaches a preset threshold, completing the training of the first posture recognition model and the second posture recognition model.
- a computer-readable storage medium including a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to perform the following steps:
- the human body posture includes the position of the human skeleton point; inputting the current frame of human body image into the trained second posture recognition model;
- the second posture recognition model is generated after training by the training method of the aforementioned posture recognition model; according to the position of the human skeleton point corresponding to the last frame of human body image, the human skeleton point corresponding to the current frame of human body image is generated And according to the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame of the human image, the human posture corresponding to the current frame of the human image is generated.
- FIG. 1 is a schematic flowchart of a method for training a gesture recognition model provided by an embodiment of the application
- Figure 2 is a schematic diagram of the position distribution of human bone points
- Figure 3 is a schematic diagram of the structure of the hourglass network
- Figure 4 is a schematic diagram of the structure of a stacked hourglass network
- FIG. 5 is a schematic flowchart of a gesture recognition method proposed in an embodiment of this application.
- FIG. 6 is a schematic structural diagram of a training device for a gesture recognition model proposed in an embodiment of this application.
- FIG. 7 is a schematic structural diagram of a gesture recognition device proposed in an embodiment of this application.
- FIG. 8 is a schematic diagram of a computer device provided by an embodiment of this application.
- first, second, third, etc. may be used in the embodiments of the present application to describe the preset range, etc., these preset ranges should not be limited to these terms. These terms are only used to distinguish the preset ranges from each other.
- the first preset range may also be referred to as the second preset range, and similarly, the second preset range may also be referred to as the first preset range.
- the word “if” as used herein can be interpreted as “when” or “when” or “in response to determination” or “in response to detection”.
- the phrase “if determined” or “if detected (statement or event)” can be interpreted as “when determined” or “in response to determination” or “when detected (statement or event) )” or “in response to detection (statement or event)”.
- the multi-person gesture recognition technology includes two indicators of recognition accuracy and recognition speed.
- the multi-person gesture recognition technology specifically includes two steps in the realization process. The first step is to detect the human body target, and the second step is to detect the human body pose for each human target. Among them, the detection of the human body pose takes up about approximately Five-sixths of the time. Therefore, to improve the recognition speed of multi-person gesture recognition technology is mainly to simplify the gesture recognition model so that the recognition accuracy and recognition speed can meet the application requirements.
- the embodiment of the present application provides a method for training a gesture recognition model, which uses the output of the first gesture recognition model with a larger number of layers to help the second gesture recognition model with a smaller number of layers to train, so that the second gesture recognition model after training is
- the accuracy of the second gesture recognition model is close to that of the first gesture recognition model, but the amount of data processing is much smaller than that of the first gesture recognition model.
- FIG. 1 is a schematic flowchart of a method for training a gesture recognition model provided by an embodiment of the application. As shown in Figure 1, the method includes the following steps:
- Step S101 Obtain an image of a human body sample and a corresponding posture of the human body sample.
- the human body sample image is an image in which the human body posture has been determined, and the correct recognition result is the corresponding human body sample posture. Therefore, it can be used to train the gesture recognition model. It should be emphasized that, in order to further ensure the privacy and security of the human body sample image and the corresponding human body sample posture, the human body sample image and the corresponding human body sample posture may also be stored in a node of a blockchain.
- the posture of the human body includes the positions of the human bone points
- FIG. 2 is a schematic diagram of the position distribution of the human bone points.
- various parts of the human body can be determined by human bone points.
- each human bone point is numbered, and according to the coordinates of each human bone point in the image, one of the different human bone points is determined. The relative position between the two, corresponding to different human postures.
- Step S102 Input the human body sample image into the trained first posture recognition model and the second posture recognition model respectively.
- the first gesture recognition model includes a first stacked hourglass network
- the first stacked hourglass network includes a first layer of hourglass network
- the second gesture recognition model includes a second stacked hourglass network
- the second stacked hourglass network includes a second layer In the hourglass network, the first layer is greater than the second layer.
- Figure 3 is a schematic diagram of the structure of the hourglass network.
- the input of a single hourglass network is an image
- the output is an image feature.
- the image processing process can be divided into two parts: a convolution path and a step-by-step path.
- the convolution path convolves the image through the convolution path residual module
- the output of the last convolution path residual module is used as the input of the first upsampling module.
- the size of the block model in Figure 3 represents the size of the input resolution
- the output resolution of the first convolution path residual module is half of the input resolution
- the second convolution path residual module The input of is the output of the first convolution path residual module, that is, the input resolution of the second convolution path residual module is half of the output resolution of the first convolution path residual module.
- the output resolution of each up-sampling module is twice the input resolution, so that the up-sampling module and the convolution path residual module have a one-to-one correspondence.
- the output resolution of the fourth convolutional neural network in Figure 3 is equal to the input resolution of the first upsampling module, and the input resolution of the fourth convolutional neural network is the same as the first upsampling
- the output resolutions of the modules are equal.
- part of the output of the convolution path residual module is processed by multiple convolution path residual modules and multiple up-sampling modules, and the other part is processed by step-by-step
- the processing of the path residual module is superimposed with the same resolution.
- part of the output of the first residual module is processed by the second, third, fourth, and five residual modules of the convolution path, and the input and output resolutions of the fifth convolution path residual module are equal, and then After the first, second, and third up-sampling modules are up-sampling, the resolution is the same as the output resolution of the first residual module.
- the output of the first residual module is processed by the step-by-step residual module, and the resolution remains unchanged, which is also the same as the output resolution of the first residual module. Therefore, after the two parts of the output of the first residual module are processed differently, the resolution is the same and can be superimposed, and the superimposed result is used as the input of the fourth upsampling module.
- the characteristic image output by the hourglass network not only retains the information of all layers, but also can determine the human skeleton points from it.
- Figure 4 is a schematic diagram of the structure of a stacked hourglass network. As shown in Figure 4, cascade multiple hourglass networks (the output of the previous hourglass network is used as the input of the next hourglass network) to get the stacked hourglass network, and the next hourglass network in the stacked hourglass network can use the previous hourglass The relationship between the human bone points determined by the network makes the determination of the human bone points in the next hourglass network output more accurate.
- the more layers of the stacked hourglass network the more accurate the determination of human bone points. Therefore, the accuracy of the first posture recognition model is higher than that of the second posture recognition model, but the data processing amount of the first posture recognition model in use is also greater than that of the second posture recognition model, and the recognition speed is lower.
- the embodiments of the present application aim to use the output of the trained first gesture recognition model with a larger number of layers to help train the second gesture recognition model with a smaller number of layers, so that the accuracy of the trained second gesture recognition model is close to that of the first gesture recognition model.
- a gesture recognition model, but the amount of data processing is much smaller than the first gesture recognition model.
- Step S103 training the second gesture recognition model according to the output of the first gesture recognition model and the output of the second gesture recognition model.
- the first gesture recognition model is trained first, and when the recognition accuracy of the first gesture recognition model meets the preset condition, the training of the first gesture recognition model is completed, and the training is used After completing the first posture recognition model, train the second posture recognition model.
- the human body sample image is input into the trained first gesture recognition model and the second gesture recognition model, respectively, to obtain the outputs of the first gesture recognition model and the second gesture recognition model.
- the second posture recognition model is trained.
- the first gesture recognition model may include an 8-layer stacked hourglass network
- the second gesture recognition model may include a 4-layer stacked hourglass network, so that when the second gesture recognition model is used, the amount of data processing is much smaller than that of the first pose Recognize the model to improve the recognition speed.
- the dimension of the feature vector input by the second gesture recognition model should also be smaller than that of the first gesture recognition model.
- the dimension of the feature vector input by the first gesture recognition model can be 256 dimensions.
- the dimension may be 128 dimensions, so that the data processing amount of the second gesture recognition model is smaller than that of the first gesture recognition model.
- the first gesture recognition model in the embodiment of the present application is trained through the following steps:
- Step S11 Determine the first difference between the output of the first posture recognition model and the posture of the human body sample.
- the output of the first posture recognition model is a human skeleton point
- the human skeleton point is in the form of coordinates.
- the coordinates (x, y) of the k-th human skeleton point output by the first pose recognition model, and the coordinates of the k-th human skeleton point in the human sample pose are (x k , y k ), then according to the formula Calculate the distribution of the k-th human skeleton points, where ⁇ 2 is the variance of the Gaussian distribution, according to the formula Calculate the first difference between the output of the first posture recognition model and the posture of the human body sample.
- step S12 the parameters of the first gesture recognition model are optimized according to the first difference.
- the parameters of the first gesture recognition model can be optimized so that L 1 is gradually reduced.
- Step S103 training the second gesture recognition model according to the output of the first gesture recognition model and the output of the second gesture recognition model, including:
- Step S21 Determine the second difference between the output of the second posture recognition model and the posture of the human body sample.
- the output of the second pose recognition model is also the human skeleton point, the coordinates of the kth human skeleton point (x, y), the coordinates of the kth human skeleton point in the human sample pose are (x k , y k ), According to the formula Calculate the distribution of the k-th human skeleton points, where ⁇ 2 is the variance of the Gaussian distribution, according to the formula Calculate the second difference between the output of the second posture recognition model and the posture of the human body sample.
- Step S22 Determine the third difference between the output of the first gesture recognition model and the output of the second gesture recognition model.
- step S23 the parameters of the second gesture recognition model are optimized according to the second difference and the third difference.
- One possible implementation is to add weights to the second difference and the third difference to generate the fourth difference.
- the sum of the weight corresponding to the second difference and the weight corresponding to the third difference is one, and the parameters of the second gesture recognition model are optimized according to the fourth difference.
- the parameters of the second gesture recognition model can be optimized so that L 4 is gradually reduced.
- Step S104 When the number of training times reaches the preset threshold, the training of the second gesture recognition model is completed.
- the learning rate in the gradient descent method that is, the step length in the gradient descent process
- the learning rate in the gradient descent method needs to be continuously adjusted. Continuously reduce the learning rate to reduce the scope of parameter optimization.
- the value of the learning rate is 0.01 . After training all 29k human sample images, and testing all the remaining 11k, the corresponding recognition accuracy is obtained as a training.
- the value of the learning rate is adjusted to 0.001
- the value of the learning rate is adjusted to 0.0001
- the training of the second gesture recognition model is completed.
- the training method of the posture recognition model proposed in the embodiment of the application obtains the human body sample image and the corresponding human body sample posture, and inputs the human body sample image into the trained first posture recognition model and the second posture recognition model respectively.
- the first number of layers of the hourglass network corresponding to the first gesture recognition model is greater than the second number of layers of the hourglass network corresponding to the second gesture recognition model.
- the second posture recognition model is trained. When the number of training times reaches the preset threshold, the training of the second gesture recognition model is completed.
- the output of the trained first gesture recognition model with a larger number of layers is used to help the second gesture recognition model with a smaller number of layers to train, so that the accuracy of the trained second gesture recognition model is close to that of the first.
- Posture recognition model but the amount of data processing is much smaller than the first posture recognition model.
- FIG. 5 is the application A schematic flowchart of the gesture recognition method proposed in the embodiment. As shown in Figure 5, the method includes the following steps:
- Step S201 Obtain the current frame of human body image to be recognized and the human body posture corresponding to the last frame of human body image.
- the posture of the human body includes the position of the bone point of the human body. It should be emphasized that, in order to further ensure the privacy and security of the human body sample image and the corresponding human body sample posture, the human body sample image and the corresponding human body sample posture can also be stored in a blockchain node.
- the embodiment of this application uses optical flow The algorithm compensates the recognition accuracy of the second gesture recognition model.
- the optical flow is the instantaneous velocity of the pixel motion of the spatially moving object on the observation imaging plane.
- the optical flow algorithm uses the changes in the time domain of the pixels in the image sequence and the correlation between adjacent frames to find the previous frame.
- the optical flow algorithm can predict the position of the corresponding human skeleton point in the current frame of the human body image by analyzing the position of the human skeleton point in the previous frame of the human body image.
- Step S202 Input the current frame of human body image into the second posture recognition model after training.
- the second gesture recognition model is generated after training by the training method of the aforementioned gesture recognition model.
- the second gesture model has a smaller amount of data processing, so the recognition speed is faster.
- Step S203 According to the position of the human skeleton point corresponding to the last frame of the human body image, the predicted position of the human skeleton point corresponding to the current frame of the human body image is generated.
- Step S204 according to the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame of the human image, generate the human posture corresponding to the current frame of the human image.
- the predicted position of the human skeleton point corresponding to the human body image in the current frame is obtained according to the optical flow corresponding to the human body image in the previous frame and the current frame.
- the human body posture corresponding to the current frame of human body image is calculated. in, Is the predicted position of the k-th human skeleton point corresponding to the current frame of the human body image, and K cur is the position of the k-th human skeleton point in the output of the second pose recognition model, Is the position of the k-th human skeleton point corresponding to the current frame of the human body image, and ⁇ is the correction coefficient, which is a constant between 0.25-0.3. According to the positions of all the human skeleton points, the posture of the human body corresponding to the current frame of the human body image can be determined.
- the posture recognition method proposed in the embodiment of the present application obtains the current frame of human body image to be recognized and the human body posture corresponding to the previous frame of human body image.
- the current frame of the human body image is input into the trained second posture recognition model, and the predicted position of the human skeleton point corresponding to the current frame of the human body image is generated according to the position of the human skeleton point corresponding to the previous frame of the human body image.
- the human posture corresponding to the current frame of the human image is generated.
- the optical flow algorithm is used to compensate the output of the second posture recognition model, and the accuracy of human posture recognition is improved.
- FIG. 6 is a schematic structural diagram of a training device for a gesture recognition model proposed in an embodiment of this application. As shown in FIG. 6, the device includes: a first acquisition module 310, a first input module 320, a training module 330, and a completion module 340.
- the first acquisition module 310 is used to acquire a human body sample image and a corresponding human body sample pose.
- the first input module 320 is configured to input the human body sample image into the trained first posture recognition model and the second posture recognition model respectively.
- the first gesture recognition model includes a first stacked hourglass network
- the first stacked hourglass network includes a first layer of hourglass network
- the second gesture recognition model includes a second stacked hourglass network
- the second stacked hourglass network includes a second layer In the hourglass network, the first layer is greater than the second layer.
- the training module 330 is configured to train the second gesture recognition model according to the output of the first gesture recognition model and the output of the second gesture recognition model.
- the completion module 340 is configured to complete the training of the second gesture recognition model when the number of training times reaches the preset threshold.
- the device further includes: a determining module 350 for determining the first difference between the output of the first posture recognition model and the posture of the human body sample .
- the optimization module 360 is configured to optimize the parameters of the first gesture recognition model according to the first difference.
- the training module 330 includes: a first determination sub-module 331 for determining the output of the second posture recognition model and the posture of the human body sample The second difference.
- the second determination sub-module 332 is used to determine the third difference between the output of the first gesture recognition model and the output of the second gesture recognition model.
- the optimization sub-module 333 is used to optimize the parameters of the second gesture recognition model according to the second difference and the third difference.
- the optimization sub-module 333 includes: a summation unit 333a for calculating the second difference And the third difference is weighted and summed to generate the fourth difference. Wherein, the sum of the weight corresponding to the second difference and the weight corresponding to the third difference is one.
- the optimization unit 333b is configured to optimize the parameters of the second gesture recognition model according to the fourth difference.
- the training device for the gesture recognition model acquires the human body sample image and the corresponding human body sample posture when training the gesture recognition model, and inputs the human body sample image into the trained first The gesture recognition model and the second gesture recognition model.
- the first number of layers of the hourglass network corresponding to the first gesture recognition model is greater than the second number of layers of the hourglass network corresponding to the second gesture recognition model.
- the second posture recognition model is trained. When the number of training times reaches the preset threshold, the training of the second gesture recognition model is completed.
- the output of the trained first gesture recognition model with a larger number of layers is used to help the second gesture recognition model with a smaller number of layers to train, so that the accuracy of the trained second gesture recognition model is close to that of the first.
- Posture recognition model but the amount of data processing is much smaller than the first posture recognition model.
- FIG. 7 is a schematic structural diagram of a gesture recognition device proposed in an embodiment of the application. As shown in FIG. 7, the device includes: a second acquisition module 410, a second input module 420, a first generation module 430, and a second generation module 440.
- the second acquisition module 410 is configured to acquire the current frame of human body image to be recognized and the human body posture corresponding to the last frame of human body image.
- the posture of the human body includes the position of the bone point of the human body. It should be emphasized that, in order to further ensure the privacy and security of the human body sample image and the corresponding human body sample posture, the human body sample image and the corresponding human body sample posture may also be stored in a node of a blockchain.
- the second input module 420 is configured to input the current frame of the human body image into the trained second posture recognition model.
- the second gesture recognition model is generated after being trained by the training device of the aforementioned gesture recognition model.
- the first generating module 430 is configured to generate the predicted position of the human skeleton point corresponding to the current frame of the human body image according to the position of the human skeleton point corresponding to the previous frame of the human body image.
- the second generating module 440 is configured to generate the human body pose corresponding to the current frame of the human body image according to the output of the second pose recognition model and the predicted position of the human skeleton point corresponding to the current frame of human body image.
- the gesture recognition device acquires the current frame of human body image to be recognized and the human body posture corresponding to the previous frame of human body image when performing gesture recognition.
- the current frame of the human body image is input into the trained second posture recognition model, and the predicted position of the human bone point corresponding to the current frame of the human body image is generated according to the position of the human skeleton point corresponding to the previous frame of the human body image.
- the human posture corresponding to the current frame of the human image is generated.
- the optical flow algorithm is used to compensate the output of the second posture recognition model, and the accuracy of human posture recognition is improved.
- the embodiments of the present application also propose a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to implement the aforementioned method
- a computer device including a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to implement the aforementioned method
- the steps of the training method of the gesture recognition model of the embodiment are described in order to implement the above-mentioned embodiments.
- the embodiments of the present application also propose a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor, where the processor implements the aforementioned method when the computer program is executed.
- the steps of the gesture recognition method of the embodiment are described in detail below.
- FIG. 8 is a schematic diagram of a computer device provided by an embodiment of this application.
- the computer device 50 of this embodiment includes: a processor 51, a memory 52, and a computer program 53 stored in the memory 52 and running on the processor 51.
- the computer program 53 is executed by the processor 51, In order to avoid repetition, the training method of the gesture recognition model and the method of gesture recognition in the embodiment are not repeated here.
- the computer program is executed by the processor 51, the function of each model/unit in the baby crying-based emotion detection device in the embodiment is realized. To avoid repetition, it will not be repeated here.
- the computer device 50 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
- the computer device may include, but is not limited to, a processor 51 and a memory 52.
- FIG. 8 is only an example of the computer device 50, and does not constitute a limitation on the computer device 50. It may include more or less components than those shown in the figure, or a combination of certain components, or different components.
- computer equipment may also include input and output devices, network access devices, buses, and so on.
- the so-called processor 51 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
- the memory 52 may be an internal storage unit of the computer device 50, such as a hard disk or memory of the computer device 50.
- the memory 52 may also be an external storage device of the computer device 50, such as a plug-in hard disk equipped on the computer device 50, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, and a flash memory card (Flash). Card) and so on.
- the memory 52 may also include both an internal storage unit of the computer device 50 and an external storage device.
- the memory 52 is used to store computer programs and other programs and data required by the computer equipment.
- the memory 52 can also be used to temporarily store data that has been output or will be output.
- the embodiments of the present application also propose a computer-readable storage medium.
- the computer-readable storage medium may be non-volatile or volatile.
- the computer-readable storage medium stores a computer program. Wherein, when the computer program is executed by the processor, the steps of the training method of the gesture recognition model as in the foregoing method embodiment are implemented.
- the embodiment of the present application also proposes a computer-readable storage medium.
- the computer-readable storage medium stores a computer program. step.
- the disclosed system, device, and method can be implemented in other ways.
- the device embodiments described above are merely illustrative.
- the division of units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
- the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
- Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
- the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
- the above-mentioned integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium.
- the above-mentioned software functional unit is stored in a storage medium and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (Processor) execute part of the steps of the methods in the various embodiments of the present application .
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
A posture recognition model training method, relating to artificial intelligence, and comprising: obtaining a human body sample image and a corresponding human body sample posture, and respectively inputting the human body sample image into a trained first posture recognition model and a trained second posture recognition model, wherein the number of first layers of a hourglass network corresponding to the first posture recognition model is greater than the number of second layers of a hourglass network corresponding to the second posture recognition model; training the second posture recognition model according to output of the first posture recognition model and output of the second posture recognition model; and when the number of times of training reaches a preset threshold, completing training of the second posture recognition model. In addition, the present invention also relates to a blockchain technology, and the human body sample image and the corresponding human body sample posture are stored in the blockchain.
Description
本申请要求于2020年4月27日提交中国专利局、申请号为CN202010343546.2,发明名称为“姿态识别模型的训练方法及装置、姿态识别方法及其装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of a Chinese patent application filed with the Chinese Patent Office on April 27, 2020, with the application number CN202010343546.2, and the invention title of "Training Method and Device for Gesture Recognition Model, and Method and Device for Gesture Recognition". The entire content is incorporated into this application by reference.
本申请涉及人工智能技术领域,尤其涉及一种姿态识别模型的训练方法及设备、姿态识别方法及其设备。This application relates to the field of artificial intelligence technology, and in particular to a method and equipment for training a gesture recognition model, a method and equipment for gesture recognition.
随着计算机视觉技术的不断发展,多人姿态识别技术不断出现到人们的生活中,比如说,在养老机构或者居家养老场景下,多人姿态识别技术能够对老人的危险动作进行识别,并进行告警,能够对老人的行动能力进行评估,以便对老人进行更好地照料。With the continuous development of computer vision technology, multi-person gesture recognition technology continues to appear in people’s lives. For example, in elderly care institutions or home care scenarios, multi-person gesture recognition technology can recognize the dangerous actions of the elderly and perform Alerts can evaluate the mobility of the elderly so as to take better care of the elderly.
多人姿态识别技术包括识别精度和识别速度两个指标,发明人意识到相关技术中,通过不断增加姿态识别模型的结构复杂度,提升识别精度,但是消耗了大量的系统资源,技术落地成本较高。但是,通过简化模型的方式来提升识别速度,降低成本,又会导致识别精度降低。因此,亟需一种识别精度和识别速度都能够满足应用需求的姿态识别模型。Multi-person gesture recognition technology includes two indicators: recognition accuracy and recognition speed. The inventor realizes that in related technologies, the recognition accuracy is improved by continuously increasing the structural complexity of the gesture recognition model, but it consumes a lot of system resources, and the cost of technology implementation is relatively high. high. However, by simplifying the model to improve the recognition speed and reduce the cost, the recognition accuracy will be reduced. Therefore, there is an urgent need for a gesture recognition model whose recognition accuracy and recognition speed can meet application requirements.
发明内容Summary of the invention
一种姿态识别模型的训练方法,包括:获取人体样本图像和对应的人体样本姿态;将所述人体样本图像分别输入第一姿态识别模型和第二姿态识别模型;其中,所述第一姿态识别模型包括第一堆叠沙漏网络,所述第一堆叠沙漏网络包括第一层数的沙漏网络,所述第二姿态识别模型包括第二堆叠沙漏网络,所述第二堆叠沙漏网络包括第二层数的沙漏网络,所述第一层数大于所述第二层数;根据所述第一姿态识别模型的输出,对所述第一姿态识别模型进行训练,并根据所述第一姿态识别模型的输出和所述第二姿态识别模型的输出,对所述第二姿态识别模型进行训练;以及当训练次数达到预设阈值时,完成对所述第一姿态识别模型和所述第二姿态识别模型的训练。A method for training a gesture recognition model, including: acquiring a human body sample image and a corresponding human body sample pose; inputting the human body sample image into a first gesture recognition model and a second gesture recognition model; wherein, the first gesture recognition The model includes a first stacked hourglass network, the first stacked hourglass network includes a first layer of hourglass network, the second gesture recognition model includes a second stacked hourglass network, the second stacked hourglass network includes a second layer Hourglass network, the first number of layers is greater than the second number of layers; training the first gesture recognition model according to the output of the first gesture recognition model, and according to the output of the first gesture recognition model Output and the output of the second posture recognition model to train the second posture recognition model; and when the number of training times reaches a preset threshold, complete the first posture recognition model and the second posture recognition model Training.
一种姿态识别方法,包括:获取待识别的当前帧人体图像,和上一帧人体图像对应的人体姿态;其中,所述人体姿态包括人体骨骼点的位置;将所述当前帧人体图像输入训练后的第二姿态识别模型;其中,所述第二姿态识别模型通过如前述姿态识别模型的训练方法训练后生成;根据所述上一帧人体图像对应的人体骨骼点的位置,生成所述当前帧人体图像对应的人体骨骼点的预测位置;以及根据所述第二姿态识别模型的输出,以及所述当前帧人体图像对应的人体骨骼点的预测位置,生成所述当前帧人体图像对应的人体姿态。A posture recognition method, including: acquiring a current frame of human body image to be recognized, and a human body posture corresponding to the last frame of human body image; wherein, the human body posture includes the position of a human skeleton point; and inputting the current frame of human body image into training After the second posture recognition model; wherein, the second posture recognition model is generated after training through the training method of the aforementioned posture recognition model; according to the position of the human skeleton point corresponding to the last frame of the human body image, the current The predicted position of the human skeleton point corresponding to the frame of the human body image; and according to the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame of human body image, the human body corresponding to the current frame of human body image is generated attitude.
一种计算机设备,包括存储器和处理器,所述存储器用于存储包括程序指令的信息,所述处理器用于控制程序指令的执行,所述程序指令被处理器加载并执行时实现如下步骤:A computer device includes a memory and a processor, the memory is used to store information including program instructions, the processor is used to control the execution of the program instructions, and the following steps are implemented when the program instructions are loaded and executed by the processor:
获取人体样本图像和对应的人体样本姿态;将所述人体样本图像分别输入第一姿态识别模型和第二姿态识别模型;其中,所述第一姿态识别模型包括第一堆叠沙漏网络,所述第一堆叠沙漏网络包括第一层数的沙漏网络,所述第二姿态识别模型包括第二堆叠沙漏网络,所述第二堆叠沙漏网络包括第二层数的沙漏网络,所述第一层数大于所述第二层数;根据所述第一姿态识别模型的输出,对所述第一姿态识别模型进行训练,并根据所述第一姿态识别模型的输出和所述第二姿态识别模型的输出,对所述第二姿态识别模型进行训练;以及当训练次数达到预设阈值时,完成对所述第一姿态识别模型和所述第二姿态识别模型的训练。Acquire a human body sample image and a corresponding human body sample pose; input the human body sample image into a first pose recognition model and a second pose recognition model respectively; wherein, the first pose recognition model includes a first stacked hourglass network, and the first pose recognition model includes a first stacked hourglass network. A stacked hourglass network includes a first number of hourglass networks, the second gesture recognition model includes a second stacked hourglass network, the second stacked hourglass network includes a second number of hourglass networks, and the first number of layers is greater than The second layer number; training the first gesture recognition model according to the output of the first gesture recognition model, and according to the output of the first gesture recognition model and the output of the second gesture recognition model , Training the second posture recognition model; and when the number of training times reaches a preset threshold, completing the training of the first posture recognition model and the second posture recognition model.
一种计算机设备,包括存储器和处理器,所述存储器用于存储包括程序指令的信息, 所述处理器用于控制程序指令的执行,所述程序指令被处理器加载并执行时实现如下步骤:A computer device includes a memory and a processor, the memory is used to store information including program instructions, the processor is used to control the execution of the program instructions, and the following steps are implemented when the program instructions are loaded and executed by the processor:
获取待识别的当前帧人体图像,和上一帧人体图像对应的人体姿态;其中,所述人体姿态包括人体骨骼点的位置;将所述当前帧人体图像输入训练后的第二姿态识别模型;其中,所述第二姿态识别模型通过如前述姿态识别模型的训练方法训练后生成;根据所述上一帧人体图像对应的人体骨骼点的位置,生成所述当前帧人体图像对应的人体骨骼点的预测位置;以及根据所述第二姿态识别模型的输出,以及所述当前帧人体图像对应的人体骨骼点的预测位置,生成所述当前帧人体图像对应的人体姿态。Acquiring the current frame of human body image to be recognized and the human body posture corresponding to the last frame of human body image; wherein the human body posture includes the position of the human skeleton point; inputting the current frame of human body image into the trained second posture recognition model; Wherein, the second posture recognition model is generated after training by the training method of the aforementioned posture recognition model; according to the position of the human skeleton point corresponding to the last frame of human body image, the human skeleton point corresponding to the current frame of human body image is generated And according to the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame of the human image, the human posture corresponding to the current frame of the human image is generated.
一种计算机可读存储介质,所述存储介质包括存储的程序,其中,在所述程序运行时控制所述存储介质所在设备执行如下步骤:A computer-readable storage medium, the storage medium including a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to perform the following steps:
获取人体样本图像和对应的人体样本姿态;将所述人体样本图像分别输入第一姿态识别模型和第二姿态识别模型;其中,所述第一姿态识别模型包括第一堆叠沙漏网络,所述第一堆叠沙漏网络包括第一层数的沙漏网络,所述第二姿态识别模型包括第二堆叠沙漏网络,所述第二堆叠沙漏网络包括第二层数的沙漏网络,所述第一层数大于所述第二层数;根据所述第一姿态识别模型的输出,对所述第一姿态识别模型进行训练,并根据所述第一姿态识别模型的输出和所述第二姿态识别模型的输出,对所述第二姿态识别模型进行训练;以及当训练次数达到预设阈值时,完成对所述第一姿态识别模型和所述第二姿态识别模型的训练。Acquire a human body sample image and a corresponding human body sample pose; input the human body sample image into a first pose recognition model and a second pose recognition model respectively; wherein, the first pose recognition model includes a first stacked hourglass network, and the first pose recognition model includes a first stacked hourglass network. A stacked hourglass network includes a first number of hourglass networks, the second gesture recognition model includes a second stacked hourglass network, the second stacked hourglass network includes a second number of hourglass networks, and the first number of layers is greater than The second layer number; training the first gesture recognition model according to the output of the first gesture recognition model, and according to the output of the first gesture recognition model and the output of the second gesture recognition model , Training the second posture recognition model; and when the number of training times reaches a preset threshold, completing the training of the first posture recognition model and the second posture recognition model.
一种计算机可读存储介质,所述存储介质包括存储的程序,其中,在所述程序运行时控制所述存储介质所在设备执行如下步骤:A computer-readable storage medium, the storage medium including a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to perform the following steps:
获取待识别的当前帧人体图像,和上一帧人体图像对应的人体姿态;其中,所述人体姿态包括人体骨骼点的位置;将所述当前帧人体图像输入训练后的第二姿态识别模型;其中,所述第二姿态识别模型通过如前述姿态识别模型的训练方法训练后生成;根据所述上一帧人体图像对应的人体骨骼点的位置,生成所述当前帧人体图像对应的人体骨骼点的预测位置;以及根据所述第二姿态识别模型的输出,以及所述当前帧人体图像对应的人体骨骼点的预测位置,生成所述当前帧人体图像对应的人体姿态。Acquiring the current frame of human body image to be recognized and the human body posture corresponding to the last frame of human body image; wherein the human body posture includes the position of the human skeleton point; inputting the current frame of human body image into the trained second posture recognition model; Wherein, the second posture recognition model is generated after training by the training method of the aforementioned posture recognition model; according to the position of the human skeleton point corresponding to the last frame of human body image, the human skeleton point corresponding to the current frame of human body image is generated And according to the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame of the human image, the human posture corresponding to the current frame of the human image is generated.
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其它的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, without creative labor, other drawings can be obtained from these drawings.
图1为本申请实施例所提供的一种姿态识别模型的训练方法的流程示意图;FIG. 1 is a schematic flowchart of a method for training a gesture recognition model provided by an embodiment of the application;
图2为人体骨骼点的位置分布示意图;Figure 2 is a schematic diagram of the position distribution of human bone points;
图3为沙漏网络的结构示意图;Figure 3 is a schematic diagram of the structure of the hourglass network;
图4为堆叠沙漏网络的结构示意图;Figure 4 is a schematic diagram of the structure of a stacked hourglass network;
图5为本申请实施例所提出的姿态识别方法的流程示意图;FIG. 5 is a schematic flowchart of a gesture recognition method proposed in an embodiment of this application;
图6为本申请实施例所提出的一种姿态识别模型的训练装置的结构示意图;6 is a schematic structural diagram of a training device for a gesture recognition model proposed in an embodiment of this application;
图7为本申请实施例所提出的一种姿态识别装置的结构示意图;以及FIG. 7 is a schematic structural diagram of a gesture recognition device proposed in an embodiment of this application; and
图8为本申请实施例提供的一种计算机设备的示意图。FIG. 8 is a schematic diagram of a computer device provided by an embodiment of this application.
为了更好的理解本申请的技术方案,下面结合附图对本申请实施例进行详细描述。In order to better understand the technical solutions of the present application, the following describes the embodiments of the present application in detail with reference to the accompanying drawings.
应当明确,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。It should be clear that the described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
在本申请实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申 请。在本申请实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。The terms used in the embodiments of this application are only for the purpose of describing specific embodiments, and are not intended to limit the application. The singular forms of "a", "the" and "the" used in the embodiments of the present application and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings.
应当理解,本文中使用的术语“和/或”仅仅是一种描述关联对象的相同的字段,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。It should be understood that the term "and/or" used herein is only a description of the same field of the associated object, indicating that there can be three relationships. For example, A and/or B can mean that A exists alone and A exists at the same time. And B, there are three cases of B alone. In addition, the character "/" in this text generally indicates that the associated objects before and after are in an "or" relationship.
应当理解,尽管在本申请实施例中可能采用术语第一、第二、第三等来描述预设范围等,但这些预设范围不应限于这些术语。这些术语仅用来将预设范围彼此区分开。例如,在不脱离本申请实施例范围的情况下,第一预设范围也可以被称为第二预设范围,类似地,第二预设范围也可以被称为第一预设范围。It should be understood that, although the terms first, second, third, etc. may be used in the embodiments of the present application to describe the preset range, etc., these preset ranges should not be limited to these terms. These terms are only used to distinguish the preset ranges from each other. For example, without departing from the scope of the embodiments of the present application, the first preset range may also be referred to as the second preset range, and similarly, the second preset range may also be referred to as the first preset range.
取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地,取决于语境,短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。Depending on the context, the word "if" as used herein can be interpreted as "when" or "when" or "in response to determination" or "in response to detection". Similarly, depending on the context, the phrase "if determined" or "if detected (statement or event)" can be interpreted as "when determined" or "in response to determination" or "when detected (statement or event) )" or "in response to detection (statement or event)".
基于前述对现有技术的说明,可以知道,多人姿态识别技术包括识别精度和识别速度两个指标。多人姿态识别技术在实现过程中具体包括两个步骤,步骤一是进行人体目标的检测,步骤二是对每个人体目标进行人体姿态的检测,其中,人体姿态的检测占用了整个实现过程约六分之五的时间。因此,对多人姿态识别技术进行识别速度提升,主要是对姿态识别模型进行简化,以使识别精度和识别速度都能够满足应用需求。Based on the foregoing description of the prior art, it can be known that the multi-person gesture recognition technology includes two indicators of recognition accuracy and recognition speed. The multi-person gesture recognition technology specifically includes two steps in the realization process. The first step is to detect the human body target, and the second step is to detect the human body pose for each human target. Among them, the detection of the human body pose takes up about approximately Five-sixths of the time. Therefore, to improve the recognition speed of multi-person gesture recognition technology is mainly to simplify the gesture recognition model so that the recognition accuracy and recognition speed can meet the application requirements.
基于此,本申请实施例提供了一种姿态识别模型的训练方法,利用层数较大的第一姿态识别模型的输出帮助层数较小的第二姿态识别模型进行训练,使得训练后的第二姿态识别模型的准确度接近第一姿态识别模型,但是数据处理量远小于第一姿态识别模型。Based on this, the embodiment of the present application provides a method for training a gesture recognition model, which uses the output of the first gesture recognition model with a larger number of layers to help the second gesture recognition model with a smaller number of layers to train, so that the second gesture recognition model after training is The accuracy of the second gesture recognition model is close to that of the first gesture recognition model, but the amount of data processing is much smaller than that of the first gesture recognition model.
图1为本申请实施例所提供的一种姿态识别模型的训练方法的流程示意图。如图1所示,该方法包括以下步骤:FIG. 1 is a schematic flowchart of a method for training a gesture recognition model provided by an embodiment of the application. As shown in Figure 1, the method includes the following steps:
步骤S101,获取人体样本图像和对应的人体样本姿态。Step S101: Obtain an image of a human body sample and a corresponding posture of the human body sample.
其中,人体样本图像是已经确定了人体姿态的图像,正确的识别结果是对应的人体样本姿态。因此,可以用于对姿态识别模型进行训练。需要强调的是,为进一步保证上述人体样本图像和对应的人体样本姿态的私密和安全性,上述人体样本图像和对应的人体样本姿态还可以存储于一区块链的节点中。Among them, the human body sample image is an image in which the human body posture has been determined, and the correct recognition result is the corresponding human body sample posture. Therefore, it can be used to train the gesture recognition model. It should be emphasized that, in order to further ensure the privacy and security of the human body sample image and the corresponding human body sample posture, the human body sample image and the corresponding human body sample posture may also be stored in a node of a blockchain.
具体来说,人体姿态包括人体骨骼点的位置,图2为人体骨骼点的位置分布示意图。如图2所示,人身体的各个部分可以通过人体骨骼点进行确定,具体来说,对每个人体骨骼点进行编号,根据每个人体骨骼点在图像中的坐标,确定不同人体骨骼点之间的相对位置,从而对应不同的人体姿态。Specifically, the posture of the human body includes the positions of the human bone points, and FIG. 2 is a schematic diagram of the position distribution of the human bone points. As shown in Figure 2, various parts of the human body can be determined by human bone points. Specifically, each human bone point is numbered, and according to the coordinates of each human bone point in the image, one of the different human bone points is determined. The relative position between the two, corresponding to different human postures.
步骤S102,将人体样本图像分别输入训练完的第一姿态识别模型和第二姿态识别模型。Step S102: Input the human body sample image into the trained first posture recognition model and the second posture recognition model respectively.
其中,第一姿态识别模型包括第一堆叠沙漏网络,第一堆叠沙漏网络包括第一层数的沙漏网络,第二姿态识别模型包括第二堆叠沙漏网络,第二堆叠沙漏网络包括第二层数的沙漏网络,第一层数大于第二层数。Wherein, the first gesture recognition model includes a first stacked hourglass network, the first stacked hourglass network includes a first layer of hourglass network, the second gesture recognition model includes a second stacked hourglass network, and the second stacked hourglass network includes a second layer In the hourglass network, the first layer is greater than the second layer.
图3为沙漏网络的结构示意图。如图3所示,单个沙漏网络的输入为图像,输出为图像特征,将图像输入沙漏网络后,对图像的处理过程可以分为卷积路和跳级路两部分。其中,卷积路通过卷积路残差模块对图像进行卷积,将最后一个卷积路残差模块的输出作为第一个上采样模块的输入。Figure 3 is a schematic diagram of the structure of the hourglass network. As shown in Figure 3, the input of a single hourglass network is an image, and the output is an image feature. After the image is input into the hourglass network, the image processing process can be divided into two parts: a convolution path and a step-by-step path. Among them, the convolution path convolves the image through the convolution path residual module, and the output of the last convolution path residual module is used as the input of the first upsampling module.
需要说明的是,图3中方块模型的大小代表着输入分辨率的大小,第一个卷积路残差模块的输出分辨率为输入分辨率的一半,而第二个卷积路残差模块的输入为第一个卷积路残差模块的输出,即第二个卷积路残差模块的输入分辨率为第一个卷积路残差模块的输出分辨率的一半。It should be noted that the size of the block model in Figure 3 represents the size of the input resolution, the output resolution of the first convolution path residual module is half of the input resolution, and the second convolution path residual module The input of is the output of the first convolution path residual module, that is, the input resolution of the second convolution path residual module is half of the output resolution of the first convolution path residual module.
此外,每一个上采样模块的输出分辨率为输入分辨率的两倍,使得上采样模块与卷积路残差模块一一对应。举例来说,图3中第四个卷积路神经网络的输出分辨率和第一个上采样模块的输入分辨率相等,第四个卷积路神经网络的输入分辨率和第一个上采样模块的输出分辨率相等。In addition, the output resolution of each up-sampling module is twice the input resolution, so that the up-sampling module and the convolution path residual module have a one-to-one correspondence. For example, the output resolution of the fourth convolutional neural network in Figure 3 is equal to the input resolution of the first upsampling module, and the input resolution of the fourth convolutional neural network is the same as the first upsampling The output resolutions of the modules are equal.
而每一个跳级路残差模块的输入分辨率和输出分辨率相等,卷积路残差模块的输出一部分经过多个卷积路残差模块、以及多个上采样模块的处理,另一部分经过跳级路残差模块的处理,以相同的分辨率进行叠加。举例来说,第一个残差模块的输出一部分经过第二、三、四、五个卷积路残差模块处理,其中第五个卷积路残差模块的输入和输出分辨率相等,再经过第一、二、三个上采样模块进行上采样,分辨率和第一个残差模块的输出分辨率大小相同。第一个残差模块的输出经过跳级路残差模块的处理,分辨率保持不变,也是和第一个残差模块的输出分辨率大小相同。因此,第一个残差模块的输出的两部分经过不同的处理后,分辨率大小相同,能够进行叠加,叠加后的结果作为第四个上采样模块的输入。And the input resolution and output resolution of each step-by-step residual module are equal, part of the output of the convolution path residual module is processed by multiple convolution path residual modules and multiple up-sampling modules, and the other part is processed by step-by-step The processing of the path residual module is superimposed with the same resolution. For example, part of the output of the first residual module is processed by the second, third, fourth, and five residual modules of the convolution path, and the input and output resolutions of the fifth convolution path residual module are equal, and then After the first, second, and third up-sampling modules are up-sampling, the resolution is the same as the output resolution of the first residual module. The output of the first residual module is processed by the step-by-step residual module, and the resolution remains unchanged, which is also the same as the output resolution of the first residual module. Therefore, after the two parts of the output of the first residual module are processed differently, the resolution is the same and can be superimposed, and the superimposed result is used as the input of the fourth upsampling module.
基于上述对沙漏网络的结构的分析,沙漏网络输出的特征图像既保留了所有层的信息,又能够从中确定人体骨骼点。Based on the above analysis of the structure of the hourglass network, the characteristic image output by the hourglass network not only retains the information of all layers, but also can determine the human skeleton points from it.
图4为堆叠沙漏网络的结构示意图。如图4所示,将多个沙漏网络进行级联(前一个沙漏网络的输出作为下一个沙漏网络的输入),即可得到堆叠沙漏网络,堆叠沙漏网络中下一个沙漏网络能够借助上一个沙漏网络确定的人体骨骼点之间的相互关系,使得下一个沙漏网络输出中人体骨骼点的确定更加准确。Figure 4 is a schematic diagram of the structure of a stacked hourglass network. As shown in Figure 4, cascade multiple hourglass networks (the output of the previous hourglass network is used as the input of the next hourglass network) to get the stacked hourglass network, and the next hourglass network in the stacked hourglass network can use the previous hourglass The relationship between the human bone points determined by the network makes the determination of the human bone points in the next hourglass network output more accurate.
应当理解,堆叠沙漏网络的层数越多,人体骨骼点的确定越准确。因此,第一姿态识别模型的准确度高于第二姿态识别模型,但是第一姿态识别模型在使用中数据处理量也大于第二姿态识别模型,识别速度较低。It should be understood that the more layers of the stacked hourglass network, the more accurate the determination of human bone points. Therefore, the accuracy of the first posture recognition model is higher than that of the second posture recognition model, but the data processing amount of the first posture recognition model in use is also greater than that of the second posture recognition model, and the recognition speed is lower.
本申请实施例旨在利用训练完的层数较大的第一姿态识别模型的输出帮助层数较小的第二姿态识别模型进行训练,使得训练后的第二姿态识别模型的准确度接近第一姿态识别模型,但是数据处理量远小于第一姿态识别模型。The embodiments of the present application aim to use the output of the trained first gesture recognition model with a larger number of layers to help train the second gesture recognition model with a smaller number of layers, so that the accuracy of the trained second gesture recognition model is close to that of the first gesture recognition model. A gesture recognition model, but the amount of data processing is much smaller than the first gesture recognition model.
步骤S103,根据第一姿态识别模型的输出和第二姿态识别模型的输出,对第二姿态识别模型进行训练。Step S103, training the second gesture recognition model according to the output of the first gesture recognition model and the output of the second gesture recognition model.
需要说明的是,本申请实施例中,先对第一姿态识别模型进行训练,当第一姿态识别模型的识别准确度满足预设条件时,完成对第一姿态识别模型的训练,并利用训练完的第一姿态识别模型,对第二姿态识别模型进行训练。具体来说,将人体样本图像分别输入训练完的第一姿态识别模型和第二姿态识别模型,得到第一姿态识别模型和第二姿态识别模型的输出。根据第一姿态识别模型的输出和第二姿态识别模型的输出,对第二姿态识别模型进行训练。It should be noted that, in the embodiment of the present application, the first gesture recognition model is trained first, and when the recognition accuracy of the first gesture recognition model meets the preset condition, the training of the first gesture recognition model is completed, and the training is used After completing the first posture recognition model, train the second posture recognition model. Specifically, the human body sample image is input into the trained first gesture recognition model and the second gesture recognition model, respectively, to obtain the outputs of the first gesture recognition model and the second gesture recognition model. According to the output of the first posture recognition model and the output of the second posture recognition model, the second posture recognition model is trained.
举例来说,第一姿态识别模型可以包括8层堆叠的沙漏网络,第二姿态识别模型可以包括4层堆叠的沙漏网络,使得第二姿态识别模型在使用时,数据处理量远小于第一姿态识别模型,从而提升识别速度。此外,第二姿态识别模型输入的特征向量的维度也应当小于第一姿态识别模型,比如说第一姿态识别模型输入的特征向量的维度可以是256维,第二姿态识别模型输入的特征向量的维度可以是128维,从而使得第二姿态识别模型的数据处理量小于第一姿态识别模型。For example, the first gesture recognition model may include an 8-layer stacked hourglass network, and the second gesture recognition model may include a 4-layer stacked hourglass network, so that when the second gesture recognition model is used, the amount of data processing is much smaller than that of the first pose Recognize the model to improve the recognition speed. In addition, the dimension of the feature vector input by the second gesture recognition model should also be smaller than that of the first gesture recognition model. For example, the dimension of the feature vector input by the first gesture recognition model can be 256 dimensions. The dimension may be 128 dimensions, so that the data processing amount of the second gesture recognition model is smaller than that of the first gesture recognition model.
具体来说,本申请实施例中的第一姿态识别模型通过以下步骤进行训练:Specifically, the first gesture recognition model in the embodiment of the present application is trained through the following steps:
步骤S11,确定第一姿态识别模型的输出和人体样本姿态的第一差别。Step S11: Determine the first difference between the output of the first posture recognition model and the posture of the human body sample.
其中,第一姿态识别模型的输出为人体骨骼点,人体骨骼点为坐标的形式。具体来说,第一姿态识别模型输出的第k个人体骨骼点的坐标(x,y),人体样本姿态中第k个人体骨骼点的坐标为(x
k,y
k),则根据公式
计算第k个人体骨骼点的分布情况,其中σ
2为高斯分布的方差,根据公式
计算第一姿态识别模型的输出和人体样本姿态的第一差别。
Among them, the output of the first posture recognition model is a human skeleton point, and the human skeleton point is in the form of coordinates. Specifically, the coordinates (x, y) of the k-th human skeleton point output by the first pose recognition model, and the coordinates of the k-th human skeleton point in the human sample pose are (x k , y k ), then according to the formula Calculate the distribution of the k-th human skeleton points, where σ 2 is the variance of the Gaussian distribution, according to the formula Calculate the first difference between the output of the first posture recognition model and the posture of the human body sample.
步骤S12,根据第一差别,对第一姿态识别模型的参数进行优化。In step S12, the parameters of the first gesture recognition model are optimized according to the first difference.
需要说明的是,使用梯度下降法,可以对第一姿态识别模型的参数进行优化,使得L
1逐渐减小。
It should be noted that using the gradient descent method, the parameters of the first gesture recognition model can be optimized so that L 1 is gradually reduced.
步骤S103,根据第一姿态识别模型的输出和第二姿态识别模型的输出,对第二姿态识别模型进行训练,包括:Step S103, training the second gesture recognition model according to the output of the first gesture recognition model and the output of the second gesture recognition model, including:
步骤S21,确定第二姿态识别模型的输出和人体样本姿态的第二差别。Step S21: Determine the second difference between the output of the second posture recognition model and the posture of the human body sample.
可以理解,第二姿态识别模型的输出也是人体骨骼点,第k个人体骨骼点的坐标(x,y),人体样本姿态中第k个人体骨骼点的坐标为(x
k,y
k),则根据公式
计算第k个人体骨骼点的分布情况,其中σ
2为高斯分布的方差,根据公式
计算第二姿态识别模型的输出和人体样本姿态的第二差别。
It can be understood that the output of the second pose recognition model is also the human skeleton point, the coordinates of the kth human skeleton point (x, y), the coordinates of the kth human skeleton point in the human sample pose are (x k , y k ), According to the formula Calculate the distribution of the k-th human skeleton points, where σ 2 is the variance of the Gaussian distribution, according to the formula Calculate the second difference between the output of the second posture recognition model and the posture of the human body sample.
步骤S22,确定第一姿态识别模型的输出和第二姿态识别模型的输出的第三差别。Step S22: Determine the third difference between the output of the first gesture recognition model and the output of the second gesture recognition model.
具体地,根据公式
计算第一姿态识别模型的输出和第二姿态识别模型的输出的第三差别。其中,
为训练完的第一姿态识别模型的输出。
Specifically, according to the formula Calculate the third difference between the output of the first gesture recognition model and the output of the second gesture recognition model. in, Is the output of the trained first gesture recognition model.
步骤S23,根据第二差别和第三差别,对第二姿态识别模型的参数进行优化。In step S23, the parameters of the second gesture recognition model are optimized according to the second difference and the third difference.
一种可能的实现方式是,对第二差别和第三差别加权求和,以生成第四差别。其中,第二差别对应的权重和第三差别对应的权重之和为一,根据第四差别,对第二姿态识别模型的参数进行优化。具体来说,根据公式L
4=wL
3+(1-w)L
2计算第四差别。使用梯度下降法,可以对第二姿态识别模型的参数进行优化,使得L
4逐渐减小。
One possible implementation is to add weights to the second difference and the third difference to generate the fourth difference. Wherein, the sum of the weight corresponding to the second difference and the weight corresponding to the third difference is one, and the parameters of the second gesture recognition model are optimized according to the fourth difference. Specifically, the fourth difference is calculated according to the formula L 4 =wL 3 +(1-w)L 2. Using the gradient descent method, the parameters of the second gesture recognition model can be optimized so that L 4 is gradually reduced.
步骤S104,当训练次数达到预设阈值时,完成对第二姿态识别模型的训练。Step S104: When the number of training times reaches the preset threshold, the training of the second gesture recognition model is completed.
需要说明的是,在使用梯度下降法对第二姿态识别模型的参数进行优化时,需要不断调整梯度下降法中的学习率,即梯度下降过程中的步长,随着参数的不断优化,需要不断减小学习率,来减小参数优化的幅度。比如说,人体样本图像库中有40k的人体样本图像,将其中29k的人体样本图像作为训练数据,将剩余的11k的人体样本图像作为测试数据,训练开始时,将学习率的数值设置为0.01,将29k的人体样本图像全部训练完,并将剩余的11k全部测试完,得到对应的识别准确度,作为一次训练。训练次数为120次时,将学习率的数值调整为0.001,训练次数达到200次时,将学习率的数值调整为0.0001,训练次数达到250次时,完成对第二姿态识别模型的训练。It should be noted that when using the gradient descent method to optimize the parameters of the second attitude recognition model, the learning rate in the gradient descent method, that is, the step length in the gradient descent process, needs to be continuously adjusted. Continuously reduce the learning rate to reduce the scope of parameter optimization. For example, there are 40k human sample images in the human body sample image library, 29k of which are used as training data, and the remaining 11k human sample images are used as test data. At the beginning of training, set the value of the learning rate to 0.01 , After training all 29k human sample images, and testing all the remaining 11k, the corresponding recognition accuracy is obtained as a training. When the number of training times is 120 times, the value of the learning rate is adjusted to 0.001, when the number of training times reaches 200 times, the value of the learning rate is adjusted to 0.0001, and when the number of training times reaches 250 times, the training of the second gesture recognition model is completed.
综上所述,本申请实施例所提出的姿态识别模型的训练方法,获取人体样本图像和对应的人体样本姿态,将人体样本图像分别输入训练完的第一姿态识别模型和第二姿态识别模型。其中,第一姿态识别模型对应的沙漏网络的第一层数大于第二姿态识别模型对应的沙漏网络的第二层数。根据第一姿态识别模型的输出和第二姿态识别模型的输出,对第二姿态识别模型进行训练。当训练次数达到预设阈值时,完成对第二姿态识别模型的训练。由此,实现了利用训练完的层数较大的第一姿态识别模型的输出帮助层数较小的第二姿态识别模型进行训练,使得训练后的第二姿态识别模型的准确度接近第一姿态识别模型,但是数据处理量远小于第一姿态识别模型。To sum up, the training method of the posture recognition model proposed in the embodiment of the application obtains the human body sample image and the corresponding human body sample posture, and inputs the human body sample image into the trained first posture recognition model and the second posture recognition model respectively. . Wherein, the first number of layers of the hourglass network corresponding to the first gesture recognition model is greater than the second number of layers of the hourglass network corresponding to the second gesture recognition model. According to the output of the first posture recognition model and the output of the second posture recognition model, the second posture recognition model is trained. When the number of training times reaches the preset threshold, the training of the second gesture recognition model is completed. As a result, the output of the trained first gesture recognition model with a larger number of layers is used to help the second gesture recognition model with a smaller number of layers to train, so that the accuracy of the trained second gesture recognition model is close to that of the first. Posture recognition model, but the amount of data processing is much smaller than the first posture recognition model.
为了能将本申请实施例所提出的姿态识别模型的训练方法训练出的第二姿态识别模型用于人体图像的姿态识别,本申请实施例还提出了一种姿态识别方法,图5为本申请实施例所提出的姿态识别方法的流程示意图。如图5所示,该方法包括以下步骤:In order to be able to use the second posture recognition model trained by the posture recognition model training method proposed in the embodiment of the application for the posture recognition of the human body image, the embodiment of the application also proposes a posture recognition method. FIG. 5 is the application A schematic flowchart of the gesture recognition method proposed in the embodiment. As shown in Figure 5, the method includes the following steps:
步骤S201,获取待识别的当前帧人体图像,和上一帧人体图像对应的人体姿态。Step S201: Obtain the current frame of human body image to be recognized and the human body posture corresponding to the last frame of human body image.
其中,人体姿态包括人体骨骼点的位置。需要强调的是,为进一步保证上述人体样本图像和对应的人体样本姿态的私密和安全性,上述人体样本图像和对应的人体样本姿态还 可以存储于一区块链的节点中。Among them, the posture of the human body includes the position of the bone point of the human body. It should be emphasized that, in order to further ensure the privacy and security of the human body sample image and the corresponding human body sample posture, the human body sample image and the corresponding human body sample posture can also be stored in a blockchain node.
需要说明的是,由于本申请实施例所训练出的第二姿态识别模型的准确度不如第一姿态识别模型,为了弥补第二姿态识别模型在准确度上的不足,本申请实施例使用光流算法对第二姿态识别模型的识别准确度进行补偿。It should be noted that, since the accuracy of the second gesture recognition model trained in the embodiment of this application is not as good as that of the first gesture recognition model, in order to make up for the lack of accuracy of the second gesture recognition model, the embodiment of this application uses optical flow The algorithm compensates the recognition accuracy of the second gesture recognition model.
其中,光流是空间运动物体在观察成像平面上的像素运动的瞬时速度,光流算法是利用图像序列中像素在时间域上的变化以及相邻帧之间的相关性来找到上一帧跟当前帧之间存在的对应关系,从而计算出相邻帧之间物体的运动信息的一种方法。Among them, the optical flow is the instantaneous velocity of the pixel motion of the spatially moving object on the observation imaging plane. The optical flow algorithm uses the changes in the time domain of the pixels in the image sequence and the correlation between adjacent frames to find the previous frame. A method of calculating the motion information of objects between adjacent frames based on the corresponding relationship between the current frames.
在本申请实施例中,光流算法通过对上一帧人体图像中人体骨骼点的位置进行分析,能够对当前帧人体图像中对应的人体骨骼点的位置进行预测。In the embodiment of the present application, the optical flow algorithm can predict the position of the corresponding human skeleton point in the current frame of the human body image by analyzing the position of the human skeleton point in the previous frame of the human body image.
步骤S202,将当前帧人体图像输入训练后的第二姿态识别模型。Step S202: Input the current frame of human body image into the second posture recognition model after training.
其中,第二姿态识别模型通过如前述姿态识别模型的训练方法训练后生成。Wherein, the second gesture recognition model is generated after training by the training method of the aforementioned gesture recognition model.
应当理解,和第一姿态识别模型相比,第二姿态模型的数据处理量较小,因此识别速度较快。It should be understood that, compared with the first gesture recognition model, the second gesture model has a smaller amount of data processing, so the recognition speed is faster.
步骤S203,根据上一帧人体图像对应的人体骨骼点的位置,生成当前帧人体图像对应的人体骨骼点的预测位置。Step S203: According to the position of the human skeleton point corresponding to the last frame of the human body image, the predicted position of the human skeleton point corresponding to the current frame of the human body image is generated.
步骤S204,根据第二姿态识别模型的输出,以及当前帧人体图像对应的人体骨骼点的预测位置,生成当前帧人体图像对应的人体姿态。Step S204, according to the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame of the human image, generate the human posture corresponding to the current frame of the human image.
具体来说,对于前一帧人体图像对应的人体骨骼点,根据前一帧和当前帧人体图像对应的光流,得到当前帧人体图像对应的人体骨骼点的预测位置。根据公式
计算得到当前帧人体图像对应的人体姿态。其中,
是当前帧人体图像对应的第k个人体骨骼点的预测位置,K
cur是第二姿态识别模型的输出中第k个人体骨骼点的位置,
是当前帧人体图像对应的第k个人体骨骼点的位置,α为修正系数,为0.25-0.3之间的常量。根据全部人体骨骼点的位置,可以确定当前帧人体图像对应的人体姿态。
Specifically, for the human skeleton point corresponding to the human body image in the previous frame, the predicted position of the human skeleton point corresponding to the human body image in the current frame is obtained according to the optical flow corresponding to the human body image in the previous frame and the current frame. According to the formula The human body posture corresponding to the current frame of human body image is calculated. in, Is the predicted position of the k-th human skeleton point corresponding to the current frame of the human body image, and K cur is the position of the k-th human skeleton point in the output of the second pose recognition model, Is the position of the k-th human skeleton point corresponding to the current frame of the human body image, and α is the correction coefficient, which is a constant between 0.25-0.3. According to the positions of all the human skeleton points, the posture of the human body corresponding to the current frame of the human body image can be determined.
综上所述,本申请实施例所提出的姿态识别方法,获取待识别的当前帧人体图像,和上一帧人体图像对应的人体姿态。将当前帧人体图像输入训练后的第二姿态识别模型,根据上一帧人体图像对应的人体骨骼点的位置,生成当前帧人体图像对应的人体骨骼点的预测位置。根据第二姿态识别模型的输出,以及当前帧人体图像对应的人体骨骼点的预测位置,生成当前帧人体图像对应的人体姿态。由此,实现了利用光流算法,对第二姿态识别模型的输出进行补偿,提升了人体姿态识别的准确度。In summary, the posture recognition method proposed in the embodiment of the present application obtains the current frame of human body image to be recognized and the human body posture corresponding to the previous frame of human body image. The current frame of the human body image is input into the trained second posture recognition model, and the predicted position of the human skeleton point corresponding to the current frame of the human body image is generated according to the position of the human skeleton point corresponding to the previous frame of the human body image. According to the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame of the human image, the human posture corresponding to the current frame of the human image is generated. As a result, the optical flow algorithm is used to compensate the output of the second posture recognition model, and the accuracy of human posture recognition is improved.
为了实现上述实施例,本申请实施例还提出了一种姿态识别模型的训练装置,图6为本申请实施例所提出的一种姿态识别模型的训练装置的结构示意图。如图6所示,该装置包括:第一获取模块310,第一输入模块320,训练模块330,完成模块340。In order to implement the foregoing embodiment, an embodiment of the present application also proposes a training device for a gesture recognition model. FIG. 6 is a schematic structural diagram of a training device for a gesture recognition model proposed in an embodiment of this application. As shown in FIG. 6, the device includes: a first acquisition module 310, a first input module 320, a training module 330, and a completion module 340.
第一获取模块310,用于获取人体样本图像和对应的人体样本姿态。The first acquisition module 310 is used to acquire a human body sample image and a corresponding human body sample pose.
第一输入模块320,用于将人体样本图像分别输入训练完的第一姿态识别模型和第二姿态识别模型。The first input module 320 is configured to input the human body sample image into the trained first posture recognition model and the second posture recognition model respectively.
其中,第一姿态识别模型包括第一堆叠沙漏网络,第一堆叠沙漏网络包括第一层数的沙漏网络,第二姿态识别模型包括第二堆叠沙漏网络,第二堆叠沙漏网络包括第二层数的沙漏网络,第一层数大于第二层数。Wherein, the first gesture recognition model includes a first stacked hourglass network, the first stacked hourglass network includes a first layer of hourglass network, the second gesture recognition model includes a second stacked hourglass network, and the second stacked hourglass network includes a second layer In the hourglass network, the first layer is greater than the second layer.
训练模块330,用于根据第一姿态识别模型的输出和第二姿态识别模型的输出,对第二姿态识别模型进行训练。The training module 330 is configured to train the second gesture recognition model according to the output of the first gesture recognition model and the output of the second gesture recognition model.
完成模块340,用于当训练次数达到预设阈值时,完成对第二姿态识别模型的训练。The completion module 340 is configured to complete the training of the second gesture recognition model when the number of training times reaches the preset threshold.
进一步地,为了对第一姿态识别模型的参数进行优化,一种可能的实现方式是,该装置还包括:确定模块350,用于确定第一姿态识别模型的输出和人体样本姿态的第一差别。优化模块360,用于根据第一差别,对第一姿态识别模型的参数进行优化。Further, in order to optimize the parameters of the first posture recognition model, a possible implementation manner is that the device further includes: a determining module 350 for determining the first difference between the output of the first posture recognition model and the posture of the human body sample . The optimization module 360 is configured to optimize the parameters of the first gesture recognition model according to the first difference.
进一步地,为了对第二姿态识别模型的参数进行优化,一种可能的实现方式是,训练模块330,包括:第一确定子模块331,用于确定第二姿态识别模型的输出和人体样本姿态的第二差别。第二确定子模块332,用于确定第一姿态识别模型的输出和第二姿态识别模型的输出的第三差别。优化子模块333,用于根据第二差别和第三差别,对第二姿态识别模型的参数进行优化。Further, in order to optimize the parameters of the second posture recognition model, a possible implementation is that the training module 330 includes: a first determination sub-module 331 for determining the output of the second posture recognition model and the posture of the human body sample The second difference. The second determination sub-module 332 is used to determine the third difference between the output of the first gesture recognition model and the output of the second gesture recognition model. The optimization sub-module 333 is used to optimize the parameters of the second gesture recognition model according to the second difference and the third difference.
进一步地,为了综合考虑第二差别和第三差别,对第二姿态识别模型的参数进行优化,一种可能的实现方式,优化子模块333,包括:求和单元333a,用于对第二差别和第三差别加权求和,以生成第四差别。其中,第二差别对应的权重和第三差别对应的权重之和为一。优化单元333b,用于根据第四差别,对第二姿态识别模型的参数进行优化。Further, in order to comprehensively consider the second difference and the third difference, the parameters of the second gesture recognition model are optimized. A possible implementation is that the optimization sub-module 333 includes: a summation unit 333a for calculating the second difference And the third difference is weighted and summed to generate the fourth difference. Wherein, the sum of the weight corresponding to the second difference and the weight corresponding to the third difference is one. The optimization unit 333b is configured to optimize the parameters of the second gesture recognition model according to the fourth difference.
需要说明的是,前述对姿态识别模型的训练方法实施例的解释说明也适用于该实施例的姿态识别模型的训练装置,此处不再赘述。It should be noted that the foregoing explanation of the embodiment of the training method of the gesture recognition model is also applicable to the training device of the gesture recognition model of this embodiment, and will not be repeated here.
综上所述,本申请实施例所提出的姿态识别模型的训练装置,在对姿态识别模型进行训练时,获取人体样本图像和对应的人体样本姿态,将人体样本图像分别输入训练完的第一姿态识别模型和第二姿态识别模型。其中,第一姿态识别模型对应的沙漏网络的第一层数大于第二姿态识别模型对应的沙漏网络的第二层数。根据第一姿态识别模型的输出和第二姿态识别模型的输出,对第二姿态识别模型进行训练。当训练次数达到预设阈值时,完成对第二姿态识别模型的训练。由此,实现了利用训练完的层数较大的第一姿态识别模型的输出帮助层数较小的第二姿态识别模型进行训练,使得训练后的第二姿态识别模型的准确度接近第一姿态识别模型,但是数据处理量远小于第一姿态识别模型。To sum up, the training device for the gesture recognition model proposed in the embodiment of the present application acquires the human body sample image and the corresponding human body sample posture when training the gesture recognition model, and inputs the human body sample image into the trained first The gesture recognition model and the second gesture recognition model. Wherein, the first number of layers of the hourglass network corresponding to the first gesture recognition model is greater than the second number of layers of the hourglass network corresponding to the second gesture recognition model. According to the output of the first posture recognition model and the output of the second posture recognition model, the second posture recognition model is trained. When the number of training times reaches the preset threshold, the training of the second gesture recognition model is completed. As a result, the output of the trained first gesture recognition model with a larger number of layers is used to help the second gesture recognition model with a smaller number of layers to train, so that the accuracy of the trained second gesture recognition model is close to that of the first. Posture recognition model, but the amount of data processing is much smaller than the first posture recognition model.
为了实现上述实施例,本申请实施例还提出了一种姿态识别装置,图7为本申请实施例所提出的一种姿态识别装置的结构示意图。如图7所示,该装置包括:第二获取模块410,第二输入模块420,第一生成模块430,第二生成模块440。In order to implement the foregoing embodiments, an embodiment of the present application also proposes a gesture recognition device. FIG. 7 is a schematic structural diagram of a gesture recognition device proposed in an embodiment of the application. As shown in FIG. 7, the device includes: a second acquisition module 410, a second input module 420, a first generation module 430, and a second generation module 440.
第二获取模块410,用于获取待识别的当前帧人体图像,和上一帧人体图像对应的人体姿态。The second acquisition module 410 is configured to acquire the current frame of human body image to be recognized and the human body posture corresponding to the last frame of human body image.
其中,人体姿态包括人体骨骼点的位置。需要强调的是,为进一步保证上述人体样本图像和对应的人体样本姿态的私密和安全性,上述人体样本图像和对应的人体样本姿态还可以存储于一区块链的节点中。Among them, the posture of the human body includes the position of the bone point of the human body. It should be emphasized that, in order to further ensure the privacy and security of the human body sample image and the corresponding human body sample posture, the human body sample image and the corresponding human body sample posture may also be stored in a node of a blockchain.
第二输入模块420,用于将当前帧人体图像输入训练后的第二姿态识别模型。The second input module 420 is configured to input the current frame of the human body image into the trained second posture recognition model.
其中,第二姿态识别模型通过前述姿态识别模型的训练装置训练后生成。Wherein, the second gesture recognition model is generated after being trained by the training device of the aforementioned gesture recognition model.
第一生成模块430,用于根据上一帧人体图像对应的人体骨骼点的位置,生成当前帧人体图像对应的人体骨骼点的预测位置。The first generating module 430 is configured to generate the predicted position of the human skeleton point corresponding to the current frame of the human body image according to the position of the human skeleton point corresponding to the previous frame of the human body image.
第二生成模块440,用于根据第二姿态识别模型的输出,以及当前帧人体图像对应的人体骨骼点的预测位置,生成当前帧人体图像对应的人体姿态。The second generating module 440 is configured to generate the human body pose corresponding to the current frame of the human body image according to the output of the second pose recognition model and the predicted position of the human skeleton point corresponding to the current frame of human body image.
需要说明的是,前述对姿态识别方法实施例的解释说明也适用于该实施例的姿态识别装置,此处不再赘述。It should be noted that the foregoing explanation of the embodiment of the gesture recognition method is also applicable to the gesture recognition device of this embodiment, and will not be repeated here.
综上所述,本申请实施例所提出的姿态识别装置,在进行姿态识别时,获取待识别的当前帧人体图像,和上一帧人体图像对应的人体姿态。将当前帧人体图像输入训练后的第二姿态识别模型,根据上一帧人体图像对应的人体骨骼点的位置,生成当前帧人体图像对应的人体骨骼点的预测位置。根据第二姿态识别模型的输出,以及当前帧人体图像对应的人体骨骼点的预测位置,生成当前帧人体图像对应的人体姿态。由此,实现了利用光流算法,对第二姿态识别模型的输出进行补偿,提升了人体姿态识别的准确度。In summary, the gesture recognition device proposed in the embodiment of the present application acquires the current frame of human body image to be recognized and the human body posture corresponding to the previous frame of human body image when performing gesture recognition. The current frame of the human body image is input into the trained second posture recognition model, and the predicted position of the human bone point corresponding to the current frame of the human body image is generated according to the position of the human skeleton point corresponding to the previous frame of the human body image. According to the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame of the human image, the human posture corresponding to the current frame of the human image is generated. As a result, the optical flow algorithm is used to compensate the output of the second posture recognition model, and the accuracy of human posture recognition is improved.
为了实现上述实施例,本申请实施例还提出一种计算机设备,包括存储器、处理器以及存储在存储器中并可在处理器上运行的计算机程序,其中,处理器执行计算机程序时实现如前述方法实施例的姿态识别模型的训练方法的步骤。In order to implement the above-mentioned embodiments, the embodiments of the present application also propose a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to implement the aforementioned method The steps of the training method of the gesture recognition model of the embodiment.
为了实现上述实施例,本申请实施例还提出一种计算机设备,包括存储器、处理器以 及存储在存储器中并可在处理器上运行的计算机程序,其中,处理器执行计算机程序时实现如前述方法实施例的姿态识别方法的步骤。In order to implement the above embodiments, the embodiments of the present application also propose a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor, where the processor implements the aforementioned method when the computer program is executed. The steps of the gesture recognition method of the embodiment.
图8为本申请实施例提供的一种计算机设备的示意图。如图8所示,该实施例的计算机设备50包括:处理器51、存储器52以及存储在存储器52中并可在处理器51上运行的计算机程序53,该计算机程序53被处理器51执行时实现实施例中的姿态识别模型的训练方法和姿态识别方法,为避免重复,此处不一一赘述。或者,该计算机程序被处理器51执行时实现实施例中基于婴儿哭声的情绪检测装置中各模型/单元的功能,为避免重复,此处不一一赘述。FIG. 8 is a schematic diagram of a computer device provided by an embodiment of this application. As shown in FIG. 8, the computer device 50 of this embodiment includes: a processor 51, a memory 52, and a computer program 53 stored in the memory 52 and running on the processor 51. When the computer program 53 is executed by the processor 51, In order to avoid repetition, the training method of the gesture recognition model and the method of gesture recognition in the embodiment are not repeated here. Alternatively, when the computer program is executed by the processor 51, the function of each model/unit in the baby crying-based emotion detection device in the embodiment is realized. To avoid repetition, it will not be repeated here.
计算机设备50可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。计算机设备可包括,但不仅限于,处理器51、存储器52。本领域技术人员可以理解,图8仅仅是计算机设备50的示例,并不构成对计算机设备50的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如计算机设备还可以包括输入输出设备、网络接入设备、总线等。The computer device 50 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The computer device may include, but is not limited to, a processor 51 and a memory 52. Those skilled in the art can understand that FIG. 8 is only an example of the computer device 50, and does not constitute a limitation on the computer device 50. It may include more or less components than those shown in the figure, or a combination of certain components, or different components. For example, computer equipment may also include input and output devices, network access devices, buses, and so on.
所称处理器51可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor 51 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
存储器52可以是计算机设备50的内部存储单元,例如计算机设备50的硬盘或内存。存储器52也可以是计算机设备50的外部存储设备,例如计算机设备50上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器52还可以既包括计算机设备50的内部存储单元也包括外部存储设备。存储器52用于存储计算机程序以及计算机设备所需的其他程序和数据。存储器52还可以用于暂时地存储已经输出或者将要输出的数据。The memory 52 may be an internal storage unit of the computer device 50, such as a hard disk or memory of the computer device 50. The memory 52 may also be an external storage device of the computer device 50, such as a plug-in hard disk equipped on the computer device 50, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, and a flash memory card (Flash). Card) and so on. Further, the memory 52 may also include both an internal storage unit of the computer device 50 and an external storage device. The memory 52 is used to store computer programs and other programs and data required by the computer equipment. The memory 52 can also be used to temporarily store data that has been output or will be output.
为了实现上述实施例,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,计算机可读存储介质存储有计算机程序,其中,计算机程序被处理器执行时实现如前述方法实施例的姿态识别模型的训练方法的步骤。In order to implement the above-mentioned embodiments, the embodiments of the present application also propose a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium stores a computer program. Wherein, when the computer program is executed by the processor, the steps of the training method of the gesture recognition model as in the foregoing method embodiment are implemented.
为了实现上述实施例,本申请实施例还提出一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,其中,计算机程序被处理器执行时实现如前述方法实施例的姿态识别方法的步骤。In order to implement the above-mentioned embodiment, the embodiment of the present application also proposes a computer-readable storage medium. The computer-readable storage medium stores a computer program. step.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method can be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机装置(可以是个人计算机,服务器,或者网络装置等)或处理器(Processor)执行本申请各个实施例方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The above-mentioned software functional unit is stored in a storage medium and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (Processor) execute part of the steps of the methods in the various embodiments of the present application . The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
以上仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。The above are only preferred embodiments of this application, and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection of this application. Within range.
Claims (20)
- 一种姿态识别模型的训练方法,其中,包括:A training method for a gesture recognition model, which includes:获取人体样本图像和对应的人体样本姿态;Obtain the human body sample image and the corresponding human body sample pose;将所述人体样本图像分别输入训练完的第一姿态识别模型和第二姿态识别模型;其中,所述第一姿态识别模型包括第一堆叠沙漏网络,所述第一堆叠沙漏网络包括第一层数的沙漏网络,所述第二姿态识别模型包括第二堆叠沙漏网络,所述第二堆叠沙漏网络包括第二层数的沙漏网络,所述第一层数大于所述第二层数;The human body sample image is input into the trained first posture recognition model and the second posture recognition model respectively; wherein, the first posture recognition model includes a first stacked hourglass network, and the first stacked hourglass network includes a first layer A number of hourglass networks, the second gesture recognition model includes a second stacked hourglass network, the second stacked hourglass network includes a second number of hourglass networks, and the first number of layers is greater than the second number of layers;根据所述第一姿态识别模型的输出和所述第二姿态识别模型的输出,对所述第二姿态识别模型进行训练;以及Training the second gesture recognition model according to the output of the first gesture recognition model and the output of the second gesture recognition model; and当训练次数达到预设阈值时,完成对所述第二姿态识别模型的训练。When the number of training times reaches the preset threshold, the training of the second gesture recognition model is completed.
- 如权利要求1所述的训练方法,其中,所述第一姿态识别模型通过以下步骤进行训练:The training method of claim 1, wherein the first gesture recognition model is trained through the following steps:确定所述第一姿态识别模型的输出和所述人体样本姿态的第一差别;Determining the first difference between the output of the first posture recognition model and the posture of the human sample;根据所述第一差别,对所述第一姿态识别模型的参数进行优化。According to the first difference, the parameters of the first gesture recognition model are optimized.
- 如权利要求2所述的训练方法,其中,所述人体样本图像和对应的人体样本姿态存储于区块链中,所述根据所述第一姿态识别模型的输出和所述第二姿态识别模型的输出,对所述第二姿态识别模型进行训练,包括:The training method of claim 2, wherein the human body sample image and the corresponding human body sample posture are stored in a blockchain, and the output of the first posture recognition model and the second posture recognition model are The output of training the second gesture recognition model includes:确定所述第二姿态识别模型的输出和所述人体样本姿态的第二差别;Determining the second difference between the output of the second posture recognition model and the posture of the human body sample;确定所述第一姿态识别模型的输出和所述第二姿态识别模型的输出的第三差别;Determining a third difference between the output of the first gesture recognition model and the output of the second gesture recognition model;根据所述第二差别和所述第三差别,对所述第二姿态识别模型的参数进行优化。According to the second difference and the third difference, the parameters of the second gesture recognition model are optimized.
- 如权利要求1所述的训练方法,其中,所述第二姿态识别模型输入的特征向量的维度小于所述第一姿态识别模型输入的特征向量的维度。The training method of claim 1, wherein the dimension of the feature vector input by the second gesture recognition model is smaller than the dimension of the feature vector input by the first gesture recognition model.
- 如权利要求2所述的训练方法,其中,所述第一姿态识别模型的输出为坐标形式的人体骨骼点。The training method according to claim 2, wherein the output of the first gesture recognition model is a human skeleton point in the form of coordinates.
- 如权利要求1所述的训练方法,其中,所述第一姿态识别模型包括8层堆叠的沙漏网络,所述第二姿态识别模型包括4层堆叠的沙漏网络。The training method of claim 1, wherein the first gesture recognition model includes an 8-layer stacked hourglass network, and the second gesture recognition model includes a 4-layer stacked hourglass network.
- 一种姿态识别方法,其中,包括:A gesture recognition method, which includes:获取待识别的当前帧人体图像,和上一帧人体图像对应的人体姿态;其中,所述人体姿态包括人体骨骼点的位置;Acquiring the current frame of human body image to be recognized and the human body posture corresponding to the last frame of human body image; wherein, the human body posture includes the position of the human skeleton point;将所述当前帧人体图像输入训练后的第二姿态识别模型;其中,所述第二姿态识别模型通过如权利要求1-6中任一项所述姿态识别模型的训练方法训练后生成;Input the current frame of the human body image into the trained second posture recognition model; wherein the second posture recognition model is generated after being trained by the training method of the posture recognition model according to any one of claims 1-6;根据所述上一帧人体图像对应的人体骨骼点的位置,生成所述当前帧人体图像对应的人体骨骼点的预测位置;以及Generating the predicted position of the human skeleton point corresponding to the current frame of the human body image according to the position of the human skeleton point corresponding to the last frame of the human body image; and根据所述第二姿态识别模型的输出,以及所述当前帧人体图像对应的人体骨骼点的预测位置,生成所述当前帧人体图像对应的人体姿态。According to the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame of the human image, the human posture corresponding to the current frame of the human image is generated.
- 如权利要求7所述的姿态识别方法,其中,所述根据所述第二姿态识别模型的输出,以及所述当前帧人体图像对应的人体骨骼点的预测位置,生成所述当前帧人体图像对应的人体姿态的计算公式包括:The gesture recognition method according to claim 7, wherein said generating the corresponding human body image corresponding to the current frame based on the output of the second gesture recognition model and the predicted position of the human skeleton point corresponding to the human body image of the current frame The formula for calculating the posture of the human body includes:其中, 表示当前帧人体图像对应的第k个人体骨骼点的预测位置,K cur表示第二姿态识别模型的输出中第k个人体骨骼点的位置, 表示当前帧人体图像对应的第k个人体骨骼点的位置,α表示修正系数。 in, Represents the predicted position of the k-th human skeleton point corresponding to the current frame of the human body image, and K cur represents the position of the k-th human skeleton point in the output of the second gesture recognition model, Indicates the position of the k-th human skeleton point corresponding to the human body image in the current frame, and α represents the correction coefficient.
- 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如下步骤,以实现 姿态模型的训练:A computer device includes a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor implements the following steps when the processor executes the computer program, so as to realize a posture model Training:获取人体样本图像和对应的人体样本姿态;Obtain the human body sample image and the corresponding human body sample pose;将所述人体样本图像分别输入训练完的第一姿态识别模型和第二姿态识别模型;其中,所述第一姿态识别模型包括第一堆叠沙漏网络,所述第一堆叠沙漏网络包括第一层数的沙漏网络,所述第二姿态识别模型包括第二堆叠沙漏网络,所述第二堆叠沙漏网络包括第二层数的沙漏网络,所述第一层数大于所述第二层数;The human body sample image is input into the trained first posture recognition model and the second posture recognition model respectively; wherein, the first posture recognition model includes a first stacked hourglass network, and the first stacked hourglass network includes a first layer A number of hourglass networks, the second gesture recognition model includes a second stacked hourglass network, the second stacked hourglass network includes a second number of hourglass networks, and the first number of layers is greater than the second number of layers;根据所述第一姿态识别模型的输出和所述第二姿态识别模型的输出,对所述第二姿态识别模型进行训练;以及Training the second gesture recognition model according to the output of the first gesture recognition model and the output of the second gesture recognition model; and当训练次数达到预设阈值时,完成对所述第二姿态识别模型的训练。When the number of training times reaches the preset threshold, the training of the second gesture recognition model is completed.
- 如权利要求9所述的计算机设备,其中,所述第一姿态识别模型通过以下步骤进行训练:9. The computer device of claim 9, wherein the first gesture recognition model is trained through the following steps:确定所述第一姿态识别模型的输出和所述人体样本姿态的第一差别;Determining the first difference between the output of the first posture recognition model and the posture of the human sample;根据所述第一差别,对所述第一姿态识别模型的参数进行优化。According to the first difference, the parameters of the first gesture recognition model are optimized.
- 如权利要求10所述的计算机设备,其中,所述人体样本图像和对应的人体样本姿态存储于区块链中,所述根据所述第一姿态识别模型的输出和所述第二姿态识别模型的输出,对所述第二姿态识别模型进行训练,包括:The computer device according to claim 10, wherein the human body sample image and the corresponding human body sample pose are stored in a blockchain, and the output of the recognition model based on the first pose and the second pose recognition model The output of training the second gesture recognition model includes:确定所述第二姿态识别模型的输出和所述人体样本姿态的第二差别;Determining the second difference between the output of the second posture recognition model and the posture of the human body sample;确定所述第一姿态识别模型的输出和所述第二姿态识别模型的输出的第三差别;Determining a third difference between the output of the first gesture recognition model and the output of the second gesture recognition model;根据所述第二差别和所述第三差别,对所述第二姿态识别模型的参数进行优化。According to the second difference and the third difference, the parameters of the second gesture recognition model are optimized.
- 如权利要求9所述的计算机设备,其中,所述第二姿态识别模型输入的特征向量的维度小于所述第一姿态识别模型输入的特征向量的维度。9. The computer device of claim 9, wherein the dimension of the feature vector input by the second gesture recognition model is smaller than the dimension of the feature vector input by the first gesture recognition model.
- 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如下步骤,以实现姿态识别:A computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor implements the following steps when the processor executes the computer program to realize gesture recognition :获取待识别的当前帧人体图像,和上一帧人体图像对应的人体姿态;其中,所述人体姿态包括人体骨骼点的位置;Acquiring the current frame of human body image to be recognized and the human body posture corresponding to the last frame of human body image; wherein, the human body posture includes the position of the human skeleton point;将所述当前帧人体图像输入训练后的第二姿态识别模型;其中,所述第二姿态识别模型通过如权利要求1-6中任一项所述姿态识别模型的训练方法训练后生成;Input the current frame of the human body image into the trained second posture recognition model; wherein the second posture recognition model is generated after being trained by the training method of the posture recognition model according to any one of claims 1-6;根据所述上一帧人体图像对应的人体骨骼点的位置,生成所述当前帧人体图像对应的人体骨骼点的预测位置;以及Generating the predicted position of the human skeleton point corresponding to the current frame of the human body image according to the position of the human skeleton point corresponding to the last frame of the human body image; and根据所述第二姿态识别模型的输出,以及所述当前帧人体图像对应的人体骨骼点的预测位置,生成所述当前帧人体图像对应的人体姿态。According to the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame of the human image, the human posture corresponding to the current frame of the human image is generated.
- 如权利要求13所述的计算机设备,其中,所述根据所述第二姿态识别模型的输出,以及所述当前帧人体图像对应的人体骨骼点的预测位置,生成所述当前帧人体图像对应的人体姿态的计算公式包括:The computer device according to claim 13, wherein the output corresponding to the current frame of the human body image is generated based on the output of the second gesture recognition model and the predicted position of the human skeleton point corresponding to the current frame of the human body image The calculation formula of human body posture includes:其中, 表示当前帧人体图像对应的第k个人体骨骼点的预测位置,K cur表示第二姿态识别模型的输出中第k个人体骨骼点的位置, 表示当前帧人体图像对应的第k个人体骨骼点的位置,α表示修正系数。 in, Represents the predicted position of the k-th human skeleton point corresponding to the current frame of the human body image, and K cur represents the position of the k-th human skeleton point in the output of the second gesture recognition model, Indicates the position of the k-th human skeleton point corresponding to the human body image in the current frame, and α represents the correction coefficient.
- 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下步骤,以实现姿态模型的训练:A computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein, when the computer program is executed by a processor, the following steps are implemented to realize the training of a posture model:获取人体样本图像和对应的人体样本姿态;Obtain the human body sample image and the corresponding human body sample pose;将所述人体样本图像分别输入训练完的第一姿态识别模型和第二姿态识别模型;其中,所述第一姿态识别模型包括第一堆叠沙漏网络,所述第一堆叠沙漏网络包括第一层数的沙漏网络,所述第二姿态识别模型包括第二堆叠沙漏网络,所述第二堆叠沙漏网络包括第二 层数的沙漏网络,所述第一层数大于所述第二层数;The human body sample image is input into the trained first posture recognition model and the second posture recognition model respectively; wherein, the first posture recognition model includes a first stacked hourglass network, and the first stacked hourglass network includes a first layer A number of hourglass networks, the second gesture recognition model includes a second stacked hourglass network, the second stacked hourglass network includes a second number of hourglass networks, and the first number of layers is greater than the second number of layers;根据所述第一姿态识别模型的输出和所述第二姿态识别模型的输出,对所述第二姿态识别模型进行训练;以及Training the second gesture recognition model according to the output of the first gesture recognition model and the output of the second gesture recognition model; and当训练次数达到预设阈值时,完成对所述第二姿态识别模型的训练。When the number of training times reaches the preset threshold, the training of the second gesture recognition model is completed.
- 如权利要求15所述的计算机可读存储介质,其中,所述第一姿态识别模型通过以下步骤进行训练:15. The computer-readable storage medium of claim 15, wherein the first gesture recognition model is trained through the following steps:确定所述第一姿态识别模型的输出和所述人体样本姿态的第一差别;Determining the first difference between the output of the first posture recognition model and the posture of the human sample;根据所述第一差别,对所述第一姿态识别模型的参数进行优化。According to the first difference, the parameters of the first gesture recognition model are optimized.
- 如权利要求16所述的计算机可读存储介质,其中,所述人体样本图像和对应的人体样本姿态存储于区块链中,所述根据所述第一姿态识别模型的输出和所述第二姿态识别模型的输出,对所述第二姿态识别模型进行训练,包括:The computer-readable storage medium according to claim 16, wherein the human body sample image and the corresponding human body sample pose are stored in a blockchain, and the output of the recognition model according to the first pose and the second The output of the gesture recognition model to train the second gesture recognition model includes:确定所述第二姿态识别模型的输出和所述人体样本姿态的第二差别;Determining the second difference between the output of the second posture recognition model and the posture of the human body sample;确定所述第一姿态识别模型的输出和所述第二姿态识别模型的输出的第三差别;Determining a third difference between the output of the first gesture recognition model and the output of the second gesture recognition model;根据所述第二差别和所述第三差别,对所述第二姿态识别模型的参数进行优化。According to the second difference and the third difference, the parameters of the second gesture recognition model are optimized.
- 如权利要求15所述的计算机可读存储介质,其中,所述第二姿态识别模型输入的特征向量的维度小于所述第一姿态识别模型输入的特征向量的维度。15. The computer-readable storage medium of claim 15, wherein the dimension of the feature vector input by the second gesture recognition model is smaller than the dimension of the feature vector input by the first gesture recognition model.
- 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下步骤,,以实现姿态识别:A computer-readable storage medium storing a computer program, wherein the computer program implements the following steps when executed by a processor to realize gesture recognition:获取待识别的当前帧人体图像,和上一帧人体图像对应的人体姿态;其中,所述人体姿态包括人体骨骼点的位置;Acquiring the current frame of human body image to be recognized and the human body posture corresponding to the last frame of human body image; wherein, the human body posture includes the position of the human skeleton point;将所述当前帧人体图像输入训练后的第二姿态识别模型;其中,所述第二姿态识别模型通过如权利要求1-6中任一项所述姿态识别模型的训练方法训练后生成;Input the current frame of the human body image into the trained second posture recognition model; wherein the second posture recognition model is generated after being trained by the training method of the posture recognition model according to any one of claims 1-6;根据所述上一帧人体图像对应的人体骨骼点的位置,生成所述当前帧人体图像对应的人体骨骼点的预测位置;以及Generating the predicted position of the human skeleton point corresponding to the current frame of the human body image according to the position of the human skeleton point corresponding to the last frame of the human body image; and根据所述第二姿态识别模型的输出,以及所述当前帧人体图像对应的人体骨骼点的预测位置,生成所述当前帧人体图像对应的人体姿态。According to the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame of the human image, the human posture corresponding to the current frame of the human image is generated.
- 如权利要求19所述的计算机可读存储介质,其中,所述根据所述第二姿态识别模型的输出,以及所述当前帧人体图像对应的人体骨骼点的预测位置,生成所述当前帧人体图像对应的人体姿态的计算公式包括:The computer-readable storage medium of claim 19, wherein the current frame human body is generated based on the output of the second posture recognition model and the predicted position of the human skeleton point corresponding to the current frame human body image The calculation formula for the posture of the human body corresponding to the image includes:其中, 表示当前帧人体图像对应的第k个人体骨骼点的预测位置,K cur表示第二姿态识别模型的输出中第k个人体骨骼点的位置, 表示当前帧人体图像对应的第k个人体骨骼点的位置,α表示修正系数。 in, Represents the predicted position of the k-th human skeleton point corresponding to the current frame of the human body image, and K cur represents the position of the k-th human skeleton point in the output of the second gesture recognition model, Indicates the position of the k-th human skeleton point corresponding to the human body image in the current frame, and α represents the correction coefficient.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010343546.2 | 2020-04-27 | ||
CN202010343546.2A CN111539349A (en) | 2020-04-27 | 2020-04-27 | Training method and device of gesture recognition model, gesture recognition method and device thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021217937A1 true WO2021217937A1 (en) | 2021-11-04 |
Family
ID=71977326
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/105898 WO2021217937A1 (en) | 2020-04-27 | 2020-07-30 | Posture recognition model training method and device, and posture recognition method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111539349A (en) |
WO (1) | WO2021217937A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114463856A (en) * | 2022-04-13 | 2022-05-10 | 深圳金信诺高新技术股份有限公司 | Method, device, equipment and medium for training attitude estimation model and attitude estimation |
CN114782981A (en) * | 2022-03-07 | 2022-07-22 | 奥比中光科技集团股份有限公司 | Human body posture estimation model, model training method and human body posture estimation method |
CN115410137A (en) * | 2022-11-01 | 2022-11-29 | 杭州新中大科技股份有限公司 | Double-flow worker labor state identification method based on space-time characteristics |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112101259A (en) * | 2020-09-21 | 2020-12-18 | 中国农业大学 | Single pig body posture recognition system and method based on stacked hourglass network |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160086015A1 (en) * | 2007-01-09 | 2016-03-24 | Si Corporation | Method and system for automated face detection and recognition |
CN107480720A (en) * | 2017-08-18 | 2017-12-15 | 成都通甲优博科技有限责任公司 | Human body attitude model training method and device |
CN109410240A (en) * | 2018-10-09 | 2019-03-01 | 电子科技大学中山学院 | Method and device for positioning volume characteristic points and storage medium thereof |
CN109409209A (en) * | 2018-09-11 | 2019-03-01 | 广州杰赛科技股份有限公司 | A kind of Human bodys' response method and apparatus |
CN109508688A (en) * | 2018-11-26 | 2019-03-22 | 平安科技(深圳)有限公司 | Behavioral value method, terminal device and computer storage medium based on skeleton |
CN109753891A (en) * | 2018-12-19 | 2019-05-14 | 山东师范大学 | Football player's orientation calibration method and system based on human body critical point detection |
CN109766887A (en) * | 2019-01-16 | 2019-05-17 | 中国科学院光电技术研究所 | A kind of multi-target detection method based on cascade hourglass neural network |
-
2020
- 2020-04-27 CN CN202010343546.2A patent/CN111539349A/en active Pending
- 2020-07-30 WO PCT/CN2020/105898 patent/WO2021217937A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160086015A1 (en) * | 2007-01-09 | 2016-03-24 | Si Corporation | Method and system for automated face detection and recognition |
CN107480720A (en) * | 2017-08-18 | 2017-12-15 | 成都通甲优博科技有限责任公司 | Human body attitude model training method and device |
CN109409209A (en) * | 2018-09-11 | 2019-03-01 | 广州杰赛科技股份有限公司 | A kind of Human bodys' response method and apparatus |
CN109410240A (en) * | 2018-10-09 | 2019-03-01 | 电子科技大学中山学院 | Method and device for positioning volume characteristic points and storage medium thereof |
CN109508688A (en) * | 2018-11-26 | 2019-03-22 | 平安科技(深圳)有限公司 | Behavioral value method, terminal device and computer storage medium based on skeleton |
CN109753891A (en) * | 2018-12-19 | 2019-05-14 | 山东师范大学 | Football player's orientation calibration method and system based on human body critical point detection |
CN109766887A (en) * | 2019-01-16 | 2019-05-17 | 中国科学院光电技术研究所 | A kind of multi-target detection method based on cascade hourglass neural network |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114782981A (en) * | 2022-03-07 | 2022-07-22 | 奥比中光科技集团股份有限公司 | Human body posture estimation model, model training method and human body posture estimation method |
CN114463856A (en) * | 2022-04-13 | 2022-05-10 | 深圳金信诺高新技术股份有限公司 | Method, device, equipment and medium for training attitude estimation model and attitude estimation |
CN115410137A (en) * | 2022-11-01 | 2022-11-29 | 杭州新中大科技股份有限公司 | Double-flow worker labor state identification method based on space-time characteristics |
CN115410137B (en) * | 2022-11-01 | 2023-04-14 | 杭州新中大科技股份有限公司 | Double-flow worker labor state identification method based on space-time characteristics |
Also Published As
Publication number | Publication date |
---|---|
CN111539349A (en) | 2020-08-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021217937A1 (en) | Posture recognition model training method and device, and posture recognition method and device | |
CN108205655B (en) | Key point prediction method and device, electronic equipment and storage medium | |
US10733431B2 (en) | Systems and methods for optimizing pose estimation | |
WO2019228317A1 (en) | Face recognition method and device, and computer readable medium | |
US20190172224A1 (en) | Optimizations for Structure Mapping and Up-sampling | |
JP2019096006A (en) | Information processing device, and information processing method | |
CN110889446A (en) | Face image recognition model training and face image recognition method and device | |
CN110879982B (en) | Crowd counting system and method | |
CN111782840A (en) | Image question-answering method, image question-answering device, computer equipment and medium | |
CN114332578A (en) | Image anomaly detection model training method, image anomaly detection method and device | |
CN111104925B (en) | Image processing method, image processing apparatus, storage medium, and electronic device | |
CN113095129B (en) | Gesture estimation model training method, gesture estimation device and electronic equipment | |
CN113642431A (en) | Training method and device of target detection model, electronic equipment and storage medium | |
WO2023098912A1 (en) | Image processing method and apparatus, storage medium, and electronic device | |
TWI803243B (en) | Method for expanding images, computer device and storage medium | |
CN111368768A (en) | Human body key point-based employee gesture guidance detection method | |
CN114495006A (en) | Detection method and device for left-behind object and storage medium | |
CN111353325A (en) | Key point detection model training method and device | |
Madane et al. | Social distancing detection and analysis through computer vision | |
CN115239508A (en) | Scene planning adjustment method, device, equipment and medium based on artificial intelligence | |
CN116453221A (en) | Target object posture determining method, training device and storage medium | |
Feng | Mask RCNN-based single shot multibox detector for gesture recognition in physical education | |
CN114120454A (en) | Training method and device of living body detection model, electronic equipment and storage medium | |
CN112101185A (en) | Method for training wrinkle detection model, electronic device and storage medium | |
CN111126566A (en) | Abnormal furniture layout data detection method based on GAN model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20933544 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 15.03.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20933544 Country of ref document: EP Kind code of ref document: A1 |