CN115601828A

CN115601828A - Dance detection method, equipment, storage medium and device

Info

Publication number: CN115601828A
Application number: CN202110781921.6A
Authority: CN
Inventors: 殷雅俊; 杜平杰
Original assignee: Beijing Mijinghefeng Technology Co ltd
Current assignee: Beijing Mijinghefeng Technology Co ltd
Priority date: 2021-07-09
Filing date: 2021-07-09
Publication date: 2023-01-13

Abstract

The invention discloses a dance detection method, equipment, a storage medium and a device, which are applied to the technical field of Internet, wherein the method comprises the following steps: the dancing detection method based on the human body posture features comprises the steps of carrying out image interception on a video to be detected to obtain a plurality of images to be detected, carrying out feature extraction on the plurality of images to be detected respectively to obtain a plurality of human body posture features, and carrying out dancing detection through a preset dancing prediction model according to the plurality of human body posture features.

Description

Dance detection method, equipment, storage medium and device

Technical Field

The invention relates to the technical field of internet, in particular to a dance detection method, equipment, a storage medium and a device.

Background

At present, when dance behaviors in videos are identified, the dance behaviors are often identified manually. However, the manual identification method has the defects of slow identification speed and low accuracy.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a dance detection method, equipment, a storage medium and a device, and aims to solve the technical problems that in the prior art, the speed and the accuracy of dance behavior identification in a video is low through manual identification.

In order to achieve the above object, the present invention provides a dance detection method, comprising the following steps:

carrying out image interception on a video to be detected to obtain a plurality of images to be detected;

respectively extracting the features of the images to be detected to obtain a plurality of human body posture features;

and carrying out dance detection through a preset dance prediction model according to the human body posture characteristics.

Optionally, before the step of performing image capturing on the video to be detected to obtain a plurality of images to be detected, the dance detection method further includes:

carrying out image interception on the video samples to obtain a plurality of image samples;

respectively extracting the features of the image samples to obtain a plurality of human posture feature samples;

and performing model training on the initial dance prediction model according to the plurality of human posture characteristic samples to obtain a preset dance prediction model.

Optionally, the step of respectively performing feature extraction on the plurality of image samples to obtain a plurality of human posture feature samples includes:

respectively extracting key points of the plurality of image samples to obtain a plurality of target human body key point samples and connection relation samples among the target human body key point samples;

and determining a plurality of human posture characteristic samples according to the plurality of target human key point samples and the connection relation samples.

Optionally, the step of extracting key points from the multiple images to be detected respectively to obtain a plurality of target human body key point samples and a connection relation sample between each target human body key point sample includes:

respectively extracting key points of the plurality of image samples to obtain key point heat map samples and component connection affinity field samples;

and determining a plurality of target human body key point samples and connection relation samples between each target human body key point sample according to the key point heat map sample and the component connection affinity field sample.

Optionally, the step of determining a plurality of target human body key point samples and connection relationship samples between each target human body key point sample according to the key point heat map sample and the component connection affinity field sample includes:

obtaining a plurality of initial human body key point samples corresponding to the key point heat pattern, and screening the plurality of initial human body key point samples based on the key point heat pattern samples to obtain key point candidate set samples;

determining a component connection candidate set sample according to the keypoint candidate set sample and the component connection affinity field sample;

and determining a plurality of target human body key point samples and connection relation samples between the target human body key point samples according to the key point candidate set samples and the component connection candidate set samples.

Optionally, the step of performing model training on the initial dance prediction model according to the plurality of human posture feature samples to obtain a preset dance prediction model includes:

inputting a plurality of human body posture characteristic samples into an initial dancing prediction model to obtain an initial prediction result;

searching dance marking information corresponding to the video sample;

and adjusting parameters of the initial dancing prediction model according to the initial prediction result and the dancing mark information to obtain a preset dancing prediction model.

Optionally, after the step of adjusting parameters of the initial dancing prediction model according to the initial prediction result and the dancing mark information to obtain a preset dancing prediction model, the dancing detection method further includes:

extracting actual dancing information from the dancing marking information, and matching the actual dancing information with the initial prediction result;

and when the matching fails, performing parameter adjustment on the initial dancing prediction model to obtain a preset dancing prediction model.

Optionally, the step of respectively performing feature extraction on a plurality of images to be detected to obtain a plurality of human posture features includes:

respectively extracting key points of the images to be detected to obtain a plurality of target human body key points and the connection relation between the target human body key points;

and determining a plurality of human posture characteristics according to the plurality of target human key points and the connection relation.

Optionally, the step of extracting key points from the multiple images to be detected respectively to obtain a connection relationship between the multiple target human body key points and each target human body key point includes:

respectively extracting key points of the images to be detected to obtain a key point heat map and a component connection affinity field;

and determining the connection relation between a plurality of target human key points and each target human key point according to the key point heat map and the component connection affinity field.

Optionally, the step of determining a connection relationship between a plurality of target human body key points and each target human body key point according to the key point heat map and the component connection affinity field includes:

obtaining a plurality of initial human body key points corresponding to the key point heat map, and screening the plurality of initial human body key points based on the key point heat map to obtain a key point candidate set;

determining a component connection candidate set according to the key point candidate set and the component connection affinity field;

and determining the connection relation between the plurality of target human body key points and each target human body key point according to the key point candidate set and the component connection candidate set.

Optionally, the step of performing image capture on the video to be detected to obtain a plurality of images to be detected includes:

extracting a video to be detected frame by frame to obtain an initial image set;

and acquiring frame information of the video to be detected, and performing image extraction on the initial image set based on the frame information to obtain a plurality of images to be detected.

Optionally, after the step of detecting dancing according to the plurality of human posture features by using a preset dancing prediction model, the dancing detection method further includes:

when the detection result is that the dancing behavior exists, searching for the live broadcast room information corresponding to the video to be detected;

and generating reminding information according to the live broadcast room information, and sending the reminding information to a target terminal.

In addition, to achieve the above object, the present invention further provides a dance detection apparatus, including a memory, a processor, and a dance detection program stored in the memory and executable on the processor, wherein the dance detection program is configured to implement the dance detection method as described above.

Furthermore, to achieve the above object, the present invention further provides a storage medium having a dance detection program stored thereon, the dance detection program implementing the dance detection method as described above when executed by a processor.

In addition, in order to achieve the above object, the present invention further provides a dance detection apparatus, comprising: the device comprises an image intercepting module, a feature extraction module and a dance detection module;

the image intercepting module is used for intercepting images of a video to be detected to obtain a plurality of images to be detected;

the characteristic extraction module is used for respectively extracting the characteristics of the images to be detected to obtain a plurality of human body posture characteristics;

and the dance detection module is used for detecting dance through a preset dance prediction model according to a plurality of human posture characteristics.

Optionally, the dance detection apparatus further comprises: a model training module;

the model training module is used for carrying out image interception on the video samples to obtain a plurality of image samples;

the model training module is also used for respectively carrying out feature extraction on the plurality of image samples to obtain a plurality of human posture feature samples;

the model training module is further used for carrying out model training on the initial dancing prediction model according to the plurality of human posture characteristic samples to obtain a preset dancing prediction model.

Optionally, the model training module is further configured to perform key point extraction on the plurality of image samples respectively to obtain a plurality of target human body key point samples and connection relationship samples between each target human body key point sample;

the model training module is further used for determining a plurality of human posture characteristic samples according to the target human key point samples and the connection relation samples.

Optionally, the model training module is further configured to perform key point extraction on the plurality of image samples respectively to obtain a key point heat map sample and a component connection affinity field sample;

the model training module is further used for determining a plurality of target human body key point samples and connection relation samples between the target human body key point samples according to the key point heat map samples and the component connection affinity field samples.

Optionally, the model training module is further configured to obtain a plurality of initial human body keypoint samples corresponding to the keypoint heat pattern, and screen the plurality of initial human body keypoint samples based on the keypoint heat pattern samples to obtain keypoint candidate set samples;

the model training module is further used for determining a component connection candidate set sample according to the keypoint candidate set sample and the component connection affinity field sample;

the model training module is further used for determining a plurality of target human body key point samples and connection relation samples between the target human body key point samples according to the key point candidate set samples and the component connection candidate set samples.

Optionally, the model training module is further configured to input a plurality of human posture feature samples into an initial dance prediction model to obtain an initial prediction result;

the model training module is also used for searching dance mark information corresponding to the video sample;

and the model training module is also used for carrying out parameter adjustment on the initial dancing prediction model according to the initial prediction result and the dancing mark information to obtain a preset dancing prediction model.

According to the invention, the images of the video to be detected are intercepted to obtain a plurality of images to be detected, the plurality of images to be detected are respectively subjected to feature extraction to obtain a plurality of human body posture features, and dance detection is carried out through the preset dance prediction model according to the plurality of human body posture features, so that dance behavior detection can be automatically completed based on the human body posture feature extraction and the preset dance prediction model, and the accuracy and the recognition speed of dance detection are improved.

Drawings

FIG. 1 is a schematic structural diagram of a dancing detection device of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a first embodiment of a dance detection method of the present invention;

FIG. 3 is a schematic structural diagram of an ST-GCN model according to an embodiment of the dance detection method of the present invention;

FIG. 4 is a flowchart illustrating a dancing detection method according to a second embodiment of the present invention;

FIG. 5 is a flowchart illustrating a third embodiment of a dance detection method according to the present invention;

FIG. 6 is a schematic structural diagram of an Openpos model according to an embodiment of the dance detection method of the present invention;

FIG. 7 is a schematic flowchart of a fourth embodiment of a dance detection method of the present invention;

fig. 8 is a block diagram showing the structure of the dance detection apparatus according to the first embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a dance detection apparatus in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the dance detection apparatus may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. The communication bus 1002 is used to implement connection communication among these components. The user interface 1003 may include a Display screen (Display), and the optional user interface 1003 may further include a standard wired interface and a wireless interface, and the wired interface for the user interface 1003 may be a USB interface in the present invention. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory or a Non-volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration shown in FIG. 1 does not constitute a limitation of the dance detection apparatus, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in FIG. 1, a memory 1005, identified as one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a dance detection program.

In the dance detection apparatus shown in FIG. 1, the network interface 1004 is mainly used for connecting to a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting user equipment; the dance detection apparatus calls a dance detection program stored in the memory 1005 through the processor 1001 and performs a dance detection method according to an embodiment of the present invention.

Based on the hardware structure, the embodiment of the dance detection method is provided.

Referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the dance detection method of the present invention, and the first embodiment of the dance detection method of the present invention is provided.

In a first embodiment, the dance detection method comprises the steps of:

step S10: and carrying out image interception on the video to be detected to obtain a plurality of images to be detected.

It should be understood that the main execution body of this embodiment is the dance detection apparatus, wherein the dance detection apparatus may be an electronic apparatus such as a personal computer or a server, or may also be another apparatus capable of implementing the same or similar functions.

It should be noted that the video to be detected may be a video of a live broadcast room that needs to detect whether there is a dance behavior. The video storage of the live broadcast room can be stored in a database in advance, and the dance detection equipment can acquire the video of the live broadcast room from the database in advance to serve as the video to be detected.

It will be appreciated that a video is typically composed of a plurality of images arranged in a temporal sequence. For example, when the number of transmission Frames Per Second (FPS) of a video is 60, it means that the video is composed of 60 images Per Second.

It should be understood that, the image capturing is performed on the video to be detected, the obtaining of the plurality of images to be detected may be obtaining the video duration of the video to be detected, and the images with the preset number of frames are extracted from the video to be detected as the plurality of images to be detected according to the video duration of the video to be detected. For example, the video duration of the video to be detected is 7 seconds, 5 frames of images are cut from the video to be detected per second to serve as the image to be detected, and 35 images can be obtained in total to serve as the image to be detected.

Step S20: and respectively extracting the features of the images to be detected to obtain a plurality of human body posture features.

It should be understood that, the feature extraction is performed on the multiple images to be detected respectively to obtain the multiple human posture features, and the feature extraction may be performed on the multiple images to be detected respectively through a preset feature extraction model to obtain the multiple human posture features. The preset feature extraction model can be used for extracting human body posture features in the image. For example, the preset feature extraction model may be a pre-trained openpos model.

It should be noted that the human body posture feature may include a plurality of human body key points and a connection relationship between the human body key points.

Step S30: and carrying out dance detection through a preset dance prediction model according to the human body posture characteristics.

It should be noted that the preset dance detection model can be used for performing graph convolution operation by using human posture features as a directed graph and performing time convolution operation on the human posture features input in sequence, so that dance detection can be performed based on the human posture features and the time sequence features of the human posture features, and the timeliness of online service is guaranteed while the accuracy of online dance detection is improved.

In a particular implementation, for example, the pre-defined dance detection model may be a pre-trained ST-GCN model.

For ease of understanding, the description will be made with reference to fig. 3, but this scheme is not limited thereto. FIG. 3 is a schematic structural diagram of an ST-GCN model, in which input data (i.e., human posture characteristics) are first regularized, then passed through 9 spatio-temporal map convolutional layers, each spatio-temporal map convolutional layer is composed of an attribute layer, a map convolutional layer and a time convolutional layer in sequence, and finally passed through a global polling layer, a full connection layer and a softmax layer to obtain a detection result. And the detection result is whether the dancing behavior exists in the video to be detected.

It should be understood that in an actual scene, the dancing video is a process lasting from tens of seconds to minutes, and the posture of the human body changes continuously in the process. Therefore, when performing dance detection, both the human posture characteristic and the timing characteristic should be considered. When the Openpos model is used alone for dance detection, the Openpos model only identifies the posture of a human body based on the human body key point information in the static picture, so that the dynamic dance behavior cannot be accurately identified; when the inclusion-respet model is used alone, all visual information in a picture can be focused by the inclusion-respet model, so that the dance detection has excessive interference information and inaccurate detection results.

In the embodiment, image interception is performed on a video to be detected to obtain a plurality of images to be detected, feature extraction is performed on the plurality of images to be detected through an openpos model to obtain a plurality of human posture features, and the plurality of human posture features are input into an ST-GCN model in sequence to obtain a dance detection result. In the embodiment, the Openpos model is used for respectively extracting the characteristics of the images to be detected, so that interference factors in picture characteristics can be eliminated, and in the embodiment, the ST-GCN model is used for dance detection, so that dance detection can be performed based on the human posture characteristics and the time sequence characteristics of the human posture characteristics, the accuracy of on-line dance detection is improved, and meanwhile, the timeliness of on-line service is guaranteed.

The first embodiment is used for carrying out image interception on a video to be detected to obtain a plurality of images to be detected, carrying out feature extraction on the plurality of images to be detected respectively to obtain a plurality of human body posture features, and carrying out dance detection through a preset dance prediction model according to the plurality of human body posture features, so that dance behavior detection can be automatically completed based on the human body posture features and the preset dance prediction model, and the accuracy and the recognition speed of the dance detection are improved.

Referring to fig. 4, fig. 4 is a schematic flowchart of a second embodiment of the dance detection method of the present invention, and the second embodiment of the dance detection method of the present invention is proposed based on the first embodiment shown in fig. 2.

In the second embodiment, before the step S10, the method further includes:

step S01: and carrying out image interception on the video samples to obtain a plurality of image samples.

It should be noted that the video samples may be stored in a database in advance, and the dance detection apparatus may obtain the video samples from the database. The video samples may be used to train an initial dance prediction model.

It should be understood that the image capturing the video samples, obtaining the plurality of image samples may be obtaining the video duration of the video samples, and extracting the images of the preset number of frames from the video samples as the plurality of image samples according to the video duration of the video samples. For example, the video duration of a video sample is 7 seconds, 5 frames of images are cut out of the video sample per second as image samples, and 35 images can be obtained as image samples in total.

Step S02: and respectively carrying out feature extraction on the plurality of image samples to obtain a plurality of human body posture feature samples.

It can be understood that the feature extraction is respectively performed on the plurality of image samples to obtain the plurality of human posture feature samples, and the feature extraction may be performed on the plurality of image samples through a preset feature extraction model to obtain the plurality of human posture feature samples. The preset feature extraction model can be used for extracting human body posture features in the image. For example, the preset feature extraction model may be a pre-trained openpos model.

Step S03: and performing model training on the initial dancing prediction model according to the plurality of human posture characteristic samples to obtain a preset dancing prediction model.

It should be noted that the initial dance prediction model may be an untrained dance prediction model, and may be preset by a manager of the dance detection apparatus. For example, the initial dance prediction model may be an untrained ST-GCN model.

It should be understood that, model training is performed on the initial dancing prediction model according to the plurality of human posture feature samples, and obtaining the preset dancing prediction model may be inputting the plurality of human posture feature samples into the initial dancing prediction model, obtaining an initial prediction result, and performing model optimization on the initial dancing prediction model based on the initial prediction result to obtain the preset dancing prediction model.

It can be understood that the initial dancing prediction model is subjected to model optimization based on the initial prediction result, and the obtaining of the preset dancing prediction model may be the initial dancing prediction model is subjected to model optimization through a preset optimization algorithm based on the initial prediction result to obtain the preset dancing prediction model. Wherein the preset optimization algorithm may be a random gradient descent algorithm.

The second embodiment obtains a plurality of image samples by intercepting the image of the video sample, and is multiple the image samples are respectively subjected to feature extraction to obtain a plurality of human posture feature samples, and model training is carried out on the initial dance prediction model according to the human posture feature samples to obtain a preset dance prediction model, so that the preset dance prediction model can be trained in advance before dance detection is carried out by using the preset dance prediction model, the reliability and the accuracy of the preset dance prediction model are improved, and more reliable and accurate dance detection results can be obtained.

In the second embodiment, after the step S30, the method further includes:

step S40: and searching the live broadcast room information corresponding to the video to be detected when the detection result shows that the dancing behavior exists.

The live room information may be information such as a live room ID and a main title.

It should be understood that when the detection result indicates that the dance behavior exists, it indicates that the dance behavior exists in the corresponding live broadcast room in the video to be detected. At this time, the user needs to be reminded to watch.

It can be understood that the searching for the live broadcast room information corresponding to the video to be detected may be searching for the live broadcast room information corresponding to the video to be detected in a database. The database comprises a corresponding relation between the video to be detected and the information of the live broadcast room, and the corresponding relation between the video to be detected and the information of the live broadcast room can be automatically stored when the video to be detected is recorded in the database.

Step S50: and generating reminding information according to the live broadcast room information, and sending the reminding information to a target terminal.

It should be noted that the reminding information is used for reminding the user to watch the dancing live broadcast in the live broadcast room, and the reminding information may be at least one of text reminding information, voice reminding information, and video reminding information, which is not limited in this embodiment.

It should be understood that, in order to remind the user to watch dancing live broadcast, a target live broadcast room can be determined according to the live broadcast room ID, and a flag of 'dancing' can be added to the target live broadcast room to remind the user to watch live broadcast of the target live broadcast room.

Or generating reminding information according to the ID of the live broadcast room and the name of the anchor broadcast, and sending the reminding information to the target terminal to remind the user to watch. For example, a short message with the content "XX live broadcast room anchor a is dancing live broadcast and coming around a bar" is generated and sent to the user. This embodiment is not limited in this regard.

The second embodiment searches for the live broadcast room information corresponding to the video to be detected when the dancing behavior exists in the detection result, generates the reminding information according to the live broadcast room information, and sends the reminding information to the target terminal, so that the reminding information can be automatically generated when the dancing behavior is detected, a user is reminded to watch the live broadcast room corresponding to the video to be detected, the user can be ensured not to miss the dancing live broadcast, and the user experience is improved.

Referring to fig. 5, fig. 5 is a schematic flow chart of a third embodiment of the dance detection method of the present invention, and the third embodiment of the dance detection method of the present invention is proposed based on the second embodiment shown in fig. 4.

In a third embodiment, the step S02 includes:

step S021: and respectively extracting key points of the plurality of image samples to obtain a plurality of target human body key point samples and connection relation samples among the target human body key point samples.

It should be understood that, the key point extraction is respectively performed on the plurality of image samples, and the obtaining of the connection relationship sample between the plurality of target human body key point samples and each target human body key point sample may be that the key point extraction is respectively performed on the plurality of image samples through a preset feature extraction model, and the connection relationship sample between the plurality of target human body key point samples and each target human body key point sample is obtained. The preset feature extraction model may be an openpos model trained in advance.

Further, in order to improve the accuracy of the target human body key point sample and the connection relation sample, the step S021 includes:

For ease of understanding, the description will be made with reference to fig. 6, but this solution is not limited thereto. Fig. 6 is a schematic structural diagram of an openpos model, in which a picture (i.e., an image sample) is input, feature extraction is performed on the input picture through the first 10 layers of the openpos model, and a key point heat map (i.e., a key point heat map sample) and a part connection affinity field (i.e., a component connection affinity field sample) of a human body key point in the input picture are respectively determined through two multilayer convolutional neural networks of the openpos model.

Further, in order to screen out a connection relationship sample between a target human body key point sample with a high reliability and each target human body key point sample, determining a connection relationship sample between a plurality of target human body key point samples and each target human body key point sample according to the key point heat map sample and the component connection affinity field sample includes:

It should be understood that the key point candidate set samples are obtained by screening a plurality of initial human body key point samples based on the key point heat map samples, which may be obtained by performing gaussian smoothing on the key point heat map samples, performing confidence filtering on the key point heat map samples, performing non-maximum suppression on the key point heat map samples, and finally screening the initial human body key points according to the processed key point heat map samples.

It can be understood that, the determining of the component connection candidate set sample according to the keypoint candidate set sample and the component connection affinity field sample may be a step of determining a component connection relationship sample between each candidate human keypoint sample according to the keypoint candidate set sample and the component connection affinity field sample, and performing confidence filtering on the component connection relationship sample between each candidate human keypoint sample to obtain a component connection candidate set sample.

It should be understood that the determining of the connection relationship samples between the multiple target human body keypoint samples and the target human body keypoint samples according to the keypoint candidate set samples and the component connection candidate set samples may be extracting multiple candidate human body keypoint samples from the keypoint candidate set samples, extracting component connection relationship samples between the candidate human body keypoint samples from the component connection candidate set samples, performing human body matching according to the component connection relationship samples between the candidate human body keypoint samples and the candidate human body keypoint samples, taking the candidate human body keypoint samples successfully matched as the target human body keypoint samples, and taking the component connection relationship samples between the candidate human body keypoint samples successfully matched as the connection relationship samples between the target human body keypoint samples.

Step S022: and determining a plurality of human posture characteristic samples according to the target human key point samples and the connection relation samples.

It should be understood that determining the plurality of human posture feature samples according to the plurality of target human body key point samples and the connection relation samples may be to use the target human body key point samples and the connection relation samples as human posture feature samples.

In the third embodiment, the key points of the plurality of image samples are extracted respectively to obtain a plurality of target human body key point samples and a connection relation sample between the target human body key point samples, and a plurality of human body posture characteristic samples are determined according to the target human body key point samples and the connection relation sample, so that the accuracy of the human body posture characteristic samples can be further improved, and the reliability of the dance prediction model training can be further ensured.

In a third embodiment, the step S03 includes:

step S031: and inputting a plurality of human posture characteristic samples into an initial dancing prediction model to obtain an initial prediction result.

It should be noted that the initial dance prediction model may be an untrained dance prediction model, and may be set by a manager of the dance detection apparatus in advance. For example, the initial dance prediction model may be an untrained ST-GCN model.

Step S032: and searching dance mark information corresponding to the video sample.

It should be noted that the dance flag information may be preset by a manager of the dance detection apparatus, and the dance flag information is used to flag whether there is a dance behavior in the video sample.

It should be understood that the step marking information corresponding to the video sample can be the step marking information corresponding to the video sample in the database. The database comprises the corresponding relation between the video sample and the dance mark information, and the corresponding relation between the video sample and the dance mark information can be set by the database when the video sample is input.

Step S033: and adjusting parameters of the initial dancing prediction model according to the initial prediction result and the dancing mark information to obtain a preset dancing prediction model.

It can be understood that the initial dancing prediction model is subjected to parameter adjustment according to the initial prediction result and the dancing sign information, the preset dancing prediction model can be obtained by matching the initial prediction result with the dancing sign information, and when the matching fails, the initial dancing prediction model is subjected to parameter adjustment to obtain the preset dancing prediction model.

It should be appreciated that the parametric adjustment to the initial dance prediction model may be a parametric adjustment to at least one of a convolution kernel size, a regularization coefficient, and a learning rate of the initial dance prediction model.

Further, in order to improve the accuracy of parameter adjustment, step S033 includes:

and when the matching fails, adjusting parameters of the initial dance prediction model to obtain a preset dance prediction model.

It should be noted that the actual dance information may be information of whether there is dance behavior in the video sample.

It will be appreciated that when a match fails, a video sample that indicates the presence of dance activity is misinterpreted as the absence of dance activity. At this time, parameter adjustment needs to be performed on the initial dance prediction model to improve the accuracy of the dance prediction model.

According to the third embodiment, a plurality of human posture characteristic samples are input into the initial dancing prediction model to obtain an initial prediction result, the dancing mark information corresponding to the video sample is searched, the initial dancing prediction model is subjected to parameter adjustment according to the initial prediction result and the dancing mark information to obtain a preset dancing prediction model, and therefore the reliability and accuracy of the dancing prediction model training can be improved.

Referring to fig. 7, fig. 7 is a schematic flow chart of a fourth embodiment of the dance detection method of the present invention, and the fourth embodiment of the dance detection method of the present invention is provided based on the first embodiment shown in fig. 2.

In a fourth embodiment, the step S10 includes:

step S101: and extracting the video to be detected frame by frame to obtain an initial image set.

It should be noted that the initial image set may be an image set composed of a plurality of frames of images.

It should be understood that a video is typically composed of a plurality of frames of images arranged in a temporal sequence. Therefore, an initial image set can be obtained by extracting the video to be detected frame by frame.

Step S102: and acquiring frame information of the video to be detected, and performing image extraction on the initial image set based on the frame information to obtain a plurality of images to be detected.

It should be noted that the frame information of the video to be detected may be the number of frames transmitted per second of the video to be detected.

It can be understood that, when the number of transmission frames per second of the video to be detected is higher, the more images are extracted per second as a plurality of images to be detected, and the more reliable the images to be detected are.

It should be understood that, performing image extraction on the initial image set based on the frame information to obtain a plurality of images to be detected may be to find an image extraction policy corresponding to the frame information in a preset policy table, and performing image extraction on the initial image set based on the image extraction policy to obtain a plurality of images to be detected. The preset policy table includes a corresponding relationship between the frame information and the image extraction policy, and the corresponding relationship between the frame information and the image extraction policy may be preset.

The fourth embodiment extracts the video to be detected frame by frame to obtain an initial image set, obtains frame information of the video to be detected, extracts the image of the initial image set based on the frame information, and obtains a plurality of images to be detected, so that the image extraction strategy can be adaptively adjusted based on the frame information of the video to be detected, more accurate images to be detected can be obtained, and the accuracy of dance detection is ensured.

In a fourth embodiment, the step S20 includes:

step S201: and respectively extracting key points of the images to be detected to obtain a plurality of target human body key points and the connection relation between the target human body key points.

It should be understood that, the key point extraction is respectively performed on the plurality of images to be detected to obtain the connection relationship between the plurality of target human key points and each target human key point may be performed on the plurality of images to be detected through a preset feature extraction model to obtain the connection relationship between the plurality of target human key points and each target human key point. The preset feature extraction model may be an openpos model trained in advance.

Further, in order to improve the accuracy of the target key point and the connection relationship, the step S201 includes:

For ease of understanding, the description will be made with reference to fig. 6, but this solution is not limited thereto. Fig. 6 is a schematic structural diagram of an openpos model, in which a picture (i.e., an image to be detected) is input, feature extraction is performed on the input picture through the first 10 layers of the openpos model, and a key point heat map and a part connection affinity field of a human body key point in the input picture are respectively determined through two multilayer convolutional neural networks of the openpos model.

Further, in order to screen out a connection relationship between a target human body key point with a high reliability and each target human body key point, determining a connection relationship between a plurality of target human body key points and each target human body key point according to the key point heat map and the component connection affinity field includes:

It should be understood that, the step of screening a plurality of initial human body key points based on the key point heat map to obtain the key point candidate set may be to first perform gaussian smoothing on the key point heat map, then perform confidence filtering on the key point heat map, then perform non-maximum suppression on the key point heat map, and finally screen the initial human body key points according to the processed key point heat map to obtain the key point candidate set.

It can be understood that determining the component connection candidate set according to the key point candidate set and the component connection affinity field may be determining component connection relationships between the candidate human key points according to the key point candidate set and the component connection affinity field, and performing confidence filtering on the component connection relationships between the candidate human key points to obtain the component connection candidate set.

It should be understood that determining the connection relationship between the plurality of target human key points and each target human key point according to the key point candidate set and the component connection candidate set may be extracting a plurality of candidate human key points from the key point candidate set, extracting the component connection relationship between each candidate human key point from the component connection candidate set, performing human body matching according to the component connection relationship between the candidate human key points and each candidate human key point, taking the candidate human key points successfully matched as the target human key points, and taking the component connection relationship between each candidate human key points successfully matched as the connection relationship between each target human key point.

Step S202: and determining a plurality of human posture characteristics according to the plurality of target human key points and the connection relation.

It should be understood that determining the plurality of human pose features from the plurality of target human keypoints and connection relationships may be taking a sample of the target human keypoints and connection relationships as the human pose features.

The fourth embodiment respectively extracts key points of a plurality of images to be detected to obtain the connection relation between a plurality of target human key points and each target human key point, and determines a plurality of human posture characteristics according to the plurality of target human key points and the connection relation, so that the accuracy of the human posture characteristics can be further improved, and the accuracy of dance detection can be further improved.

In addition, an embodiment of the present invention further provides a storage medium, where the storage medium stores a dance detection program, and the dance detection program, when executed by a processor, implements the dance detection method as described above.

In addition, referring to fig. 8, an embodiment of the present invention further provides a dance detection apparatus, where the dance detection apparatus includes: the image capturing module 10, the feature extraction module 20 and the dance detection module 30;

the image capture module 10 is configured to capture an image of a video to be detected to obtain a plurality of images to be detected.

It should be noted that the video to be detected may be a video of a live broadcast room that needs to detect whether there is dancing behavior. The video storage of the live broadcast room can be stored in a database in advance, and the dance detection equipment can acquire the video of the live broadcast room from the database in advance to serve as the video to be detected.

It should be understood that, the image capturing is performed on the video to be detected, the obtaining of the plurality of images to be detected may be obtaining the video duration of the video to be detected, and the images with the preset number of frames are extracted from the video to be detected as the plurality of images to be detected according to the video duration of the video to be detected. For example, the video duration of the video to be detected is 7 seconds, 5 frames of images are cut from the video to be detected every second to serve as the image to be detected, and 35 images can be obtained in total to serve as the image to be detected.

The feature extraction module 20 is configured to perform feature extraction on the multiple images to be detected respectively to obtain multiple human posture features.

The dance detection module 30 is configured to perform dance detection through a preset dance prediction model according to a plurality of human posture characteristics.

It should be understood that in an actual scene, the dancing video is a process lasting from tens of seconds to minutes, and the posture of the human body changes continuously in the process. Therefore, when performing dance detection, both the human posture characteristic and the timing characteristic should be considered. When the Openpos model is used alone for dance detection, the Openpos model only identifies the posture of a human body based on the human body key point information in the static picture, so that the dynamic dance behavior cannot be accurately identified; when the inclusion-respet model is used alone, all visual information in the picture can be focused by the inclusion-respet model, so that the interference information in the dance detection is too much, and the detection result is inaccurate.

The embodiment carries out image interception through treating the detection video, obtains a plurality of images of waiting to detect, carries out feature extraction respectively to a plurality of images of waiting to detect, obtains a plurality of human body gesture characteristics, carries out the dance detection through predetermineeing the dance prediction model according to a plurality of human body gesture characteristics to can extract and predetermine the dance prediction model and accomplish the dance action automatically and detect based on human body gesture characteristics, improve the accuracy and the recognition speed that the dance detected.

Other embodiments or specific implementation manners of the dance detection apparatus of the present invention may refer to the above method embodiments, and are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering and these words may be interpreted as names.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g., a Read Only Memory (ROM)/Random Access Memory (RAM), a magnetic disk, an optical disk), and includes several instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

The invention discloses A1 dancing detection method, which comprises the following steps:

The dance detection method of claim 1, wherein before the step of capturing the image of the video to be detected to obtain a plurality of images to be detected, the dance detection method further comprises:

respectively performing feature extraction on the plurality of image samples to obtain a plurality of human body posture feature samples;

A3, the dance detection method according to A2, wherein the step of respectively performing feature extraction on the plurality of image samples to obtain a plurality of human posture feature samples includes:

and determining a plurality of human posture characteristic samples according to the target human key point samples and the connection relation samples.

A4, as in the dancing detection method of A3, the step of extracting key points from the images to be detected respectively to obtain a plurality of target human body key point samples and connection relation samples among the target human body key point samples comprises:

A5, the dance detection method according to A4, wherein the step of determining a plurality of target human body key point samples and connection relation samples between the target human body key point samples according to the key point heat map samples and the component connection affinity field samples includes:

A6, the dance detection method according to the A2, wherein the step of performing model training on the initial dance prediction model according to the plurality of human posture feature samples to obtain a preset dance prediction model comprises the following steps:

searching dance marking information corresponding to the video sample;

A7, according to the dance detection method of A6, after the step of adjusting parameters of the initial dance prediction model according to the initial prediction result and the dance flag information to obtain a preset dance prediction model, the dance detection method further comprises:

A8, as in any one of A1-A7, the step of respectively performing feature extraction on a plurality of images to be detected to obtain a plurality of human posture features includes:

A9, the dance detection method according to A8, wherein the step of extracting key points of the images to be detected respectively to obtain the connection relation between the key points of the target human bodies and the key points of the target human bodies comprises the steps of:

and determining the connection relation between a plurality of target human body key points and each target human body key point according to the key point heat map and the component connection affinity field.

The dancing detection method according to the aspect of A9, wherein the step of determining the connection relationship between the plurality of target human key points and each target human key point according to the key point heat map and the component connection affinity field includes:

and determining the connection relation between a plurality of target human body key points and each target human body key point according to the key point candidate set and the component connection candidate set.

A11, as in any one of A1-A7, the step of performing image interception on a video to be detected to obtain a plurality of images to be detected includes:

extracting the video to be detected frame by frame to obtain an initial image set;

A12, the dance detection method according to any one of A1-A7, wherein after the step of detecting dance through a preset dance prediction model according to the plurality of human posture features, the dance detection method further comprises:

when the detection result is that dancing behavior exists, searching for live broadcast room information corresponding to the video to be detected;

The invention discloses B13 dance detection equipment, which comprises: a memory, a processor, and a dance detection program stored on the memory and executable on the processor, the dance detection program when executed by the processor implementing a dance detection method as described above.

The invention discloses C14, a storage medium, wherein the storage medium is stored with a dance detection program, and the dance detection program is executed by a processor to realize the dance detection method.

The invention discloses D15 and a dance detection device, wherein the dance detection device comprises: the device comprises an image intercepting module, a feature extraction module and a dance detection module;

D16, the dance detection apparatus according to D15, the dance detection apparatus further comprising: a model training module;

the model training module is further used for carrying out model training on the initial dance prediction model according to the plurality of human posture characteristic samples to obtain a preset dance prediction model.

D17, the dance detection device according to D16, wherein the model training module is further used for performing key point extraction on the plurality of image samples respectively to obtain a plurality of target human body key point samples and connection relation samples among the target human body key point samples;

D18, the dance detection device according to D17, wherein the model training module is further used for extracting key points of the image samples respectively to obtain key point heat map samples and component connection affinity field samples;

and the model training module is also used for determining a plurality of target human body key point samples and connection relation samples between each target human body key point sample according to the key point heat map samples and the component connection affinity field samples.

D19, the dance detection apparatus according to D18, wherein the model training module is further configured to obtain a plurality of initial human body key point samples corresponding to the key point heat pattern, and screen the plurality of initial human body key point samples based on the key point heat pattern samples to obtain key point candidate set samples;

the model training module is also used for determining a component connection candidate set sample according to the key point candidate set sample and the component connection affinity field sample;

and the model training module is also used for determining a plurality of target human body key point samples and connection relation samples between the target human body key point samples according to the key point candidate set samples and the component connection candidate set samples.

D20, the dance detection device according to D16, wherein the model training module is further used for inputting the human body posture characteristic samples into an initial dance prediction model to obtain an initial prediction result;

the model training module is also used for searching dance marking information corresponding to the video sample;

Claims

1. A dance detection method, characterized in that the dance detection method comprises the following steps:

2. The dance detection method according to claim 1, wherein before the step of performing image capturing on the video to be detected to obtain a plurality of images to be detected, the dance detection method further comprises:

3. The dance detection method of claim 2, wherein the step of performing feature extraction on the plurality of image samples to obtain a plurality of human posture feature samples comprises:

4. The dance detection method of claim 3, wherein the step of extracting key points from the images to be detected to obtain a plurality of target human body key point samples and connection relation samples between the target human body key point samples comprises:

5. The dance detection method of claim 4, wherein the step of determining a plurality of target human keypoint samples and connection relationship samples between each target human keypoint sample according to the keypoint heat map sample and the component connection affinity field sample comprises:

6. The dance detection method of claim 2, wherein the step of performing model training on an initial dance prediction model according to a plurality of human posture feature samples to obtain a preset dance prediction model comprises:

searching dance marking information corresponding to the video sample;

7. The dance detection method according to claim 6, wherein after the step of adjusting parameters of the initial dance prediction model according to the initial prediction result and the dance flag information to obtain a preset dance prediction model, the dance detection method further comprises:

8. A dance detection apparatus, comprising: memory, a processor and a dance detection program stored on the memory and executable on the processor, the dance detection program when executed by the processor implementing a dance detection method according to any one of claims 1 to 7.

9. A storage medium having stored thereon a dance detection program, the dance detection program implementing the dance detection method according to any one of claims 1 to 7 when executed by a processor.

10. A dance detection apparatus, comprising: the device comprises an image intercepting module, a feature extraction module and a dance detection module;