CN113392744A

CN113392744A - Dance motion aesthetic feeling confirmation method and device, electronic equipment and storage medium

Info

Publication number: CN113392744A
Application number: CN202110626028.6A
Authority: CN
Inventors: 赵勇; 夏鹏飞
Original assignee: Beijing Gelingshentong Information Technology Co ltd
Current assignee: Beijing Gelingshentong Information Technology Co ltd
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2021-09-14

Abstract

The embodiment of the application provides a dance motion aesthetic feeling confirmation method, a dance motion aesthetic feeling confirmation device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a video segment corresponding to the target action in the dance video; acquiring a target human skeleton key point corresponding to each frame of image in the video clip to obtain a key point image group; inputting the key point image group into an aesthetic feeling prediction model to obtain a prediction score corresponding to the video clip; determining whether the target action is aesthetic according to the prediction score. Compared with a manual evaluation mode, the method has the advantages that the aesthetic feeling determining efficiency of the target action can be effectively improved by using the aesthetic feeling prediction model, the corresponding video clips of the target action are converted into the key point image group, the posture of the human body can be accurately analyzed through the human skeleton key points, and the prediction score obtained based on the key point image group and the aesthetic feeling prediction model is more accurate.

Description

Dance motion aesthetic feeling confirmation method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a dance movement aesthetic feeling confirmation method, apparatus, electronic device, and storage medium.

Background

The aesthetic feeling of dance movements is an important content for evaluating the whole dance, and in dance teaching, a professional dance teacher usually evaluates the aesthetic feeling of dance movements of students manually according to the dance experience of the teacher. However, the number of dance teachers is small, and the aesthetic evaluation of dance movements of the trainee is inefficient.

Disclosure of Invention

The embodiment of the application provides a dance motion aesthetic feeling confirmation method and device, electronic equipment and a storage medium, and can effectively solve the problem of low efficiency of evaluating the dance motion aesthetic feeling.

According to a first aspect of embodiments of the present application, there is provided a dance motion aesthetic feeling confirmation method, including: acquiring a video segment corresponding to the target action in the dance video; acquiring a target human skeleton key point corresponding to each frame of image in the video clip to obtain a key point image group; inputting the key point image group into an aesthetic feeling prediction model to obtain a prediction score corresponding to the video segment, wherein the aesthetic feeling prediction model is obtained by training a sample segment and a target marking score corresponding to the sample segment; determining whether the target action is aesthetic according to the prediction score.

According to a second aspect of embodiments of the present application, there is provided a dance motion aesthetic feeling confirmation apparatus, including: the first acquisition module is used for acquiring a video clip corresponding to the target action in the dance video; the second acquisition module is used for acquiring a target human skeleton key point corresponding to each frame of image in the video clip to obtain a key point image group; the prediction module is used for inputting the key point image group into an aesthetic feeling prediction model to obtain a prediction score corresponding to the video segment, and the aesthetic feeling prediction model is obtained by training a sample segment and a target marking score corresponding to the sample segment; a determination module to determine whether the target action is aesthetic according to the prediction score.

According to a third aspect of embodiments of the present application, there is provided an electronic device comprising one or more processors; a memory; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method as applied to an electronic device, as described above.

According to a fourth aspect of the embodiments of the present application, there is provided a computer-readable storage medium having a program code stored therein, wherein the method described above is performed when the program code runs.

By adopting the dance motion aesthetic feeling confirmation method provided by the embodiment of the application, the video segment corresponding to the target motion in the dance video is obtained, and the video segment corresponding to the target motion in the dance video is obtained; acquiring a target human skeleton key point corresponding to each frame of image in the video clip to obtain a key point image group; inputting the key point image group into an aesthetic feeling prediction model to obtain a prediction score corresponding to the video segment, wherein the aesthetic feeling prediction model is obtained by training a sample segment and a target marking score corresponding to the sample segment; determining whether the target action is aesthetic according to the prediction score. Compared with a manual evaluation mode, the method has the advantages that the aesthetic feeling determining efficiency of the target action can be effectively improved by using the aesthetic feeling prediction model, the corresponding video clips of the target action are converted into the key point image group, the posture of the human body can be accurately analyzed through the human skeleton key points, and the prediction score obtained based on the key point image group and the aesthetic feeling prediction model is more accurate.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a flow chart of a dance movement aesthetics validation method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a target tactical model provided by an embodiment of the present application;

FIG. 3 is a flow chart of a dance movement aesthetics validation method according to another embodiment of the present application;

FIG. 4 is a functional block diagram of an apparatus for determining an aesthetic feeling of a dance movement according to an embodiment of the present application;

fig. 5 is a block diagram of an electronic device for performing a dance motion aesthetic feeling confirmation method according to an embodiment of the present application.

Detailed Description

The aesthetic feeling of dance movements is an important content for evaluating the whole dance, and in dance teaching, a professional dance teacher usually evaluates the aesthetic feeling of dance movements of students manually according to the dance experience of the teacher. This method depends heavily on the subjective feeling of the dance teachers, resulting in that the evaluation of the aesthetic feeling of the dance movements is not objective and accurate enough, and the evaluation of the aesthetic feeling of the dance movements of the trainees is inefficient due to the small number of dance teachers.

The inventor finds that in research, the dance video can be analyzed through a computer vision algorithm, video segments corresponding to fixed actions are extracted from the dance video, the fixed actions are scored by using a neural network model, the aesthetic feeling of the dance actions can be objectively and accurately evaluated, and the existing manual evaluation mode is replaced.

In order to solve the above problem, an embodiment of the present application provides a dance motion aesthetic feeling confirmation method, which acquires a video segment corresponding to a target motion in a dance video; acquiring a target human skeleton key point corresponding to each frame of image in the video clip to obtain a key point image group; inputting the key point image group into an aesthetic feeling prediction model to obtain a prediction score corresponding to the video segment, wherein the aesthetic feeling prediction model is obtained by training a sample segment and a target marking score corresponding to the sample segment; determining whether the target action is aesthetic according to the prediction score. Compared with a manual evaluation mode, the method has the advantages that the aesthetic feeling determining efficiency of the target action can be effectively improved by using the aesthetic feeling prediction model, the corresponding video clips of the target action are converted into the key point image group, the posture of the human body can be accurately analyzed through the human skeleton key points, and the prediction score obtained based on the key point image group and the aesthetic feeling prediction model is more accurate.

The scheme in the embodiment of the present application may be implemented by using various computer languages, for example, object-oriented programming language Java and transliterated scripting language JavaScript, Python, and the like.

In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

Referring to fig. 1, an embodiment of the present application provides a dance motion aesthetic feeling confirmation method, which is applicable to an electronic device, where the electronic device may be a smart phone, a computer, a server, or the like, and the method may specifically include the following steps.

And step 110, acquiring a video segment corresponding to the target action in the dance video.

The dance video is image data obtained through an image acquisition device. In some embodiments, the dance video may be uploaded by a user to a designated application through which the dance video is obtained. In some embodiments, the dance video may be directly sent to the electronic device for subsequent processing after being acquired by the image acquisition device.

Dance is composed of a plurality of dance movements, and in a dance video, there are a plurality of different dance movements. The electronic equipment can extract a video segment corresponding to the target action from the dance video. The target action refers to any one of a plurality of dance actions.

When the electronic equipment extracts the video segment corresponding to the target action from the dance video, the key points of the human skeleton of each frame of image in the dance video can be identified; constructing a feature vector of each frame of image based on the human skeleton key points; and extracting a video segment corresponding to the target action from the dance video according to the feature vector.

As an implementation mode, each frame of image in a dance video is detected through a human skeleton key point detection algorithm, so that human skeleton key points corresponding to each frame of image can be obtained, and feature vectors are constructed based on the human skeleton key points; and clustering the feature vectors by using a clustering algorithm to obtain a clustering result, and finally extracting a video segment corresponding to the target action from video data based on the clustering result.

As an implementation mode, each frame of image in a dance video is detected through a human skeleton key point detection algorithm, human skeleton key points corresponding to each frame of image can be obtained, and feature vectors are constructed based on the human skeleton key points. After the feature vector corresponding to each frame of image is obtained, the image including the target motion may be designated as a target image, and the feature vector corresponding to the target image may be designated as a target vector. And sequentially calculating the similarity between each feature vector and the target vector, taking the image corresponding to the feature vector with the similarity larger than a preset value as a candidate image, finally extracting the candidate image and the target image from the dance video, and combining the candidate image and the target image into a video segment corresponding to the target action.

And 120, acquiring target human skeleton key points corresponding to each frame of image in the video clip to obtain a key point image group.

After the video clip is obtained, the key points of the target human skeleton corresponding to each frame of image in the video clip can be obtained, and a key point image group is obtained.

And detecting each frame of image in the video clip through a human skeleton key point detection algorithm to obtain a target human skeleton key point corresponding to each frame of image. After the corresponding key points of the target human skeleton in each frame of image are obtained, each frame of image can be converted into a key point image, and the key point image is an image formed by the key points of the target human skeleton. Furthermore, each key point image can be adjusted to a specified visual angle according to the target human skeleton key point, and each frame of key point image adjusted to the specified visual angle is combined to obtain a key point image group corresponding to the video clip.

Specifically, when the key point image is obtained, the target human skeleton key point and the three-dimensional coordinates of the target human skeleton key point can be obtained. And respectively calculating the front view, the side view and the top view of the key points of the target human skeleton in each frame of key point image by using the three-dimensional coordinates of the key points of the target human skeleton and through a projection matrix pair. And adjusting each frame of key point image to a specified visual angle, wherein the specified visual angle can be one of a front view, a side view and a top view, and can be specifically set according to actual needs, which is not specifically limited herein. And after the visual angle of the key point image is adjusted, taking the key point image corresponding to each frame of image in the video clip as the key point image group.

Before obtaining the target human skeleton key points corresponding to each frame of image in the video clip, the target human skeleton key points can be determined according to the action type of the target action. The action type of the target action may be determined manually. The motion type may be an arm motion, a leg motion, or a limb motion.

After the action type corresponding to the target action is acquired, the action type can be used as the target type. After the target type is determined, the target human skeleton key points corresponding to the target type can be inquired through an information table. The information table may be preset and stored in the electronic device, and includes a corresponding relationship between the action type and the key points of the human skeleton. For example, when the action type is a type a, the corresponding human skeleton key points may be key point a and key point B; when the action type is B type, the corresponding human skeleton key points can be key point A and key point C. If the target type is determined to be type A, the key points of the target human skeleton can be determined to be key points A and key points B by inquiring the information table.

Specifically, if the motion type is an arm motion, the corresponding human skeleton key points may be human skeleton key points on the arm, such as an elbow, a wrist, and the like; if the motion type is a leg motion, the corresponding human skeletal key points may be human skeletal key points on the leg, such as toes, ankles, etc.

It should be noted that the human skeleton key points used in the video clip corresponding to the target motion extracted in step 110 may be different from the target human skeleton key points. When the video clip corresponding to the target action is extracted, preset human skeleton key points can be extracted, and the preset human skeleton key points can be the whole human skeleton key points including the key points of the arm, the leg and the head, so that the video clip corresponding to the target action can be accurately extracted.

Step 130, inputting the key point image group into an aesthetic feeling prediction model to obtain a prediction score corresponding to the video segment, wherein the aesthetic feeling prediction model is obtained by training a sample segment and a target marking score corresponding to the sample segment.

After the key point image groups corresponding to the video clips are obtained, the key point image groups may be input into an aesthetic prediction model to obtain prediction scores corresponding to the key point image groups. Wherein the prediction score represents the aesthetic measure of the target action, and a higher prediction score indicates a more aesthetic measure of the target action. Since the key point image groups are derived from video segments, the prediction score can be considered as the prediction score of the video segment.

The aesthetic feeling prediction model is obtained by training a neural network by using a sample segment and a score corresponding to the sample segment. Specifically, a sample set may be obtained, where the sample set includes a plurality of sample segments and target labeling scores corresponding to the sample segments; and training a neural network model by using the sample segment and the target marking score corresponding to the sample segment to obtain the aesthetic feeling prediction model.

The structure of the aesthetic feeling prediction model can refer to fig. 2, a key point image group corresponding to a video clip sequentially passes through a convolutional neural network and a full link layer, and finally a prediction score corresponding to the key point image group is output.

Step 140, determining whether the target action is aesthetic according to the prediction score.

After the prediction score is obtained, whether the target action is aesthetic may be determined according to the prediction score. As an embodiment, a preset score may be preset, and the prediction score and the preset score may be compared; when the predicted score is greater than the preset score, the target action is considered to be aesthetic.

According to the dance motion aesthetic feeling confirmation method provided by the embodiment of the application, a video segment corresponding to a target motion in a dance video is obtained; acquiring a target human skeleton key point corresponding to each frame of image in the video clip to obtain a key point image group; inputting the key point image group into an aesthetic feeling prediction model to obtain a prediction score corresponding to the video segment, wherein the aesthetic feeling prediction model is obtained by training a sample segment and a target marking score corresponding to the sample segment; determining whether the target action is aesthetic according to the prediction score. Compared with a manual evaluation mode, the method has the advantages that the aesthetic feeling determining efficiency of the target action can be effectively improved by using the aesthetic feeling prediction model, the corresponding video clips of the target action are converted into the key point image group, the posture of the human body can be accurately analyzed through the human skeleton key points, and the prediction score obtained based on the key point image group and the aesthetic feeling prediction model is more accurate.

Referring to fig. 3, another embodiment of the present application provides a dance motion aesthetic feeling confirmation method, which focuses on the process of obtaining an aesthetic feeling prediction model based on the foregoing embodiment, and specifically, the method may include the following steps.

Step 210, obtaining a sample set, where the sample set includes a plurality of sample segments and target annotation scores corresponding to the sample segments.

Before training the neural network model, a sample set needs to be constructed to train the neural network model through the sample set. The sample set comprises a plurality of sample segments and target annotation scores corresponding to the sample segments.

When the sample set is obtained, a plurality of sample videos can be obtained firstly; extracting a sample segment corresponding to a target action from the sample video; and acquiring a target marking score corresponding to the sample fragment. The sample segment corresponding to the target action is extracted from the sample video, which may refer to the description in step 110 in the foregoing embodiment, and is not repeated herein for avoiding repetition.

After the sample fragment is obtained, a plurality of marking scores obtained after a plurality of marking personnel mark the sample fragment can be obtained; and determining the median of the plurality of labeling scores as the target labeling score corresponding to the sample segment.

If n professional dance teachers serve as marking personnel to score the sample segments, n marking scores can be obtained corresponding to each sample segment, and the median of the n marking scores is used as the target marking score corresponding to the sample segment. The median is data which is arranged according to the size sequence to form a sequence and is positioned in the middle of the sequence. In the embodiment of the application, the n marked scores are arranged according to the size sequence to form a number sequence; if n is an odd number, taking the labeling score at the middle of the number series as the target labeling score; and if n is an even number and two labeling scores are arranged in the middlemost of the number series, taking the average value of the two labeling scores as the target labeling score.

And step 220, training a neural network model by using the sample set to obtain the aesthetic feeling prediction model.

After the sample set is obtained, a neural network model can be trained by using the sample set, so that an aesthetic feeling prediction model is obtained. May be acquiring a set of keypoint images corresponding to the sample fragment; inputting the key point image group into the neural network model to obtain a prediction score corresponding to the sample fragment; and adjusting parameters of the neural network model according to the difference between the prediction score and the annotation score until the prediction score and the annotation score are consistent.

It should be noted that the specific manner of obtaining the key point image group corresponding to the sample fragment may be the same as the manner described in step 120 of the foregoing embodiment, and is not repeated herein.

And inputting the key point image group into the neural network model, outputting a corresponding prediction score, and when the prediction score is inconsistent with the mark score, indicating that the prediction result of the neural network model is inaccurate, so that the parameters of the neural network model can be adjusted until the prediction score is consistent with the mark score, and the neural network model is considered to have correct prediction capability, thereby obtaining the aesthetic feeling prediction model.

If a sample segment in the sample set is the target action a and the target label score corresponding to the sample segment, the aesthetic feeling prediction model trained based on the sample set may have the capability of predicting the aesthetic feeling of the target action a. If a sample segment in the sample set is a target action B and a target label score corresponding to the sample segment, the aesthetic feeling prediction model trained based on the sample set may have an ability to perform aesthetic feeling prediction on the target action B.

And step 230, acquiring a video segment corresponding to the target action in the dance video.

And 240, acquiring target human skeleton key points corresponding to each frame of image in the video clip to obtain a key point image group.

Step 250, inputting the key point image group into an aesthetic feeling prediction model to obtain a prediction score corresponding to the video segment, wherein the aesthetic feeling prediction model is obtained by training a sample segment and a target marking score corresponding to the sample segment.

Step 260, determining whether the target action has an aesthetic feeling according to the prediction score.

The steps 230 to 260 can refer to the corresponding parts of the previous embodiments, and are not described herein again.

According to the dance action aesthetic feeling confirmation method, a sample set is obtained, wherein the sample set comprises a plurality of sample segments and target marking scores corresponding to the sample segments; and training the neural network by using the sample set to obtain an aesthetic feeling prediction model. The method comprises the steps of training a neural network in advance to obtain an aesthetic feeling prediction model, converting a video clip corresponding to a target motion into a key point image group when the aesthetic feeling of the target motion is predicted, inputting the key point image group into the aesthetic feeling prediction model to obtain a prediction score corresponding to the video clip, determining whether the target motion has an aesthetic feeling according to the prediction score, and objectively and accurately evaluating the aesthetic feeling of the target motion.

Referring to fig. 4, an embodiment of the present application provides an apparatus 300 for determining a dance motion aesthetic feeling, where the apparatus 300 includes a first obtaining module 310, a second obtaining module 320, a predicting module 330, and a determining module 340. The first obtaining module 310 is configured to obtain a video segment corresponding to a target motion in a dance video; the second obtaining module 320 is configured to obtain a key point of a target human skeleton corresponding to each frame of image in the video clip to obtain a key point image group; the prediction module 330 is configured to input the key point image group into an aesthetic prediction model to obtain a prediction score corresponding to the video segment, where the aesthetic prediction model is obtained by training a sample segment and a target annotation score corresponding to the sample segment; the determining module 340 is configured to determine whether the target action is aesthetic according to the prediction score.

Further, the first obtaining module 310 is further configured to identify key points of human bones of each frame of image in the dance video; constructing a feature vector of each frame of image based on the human skeleton key points; and extracting a video segment corresponding to the target action from the dance video according to the characteristic vector.

Further, the dance motion aesthetic feeling confirmation device 300 further includes a target key point confirmation module, configured to, before obtaining a target human skeleton key point corresponding to each frame of image in the video clip and obtaining a key point image group, obtain a motion type corresponding to the target motion as a target type; and inquiring an information table according to the target type, and determining target human skeleton key points corresponding to the target type, wherein the information table comprises the corresponding relation between the action type and the human skeleton key points.

Further, the second obtaining module 320 is further configured to obtain key points of a target human skeleton of each frame of image in the video segment, so as to obtain key point images corresponding to each frame of image; and adjusting each key point image to a specified visual angle according to the key points of the target human skeleton to obtain a key point image group corresponding to the video clip.

Further, the prediction module 330 is further configured to obtain a plurality of sample videos; extracting a sample segment corresponding to a target action from the sample video; acquiring a target marking score corresponding to the sample fragment; and training a neural network model by using the sample segment and the target marking score corresponding to the sample segment to obtain the aesthetic feeling prediction model.

Further, the prediction module 330 is further configured to obtain a plurality of labeling scores obtained after a plurality of labeling personnel label the sample segment; and determining the median of the plurality of labeling scores as the target labeling score corresponding to the sample segment.

Further, the prediction module 330 is further configured to obtain a key point image group corresponding to the sample fragment; inputting the key point image group into the neural network model to obtain a prediction score corresponding to the sample fragment; and adjusting parameters of the neural network model according to the difference between the prediction score and the annotation score until the prediction score and the annotation score are consistent.

The dance action aesthetic feeling confirmation device provided by the embodiment of the application acquires a video segment corresponding to a target action in a dance video; acquiring a target human skeleton key point corresponding to each frame of image in the video clip to obtain a key point image group; inputting the key point image group into an aesthetic feeling prediction model to obtain a prediction score corresponding to the video segment, wherein the aesthetic feeling prediction model is obtained by training a sample segment and a target marking score corresponding to the sample segment; determining whether the target action is aesthetic according to the prediction score. Compared with a manual evaluation mode, the method has the advantages that the aesthetic feeling determining efficiency of the target action can be effectively improved by using the aesthetic feeling prediction model, the corresponding video clips of the target action are converted into the key point image group, the posture of the human body can be accurately analyzed through the human skeleton key points, and the prediction score obtained based on the key point image group and the aesthetic feeling prediction model is more accurate.

It should be noted that, as will be clear to those skilled in the art, for convenience and brevity of description, the specific working process of the above-described apparatus may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.

Referring to fig. 5, an embodiment of the present application provides a block diagram of an electronic device 400, which includes a processor 410, a memory 420, and one or more applications, wherein the one or more applications are stored in the memory 420 and configured to be executed by the one or more processors 410, and the one or more programs are configured to perform the method for target tactical identification.

The electronic device 400 may be a terminal device capable of running an application, such as a smart phone or a tablet computer, or may be a server. The electronic device 400 in the present application may include one or more of the following components: a processor 410, a memory 420, and one or more applications, wherein the one or more applications may be stored in the memory 420 and configured to be executed by the one or more processors 410, the one or more programs configured to perform the methods as described in the aforementioned method embodiments.

Processor 410 may include one or more processing cores. The processor 410 interfaces with various components throughout the electronic device 400 using various interfaces and circuitry to perform various functions of the electronic device 400 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 420 and invoking data stored in the memory 420. Alternatively, the processor 410 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 410 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 410, but may be implemented by a communication chip.

The Memory 420 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 420 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 420 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The data storage area may also store data created by the electronic device 400 during use (e.g., phone books, audio-video data, chat log data), and the like.

The electronic equipment provided by the embodiment of the application acquires a video clip corresponding to a target action in a dance video; acquiring a target human skeleton key point corresponding to each frame of image in the video clip to obtain a key point image group; inputting the key point image group into an aesthetic feeling prediction model to obtain a prediction score corresponding to the video segment, wherein the aesthetic feeling prediction model is obtained by training a sample segment and a target marking score corresponding to the sample segment; determining whether the target action is aesthetic according to the prediction score. Compared with a manual evaluation mode, the method has the advantages that the aesthetic feeling determining efficiency of the target action can be effectively improved by using the aesthetic feeling prediction model, the corresponding video clips of the target action are converted into the key point image group, the posture of the human body can be accurately analyzed through the human skeleton key points, and the prediction score obtained based on the key point image group and the aesthetic feeling prediction model is more accurate.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A dance motion aesthetic confirmation method, comprising:

acquiring a video segment corresponding to the target action in the dance video;

acquiring a target human skeleton key point corresponding to each frame of image in the video clip to obtain a key point image group;

inputting the key point image group into an aesthetic feeling prediction model to obtain a prediction score corresponding to the video segment, wherein the aesthetic feeling prediction model is obtained by training a sample segment and a target marking score corresponding to the sample segment;

determining whether the target action is aesthetic according to the prediction score.

2. The method of claim 1, wherein the obtaining of the video segment corresponding to the target action in the dance video comprises:

identifying human skeleton key points of each frame of image in the dance video;

constructing a feature vector of each frame of image based on the human skeleton key points;

and extracting a video segment corresponding to the target action from the dance video according to the characteristic vector.

3. The method according to claim 1, wherein before obtaining key points of the target human skeleton corresponding to each frame of image in the video clip, obtaining a key point image group, further comprises:

acquiring an action type corresponding to the target action as a target type;

and inquiring an information table according to the target type, and determining target human skeleton key points corresponding to the target type, wherein the information table comprises the corresponding relation between the action type and the human skeleton key points.

4. The method of claim 3, wherein the obtaining key points of the target human skeleton corresponding to each frame of image in the video clip to obtain a key point image group comprises:

acquiring a target human skeleton key point of each frame of image in the video clip to obtain a key point image corresponding to each frame of image;

and adjusting each key point image to a specified visual angle according to the key points of the target human skeleton to obtain a key point image group corresponding to the video clip.

5. The method of claim 1, wherein the aesthetic prediction model is obtained by:

acquiring a plurality of sample videos;

extracting a sample segment corresponding to a target action from the sample video;

acquiring a target marking score corresponding to the sample fragment;

and training a neural network model by using the sample segment and the target marking score corresponding to the sample segment to obtain the aesthetic feeling prediction model.

6. The method of claim 5, wherein the obtaining the target annotation score corresponding to the sample segment comprises:

obtaining a plurality of marking scores obtained after a plurality of marking personnel mark the sample segment;

and determining the median of the plurality of labeling scores as the target labeling score corresponding to the sample segment.

7. The method of claim 4, wherein the training a neural network model using the sample segment and a target labeling score corresponding to the sample segment to obtain the aesthetic prediction model comprises:

acquiring a key point image group corresponding to the sample fragment;

inputting the key point image group into the neural network model to obtain a prediction score corresponding to the sample fragment;

and adjusting parameters of the neural network model according to the difference between the prediction score and the annotation score until the prediction score and the annotation score are consistent.

8. A dance motion aesthetic confirmation apparatus, comprising:

the first acquisition module is used for acquiring a video clip corresponding to the target action in the dance video;

the second acquisition module is used for acquiring a target human skeleton key point corresponding to each frame of image in the video clip to obtain a key point image group;

the prediction module is used for inputting the key point image group into an aesthetic feeling prediction model to obtain a prediction score corresponding to the video segment, and the aesthetic feeling prediction model is obtained by training a sample segment and a target marking score corresponding to the sample segment;

a determination module to determine whether the target action is aesthetic according to the prediction score.

9. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a memory electrically connected with the one or more processors;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-7.

10. A computer-readable storage medium, having stored thereon program code that can be invoked by a processor to perform the method according to any one of claims 1 to 7.