CN113392746A

CN113392746A - Action standard mining method and device, electronic equipment and computer storage medium

Info

Publication number: CN113392746A
Application number: CN202110626807.6A
Authority: CN
Inventors: 赵勇; 夏鹏飞
Original assignee: Beijing Gelingshentong Information Technology Co ltd
Current assignee: Beijing Gelingshentong Information Technology Co ltd
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2021-09-14

Abstract

The embodiment of the application provides an action standard mining method and device, electronic equipment and a computer storage medium. The method comprises the following steps: acquiring video data and a labeling result; the video data comprise a plurality of target action video data, and the annotation result is used for representing whether the action corresponding to each frame of image in the target action video data is standard or not; the video data are segmented to obtain motion attitude vectors of each frame of image in the target motion video data; the motion posture vector is used for representing the motion posture of the human body; and inputting the motion attitude vector and the labeling result into a preset motion standard mining model to generate motion evaluation data. By adopting the action standard mining method provided by the application, through the video data and the labeling result, based on a machine learning algorithm, the normative, quantifiable and meaningful action evaluation data can be automatically mined, and the normative of sports evaluation is further improved.

Description

Action standard mining method and device, electronic equipment and computer storage medium

Technical Field

The present application relates to the field of machine learning technologies, and in particular, to a method and an apparatus for mining an action standard, an electronic device, and a computer storage medium.

Background

Under the scenes of education, sports examination and the like on line off line of sports, the evaluation of sports depends on the manual experience of professional teachers, and the professional teachers often say that some evaluations which are difficult to quantify are performed. For example, "arms are not high" and "legs need to be stepped further" etc.

Problems existing in the prior art:

evaluation by the subjective judgment of a professional teacher cannot give detailed and standardized evaluation data. And the description of professional teachers cannot be quantized, so that the evaluation of sports actions is not objective enough and cannot be widely popularized.

Disclosure of Invention

The embodiment of the application provides an action standard mining method, an action standard mining device, electronic equipment and a computer storage medium, and aims to solve the problems that in the prior art, the evaluation of sports actions cannot be standardized and is not objective enough.

According to a first aspect of embodiments of the present application, there is provided an action standard mining method, the method including:

acquiring video data and a labeling result; the video data comprises a plurality of target action video data, and the annotation result is used for representing whether actions corresponding to each frame of image in the target action video data are standard or not;

segmenting the video data to obtain motion attitude vectors of each frame of image in the target motion video data; the motion attitude vector of each frame of image corresponds to the labeling result of each frame of image one by one, and the motion attitude vector is used for representing the motion attitude of a human body;

inputting the motion attitude vector and the labeling result into a preset motion standard mining model, and generating motion evaluation data based on the trained motion standard mining model; and the action evaluation data is a judgment basis for judging whether the action in the target action video data is standard or not.

According to a second aspect of embodiments of the present application, there is provided an action standard excavating device, the device comprising:

the acquisition module is used for acquiring video data and a labeling result; the video data comprises a plurality of target action video data, and the annotation result is used for representing whether actions corresponding to each frame of image in the target action video data are standard or not;

the segmentation processing module is used for carrying out segmentation processing on the video data to obtain motion attitude vectors of each frame of image in the target motion video data; the motion attitude vector of each frame of image corresponds to the labeling result of each frame of image one by one, and the motion attitude vector is used for representing the motion attitude of a human body;

the rule generation module is used for inputting the action posture vector and the labeling result into a preset action standard mining model and generating action evaluation data based on the trained action standard mining model; and the action evaluation data is a judgment basis for judging whether the action in the target action video data is standard or not.

According to a third aspect of embodiments of the present application, there is provided an electronic device comprising one or more processors, and memory for storing one or more programs; the one or more programs, when executed by the one or more processors, implement the steps of the action standard mining method as described above.

According to a fourth aspect of embodiments of the present application, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the action standard mining method as described above.

By adopting the action standard mining method, the action standard mining device, the electronic equipment and the computer storage medium provided by the embodiment of the application, the action attitude vector of each frame of image in the target action video data is obtained by segmenting the video data, the action attitude vector and the labeling result are input into a preset action standard mining model, and the action evaluation data is generated based on the trained action standard mining model. Therefore, through the video data and the labeling result, based on the machine learning algorithm, the normative, quantifiable and meaningful action evaluation data can be automatically mined, and the normative of sports evaluation is further improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of an action standard mining method according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of another action standard mining method according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of another action standard mining method according to an embodiment of the present application;

fig. 5 is a schematic flowchart of another action standard mining method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a decision tree according to an embodiment of the present application;

fig. 7 is a schematic flowchart of another action standard mining method according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an action standard excavating device according to an embodiment of the present application.

Detailed Description

In the process of implementing the application, the inventor finds that the evaluation of sports depends on the manual experience of professional teachers in scenes such as education, sports examinations and the like on-line sports, and the professional teachers often say that some evaluations which are difficult to quantify are performed. Evaluation by the subjective judgment of a professional teacher cannot give detailed and standardized evaluation data. And the description of professional teachers cannot be quantized, so that the evaluation of sports actions is not objective enough and cannot be widely popularized.

In view of the above problems, embodiments of the present application provide an action standard mining method, apparatus, electronic device, and computer storage medium, which can automatically mine normative, quantifiable, and meaningful action evaluation data based on a machine learning algorithm through video data and a labeling result, so as to improve the normative of sports evaluation.

The scheme in the embodiment of the application can be implemented by adopting various computer languages, such as object-oriented programming language Java and transliterated scripting language JavaScript.

In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

As shown in fig. 1, a schematic structural diagram of an electronic device 100 provided in an embodiment of the present application is shown, where the electronic device 100 includes a memory 101, a processor 102, and a communication interface 103. The memory 101, processor 102 and communication interface 103 are electrically connected to each other, directly or indirectly, to enable the transfer or interaction of data. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory 101 may be used for storing software programs and modules, such as program instructions/modules corresponding to the action standard mining method provided in the embodiments of the present application, and the processor 102 executes the software programs and modules stored in the memory 101, thereby executing various functional applications and data processing. The communication interface 103 may be used for communication of signaling or data with the node apparatus 300 and the client 200. The electronic device 100 may have a plurality of communication interfaces 103 in this application.

The Memory 101 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.

The processor 102 may be an integrated circuit chip having signal processing capabilities. The processor may be a general-purpose processor including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc.

Next, on the basis of the electronic device 100 shown in fig. 1, an embodiment of the present application provides an action standard mining method, please refer to fig. 2, and fig. 2 is an action standard mining method provided in an embodiment of the present application, where the action standard mining method may include the following steps:

s201, video data and a labeling result are obtained.

The video data comprise a plurality of target action video data, and the annotation result is used for representing whether the action corresponding to each frame of image in the target action video data is standard or not.

It should be understood that the plurality of target motion video data is video data of a motion a performed by a plurality of persons, and the plurality of target motion video data includes a standard motion a and a non-standard motion a. The action a may be a push-up action, a sit-up action, a flat-bed action, a dance action, and the like. The annotation result can represent whether the action corresponding to a certain frame of image of the target action video data is nonstandard or not.

S202, the video data is divided to obtain motion attitude vectors of each frame of image in the target motion video data.

The motion attitude vector of each frame of image corresponds to the labeling result of each frame of image one by one, and the motion attitude vector is used for representing the motion attitude of the human body.

It should be understood that each frame of image in the target motion video data may include a time stamp, and the motion pose vector of each frame of image may be in one-to-one correspondence with the annotation result according to the time stamp. Each frame of image in the target action video data can also comprise a serial number, and the action attitude vector of each frame of image can also be in one-to-one correspondence with the labeling result according to the serial number.

And S203, inputting the motion attitude vector and the labeling result into a preset motion standard mining model, and generating motion evaluation data based on the trained motion standard mining model.

The action evaluation data is a judgment basis for judging whether the action in the target action video data is standard or not. It is to be appreciated that the professional teacher may objectively make the evaluation based on the action evaluation data.

It should be understood that the action evaluation data can be automatically generated by automatically analyzing the action posture vector of each frame of image and the corresponding labeling result thereof through a preset action standard mining model.

In order to facilitate understanding of how to obtain the annotation result, referring to fig. 3, on the basis of the action standard mining method shown in fig. 2, the action standard mining method further includes the following steps:

s301, obtaining a plurality of grading results of each frame of image in the target action video data.

Wherein, a plurality of scoring results are generated by different professionals respectively evaluating. It should be understood that a plurality of professionals each makes a comment on a plurality of target action video data in the data video. And each professional independently evaluates each frame of image of the target action video data, and does not discuss among the professionals so as to ensure the objectivity of the evaluation result. The scoring result can be represented by 0 and not by 1, i.e. the professional can mark 0 or 1 as the scoring result; of course, the scoring result may be represented by a number in the interval of 60 to 100, and a number in the interval of 0 to 59. The present invention is not limited to the above embodiments, and may be set according to actual circumstances.

For example, if the data video includes three target motion video data, the three target motion video data are respectively the first target motion video data, the second target motion video data and the third target motion video data. The three target motion video data each include three frame images, the first target motion video data includes a first frame image a1, a second frame image b1, and a third frame image c1, the second target motion video data includes a first frame image a2, a second frame image b2, and a third frame image c2, and the third target motion video data includes a first frame image a3, a second frame image b3, and a third frame image c 3. If the professionals are professional a, professional b and professional c, the professional a performs criticizing on the first frame image a1-a3, the second frame image b1-b3 and the third frame image c1-c3 respectively to obtain 9 scoring results; the professional b respectively evaluates the first frame image a1-a3, the second frame image b1-b3 and the third frame image c1-c3, and also obtains 9 evaluation results; the professional c makes a comment on the first frame image a1-a3, the second frame image b1-b3 and the third frame image c1-c3 respectively, and 9 scoring results are obtained.

S302, obtaining the marking result of each frame of the target action video data according to the plurality of grading results of each frame of image in the target action video data.

It should be understood that, since a plurality of professionals may comment on the image of the same frame of the target motion video data, there should be a plurality of results of the comment on the image of one frame of the target motion video data. For example, if there are three professionals performing the comment, there are three corresponding scoring results for the first frame image a1 of the first target motion video data. According to the three scoring results corresponding to the first frame image a1, the annotation result of the first frame image a1 can be obtained.

In order to facilitate understanding of how to obtain the labeling result of each frame of image in the target action video data according to the plurality of scoring results of each frame of image in the target action video data. Referring to fig. 4, a flow chart of the sub-steps of the above S302 is shown, where the above S302 includes the following sub-steps:

s302a, summing up the multiple scoring results of each frame of image in the target action video data to obtain the processing result of each frame of image in the target action video data.

S302b, comparing the processing result of each frame of image in the target action video data with a preset standard value to obtain the labeling result of each frame of image in the target action video data.

For convenience of description, the scoring result is illustrated by using 0 as a standard and 1 as a nonstandard. If the data video comprises three target motion video data, each target motion video data comprises m frames of images, and the number of the professionals is n. Summing the n grading results of the t frame image of the first target action video data to obtain a processing result of the t frame image, and comparing the processing result of the t frame image with a preset standard value to obtain a labeling result of the t frame image.

If the number of the professionals is n, the preset standard value is correspondingly set to be 2 n/3. If the processing result of the t frame image is greater than or equal to the preset standard value, the marking result of the t frame image represents that the action corresponding to the t frame image is nonstandard, and the t frame image is a nonstandard frame; and if the processing result of the t frame image is smaller than the preset standard value, the labeling result of the t frame image represents the action standard corresponding to the t frame image, and the t frame image is a standard frame.

In order to facilitate understanding of how to divide the video data, the motion attitude vector of each frame of image in the target motion video data is obtained. Referring to fig. 5, a flow chart of the sub-steps of the above S202 is shown, where the above S202 includes the following sub-steps:

s202a, detecting a human skeleton key point posture image corresponding to each frame of image in the target action video data according to a human skeleton key point detection algorithm.

The human skeleton key point posture graph is a posture graph formed by human skeleton key points. It should be understood that the three-dimensional coordinates of the skeletal key points of the human body in each frame of image of the target motion video data can be detected by the human skeletal key point detection algorithm, and the three-dimensional coordinates of all the skeletal key points in each frame of image of the target motion video data form the posture graph of the skeletal key points of the human body. For example, if the skeletal key points of the human body in the target motion video data include pelvic joints, left shoulder joints, right shoulder joints, left elbow joints, right elbow joints, left wrist joints, right wrist joints, left hip joints, right hip joints, left knee joints, right knee joints, etc., the skeletal key point posture map of the human body includes three-dimensional coordinates of the pelvic joints, the left shoulder joints, the right shoulder joints, the left elbow joints, the right elbow joints, the left wrist joints, the right wrist joints, the left hip joints, the right hip joints, the left knee joints, and the right knee joints.

S202b, calculating to obtain an action posture vector according to the posture graph of the key points of the human skeleton.

It should be understood that the motion pose vector may be described by the angle formed between different skeletal keypoints. The motion attitude vector can be described by the angle formed between any three different skeleton key points, and the angle of any three skeleton key points can be calculated according to the coordinates of any three skeleton key points. The motion pose vector should include all combined angles of any three skeletal key points in one frame of image of the target motion video data. For example, if the human skeleton key point pose graph of one frame of image of the target motion video data includes four human skeleton key points, and any three of the four human skeleton key points can be combined into 12 angles, the motion pose vector includes 12 angles formed by three skeleton key points.

In an alternative embodiment, the motion pose vector may also be described by the angles formed between a preset set of three different skeletal key points, and not necessarily by the angles formed between all the combined three different skeletal key points.

In the embodiment of the application, a plurality of target motion video data can be cut from the video data, and invalid noise video segments are removed; and detecting a human skeleton key point attitude image corresponding to each frame of target motion video data according to a human skeleton key point detection algorithm. The method can also comprise the steps of detecting a human skeleton key point attitude image corresponding to each frame of image of the video data according to a human skeleton key point detection algorithm, and calculating to obtain an action attitude vector corresponding to each frame of image of the video data according to the human skeleton key point attitude image; clustering the action attitude vectors corresponding to all frame images of the video data by adopting a clustering algorithm to obtain a clustering result; then, the video data is segmented based on the clustering result, and a plurality of target motion video data are extracted to remove invalid noise video segments.

In order to facilitate understanding of how to mine the model according to the action standard, the action evaluation data is generated, and the implementation principle of the action evaluation data is explained. The mining model adopts a decision tree model, and the action attitude vector and the labeling result are input into the decision tree model to generate action evaluation data.

It should be understood that the decision tree model includes a root node, a plurality of edges, and a plurality of child nodes; inputting the action attitude vector and the labeling result into a root node; the root node is connected with a plurality of primary child nodes in a one-to-one correspondence mode through a plurality of primary edges, the primary edges are used for representing primary evaluation basis formed by a plurality of dimensions in the motion attitude vector, and the child nodes comprise a plurality of primary child nodes; screening the motion attitude vectors and the labeling results according to the primary evaluation basis to obtain a plurality of primary screening groups; the primary screening groups comprise different motion attitude vectors and labeling results corresponding to the motion attitude vectors; if the labeling results in the primary screening group are the same, determining the primary evaluation basis corresponding to the primary screening group as action evaluation data; wherein the primary evaluation basis is determined according to primary edges connected with primary child nodes corresponding to the primary screening group; if the labeling results in the primary screening groups are different, screening the motion attitude vectors and the labeling results in the primary screening groups according to a second evaluation basis to obtain a plurality of second screening groups; the primary sub-nodes corresponding to the primary screening group are correspondingly connected with the plurality of secondary sub-nodes one by one through a plurality of secondary edges, the plurality of secondary edges are used for representing the second evaluation basis formed by a plurality of dimensionalities in the action attitude vectors in the primary screening group, the group number of the plurality of secondary screening groups corresponds to the number of the secondary sub-nodes, and different secondary screening groups comprise different action attitude vectors and labeling results corresponding to the action attitude vectors; if the labeling results in the second screening group are the same, determining the primary evaluation basis and the second evaluation basis traversed from the root node to the second-level child node corresponding to the second screening group as action evaluation data; if the labeling results in the second screening group are different, the steps are repeated, and the child nodes continue to be split until the labeling results in the screening groups on all the child nodes are the same.

The primary evaluation basis or the second evaluation basis is formed according to a plurality of dimensions in the motion attitude vector, which can be understood as that if one motion attitude vector includes three dimensions, the three dimensions are respectively a first dimension, a second dimension and a third dimension. The first dimension of the motion attitude vector can be understood as an included angle formed by certain three bone key points, the second dimension of the motion attitude vector can be understood as an included angle formed by other three bone key points, and the third dimension of the motion attitude vector can be understood as an included angle formed by other three bone key points. In other words, different dimensions of the motion pose vector characterize different motion poses.

Fig. 6 is a schematic diagram of a decision tree according to the present application. If the motion attitude vectors and the labeling results corresponding to t frame images of the target motion video data are input to the root node, wherein the t frame images of the target motion video data are all frames of a plurality of target motion video data, the labeling results of m1 frame images in the t frame images of the target motion video data represent motion standards, the labeling results of n1 frame images in the t frame images of the target motion video data represent motion non-standards, and m1+ n1 is t.

Screening the labeling results corresponding to t frame images of the target motion video data according to a primary evaluation basis r1 and a primary evaluation basis r2 to obtain two primary screening groups which are respectively a first primary screening group and a second primary screening group; the root node is connected with the first primary child node according to a primary side corresponding to the primary evaluation criterion r1, and the root node is connected with the second primary child node according to a primary side corresponding to the primary evaluation criterion r 2; the first primary screening group comprises an annotation result of m1 frame images representing action standard, an annotation result of n2 frame images representing action non-standard and an action attitude vector of m1+ n2 frame images, the second primary screening group comprises an annotation result of n3 frame images representing action non-standard and an action attitude vector of n3 frame images, and n2+ n3 is n 1; the primary evaluation is formed according to a first dimension of the motion attitude vector corresponding to the t frame images according to r1, and the primary evaluation is formed according to a second dimension of the motion attitude vector corresponding to the t frame images according to r 2; because the labeling results included in the second primary screening group all represent that the actions are not standard, the second primary child node corresponding to the second primary screening group is a pure node, the second primary child node does not need to be split, the primary evaluation basis r2 is an action evaluation data, and the action evaluation data is an evaluation basis of the nonstandard actions.

Because the labeling results in the first primary screening group are different, the first primary child node continues to split, a second evaluation basis r3 is formed according to the first dimension of the motion attitude vector of the m1+ n2 frame image, a second evaluation basis r4 is formed according to the second dimension of the motion attitude vector of the m1+ n2 frame image, correspondingly, the first primary child node is connected with the second-level child node a1 according to the second evaluation basis r3, and the first primary child node is connected with the second-level child node a2 according to the second evaluation basis r 4; screening the labeling results of the m1+ n2 frame images according to a second evaluation criterion r3 and a second evaluation criterion r4 to obtain two second screening groups, namely a second screening group a1 and a second screening group a 2; the second screening group a1 includes the annotation result of n4 frame images and the motion attitude vector of n4 frame images, which characterize the motion non-standard, the second screening group a2 includes the annotation result of m1 frame images, which characterize the motion standard, the annotation result of n5 frame images, which characterize the motion non-standard, and the motion attitude vector of m1+ n5 frame images, and n4+ n5 is n 2; because the labeling results included in the second screening group a1 all represent that the actions are not standard, the second-level child node a1 corresponding to the second screening group a1 is a pure node, the second-level child node a1 does not need to be split, the primary evaluation is determined as an action evaluation data according to r1 and the second evaluation according to r3, and the action evaluation data formed by the primary evaluation according to r1 and the second evaluation according to r3 is an evaluation basis of the non-standard actions.

Because the labeling results in the second screening group a2 are different, the second-level sub-node a2 continues to split, a third evaluation basis r5 is formed according to the first dimension of the motion attitude vector of the m1+ n5 frame image, a third evaluation basis r6 is formed according to the second dimension of the motion attitude vector of the m1+ n5 frame image, correspondingly, the second-level sub-node a2 is connected with the third-level sub-node b1 according to the third evaluation basis r5, and the second-level sub-node a2 is connected with the third-level sub-node b2 according to the third evaluation basis r 6; screening the labeling results of the m1+ n5 frame images according to a third evaluation criterion r5 and a third evaluation criterion r6 to obtain two third screening groups, namely a third screening group b1 and a third screening group b 2; wherein the third screening group b1 comprises the annotation result of m1 frame images and the motion attitude vector of m1 frame images which characterize the motion standard, and the third screening group b2 comprises the annotation result of n5 frame images and the motion attitude vector of n5 frame images which characterize the motion non-standard; because the labeling results included in the third screening group b1 all represent action standards, and the labeling results included in the third screening group b2 all represent action substandard, then the third-level child node b1 corresponding to the third screening group b1 and the third-level child node b2 corresponding to the third screening group b2 are pure nodes, neither the third-level child node b1 nor the third-level child node b2 are split, the primary evaluation is determined as action evaluation data according to r1, the second evaluation basis r4 and the third evaluation basis r5, and the primary evaluation is determined as evaluation basis of the action evaluation data formed according to r1, the second evaluation basis r4 and the third evaluation basis r 5; the primary evaluation criterion r1, the second evaluation criterion r4 and the third evaluation criterion r6 are determined as a piece of action evaluation data, and the action evaluation data formed by the primary evaluation criterion r1, the second evaluation criterion r4 and the third evaluation criterion r6 is an evaluation criterion of nonstandard action. At this time, there is no sub-node in the decision tree that can be split, and all action evaluation bases are generated.

After the decision tree model is trained, referring to fig. 7, the action standard mining method further includes the following steps:

s401, obtaining a test motion attitude vector.

And the test motion attitude vector is used for representing the motion attitude of the human body in the test motion video data.

It can be understood that the test motion attitude vector of each frame of image in the test motion video data can be obtained by performing segmentation processing on the test motion video data. The specific division processing principle can refer to the content of S202 and its sub-steps, which will not be described herein.

S402, inputting the test action posture vector to the trained decision tree model to obtain test action evaluation data and a test action evaluation result.

The test action evaluation data is generated according to a traversal path from a root node to a target child node of the decision tree model, and the test action evaluation result is generated according to the target child node.

It should be understood that the action posture vectors and labeling results on the nodes in the trained decision tree model are known, the number and types of the pure nodes generated by splitting are determined, traversal paths from the root node to the number of each pure node in the decision tree model are also determined, and evaluation bases formed by multiple dimensions in the action posture vectors represented by the edges corresponding to each traversal path are also known and determined. The trained decision tree model can determine a traversal path and a target child node according to the action attitude vector and the test action attitude vector on each node, test action evaluation data can be obtained according to the corresponding edge of the traversal path, and a test action evaluation result can be obtained according to the labeling result on the target child node. Wherein the target child node is a clean node.

In order to facilitate understanding of the motion standard mining method of the present application, a description will now be given taking an example in which motion a in target motion video data is a panel panning. Firstly, video data and a labeling result are obtained, wherein the video data comprise a plurality of flat plate support action video data, and the labeling result is used for representing whether actions corresponding to each frame of image in the flat plate support action video data are standard or not; the video data are segmented to obtain motion attitude vectors of each frame of image in the flat plate support motion video data; the motion posture vector can comprise an angle formed by a left ankle joint, a left knee joint and a left hip joint, an angle formed by a right ankle joint, a right knee joint and a right hip joint, an angle formed by a left wrist joint, a left elbow joint and a left shoulder joint, an angle formed by a right wrist joint, a right elbow joint and a right shoulder joint, and an angle formed by a vertex, a neck and a pelvis joint; and inputting the motion attitude vector and the labeling result into a preset motion standard mining model to generate motion evaluation data.

The motion posture vector corresponding to each frame of image in the flat plate support motion video data comprises 5 dimensions, namely a first dimension of an angle representation formed by a left ankle joint, a left knee joint and a left hip joint, a second dimension of an angle representation formed by a right ankle joint, a right knee joint and a right hip joint, a third dimension of an angle representation formed by a left wrist joint, a left elbow joint and a left shoulder joint, a fourth dimension of an angle representation formed by a right wrist joint, a right elbow joint and a right shoulder joint, and a fifth dimension of an angle representation formed by a vertex, a neck and a pelvic bone joint. If the angles of the first dimension, the second dimension and the fifth dimension in a certain frame of image are all 180 degrees, and the angles of the third dimension and the fourth dimension are all 90 degrees, the corresponding labeling result of the frame of image is the action standard; if the angle of at least one of the first dimension, the second dimension and the fifth dimension in a certain frame image is not 180 degrees, and the angle of at least one of the third dimension and the fourth dimension is not 90 degrees, the corresponding labeling result of the frame image is the action non-standard.

In an alternative embodiment, the motion a in the target motion video data may be taken as a push-up motion as an example. Firstly, video data and a labeling result are obtained, wherein the video data comprise a plurality of push-up action video data, and the labeling result is used for representing whether actions corresponding to each frame of image in the push-up action video data are standard or not; the video data are segmented to obtain the motion attitude vector of each frame of image in the push-up motion video data; the motion posture vector can comprise an angle formed by a left ankle joint, a left knee joint and a left hip joint, an angle formed by a right ankle joint, a right knee joint and a right hip joint, an angle formed by a left knee joint, a left hip joint and a pelvis joint, an angle formed by a right knee joint, a right hip joint and a pelvis joint, an angle formed by a left wrist joint, a left elbow joint and a left shoulder joint, an angle formed by a right wrist joint, a right elbow joint and a right shoulder joint, and an angle formed by a crown, a neck and a pelvis joint; and inputting the motion attitude vector and the labeling result into a preset motion standard mining model to generate motion evaluation data.

The motion posture vectors corresponding to each frame of image in the push-up motion video data respectively comprise 7 dimensions, namely a first dimension represented by an angle formed by a left ankle joint, a left knee joint and a left hip joint, a second dimension represented by an angle formed by a right ankle joint, a right knee joint and a right hip joint, a third dimension represented by an angle formed by a left wrist joint, a left elbow joint and a left shoulder joint, a fourth dimension represented by an angle formed by a right wrist joint, a right elbow joint and a right shoulder joint, a fifth dimension represented by an angle formed by a vertex, a neck and a pelvic bone joint, a sixth dimension represented by an angle formed by a left knee joint, a left hip joint and a pelvic joint and a seventh dimension represented by an angle formed by a right knee joint, a right hip joint and a pelvic joint.

If the push-up action video data is divided into 3 frames of images according to the initial time, the middle time and the end time, the images are respectively a first frame of image representing the initial time, a second frame of image representing the middle time and a third frame of image representing the end time. If the angles of the first dimension, the second dimension, the fifth dimension, the sixth dimension and the seventh dimension in the first frame image of a certain push-up action video data are all 180 degrees, and the angles of the third dimension and the fourth dimension are all 30 degrees, the corresponding labeling result of the first frame image is the action standard; if the angle of at least one of the first dimension, the second dimension, the fifth dimension, the sixth dimension and the seventh dimension is not 180 degrees and the angle of at least one of the third dimension and the fourth dimension is not 30 degrees in a first frame image of a certain push-up action video data, the corresponding labeling result of the first frame image is the action nonstandard; if the angles of the first dimension, the second dimension, the fifth dimension, the sixth dimension and the seventh dimension in the second frame of image of the push-up action video data are all 180 degrees, and the angles of the third dimension and the fourth dimension are all 110 degrees, the corresponding labeling result of the first frame of image is the action standard; if the angle of at least one of the first dimension, the second dimension, the fifth dimension, the sixth dimension and the seventh dimension is not 180 degrees and the angle of at least one of the third dimension and the fourth dimension is not 110 degrees in a second frame image of a certain push-up action video data, the corresponding labeling result of the second frame image is the action nonstandard; if the angles of the first dimension, the second dimension, the fifth dimension, the sixth dimension and the seventh dimension in a third frame of image of a certain push-up action video data are all 180 degrees, and the angles of the third dimension and the fourth dimension are all 180 degrees, the corresponding labeling result of the first frame of image is the action standard; if the angle of at least one of the first dimension, the second dimension, the fifth dimension, the sixth dimension and the seventh dimension is not 180 degrees and the angle of at least one of the third dimension and the fourth dimension is not 180 degrees in a third frame of image of a certain push-up action video data, the corresponding labeling result of the third frame of image is the action nonstandard.

In order to implement the action standard mining method corresponding to the above-mentioned S201 to S403, S301 to S302 and possible sub-steps thereof, an embodiment of the present application provides an action standard mining device, please refer to fig. 8, fig. 8 is a block diagram of an action standard mining device 400 provided in the embodiment of the present application, where the action standard mining device 400 includes: an acquisition module 401, a segmentation processing module 402, a data generation module 403, and a result generation module 404.

The obtaining module 401 is configured to obtain video data and an annotation result; the video data comprise a plurality of target action video data, and the annotation result is used for representing whether the action corresponding to each frame of image in the target action video data is standard or not.

In an optional embodiment, the obtaining module 401 is further configured to obtain a plurality of scoring results for each frame of image in the target motion video data; wherein, a plurality of scoring results are generated by different professionals respectively evaluating.

In an optional embodiment, the obtaining module 401 is further configured to obtain a test motion posture vector; and the test motion attitude vector is used for representing the motion attitude of the human body in the test motion video data.

The segmentation processing module 402 is configured to perform segmentation processing on the video data to obtain a motion attitude vector of each frame of image in the target motion video data; the motion attitude vector of each frame of image corresponds to the labeling result of each frame of image one by one, and the motion attitude vector is used for representing the motion attitude of the human body.

In an optional implementation manner, the segmentation processing module 402 is further configured to detect a human skeleton key point attitude map corresponding to each frame of image in the target motion video data according to a human skeleton key point detection algorithm; wherein, the human skeleton key point posture graph is a posture graph formed by human skeleton key points; and calculating to obtain an action attitude vector according to the human skeleton key point attitude diagram.

The data generating module 403 is configured to input the motion posture vector and the labeling result into a preset motion standard mining model, and generate motion evaluation data based on the trained motion standard mining model.

In an optional implementation manner, the data generating module 403 is further configured to input the test action posture vector to the trained decision tree model, so as to obtain test action evaluation data and a test action evaluation result; the test action evaluation data is generated according to a traversal path from a root node to a target child node of the decision tree model, and the test action evaluation result is generated according to the target child node.

The result generating module 404 is configured to obtain a labeling result of each frame of image in the target action video data according to a plurality of scoring results of each frame of image in the target action video data.

In an optional implementation manner, the result generating module 404 is further configured to sum the multiple scoring results of each frame of image in the target motion video data to obtain a processing result of each frame of image in the target motion video data; and comparing the processing result of each frame of image in the target action video data with a preset standard value to obtain the labeling result of each frame of image in the target action video data. It should be understood that the acquisition module 401, the segmentation processing module 402, the data generation module 403 and the result generation module 404 may cooperatively implement the above-described S201 to S403, S301 to S302, S401 to S402 and possible sub-steps thereof.

In summary, the present application provides an action standard mining method, an action standard mining device, an electronic device, and a computer storage medium, where the action standard mining method includes: acquiring video data and a labeling result; the video data comprise a plurality of target action video data, and the annotation result is used for representing whether the action corresponding to each frame of image in the target action video data is standard or not; the video data are segmented to obtain motion attitude vectors of each frame of image in the target motion video data; the motion attitude vector of each frame of image corresponds to the labeling result of each frame of image, and the motion attitude vector is used for representing the motion attitude of the human body; and inputting the motion attitude vector and the labeling result into a preset motion standard mining model, and generating motion evaluation data based on the trained motion standard mining model. By adopting the action standard mining method provided by the application, through the video data and the labeling result, based on a machine learning algorithm, the normative, quantifiable and meaningful action evaluation data can be automatically mined, and the normative of sports evaluation is further improved.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of mining an action standard, the method comprising:

inputting the motion attitude vector and the labeling result into a preset motion standard mining model for model training, and generating motion evaluation data based on the trained motion standard mining model; and the action evaluation data is a judgment basis for judging whether the action in the target action video data is standard or not.

2. The method of claim 1, wherein before the step of obtaining the annotation result, the method further comprises:

obtaining a plurality of grading results of each frame of image in the target action video data; wherein the plurality of scoring results are respectively evaluated and generated by different professionals;

and obtaining the labeling result of each frame of image in the target action video data according to the plurality of grading results of each frame of image in the target action video data.

3. The method according to claim 2, wherein the step of obtaining the labeling result of each frame of image in the target motion video data according to the plurality of scoring results of each frame of image in the target motion video data comprises:

summing a plurality of scoring results of each frame of image in the target action video data to obtain a processing result of each frame of image in the target action video data;

and comparing the processing result of each frame of image in the target action video data with a preset standard value to obtain the labeling result of each frame of image in the target action video data.

4. The method of any one of claims 1-3, wherein the action criteria mining model employs a decision tree model;

the method further comprises the following steps:

acquiring a test action attitude vector; the test motion gesture vector is used for representing the motion gesture of the human body in the test motion video data;

inputting the test action posture vector to a trained decision tree model to obtain test action evaluation data and a test action evaluation result; the test action evaluation data is generated according to a traversal path from a root node to a target child node of the decision tree model, and the test action evaluation result is generated according to the target child node.

5. The method according to any one of claims 1 to 3, wherein the step of performing segmentation processing on the video data to obtain motion attitude vectors of each frame of image in the target motion video data comprises:

detecting a human skeleton key point attitude map corresponding to each frame of image in the target action video data according to a human skeleton key point detection algorithm; the human skeleton key point posture graph is a posture graph formed by human skeleton key points;

and calculating to obtain the motion attitude vector according to the human skeleton key point attitude diagram.

6. The method according to any one of claims 1 to 3, wherein if the target motion video data is a plate hinge motion video data, the motion posture vector corresponding to each frame of image in the plate hinge motion video data comprises an angle formed by a left ankle joint, a left knee joint and a left hip joint, an angle formed by a right ankle joint, a right knee joint and a right hip joint, an angle formed by a left wrist joint, a left elbow joint and a left shoulder joint, an angle formed by a right wrist joint, a right elbow joint and a right shoulder joint, and an angle formed by a vertex, a neck and a pelvis joint.

7. An action standard excavating apparatus, the apparatus comprising:

8. The device of claim 7, wherein the obtaining module is further configured to obtain a plurality of scoring results for each frame of image in the target action video data; wherein the plurality of scoring results are respectively evaluated and generated by different professionals;

the device further comprises:

and the result generation module is used for obtaining the labeling result of each frame of image in the target action video data according to the plurality of grading results of each frame of image in the target action video data.

9. The apparatus according to claim 8, wherein the result generating module is further configured to perform a summation calculation on a plurality of scoring results of each frame of image in the target motion video data, so as to obtain a processing result of each frame of image in the target motion video data;

the result generation module is further configured to compare the processing result of each frame of image in the target action video data with a preset standard value, so as to obtain an annotation result of each frame of image in the target action video data.

10. An electronic device comprising one or more processors, and memory for storing one or more programs; the one or more programs, when executed by the one or more processors, implement the method of any of claims 1-6.

11. A computer storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.