CN115578786A

CN115578786A - Motion video detection method, device, equipment and storage medium

Info

Publication number: CN115578786A
Application number: CN202211093058.6A
Authority: CN
Inventors: 贾泽华; 卓钰博
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-09-08
Filing date: 2022-09-08
Publication date: 2023-01-06

Abstract

The invention relates to artificial intelligence and provides a motion video detection method, a motion video detection device, motion video detection equipment and a storage medium. The method comprises the steps of detecting a user motion frame based on a gesture key point model to obtain user key points, detecting the user motion frame based on a ground plane detection model to obtain a motion ground plane, identifying a plurality of key point combinations from the user key points, calculating a user key point angle based on key point coordinate information and ground plane coordinate information of the motion ground plane in each user motion frame, wherein the user key point angle comprises a combination angle and a user plane angle, obtaining a standard key point angle based on each standard motion frame in a standard video, generating action contact ratio based on the user key point angle and the standard key point angle, and generating a video score according to the action contact ratio. In addition, the invention also relates to a block chain technology, and the video scores can be stored in the block chain.

Description

Motion video detection method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a motion video detection method, a motion video detection device, motion video detection equipment and a storage medium.

Background

As the heat of online movement has increased year by year, the need for normative through online detection of user actions has also increased. At present, in a scheme for performing online scoring on a sports video, due to the fact that a training user and a standard user have differences in body types, the contact degree between a user action and a standard action cannot be accurately quantified, the sports quality in the video cannot be accurately evaluated, and therefore the user cannot be reasonably assisted to complete sports training.

Disclosure of Invention

In view of the foregoing, there is a need to provide a motion video detection method, device, apparatus and storage medium, which can solve the technical problem that the motion quality in the video cannot be accurately evaluated.

In one aspect, the present invention provides a motion video detection method, where the motion video detection method includes:

acquiring a video to be detected and a standard video, wherein the video to be detected comprises a plurality of user motion frames, and the standard video comprises a plurality of standard motion frames;

detecting the plurality of user motion frames based on the pre-trained attitude key point model to obtain user key points of each user motion frame, and detecting the plurality of user motion frames based on the pre-trained ground plane detection model to obtain a motion ground plane of each user motion frame;

identifying a plurality of key point combinations from the user key points;

calculating a user key point angle based on the key point coordinate information of the user key point in each user motion frame and the ground plane coordinate information of the motion ground plane in each user motion frame, wherein the user key point angle comprises a combination angle of each key point combination and a user plane angle of a motion key point edge and the motion ground plane;

acquiring a standard key point angle corresponding to the user key point angle based on each standard motion frame in the standard video;

generating the action coincidence degree of each user key point angle based on the user key point angle and the standard key point angle;

and generating the video score of the video to be detected according to the action contact ratio.

According to a preferred embodiment of the present invention, the detecting the plurality of user motion frames based on the ground plane detection model trained in advance to obtain the motion ground plane of each user motion frame includes:

for each user motion frame, acquiring pixel information of each frame pixel point in the user motion frame on a preset channel;

predicting the pixel information based on the ground plane detection model to obtain the prediction probability of each frame pixel point;

and determining the area formed by the frame pixel points with the prediction probability greater than the configuration probability as the motion ground plane.

According to a preferred embodiment of the present invention, said identifying a plurality of key point combinations from said user key points comprises:

for any user key point, acquiring a connection key point of the user key point from a plurality of user key points;

and if the connection number of the connection key points is greater than the preset number, generating the plurality of key point combinations according to any key point pair in the connection key points and any user key point.

According to a preferred embodiment of the present invention, each keypoint combination includes a first combination edge and a second combination edge, the motion keypoint edge includes an initial keypoint and a target keypoint, and the calculating a user keypoint angle based on the keypoint coordinate information of the user keypoint in each user motion frame and the ground plane coordinate information of the motion ground plane in each user motion frame includes:

constructing a coordinate system according to any user motion frame;

obtaining coordinate values corresponding to the key points of the users at the key point pixel positions of each user motion frame from the coordinate system as the key point coordinate information, and obtaining coordinate values corresponding to the plane pixel positions of the motion ground plane at each user motion frame from the coordinate system as the ground plane coordinate information;

based on the key point coordinate information and the ground plane coordinate information, selecting a user key point with a distance to the motion ground plane smaller than a preset height as the target key point, and identifying a user key point connected with the target key point as the initial key point;

calculating first coordinate information corresponding to a first unit vector of the first combined edge based on the coordinate information of the key point, and calculating second coordinate information corresponding to a second unit vector of the second combined edge based on the coordinate information of the key point;

and generating the combined angle based on the first coordinate information and the second coordinate information, and calculating the user plane angle based on the key point coordinate information and the ground plane coordinate information.

According to a preferred embodiment of the present invention, the calculation formula of the combination angle is:

if sin θ<0, then

If sin theta is more than or equal to 0, then

sinθ＝c ₁ s ₂ -c ₂ s ₁ ；

cosθ＝c ₁ c ₂ +s ₁ s ₂ ；

Wherein θ represents the combined angle, and the first coordinate information is (c) ₁ ，s ₁ ) The second coordinate information is (c) ₂ ，s ₂ )。

According to a preferred embodiment of the present invention, the generating an action overlap ratio of each user keypoint angle based on the user keypoint angle and the standard keypoint angle includes:

acquiring a weight threshold value of each user key point angle;

calculating a total weight of the plurality of weight thresholds;

calculating the action contact ratio based on the weight threshold, the total weight, the user key point angle and the standard key point angle, wherein the calculation formula of the action contact ratio is as follows:

wherein S is _i Representing the action coincidence degree, w, of the ith user key point angle _i A weight threshold representing the ith user keypoint angle, sum (W) representing the total weight, m _i Representing a standard keypoint angle, t, corresponding to said ith user keypoint angle _i Representing the ith user keypoint angle.

According to a preferred embodiment of the present invention, the generating the video score of the video to be detected according to the action overlap ratio includes:

calculating the coincidence sum of the action coincidence degrees;

comparing the sum of the contact ratio with a preset contact ratio;

if the sum of the contact ratios is greater than the preset contact ratio, calculating the product of the sum of the contact ratios and a first configuration score to obtain the video score, or

And if the sum of the contact degrees is less than or equal to the preset contact degree, setting the video score as a second configuration score, wherein the first configuration score is greater than the second configuration score.

According to a preferred embodiment of the present invention, after generating the action overlap ratio of each user keypoint angle based on the user keypoint angle and the standard keypoint angle, the method for detecting a motion video further includes:

if the contact ratio of any action is smaller than a first preset value, first prompt information is generated; or alternatively

And if the contact ratio of any action is greater than a second preset value, generating second prompt information, wherein the second preset value is greater than the first preset value.

In another aspect, the present invention further provides a motion video detection apparatus, including:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a video to be detected and a standard video, the video to be detected comprises a plurality of user motion frames, and the standard video comprises a plurality of standard motion frames;

the detection unit is used for detecting the plurality of user motion frames based on the pre-trained attitude key point model to obtain the user key points of each user motion frame, and detecting the plurality of user motion frames based on the pre-trained ground plane detection model to obtain the motion ground plane of each user motion frame;

an identifying unit, configured to identify a plurality of key point combinations from the user key points;

a calculating unit, configured to calculate a user key point angle based on key point coordinate information of the user key point in each user motion frame and ground plane coordinate information of the motion ground plane in each user motion frame, where the user key point angle includes a combination angle of each key point combination and a user plane angle of a motion key point edge and the motion ground plane;

the acquiring unit is further configured to acquire a standard key point angle corresponding to the user key point angle based on each standard motion frame in the standard video;

the generating unit is used for generating the action contact ratio of each user key point angle based on the user key point angle and the standard key point angle;

the generating unit is further used for generating the video score of the video to be detected according to the action contact degrees.

In another aspect, the present invention further provides an electronic device, including:

a memory storing computer readable instructions; and

a processor executing computer readable instructions stored in the memory to implement the motion video detection method.

In another aspect, the present invention also provides a computer-readable storage medium, in which computer-readable instructions are stored, and the computer-readable instructions are executed by a processor in an electronic device to implement the motion video detection method.

According to the technical scheme, the key point coordinate information can identify a plurality of key point combinations for analyzing the user action specification, the ground plane detection model can accurately identify the motion ground plane of each user motion frame, and the key point coordinate information and the ground plane coordinate information calculate the user key point angle.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the method for detecting motion video according to the present invention.

FIG. 2 is a visual depiction of user key points in the present invention.

Fig. 3 is a functional block diagram of a preferred embodiment of the motion video detection apparatus of the present invention.

Fig. 4 is a schematic structural diagram of an electronic device implementing a motion video detection method according to a preferred embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

Fig. 1 is a flow chart of a method for detecting motion video according to a preferred embodiment of the invention. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.

The motion video detection method can acquire and process related data based on an artificial intelligence technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The motion video detection method is applied to one or more electronic devices, which are devices capable of automatically performing numerical calculation and/or information processing according to computer readable instructions set or stored in advance, and the hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The electronic device may be any electronic product capable of performing human-computer interaction with a user, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an interactive Internet Protocol Television (IPTV), a smart wearable device, and the like.

The electronic device may include a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network electronic device, an electronic device group consisting of a plurality of network electronic devices, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network electronic devices.

The network where the electronic device is located includes, but is not limited to: the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.

101, acquiring a video to be detected and a standard video, wherein the video to be detected comprises a plurality of user motion frames, and the standard video comprises a plurality of standard motion frames.

In at least one embodiment of the present invention, the video to be detected refers to a video shot by a training user during exercise, wherein the training user can do yoga, gymnastics, and other exercises. The standard video is typically a video of the instruction made by a sports coach for a certain sport. The video to be detected can be obtained from the terminal equipment of the training user, the video to be detected is shot by the terminal equipment of the training user, the standard video can be obtained from the terminal equipment of the sports coach, and the standard video is shot by the terminal equipment of the training user.

The plurality of user motion frames may be partial video frames or all video frames in the video to be detected, the plurality of standard motion frames refer to video frames corresponding to the plurality of user motion frames, it may be understood that the video duration of the video to be detected is equal to the video duration of the standard video, and a manner in which the plurality of user motion frames are extracted from the video to be detected is the same as a manner in which the plurality of standard motion frames are extracted from the standard video, for example, a manner in which the plurality of user motion frames are extracted from the video to be detected may be a video frame corresponding to a certain time point in the video to be detected.

And 102, detecting the plurality of user motion frames based on the pre-trained attitude key point model to obtain the user key points of each user motion frame, and detecting the plurality of user motion frames based on the pre-trained ground plane detection model to obtain the motion ground plane of each user motion frame.

In at least one embodiment of the present invention, the pose keypoint model may identify human pose keypoints of the training user from any user motion frame. The pose key point model is generated by utilizing the crawled pictures related to the character to train and test. The user key points generally include, but are not limited to: nose, left eye, right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left hip joint, right hip joint, left knee, right knee, left ankle, right ankle, and the like.

The ground plane detection model stores ground plane pixel values of a plurality of different ground planes on a preset channel. The sports ground plane refers to the ground plane for training the user to finish sports training.

As shown in fig. 2, fig. 2 is a visual diagram of key points of a user in the present invention, and specifically, 0 in fig. 2 represents a nose, 1 represents a left eye, 2 represents a right eye, 3 represents a left ear, 4 represents a right ear, 5 represents a left shoulder, 6 represents a right shoulder, 7 represents a left elbow, 8 represents a right elbow, 9 represents a left wrist, 10 represents a right wrist, 11 represents a left crotch joint, 12 represents a right crotch joint, 13 represents a left knee, and 14 represents a right knee.

In at least one embodiment of the present invention, the detecting, by the electronic device, the plurality of user motion frames based on the ground plane detection model trained in advance, and obtaining the motion ground plane of each user motion frame includes:

and determining the area formed by the frame pixel points with the prediction probability larger than the configuration probability as the motion ground plane.

Wherein the preset channel may be a (red, green, blue) channel.

The prediction probability is generated according to the difference value mapping of the pixel information and the ground plane pixel value prestored in the ground plane detection model. The larger the difference between the pixel information and the ground plane pixel value prestored in the ground plane detection model is, the smaller the prediction probability is.

The configuration probability is set according to actual requirements.

The prediction probability can be rapidly generated through the ground plane pixel value and the pixel information prestored in the ground plane detection model, and the moving ground plane can be accurately identified according to the difference value between the prediction probability and the configuration probability.

And 103, identifying a plurality of key point combinations from the user key points.

In at least one embodiment of the present invention, each keypoint combination comprises a first combined edge and a second combined edge, wherein the first combined edge and the second combined edge comprise the same user keypoint.

In at least one embodiment of the present invention, the electronic device identifying a plurality of keypoint combinations from the user keypoints comprises:

and if the connection number of the connection key points is larger than the preset number, generating a plurality of key point combinations according to any key point pair in the connection key points and any user key point.

Wherein the connection key point refers to another user key point connected with the any user key point. For example, in a human body configuration, the left elbow is connected with the left wrist, and when any one of the user key points is the left elbow, the connected key point may be the left wrist.

The preset number is usually set to 1 according to actual requirements.

Through the identification of the connection key points and the comparison of the connection number and the preset number, the situation that any key point combination is not comprehensive enough due to insufficient connection number can be avoided, and therefore the accuracy of the combination of the key points is improved.

Specifically, the generating, by the electronic device, the multiple key point combinations according to any key point pair of the connection key points and the any user key point includes:

determining a connecting line between any user key point and each arbitrary key point as a combined edge;

and determining a plurality of combined edges as key point combinations taking any user key point as a common point.

In other embodiments, if the number of connected keypoints is less than or equal to the preset number, it is determined that there is no keypoint combination that takes any user keypoint as a common point.

And 104, calculating a user key point angle based on the key point coordinate information of the user key point in each user motion frame and the ground plane coordinate information of the motion ground plane in each user motion frame, wherein the user key point angle comprises a combination angle of each key point combination and a user plane angle of a motion key point edge and the motion ground plane.

In at least one embodiment of the present invention, the key point coordinate information is determined based on the key point pixel position where the user key point is located. The ground plane coordinate information is determined based on the position of a plane pixel point where the moving ground plane is located. The user keypoint angle refers to an angle on the same plane as any user motion frame.

Each key point combination comprises a first combination edge and a second combination edge, and the combination angle refers to angle information formed by the combination edges in the key point combination. The motion key point edge comprises an initial key point and a target key point, wherein the target key point is a user key point of which the distance from the motion ground plane is smaller than a preset height, and the initial key point is a user key point connected with the target key point. The motion key point side refers to a body formed by the initial key point and the target key point, and may be a lower leg, for example.

In at least one embodiment of the present invention, the electronic device calculating a user keypoint angle based on the keypoint coordinate information of the user keypoint in each user motion frame and the ground plane coordinate information of the motion ground plane in each user motion frame comprises:

constructing a coordinate system according to any user motion frame;

obtaining coordinate values corresponding to the pixel point positions of the user key points in each user motion frame from the coordinate system as the key point coordinate information, and obtaining coordinate values corresponding to the planar pixel point positions of the motion ground plane in each user motion frame from the coordinate system as the ground plane coordinate information;

The coordinate system may be a planar linear coordinate system constructed by taking an image width side of any user motion frame as an x-axis and an image height side as a y-axis.

The keypoint pixel location refers to the location of the user keypoint in each user motion frame. For example, the coordinate system is a planar linear coordinate system constructed by taking the lower left corner in any user motion frame as an origin, the image width side as an x axis, and the image height side as a y axis, and if the key point pixel point of the user key point a in the user motion frame is a horizontal 5 th pixel point and a vertical 8 th pixel point, the key point coordinate information of the user key point a may be (5, 8).

By combining the key point pixel position and the coordinate system, the key point coordinate information and the ground plane coordinate information of each user key point can be accurately obtained, and the key point coordinate information and the ground plane coordinate information can be obtained from the same dimension, so that the identification accuracy of the user key point angle can be improved.

Specifically, the first combined edge includes a first key point and a second key point, and the calculation formula of the first coordinate information is as follows:

wherein the first coordinate information is (c) ₁ ，s ₁ ) The coordinate information of the first key point is (x) ₀ ，y ₀ ) The coordinate information of the second key point is (x) ₁ ，y ₁ )。

In other embodiments, the generation manner of the second coordinate information is similar to the generation manner of the first coordinate information, and details are not repeated herein.

Specifically, the calculation formula of the combination angle is as follows:

sin theta<0, then

If sin θ is greater than or equal to 0, then

sinθ＝c ₁ s ₂ -c ₂ s ₁ ；

cosθ＝c ₁ c ₂ +s ₁ s ₂ ；

By combining with the value of sin theta, different modes are adopted to calculate the user key point angle, and the accuracy of the user key point angle can be improved because the direction consistency is controlled.

Specifically, the generation manner of the user plane angle is similar to the generation manner of the combination angle, which is not described herein again.

And 105, acquiring a standard key point angle corresponding to the user key point angle based on each standard motion frame in the standard video.

In at least one embodiment of the present invention, a calculation manner of the standard keypoint angle is similar to a calculation manner of the user keypoint angle, and details thereof are not repeated herein.

In at least one embodiment of the present invention, after the calculation of the standard keypoint angle is completed, the standard keypoint angle may be stored, which is convenient for subsequent retrieval.

And 106, generating the action coincidence degree of each user key point angle based on the user key point angle and the standard key point angle.

In at least one embodiment of the invention, the action overlap is used to measure how similar the training user mimics the athletic actions of the athletic trainer.

In at least one embodiment of the present invention, the electronic device generating the action overlap ratio of each user keypoint angle based on the user keypoint angle and the standard keypoint angle comprises:

acquiring a weight threshold value of each user key point angle;

calculating a total weight of the plurality of weight thresholds;

Wherein, W = [ W = ₁ ,w ₂ ,…,w _i ,…,w _n ]。

By setting different weight thresholds for each key point combination, the accuracy of the action overlap ratio can be improved.

In at least one embodiment of the present invention, after generating the action overlap ratio of each user keypoint angle based on the user keypoint angle and the standard keypoint angle, the motion video detection method further includes:

if the contact ratio of any action is smaller than a first preset value, first prompt information is generated; or

The first preset value and the second preset value can be set according to actual requirements.

The first prompt information is used for indicating that the gesture of the user key point corresponding to the user key point angle needs to be adjusted.

The second prompt information is used for indicating that the gesture of the user key point corresponding to the user key point angle is close to the standard gesture.

Through the embodiment, the training condition of the training user can be intuitively displayed.

And 107, generating a video score of the video to be detected according to the action coincidence degrees.

In at least one embodiment of the invention, the video score is used to measure the exercise achievement of the training user.

It is emphasized that the video score may also be stored in a node of a blockchain in order to further ensure privacy and security of the video score.

In at least one embodiment of the present invention, the generating, by the electronic device, the video score of the video to be detected according to the action overlap ratio includes:

calculating the contact ratio sum of the action contact ratios;

comparing the sum of the contact ratios with a preset contact ratio;

And if the sum of the contact ratios is less than or equal to the preset contact ratio, setting the video score as a second configuration score, wherein the first configuration score is greater than the second configuration score.

The first configuration score is a score corresponding to a motion standard in the user motion frames, the second configuration score is a score corresponding to a motion non-standard in the user motion frames, and the preset overlap ratio, the first configuration score and the second configuration score may be set according to actual requirements, for example, the preset overlap ratio is usually set to 0, the first configuration score is usually set to 100, and the second configuration score is usually set to 0.

Through the comparison of contact ratio sum and preset contact ratio can take different modes, and the video score can be reasonably determined.

According to the technical scheme, the method and the device for detecting the video scores can identify a plurality of key point combinations for analyzing the user action specification through the key point coordinate information, can accurately identify the motion ground plane of each user motion frame through the ground plane detection model, and further can calculate the user key point angles through the key point coordinate information and the ground plane coordinate information.

Fig. 3 is a functional block diagram of a motion video detection apparatus according to a preferred embodiment of the present invention. The motion video detection apparatus 11 includes an acquisition unit 110, a detection unit 111, an identification unit 112, a calculation unit 113, and a generation unit 114. The module/unit referred to herein is a series of computer readable instruction segments that can be accessed by the processor 13 and perform a fixed function and that are stored in the memory 12. In the present embodiment, the functions of the modules/units will be described in detail in the following embodiments.

The obtaining unit 110 obtains a video to be detected and a standard video, where the video to be detected includes a plurality of user motion frames, and the standard video includes a plurality of standard motion frames.

The plurality of user motion frames may be partial video frames or all video frames in the video to be detected, the plurality of standard motion frames refer to video frames corresponding to the plurality of user motion frames, it can be understood that the video duration of the video to be detected is equal to the video duration of the standard video, and the manner in which the plurality of user motion frames are extracted from the video to be detected is the same as the manner in which the plurality of standard motion frames are extracted from the standard video, for example, the manner in which the plurality of user motion frames are extracted from the video to be detected may be a video frame corresponding to a certain time point in the video to be detected.

The detection unit 111 detects the plurality of user motion frames based on the pre-trained pose key point model to obtain the user key points of each user motion frame, and detects the plurality of user motion frames based on the pre-trained ground plane detection model to obtain the motion ground plane of each user motion frame.

In at least one embodiment of the present invention, the pose keypoint model may identify human pose keypoints for the training user from any user motion frame. The pose key point model is generated by utilizing the crawled pictures related to the character to train and test. The user key points generally include, but are not limited to: nose, left eye, right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left crotch joint, right crotch joint, left knee, right knee, left ankle, right ankle, and the like.

In at least one embodiment of the present invention, the detecting unit 111 detects the multiple user motion frames based on a ground plane detection model trained in advance, and obtaining the motion ground plane of each user motion frame includes:

Wherein the preset channel may be a (red, green, blue) channel.

The configuration probability is set according to actual requirements.

The identifying unit 112 identifies a plurality of key point combinations from the user key points.

In at least one embodiment of the present invention, the identifying unit 112 identifies a plurality of key point combinations from the user key points, including:

Wherein the connection key point refers to another user key point connected with the any user key point. For example, in a human configuration where the left elbow is connected to the left wrist, then when any of the user keypoints is the left elbow, the connected keypoint may be the left wrist.

The preset number is usually set to 1 according to actual requirements.

Through the identification of the connection key points and the comparison of the connection number and the preset number, the situation that any key point combination is not comprehensive enough due to insufficient connection number can be avoided, and the accuracy of the combination of the key points is improved.

Specifically, the generating, by the identifying unit 112, the plurality of key point combinations according to any key point pair of the connection key points and the any user key point includes:

determining a connecting line of any user key point and each arbitrary key point as a combined edge;

In other embodiments, if the number of connected keypoints is less than or equal to the preset number, the identifying unit 112 determines that there is no keypoint combination with any user keypoint as a common point.

The calculating unit 113 calculates a user keypoint angle based on the keypoint coordinate information of the user keypoint in each user motion frame and the ground plane coordinate information of the motion ground plane in each user motion frame, where the user keypoint angle includes a combination angle of each keypoint combination and a user plane angle of a motion keypoint edge and the motion ground plane.

Each key point combination comprises a first combination edge and a second combination edge, and the combination angle refers to angle information formed by the combination edges in the key point combination. The motion key point edge comprises an initial key point and a target key point, wherein the target key point is a user key point, the distance between the target key point and the motion ground plane is smaller than the preset height, and the initial key point is a user key point connected with the target key point. The motion key point side refers to a body formed by the initial key point and the target key point, and may be a lower leg, for example.

In at least one embodiment of the present invention, the calculating unit 113 calculating the user keypoint angle based on the keypoint coordinate information of the user keypoint in each user motion frame and the ground plane coordinate information of the motion ground plane in each user motion frame comprises:

constructing a coordinate system according to any user motion frame;

based on the coordinate information of the key points and the coordinate information of the ground plane, selecting user key points with a distance to the moving ground plane smaller than a preset height as the target key points, and identifying the user key points connected with the target key points as the initial key points;

calculating first coordinate information corresponding to a first unit vector of the first combined edge based on the coordinate information of the key points, and calculating second coordinate information corresponding to a second unit vector of the second combined edge based on the coordinate information of the key points;

The coordinate system may be a planar linear coordinate system constructed by taking an image width side of any one of the user motion frames as an x-axis and an image height side as a y-axis.

Specifically, the calculation formula of the combination angle is as follows:

sin theta<0, then

If sin θ is greater than or equal to 0, then

sinθ＝c ₁ s ₂ -c ₂ s ₁ ；

cosθ＝c ₁ c ₂ +s ₁ s ₂ ；

Specifically, the generating manner of the user plane angle is similar to the generating manner of the combined angle, which is not described herein again.

The obtaining unit 110 obtains a standard key point angle corresponding to the user key point angle based on each standard motion frame in the standard video.

The generating unit 114 generates the action overlap ratio of each user keypoint angle based on the user keypoint angle and the standard keypoint angle.

In at least one embodiment of the present invention, the generating unit 114 generates the action overlap ratio of each user keypoint angle based on the user keypoint angle and the standard keypoint angle, including:

acquiring a weight threshold value of each user key point angle;

calculating a total weight of the plurality of weight thresholds;

Wherein, W = [ W = ₁ ,w ₂ ,…,w _i ,…,w _n ]。

In at least one embodiment of the present invention, after generating an action overlap ratio of each user keypoint angle based on the user keypoint angle and the standard keypoint angle, if any action overlap ratio is smaller than a first preset value, the generating unit 114 generates a first prompt message; or

If the contact ratio of any action is greater than a second preset value, the generating unit 114 generates a second prompt message, where the second preset value is greater than the first preset value.

The generating unit 114 generates a video score of the video to be detected according to the action overlap ratio.

In at least one embodiment of the invention, the video score is used to measure the exercise achievement condition of the training user.

It is emphasized that the video scores may also be stored in nodes of a blockchain in order to further ensure privacy and security of the video scores.

In at least one embodiment of the present invention, the generating unit 114 generates the video score of the video to be detected according to a plurality of the action overlap ratios, including:

calculating the coincidence sum of the action coincidence degrees;

comparing the sum of the contact ratio with a preset contact ratio;

The first configuration score is a score corresponding to the action standard in the motion frames of the users, the second configuration score is a score corresponding to the action non-standard in the motion frames of the users, and the predetermined overlap ratio, the first configuration score and the second configuration score may be set according to actual requirements, for example, the predetermined overlap ratio is usually set to 0, the first configuration score is usually set to 100, and the second configuration score is usually set to 0.

Through the comparison of the sum of the contact ratio and the preset contact ratio, different modes can be adopted, and the video score can be reasonably determined.

In one embodiment of the present invention, the electronic device 1 includes, but is not limited to, a memory 12, a processor 13, and computer readable instructions, such as a motion video detection program, stored in the memory 12 and executable on the processor 13.

It will be appreciated by a person skilled in the art that the schematic diagram is only an example of the electronic device 1 and does not constitute a limitation of the electronic device 1, and that it may comprise more or less components than shown, or some components may be combined, or different components, e.g. the electronic device 1 may further comprise an input output device, a network access device, a bus, etc.

The Processor 13 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The processor 13 is an operation core and a control center of the electronic device 1, and is connected with various parts of the whole electronic device 1 by various interfaces and lines, and executes an operating system of the electronic device 1 and various installed application programs, program codes and the like.

Illustratively, the computer readable instructions may be partitioned into one or more modules/units that are stored in the memory 12 and executed by the processor 13 to implement the present invention. The one or more modules/units may be a series of computer readable instruction segments capable of performing specific functions, which are used for describing the execution process of the computer readable instructions in the electronic device 1. For example, the computer readable instructions may be divided into an acquisition unit 110, a detection unit 111, a recognition unit 112, a calculation unit 113, and a generation unit 114.

The memory 12 may be used for storing the computer readable instructions and/or modules, and the processor 13 implements various functions of the electronic device 1 by executing or executing the computer readable instructions and/or modules stored in the memory 12 and invoking data stored in the memory 12. The memory 12 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. The memory 12 may include non-volatile and volatile memories, such as: a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other storage device.

The memory 12 may be an external memory and/or an internal memory of the electronic device 1. Further, the memory 12 may be a memory in a physical form, such as a memory stick, a TF Card (Trans-flash Card), and the like.

The integrated modules/units of the electronic device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by hardware that is configured to be instructed by computer readable instructions, which may be stored in a computer readable storage medium, and when the computer readable instructions are executed by a processor, the steps of the method embodiments may be implemented.

Wherein the computer readable instructions comprise computer readable instruction code which may be in source code form, object code form, an executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying said computer readable instruction code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM).

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

In conjunction with fig. 1, the memory 12 of the electronic device 1 stores computer-readable instructions to implement a motion video detection method, and the processor 13 can execute the computer-readable instructions to implement:

detecting the plurality of user motion frames based on the pre-trained attitude key point model to obtain a user key point of each user motion frame, and detecting the plurality of user motion frames based on the pre-trained ground plane detection model to obtain a motion ground plane of each user motion frame;

identifying a plurality of key point combinations from the user key points;

Specifically, the processor 13 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the computer readable instructions, which is not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The computer readable storage medium has computer readable instructions stored thereon, wherein the computer readable instructions when executed by the processor 13 are configured to implement the steps of:

identifying a plurality of key point combinations from the user key points;

generating the action contact ratio of each user key point angle based on the user key point angle and the standard key point angle;

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Furthermore, it will be obvious that the term "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or devices may also be implemented by one unit or device through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A method for detecting motion video, the method comprising:

identifying a plurality of key point combinations from the user key points;

2. The motion video detection method of claim 1, wherein the detecting the plurality of user motion frames based on the pre-trained ground plane detection model, and obtaining the motion ground plane of each user motion frame comprises:

3. The motion video detection method of claim 1, wherein said identifying a plurality of keypoint combinations from said user keypoints comprises:

4. The motion video detection method of claim 1, wherein each keypoint combination comprises a first combined edge and a second combined edge, the motion keypoint edges comprise an initial keypoint and a target keypoint, and the calculating a user keypoint angle based on the keypoint coordinate information of the user keypoint in each user motion frame and the ground plane coordinate information of the motion ground plane in each user motion frame comprises:

constructing a coordinate system according to any user motion frame;

5. The motion video detection method of claim 4, wherein the combination angle is calculated by the formula:

if sin θ<0, then

If sin θ is greater than or equal to 0, then

sinθ＝C ₁ s ₂ -c ₂ s ₁ ；

cosθ＝c ₁ c ₂ +s ₁ s ₂ ；

6. The motion video detection method of claim 1, wherein the generating a motion overlap ratio for each user keypoint angle based on the user keypoint angle and the standard keypoint angle comprises:

acquiring a weight threshold value of each user key point angle;

calculating a total weight of the plurality of weight thresholds;

wherein S is _i Representing the action coincidence degree, w, of the ith user key point angle _i A weight threshold representing the ith user keypoint angle, sum (W) representing the total weight, m _i Representing a standard keypoint angle, t, corresponding to said ith user keypoint angle _i Representing the ith user keypoint angle。

7. The method for detecting motion video according to claim 1, wherein the generating the video score of the video to be detected according to the plurality of action contact ratios comprises:

calculating the coincidence sum of the action coincidence degrees;

comparing the sum of the contact ratio with a preset contact ratio;

8. The motion video detection method of claim 1, wherein after generating the action goodness-of-fit for each user keypoint angle based on the user keypoint angle and the standard keypoint angle, the motion video detection method further comprises:

9. A motion video detection apparatus, characterized in that the motion video detection apparatus comprises:

the identification unit is used for identifying a plurality of key point combinations from the user key points;

the generating unit is further used for generating the video score of the video to be detected according to the action coincidence degrees.

10. An electronic device, characterized in that the electronic device comprises:

a memory storing computer readable instructions; and

a processor executing computer readable instructions stored in the memory to implement the motion video detection method of any of claims 1 to 8.

11. A computer-readable storage medium characterized by: the computer-readable storage medium has stored therein computer-readable instructions that are executed by a processor in an electronic device to implement the motion video detection method according to any one of claims 1 to 8.