CN113593053A

CN113593053A - Video frame correction method and related product

Info

Publication number: CN113593053A
Application number: CN202110784826.1A
Authority: CN
Inventors: 刘山源; 刘文韬
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2021-07-12
Filing date: 2021-07-12
Publication date: 2021-11-02

Abstract

The embodiment of the application provides a video frame correction method and a related product, wherein the video frame correction method comprises the following steps: determining error frames in the video according to the detection result matrix; determining a current correction strategy; and correcting the error frame according to the current correction strategy to obtain the corrected 3D skeleton data of the video. The embodiment of the application can improve the correction effect of the video frame.

Description

Video frame correction method and related product

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a video frame correction method and a related product.

Background

Three-dimensional (3D) pose modeling of human video is an important issue in the field of Augmented Reality (AR) and Virtual Reality (VR). 3D pose modeling has important applications in many areas, such as virtual idols, VR games, and so forth.

The 3D skeleton data is important data for 3D posture modeling, and the 3D skeleton data is constructed through key points in the character video. In calculating 3D skeletal data of a human video, since both two-dimensional (2D) keypoint detection and 3D skeletal generation may have errors, the calculated 3D skeletal data is difficult to directly use. Currently, 3D skeletal data of video frames is generally optimized by means of indiscriminate smoothing, filtering and the like. However, the indiscriminate modification of the character video affects the original video frames of the character video, which are relatively well represented, and thus affects the modification effect of the video frames.

Disclosure of Invention

The embodiment of the application provides a video frame correction method and a related product, and the correction effect of a video frame is improved.

A first aspect of an embodiment of the present application provides a video frame correction method, including:

determining error frames in the video according to the detection result matrix;

determining a current correction strategy;

and correcting the error frame according to the current correction strategy to obtain corrected 3D skeleton data of the video.

According to the method and the device, the error frame in the video is determined according to the detection result matrix, the error frame is corrected, the video frame which is good in original performance in the video frame cannot be influenced, and therefore the correction effect of the video frame is improved.

Optionally, before determining an erroneous frame in the video according to the detection result matrix, the method further includes:

performing the current error frame detection to obtain a current detection result;

and generating the detection result matrix according to the current detection result.

According to the embodiment of the application, the detection result can be obtained through error frame detection, the detection result matrix is generated according to the detection result, and the detection result matrix can be used for repairing the subsequent video frame, so that the effect of repairing the video frame is improved.

Optionally, after the error frame is corrected according to the current correction policy, the method further includes:

determining that the video completes video frame correction under the condition that the accumulated correction times of the video reach a first threshold value;

performing next error frame detection in a case where the accumulated number of corrections of the video is less than the first threshold.

According to the embodiment of the application, the error frames in the video can be eliminated step by step through a mode of alternately executing the video frame correction and the error frame detection, so that the error frame optimization result is smoother.

Optionally, after performing the current-time error frame detection and obtaining the current-time detection result, the method further includes:

if the number of the continuous correct frames in the current detection result is less than or equal to a second threshold value, modifying the continuous correct frames into error frames to obtain a modified current detection result;

the generating the detection result matrix according to the current detection result includes:

and generating the detection result matrix according to the modified current detection result.

In the embodiment of the present application, the consecutive correct frames refer to: all frames between two erroneous frames are correct frames. If the number of the continuous correct frames is small, the current detection result obtained by the current error frame detection is very likely to be inaccurate, and the current detection result needs to be modified, so that the accuracy of the modified current detection result can be improved, the accuracy of the detection result matrix is improved, and the accuracy of accurately determining the error frames in the video according to the detection result matrix can be improved.

determining a number of times that the accumulation of the video performs erroneous frame detection;

updating the current detection result under the condition that the number of times of performing error frame detection in an accumulated way is more than M;

generating the detection result matrix according to the updated current detection result; m is an integer greater than or equal to 3;

and executing the step of generating the detection result matrix according to the current detection result under the condition that the number of times of cumulatively executing the error frame detection is less than or equal to M.

The embodiment of the application can update the current detection result, and the updating can avoid influencing the previous correct frame.

Optionally, the performing the current error frame detection to obtain the current detection result includes:

acquiring 3D bone data of the video; the 3D bone data comprises original 3D bone data or revised 3D bone data;

calculating key skeleton vectors of two continuous frames in the video according to the 3D skeleton data of the video;

calculating the similarity of corresponding key skeleton vectors between the two continuous frames;

if the similarity is smaller than a preset threshold corresponding to the key skeleton vector, judging that the current detection result of the two continuous frames is an error frame;

and if the similarity is greater than a preset threshold corresponding to the key skeleton vector, judging that the current detection result of the two continuous frames is a correct frame.

Different key skeleton vectors can correspond to different preset thresholds, so that the accuracy of error frame detection can be improved, and the error rate of error frame detection can be reduced.

Optionally, the updating the current detection result includes:

maintaining the detection result of the video frame which is the same as the first round detection result in the current detection result; the first round of detection results comprise all error frames participating in correction after the previous M times of error frame detection and correct frames except all the error frames;

updating the detection result of the video frame of which the current detection result is an error frame and the first round detection result is a correct frame;

and maintaining the detection result of the video frame of which the current detection result is a correct frame and the first round detection result is an error frame.

In the embodiment of the application, when the first round of detection result is a video frame with a correct frame and the current time of detection result of an erroneous frame of the video frame is an erroneous frame, the current time of detection result of the video frame is updated, the current time of detection result of the video frame is changed from the erroneous frame to a correct frame, the detection result of the first round of detection result detected as the correct frame is retained, the current time of detection result is compared with the first round of detection result, only the erroneous frame in the first round of detection result is corrected, the correct frame in the first round of detection result is not corrected, optimization of the erroneous frame can be limited within the range of the erroneous frame included in the first round of detection result, the correct frame in the first round of detection result is not corrected, the erroneous frame is not expanded, so that the influence of a subsequent too strict threshold setting on the correct frame outside the erroneous frame and the overall influence is not caused, thereby improving the overall video frame correction effect.

Optionally, a preset threshold corresponding to the key bone vector is determined based on the key bone vector and the number of times of performing error frame detection cumulatively;

wherein the accumulated number of times of performing erroneous frame detection is positively correlated with the preset threshold when the key bone vector is determined.

According to the embodiment of the application, the more times of executing the error frame detection are accumulated, the larger the preset threshold value is, the threshold value can be gradually adjusted upwards along with the increase of the times of executing the error frame detection, and the strategy from coarse adjustment to fine adjustment is adopted to gradually optimize the error frame, so that the error frame optimization effect is improved.

Optionally, the key bone vector includes any one of a first class vector, a second class vector and a third class vector;

the first class of vectors comprises left and right shoulder vectors;

the second class of vectors comprises crotch bone center root node to spine center node vectors;

the third class of vectors includes: at least one of a left forearm vector, a left upper arm vector, a left calf vector, a left thigh vector, a right forearm vector, a right upper arm vector, a right calf vector, and a right thigh vector;

when the number of times of performing the error frame detection cumulatively is determined, the preset threshold corresponding to the first class of vectors is smaller than the preset threshold corresponding to the third class of vectors, and the preset threshold corresponding to the third class of vectors is smaller than the preset threshold corresponding to the second class of vectors.

The embodiment of the application can classify the key bone vectors, so that which bone vectors in the key bone vectors are wrong bone vectors can be better determined, and the error reasons of wrong frames can be better analyzed. The error frame correction is better performed according to the error reason of the error frame. During the same error frame detection, different key skeleton vectors can correspond to different preset thresholds, so that the accuracy of error frame detection can be improved, and the error rate of error frame detection can be reduced.

Optionally, the determining the current correction policy according to the current correction times includes:

if the current correction times are (3N +1) times, the current correction strategy comprises a rotary error completion strategy;

if the current correction times are (3N +2) times, the current correction strategy comprises a forward lean backward reinforcement strategy;

and if the current correction times are (3N +3) times, the current correction strategy comprises a hand and foot jump completion strategy, and N is an integer greater than or equal to 0.

In the embodiment of the application, firstly, the bone vector with larger error influence is corrected by adopting a rotary error completion strategy, so that the interference of the bone vector with larger error influence on the bone vector with smaller error influence is reduced, and the subsequent correction of the bone vector with smaller error influence is better performed.

Optionally, the correcting the error frame according to the current correction policy includes:

if the current correction strategy comprises a rotation error completion strategy, performing spherical interpolation processing on other skeleton vectors except the first class vector, the second class vector and the third class vector in all skeleton vectors of the error frame, and performing spherical interpolation processing on the error skeleton vectors in the key skeleton vectors.

According to the embodiment of the application, spherical difference values can be firstly carried out on vectors (except the first-class vectors, the second-class vectors and the third-class vectors) related to the main body of the error frame, the main body can move smoothly, then the spherical difference values are carried out on all error bone vectors in the key bone vectors, and all correct bone vectors in the key bone vectors are kept unchanged. The method has the advantages that the smooth movement of the trunk body of the error frame is ensured, and then the limbs of the error frame are corrected, so that the correction effect of the error frame is improved.

if the current correction strategy comprises a forward leaning and backward leaning completion strategy, performing spherical interpolation processing on other skeleton vectors except the first type vector, the second type vector and the third type vector in all skeleton vectors of the error frame, and performing spherical interpolation processing on error skeleton vectors in the first type vector, the second type vector and the third type vector;

if the second-class vector in the key skeleton vectors is an error skeleton vector, calculating a rotation matrix of the second-class vector in the key skeleton vectors and a target vector, wherein the target vector is the second-class vector in a correct frame with the minimum difference with the frame number of the error frame in the video;

and correcting all bone vectors of the error frame according to the rotation matrix.

According to the embodiment of the application, spherical difference values are carried out on vectors (except the first-class vectors, the second-class vectors and other skeleton vectors except the third-class vectors) related to a trunk body, then spherical difference values are carried out on all wrong skeleton vectors in the key skeleton vectors, a rotation matrix of the second-class vectors and a target vector in the key skeleton vectors is calculated, the overall posture of a character is restored through the rotation matrix, other vectors are adjusted in place before the posture is restored, and the restoration effect of the overall posture of the character is improved.

if the current correction strategy comprises a hand-foot jump completion strategy, a third class vector in the key skeleton vectors contains an error skeleton vector, and spherical interpolation processing is carried out on the error skeleton vector contained in the third class vector in the error frame;

and performing spherical interpolation processing on other skeleton vectors except the third type vector in the error frame.

According to the embodiment of the application, error frame detection is already carried out before the hand and foot jumping completion strategy, vectors related to the trunk body may have slight change, spherical interpolation processing is carried out on other skeleton vectors except for the third class vector in the error frame, spherical difference values can be carried out on the vectors related to the trunk body, and the trunk body can move smoothly.

A second aspect of an embodiment of the present application provides a video frame correction apparatus, including:

the determining unit is used for determining error frames in the video according to the detection result matrix;

the determining unit is further configured to determine a current correction strategy;

and the correcting unit is used for correcting the error frame according to the current correction strategy to obtain corrected 3D skeleton data of the video.

A third aspect of embodiments of the present application provides an electronic device, comprising a processor and a memory, the memory being configured to store a computer program, the computer program comprising program instructions, the processor being configured to invoke the program instructions to perform some or all of the steps as described in the first aspect of embodiments of the present application.

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program comprises program instructions, which, when executed by a processor, cause the processor to perform some or all of the steps as described in the first aspect of embodiments of the present application.

A fifth aspect of embodiments of the present application provides a computer program product, wherein the computer program product comprises a computer program comprising program instructions that, when executed by a processor, cause the processor to perform some or all of the steps as described in the first aspect of embodiments of the present application. The computer program product may be a software installation package.

In the embodiment of the application, when the video frame is corrected, the error frame in the video is determined according to the detection result matrix; determining a current correction strategy; and correcting the error frame according to the current correction strategy to obtain corrected 3D skeleton data of the video. According to the method and the device, the error frame in the video is determined according to the detection result matrix, the error frame is corrected, the correct frame is not processed, the video frame with better original performance in the video frame is not affected, and therefore the correction effect of the video frame is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a video frame correction method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of another video frame correction method according to an embodiment of the present application;

fig. 3 is a schematic diagram of a key point and a skeleton vector of a certain video frame according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a spherical aberration value provided in an embodiment of the present application;

fig. 5 is a schematic structural diagram of a video frame correction apparatus according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The electronic devices referred to in the embodiments of the present application may include devices with image and video processing capabilities, such as mobile phones, tablet computers, desktop computers, and the like.

Three-dimensional (3D) pose modeling of video characters is an important issue in the field of Augmented Reality (AR) and Virtual Reality (VR). 3D pose modeling has important applications in many areas, such as virtual idols, VR games, and so forth. Currently, 3D skeletal data of a video frame is generally optimized by means of indifferent smoothing, filtering and the like, and a visualization effect is optimized (that is, a dithering 3D coordinate is optimized to an effect that the dithering cannot be seen by naked eyes). Conventional filter smoothing methods typically include these problems: global change, action amplitude loss, and large action amplitude of misdetection, that is, the phenomena of large-amplitude error frame and slow action can not be completely corrected. The slow motion phenomenon is that when a plurality of error frames occur and are smoothed, the motion is too smooth, and the motion looks like slow motion without the difference between fast and slow motion and amplitude.

The 3D skeleton data is generated by using the 2D pose to detect the coordinates of the 2D key points of the human body and obtaining the coordinates of the 3D spatial key points of the human body through a neural network (e.g., a neural network model). Because the neural network model is generally a frame-by-frame calculation 3D coordinate, there is no relationship between frames, and oscillation, jitter, and the like easily occur.

Especially during the turning or sudden fast movements of the video character, the 2D pose detection may detect several completely wrong points, so that the smoothed result still contains erroneous movements, which often occur in dance videos. For example, when a person turns around, an elbow is already occluded by the body in the image, so that it can be considered that there is no point, but the neural network model still needs to output a point, so that a completely wrong point may be output; the same reason of quick action leads to hand blurring too quickly, and the neural network model is difficult to find key points, thus leading to output wrong results.

The following method steps implemented by the present application may solve the above-mentioned problems.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a video frame correction method according to an embodiment of the present disclosure. As shown in fig. 1, the method includes the following steps.

And 101, the electronic equipment determines error frames in the video according to the detection result matrix.

In the embodiment of the present application, the detection result matrix may be obtained by performing error frame detection before step 101. The videos mentioned in the embodiment of the application may include a person video or an animal video.

The detection result matrix may be an M × P matrix, M may represent the number of video frames included in the video, and P may represent whether the corresponding video frame is an error frame. For example, P may be a 1-dimensional vector, and may represent a correct frame by 0 and an error frame by 1; or a correct frame may be represented by 1 and an erroneous frame may be represented by 0.

For example, if M is 5, P is 1, 0 indicates a correct frame, and 1 indicates an error frame. If the detection result matrix is [0, 1, 1, 0]]^TIt indicates that the video has 5 frames in total, the second frame and the third frame are error frames, and the others are correct frames, wherein 5 frames in the video are continuous.

In one embodiment, P may also be a multi-dimensional vector used to indicate which skeletal vectors in the corresponding video frame are error vectors and which skeletal vectors are correct vectors. A correct vector can be represented by 0 and an error vector can be represented by 1; or a correct vector may be represented by 1 and an error vector may be represented by 0. Skeletal vector the 3D skeleton of a person or animal is a vector in three-dimensional space.

For example, if M is 5, P is 10 (for 10 different skeleton vectors in a video frame), 0 represents a correct vector, and 1 represents an error vector. If the detection result matrix is:

as can be seen from the above detection result matrix, there are 5 frames (each row in the matrix represents one frame), the first frame, the fourth frame and the fifth frame are error frames (there is at least one error vector in the error frames), and the second frame and the third frame are correct frames (all the bone vectors in the correct frames are correct vectors). The first section of skeleton vector and the sixth section of skeleton vector in the first frame are error vectors, and the other sections of skeleton vectors are correct vectors; the third section of skeleton vector, the sixth section of skeleton vector, the seventh section of skeleton vector and the ninth section of skeleton vector in the fourth frame are error vectors, and the others are correct vectors; the fourth section of skeleton vector in the fifth frame is an error vector, and the others are correct vectors. Wherein 5 frames in the video are consecutive.

According to the method and the device, the error frame in the video can be determined according to the detection result matrix, and which skeleton vectors in the error frame are error vectors can be determined according to the detection result matrix.

The electronic device determines 102 a current corrective policy.

In the embodiment of the present application, steps 101 to 103 are steps of a video correction method, and for the same video, steps 101 to 103 may be executed repeatedly. The electronic device may determine a current correction number according to the current correction number, which refers to the number of times that video frame correction has been performed for the same video (i.e., the number of times that step 103 is performed for the same video). The initial value of the current correction time is 0, and the current correction time is +1 every time step 103 is executed.

Each time the correction is performed, the number of corrections is indicated. Different correction strategies may be employed at different correction times. The embodiment of the application can adopt a strategy of correcting the wrong skeleton vector of the trunk first and then correcting the skeleton vector of the limb. The bone vector of the limb may be affected due to the wrong bone vector of the stem, which generally does not affect the bone vector of the stem. According to the embodiment of the application, firstly, the wrong skeleton vector of the trunk is corrected, so that the interference of the wrong skeleton vector of the trunk on the skeleton vector of the limb is reduced, and the subsequent correction of the skeleton vector of the limb is better performed.

The embodiment of the application can also adopt that the bone vector with larger error influence is modified firstly, and then the bone vector with smaller error influence is modified. Since a bone vector with a larger influence of an error may seriously affect a bone vector with a smaller influence of an error, the bone vector with a smaller influence of an error has a smaller influence on the bone vector with a larger influence of an error. According to the embodiment of the application, the bone vector with larger error influence is firstly corrected, so that the interference of the bone vector with larger error influence on the bone vector with smaller error influence is reduced, and the subsequent correction of the bone vector with smaller error influence is better performed.

The embodiment of the application can also adopt a strategy of correcting the wrong skeleton vector of the trunk firstly and then correcting the skeleton vector of the limb, and then a strategy of circularly and alternately correcting the wrong skeleton vector and the limb. The error frame optimization result is smoother by cycling for multiple times instead of correcting twice. The number of cycles can be set based on empirical values.

The embodiment of the application can also adopt a strategy that the bone vector with larger error influence is modified firstly, then the bone vector with smaller error influence is modified, and then the bone vector and the bone vector are modified circularly and alternately. The error frame optimization result is smoother by cycling for multiple times instead of correcting twice. The number of cycles can be set based on empirical values.

The bone vector with a larger error influence may include bone vectors related to the overall posture, such as: left and right shoulder vectors, crotch center root node to spine center node vectors, and the like. Bone vectors with less error impact may include extremity-related vectors such as: a left forearm vector, a left upper arm vector, a left calf vector, a left thigh vector, a right forearm vector, a right upper arm vector, a right calf vector, a right thigh vector, and the like.

The execution sequence of step 101 and step 102 is not limited. Step 101 may be performed first and then step 102 may be performed, step 102 may be performed first and then step 101 may be performed, or

steps

101 and 102 may be performed simultaneously.

And 103, the electronic equipment corrects the error frame according to the current correction strategy to obtain corrected 3D skeleton data of the video.

In the embodiment of the application, the correction of the error frame may be performed by correcting a bone vector in the error frame, and since the bone vector is generated based on the 3D bone data, after the correction of the error frame, it is equivalent to the correction of the 3D bone data, and the corrected 3D bone data may be obtained.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating another video frame correction method according to an embodiment of the present disclosure. As shown in fig. 2, the method includes the following steps.

And 201, the electronic equipment executes the current error frame detection to obtain the current detection result.

202, the electronic device generates a detection result matrix according to the current detection result.

In this embodiment of the present application, step 201 may be to perform error frame detection for a video for the first time, or to perform error frame detection after performing video frame correction.

In one possible embodiment, step 201 is to perform the error frame detection for the video for the first time. Specifically, step 201 may obtain input raw 3D bone data, where the raw 3D bone data may be calculated by a neural network model. For example, the original 3D bone data is 1000 × 17 × 3, where 1000 is the number of video frames, 17 is the number of keypoints, and 3 is the three-dimensional spatial coordinate xyz of each keypoint. The neural network model can detect 17 key points of each video frame, and a multi-segment bone vector can be formed among the 17 key points. Referring to fig. 3, fig. 3 is a schematic diagram of a keypoint and a skeleton vector of a video frame according to an embodiment of the present disclosure. As shown in fig. 3, the 17 key points are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, respectively. In fig. 3, 0 denotes a crotch bone center root node, 1 denotes a spine center node, 2 denotes a cervical lower end node, 3 denotes a cervical upper end node, 4 denotes a head node, 5 denotes a left shoulder and left upper arm connecting node, 6 denotes a left upper arm and left forearm connecting node, 7 denotes a left wrist node, 8 denotes a right shoulder and right upper arm connecting node, 9 denotes a right upper arm and right upper arm connecting node, 10 denotes a right wrist node, 11 denotes a crotch bone and left thigh connecting node, 12 denotes a left patella node, 13 denotes a left ankle node, 14 denotes a crotch bone and right thigh connecting node, 15 denotes a right patella node, and 16 denotes a right ankle node. Fig. 3 shows a part of the skeleton vector, as shown in fig. 3, where: (i) is a vector between key points 6 and 7, which may also be referred to as a left forearm vector,: (ii) is a vector between key points 9 and 10, which may also be referred to as a right forearm vector, and: (iii) is a vector between key points 5 and 6, which may also be referred to as a left upper arm vector,: (iv) is a vector between key points 12 and 13, which may also be referred to as a left calf vector, and £ is a vector between key points 15 and 16, which may also be referred to as a right calf vector, and £ is a vector between key points 11 and 12, which may also be referred to as a left thigh vector, and @ is a vector between key points 14 and 15, which may also be referred to as a right thigh vector, and @, which is a vector between key points 5 and 8, which may also be referred to as left and right shoulder vectors, and @isa shoulder vector between key points 0 and 1, it may also be referred to as a crotch bone center root node to spine center node vector. There are also other bone vectors not shown in fig. 3, such as the vector between

keypoints

11 and 14, the vector between

keypoints

1 and 2, the vector between

keypoints

2 and 3, the vector between

keypoints

3 and 4, and so on. Although a two-dimensional map is shown in fig. 3, in reality, these key points are three-dimensional, and the bone vectors are also three-dimensional vectors.

By observing the distribution of error frames, the unsmooth error frame and most error actions are embodied in hand-foot jumping (great change of hands or feet in continuous frames), error rotation, error forward tilting or backward tilting, so that 10 sections of skeleton vectors (such as the section vectors from (i) to (r) shown in fig. 3) can be selected as the judgment basis of the error frame, and the first 8 sections (such as the section vectors from (i) to (r) shown in fig. 3) are respectively a left forearm vector, a right upper arm vector, a left calf vector, a right calf vector, a left thigh vector, a right thigh vector and a left thigh vector and correspond to the hand-foot jumping condition; the ninthly segment is left and right shoulder vectors (generally not connected in the skeleton point sequence diagram, and refers to the vectors between the two) corresponding to the error rotation condition; section (c) is a vector from the crotch bone central root node to the spine central node, corresponding to a wrong forward or backward tilt.

According to the embodiment of the application, when the electronic equipment executes the current error frame detection, the bone vectors of two continuous frames in the video can be detected, the similarity of the bone vectors corresponding to the two continuous frames in the video is calculated, and whether the two continuous frames are error frames or not can be determined according to the comparison of the similarity and the set threshold. If the similarity is greater than a set threshold, the change of the skeletal vector in two continuous frames is large, the current detection result of the two continuous frames is determined to be an error frame, and if 0 represents a correct frame and 1 represents an error frame, the corresponding value of the two continuous frames in the detection result matrix is 1. If the similarity is smaller than a set threshold, the change of the skeletal vector in two continuous frames is small, the current detection result of the two continuous frames is determined to be a correct frame, and if 0 represents the correct frame and 1 represents an error frame, the corresponding value of the two continuous frames in the detection result matrix is 0. After the current detection result of each frame in the video is obtained through calculation, a 1000 × 1 detection result matrix can be generated according to the current detection result.

Optionally, after step 201 is executed, the following steps may also be executed:

and if the number of the continuous correct frames in the current detection result is less than or equal to a second threshold value, the electronic equipment modifies the continuous correct frames into error frames to obtain a modified current detection result.

After each execution of step 201, if the number of consecutive correct frames in the current detection result is less than or equal to the second threshold, the electronic device may modify the consecutive correct frames into error frames to obtain a modified current detection result.

In step 202, the step of generating, by the electronic device, a detection result matrix according to the current detection result may include the following steps:

and the electronic equipment generates the detection result matrix according to the modified current detection result.

In the embodiment of the application, if the continuous correct frames are too short, false detection is likely to occur, the current detection result can be corrected based on the judgment rule to obtain the modified current detection result, so that a more accurate detection result matrix can be obtained according to the modified current detection result.

Wherein, for the difference of the skeleton vector of the frame judged as error, different second threshold values can be adopted, thereby improving the accuracy of the modified current detection result.

In one embodiment, if it is determined that the skeleton vector of the error frame is the 9 th segment vector (i.e., the first mentioned segment vector), the error frame is a rotation error frame, and if the number of consecutive correct frames between two consecutive rotation error frames in the video frame is less than or equal to the second threshold, the consecutive correct frames are modified into error frames, and the current detection result is modified again. The embodiment of the present application is a case of continuously rotating error frames-continuously correct frames (the continuously correct frames may have only 1 frame, or may have at least 2 frames) -continuously rotating error frames. For example, if the second threshold is equal to 4, in 50 consecutive frames, the 1 st to 20 th frames are hand-foot jumping frames, the 21 st to 23 th frames are correct frames, and the 24 th to 50 th frames are hand-foot jumping frames, the 21 st to 23 th frames are modified into error frames. If the continuous correct frame is too short, the detection is likely to be false detection, and the current detection result can be modified based on the judgment rule to obtain a more accurate modified current detection result.

In one embodiment, if it is determined that the bone vector of the erroneous frame is the 10 th segment vector (i.e., the second-mentioned type of vector), the erroneous frame is a forward-leaning and backward-leaning frame (forward-leaning or backward-leaning or both), and if the number of consecutive correct frames between two consecutive forward-leaning and backward-leaning frames in the video frame is less than or equal to the second threshold, the consecutive correct frames are modified into erroneous frames, and the current detection result is modified again. The embodiment of the present application is a case of continuously rotating error frames-continuously correct frames (the continuously correct frames may have only 1 frame, or may have at least 2 frames) -continuously rotating error frames. For example, if the second threshold is equal to 1, in 50 consecutive frames, the 1 st to 20 th frames are forward-backward leaning frames, the 21 st frame is a correct frame, and the 22 nd to 50 th frames are forward-backward leaning frames, the 21 st frame is modified into an error frame. If the continuous correct frame is too short, the detection is likely to be false detection, and the current detection result can be modified based on the judgment rule to obtain a more accurate modified current detection result.

In one embodiment, if it is determined that the skeleton vector of the error frame is the first 8 segments of vectors (i.e., the third mentioned vector), the error frame is a hand-foot jumping frame, and if the number of consecutive correct frames between two consecutive hand-foot jumping frames in the video frame is less than or equal to the second threshold, the consecutive correct frames are modified into error frames, and the current detection result is modified again. The embodiment of the application is the situation of continuous hand and foot jumping frames-continuous correct frames (the continuous correct frames can only have 1 frame, and can also have at least 2 frames) -continuous hand and foot jumping frames. For example, if the second threshold is equal to 2, in 50 consecutive frames, the 1 st to 20 th frames are hand-foot jumping frames, the 21 st to 22 th frames are correct frames, and the 23 st to 50 th frames are hand-foot jumping frames, the 21 st to 22 th frames are modified into error frames. If the continuous correct frame is too short, the detection is likely to be false detection, and the current detection result can be modified based on the judgment rule to obtain a more accurate modified current detection result. Wherein, when the bone vectors of the error frames are judged to be different, the second threshold value is also different. Specifically, when the skeleton vector of the error frame is judged to be the first type vector, the second threshold value is the largest; when the skeleton vector of the error frame is judged to be a third-class vector, the second threshold value is the second threshold value; when the skeleton vector of the error frame is judged to be the second type vector, the second threshold value is minimum. The second threshold may be set to be greater than or equal to 1.

The similarity calculation of the embodiment of the application can be calculated by means of cosine distance, Euclidean distance, Manhattan distance, Pearson correlation coefficient, cosine similarity and the like of vectors.

In another possible embodiment, step 201 may obtain input modified 3D skeleton data, and the modified 3D skeleton data may be error frame detection performed after performing video frame modification.

And 203, the electronic equipment determines an error frame in the video according to the detection result matrix.

And 204, the electronic equipment determines a current correction strategy according to the current correction times.

And 205, the electronic equipment corrects the error frame according to the current correction strategy to obtain corrected 3D skeleton data of the video.

For the specific implementation of steps 203 to 205, refer to the embodiment of steps 101 to 103 in fig. 1, which is not described herein again.

Optionally, after step 205 is executed, the following steps may also be executed:

(11) under the condition that the accumulated correction times of the video reach a first threshold value, the electronic equipment determines that the video completes video frame correction;

(12) and in the case that the accumulated correction times of the video are smaller than the first threshold value, the electronic equipment executes next error frame detection.

In the embodiment of the application, when the accumulated correction times of the video reach a certain number, the video is determined to finish the video frame correction. For example, the first threshold may be set to a multiple of 3, for example, 15 times. The first threshold may be an empirical value.

If the cumulative number of times of correction of the video is smaller than the first threshold, the electronic device continues to perform the next detection of the error frame, and continues to perform step 201. Steps 201 to 205 may be performed in a loop. After step 205 is executed, the above steps (11) and (12) may be executed, thereby determining whether to continue executing step 201.

(21) the electronic device determines a number of times that the accumulation of the video performs erroneous frame detection;

(22) in case the number of times of performing the error frame detection cumulatively is less than or equal to M, the electronic device performs step 202; m is an integer greater than or equal to 3;

(23) under the condition that the number of times of performing error frame detection in an accumulated mode is larger than M, the electronic equipment updates the current detection result;

in step 202, the electronic device generates a detection result matrix according to the current detection result, which may specifically include the following steps:

and the electronic equipment generates the detection result matrix according to the updated current detection result.

In this embodiment of the application, if the current correction strategy is one of three correction strategies, namely a rotation error completion strategy, a forward leaning and backward leaning completion strategy, and a hand and foot jumping completion strategy, M may be set to 3. For example, three correction strategies can be performed in turn. For example, if the number of times of performing the error frame detection cumulatively is 1, when the error frame is corrected for the first time, a rotation error completion strategy may be adopted; if the number of times of performing the error frame detection is 2, a forward leaning and backward leaning compensation strategy can be adopted when the error frame is corrected for the second time; if the number of times of performing the detection of the error frame cumulatively is 3, when the error frame is corrected for the third time, a hand-foot jump completion strategy can be adopted. And in the same way, when correcting the error frame for the fourth to sixth times, correcting according to the first rotation error completion strategy, the forward inclination and backward inclination completion strategy and the second hand and foot jumping completion strategy. The first three error frame detections and the first three corrections may be used as a first round of correction, the fourth to sixth corrections as a second round of correction, and so on. When the first round of correction is performed, three different types of errors (rotation error, forward leaning and backward leaning, hand and foot jump compensation) are corrected without updating the current detection result. When the second round of correction and the subsequent correction are performed, because the first round of correction is performed on the three types of errors, the current detection result of the second round of correction and the subsequent error frame detection is prevented from being problematic, the current detection result can be updated during the second round of correction and the subsequent error frame detection, the detection result of the correct frame of the first round of detection can be kept by updating, the correct frame detected before can be prevented from being re-detected as the error frame, the range of the error frame is prevented from being enlarged, the result of the error frame detection can be converged, and the efficiency of video frame correction is improved. According to the embodiment of the application, the error frames in the video are gradually eliminated in a mode of alternately executing the video frame correction and the error frame detection, so that the error frame optimization result is smoother.

Optionally, in step 201, the electronic device performs current error frame detection to obtain a current detection result, which may specifically include the following steps:

(31) the electronic equipment acquires original 3D bone data or corrected 3D bone data of the video;

wherein, the original 3D bone data can be calculated by a neural network model. The corrected 3D bone data was last measured for correcting the erroneous frame.

(32) The electronic equipment calculates key skeleton vectors of two continuous frames in the video according to original 3D skeleton data of the video;

the key skeleton vector can be used to measure whether a video frame is an error frame. For example, a key bone vector may include any of the (r) to (r) segment vectors shown in FIG. 3. As long as the similarity of any section of vector between two continuous frames is smaller than the corresponding set threshold, the two continuous frames can be judged to be error frames.

The 3D skeletal data of the video may include three-dimensional coordinate data of the M key points for each frame. The M key points are key points for the bone. For example, as shown in fig. 3, the 3D skeletal data of a video may include three-dimensional coordinate data of 17 key points per frame. Bone vector (the vector direction of bone vector (could be from key point 6 to 7, that is, key point 6 is the starting point, key point 7 is the end point) or from key point 7 to 6, that is, key point 7 is the starting point, key point 6 is the end point, and the rest bone vectors are analogized), bone vector (could be obtained by calculating from the three-dimensional coordinates of key points 9 and 10 in fig. 3, bone vector (could be obtained by calculating from the three-dimensional coordinates of key points 5 and 6 in fig. 3), bone vector (could be obtained by calculating from the three-dimensional coordinates of key points 8 and 9 in fig. 3, bone vector (could be obtained by calculating from the three-dimensional coordinates of key points 12 and 13 in fig. 3), bone vector (could be obtained by calculating from the three-dimensional coordinates of key points 15 and 16 in fig. 3), bone vector (could be obtained by calculating from the three-dimensional coordinates of key points 11 and 12 in fig. 3), a bone vector (r) is calculated from the three-dimensional coordinates of key points 14 and 15 in fig. 3, a bone vector (ninc) is calculated from the three-dimensional coordinates of key points 5 and 8 in fig. 3, and a bone vector (r) is calculated from the three-dimensional coordinates of key points 0 and 1 in fig. 3. To calculate the similarity from the skeleton vectors, it is necessary that the skeleton vectors of each frame of the video are calculated in the same manner, for example, the skeleton vector (r) of each frame is a vector between the

keypoints

6 and 7, and the directions thereof are from the keypoints 6 to 7.

(33) The electronic equipment calculates the similarity of corresponding key skeleton vectors between the two continuous frames;

in the embodiment of the present application, the corresponding key skeleton vector between two consecutive frames refers to the same segment vector in the two consecutive frames. For example, the similarity between the skeleton vector (r) of the first frame of the two consecutive frames and the skeleton vector (r) of the second frame of the two consecutive frames can be calculated. The similarity calculation can be calculated by means of cosine distance, Euclidean distance, Manhattan distance, Pearson correlation coefficient, cosine similarity and the like of the vectors. For example, the cosine distance of the corresponding key bone vector between two consecutive frames can be calculated, and the cosine distance of the corresponding key bone vector between the two consecutive frames is taken as the similarity of the corresponding key bone vector between the two consecutive frames. The cosine distance refers to the cosine value of an included angle between two vectors, and the range of the cosine value is-1 to 1. The larger the cosine distance, the higher the similarity of the two vectors, and the smaller the chord distance, the lower the similarity of the two vectors.

(34) If the similarity is smaller than a preset threshold corresponding to the key skeleton vector, the electronic equipment judges that the current detection result of the two continuous frames is an error frame;

(35) and if the similarity is greater than a preset threshold corresponding to the key skeleton vector, the electronic equipment judges that the current detection result of the two continuous frames is a correct frame.

In the embodiment of the application, the variation of different key bone vectors in the actual motion process of the person is different. For example, taking fig. 3 as an example, when the person rotates, the bone vector ninu may change greatly, and therefore, the preset threshold corresponding to the bone vector ninu may be set to be relatively small, for example, 0.3. When the person is in rotation or in intense motion, bone vector r does not change much, and therefore the preset threshold value for bone vector r may be set relatively large, such as 0.9 or 0.97. Different key skeleton vectors of the embodiment of the application can correspond to different preset thresholds, so that the accuracy of error frame detection can be improved, and the error rate of error frame detection can be reduced.

The reliability of the key bone vector calculated from the 3D bone data of the video is relatively high, and therefore, the detection result matrix can be generated directly from the detection results determined in steps (14) and (15). For example, if 0 indicates a correct frame, 1 indicates an error frame, and the video has 10 frames in total, and if the 6 th, 7 th, and 8 th frames are detected as error frames and the other frames are detected as correct frames, the detection result matrix of the video is [0, 0, 0, 0, 0, 1, 1, 1, 0, 0 ].

Optionally, in step (23), the electronic device updates the current detection result, which may specifically include the following steps:

(41) the electronic equipment maintains the detection result of the video frame which is the same as the first round of detection result in the current detection result; the first round of detection results comprise all error frames participating in correction after the previous M times of error frame detection and correct frames except all the error frames;

(42) the electronic equipment updates the detection result of the video frame of which the current detection result is an error frame and the first round detection result is a correct frame;

(43) and the electronic equipment maintains the detection result of the video frame of which the current detection result is a correct frame and the first round detection result is an error frame.

In the embodiment of the present application, the first round of detection results include all error frames participating in correction after performing error frame detection M times before and correct frames except for all error frames. The following description will be given by taking M as 3.

For example, if 0 represents a correct frame and 1 represents an error frame, the video has 10 frames in total, if the detection result matrix generated according to the detection result of the first error frame detection is [0, 0, 0, 0, 0, 1, 1, 1, 0, 0], the error frame participating in the correction after the first error frame detection is performed is: 6 th to 8 th frames. If the detection result matrix generated according to the detection result of the error frame detection performed for the second time is [0, 0, 0, 0, 0, 0, 1, 1, 0, 0], the error frame participating in the correction after the error frame detection is performed for the second time is: frames 7 to 8. If the detection result matrix generated according to the detection result of the error frame detection performed for the third time is [0, 0, 0, 1, 1, 1, 0, 0, 0], the error frame participating in the correction after the error frame detection is performed for the second time is: frames 4 to 6. All error frames participating in correction after the error frame detection is performed for the first three times are 4 th to 8 th frames, and the correct frames are 1 st to 3 rd and 9 th to 10 th correct frames. The first round of detection results are: the 1 st to 3 rd and 9 th to 10 th frames are correct frames; the 4 th to 8 th frames are error frames. The detection result matrix generated from the first round of detection results is [0, 0, 0, 1, 1, 1, 1, 1, 0, 0 ]. If the current (fourth and later) test result is: the 1 st to 7 th frames are correct frames; the 8 th to 10 th frames are error frames, and the detection result matrix generated according to the current detection result is [0, 0, 0, 0, 0, 0, 0, 1, 1, 1 ]. The current detection result is updated. Specifically, the video frames in the current detection result that are the same as the first round of detection result are: 1 st to 3 rd, 8 th frames; the video frames with the current detection result being the error frame and the first detection result being the correct frame are: 9 th to 10 th frames; and 4 th to 7 th frames of the detection result of the video frame of which the current detection result is the correct frame and the first detection result is the error frame. Therefore, according to the above steps (41) to (43), the detection results of the 1 st to 3 th frames and 8 th frames in the current detection result are maintained (i.e., the detection results of the 1 st to 3 th frames and 8 th frames in the current detection result are not changed), the detection results of the 9 th to 10 th frames in the current detection result are updated (i.e., the detection results of the 9 th to 10 th frames in the current detection result are changed from the error frame to the correct frame), and the detection results of the 4 th to 7 th frames in the current detection result are maintained (i.e., the detection results of the 4 th to 7 th frames in the current detection result are not changed). The updated current detection result is as follows: the 1 st to 7 th and 9 th to 10 th frames are correct frames, and the 8 th frame is an error frame. The detection result matrix generated according to the updated current detection result is [0, 0, 0, 0, 0, 0, 0, 1, 0, 0 ]. Wherein [0, 0, 0, 0, 1, 0, 0] is the intersection of [0, 0, 0, 1, 1, 1, 1, 0, 0] and [0, 0, 0, 0, 0, 0, 0, 1, 1, 1 ].

The video frame correction and the error frame detection can be executed alternately in a circulating way, after the error frame detection is executed each time, the video frame correction is carried out, and after the video frame correction is carried out, the next error frame detection is carried out. According to the embodiment of the application, when the first round of detection results is a video frame with a correct frame and the current time of detection results of the error frame of the video frame is an error frame, the current time of detection results of the video frame is updated, the current time of detection results of the video frame is changed from the error frame to a correct frame, the detection results of the first round of detection results detected as the correct frame are reserved, the current time of detection results is compared with the first round of detection results, only the error frame in the first round of detection results is corrected, optimization of the error frame can be limited within the range of the error frame included in the first round of detection results, the correct frame in the first round of detection results is not corrected, the error frame is not expanded, so that the effect that the subsequent too strict threshold value is set to influence the correct frame outside the error frame, the overall influence is not caused, and the whole video frame correction effect is improved.

When the number of times of performing the error frame detection cumulatively is determined, the preset threshold corresponding to the first class vector is smaller than the preset threshold corresponding to the third class vector, and the preset threshold corresponding to the third class vector is smaller than the preset threshold corresponding to the second class vector. The first type vector, the second type vector and the third type vector can be referred to the following description.

In the embodiment of the application, when the same error frame is detected, different key skeleton vectors can correspond to different preset thresholds, so that the accuracy of error frame detection can be improved, and the error rate of error frame detection can be reduced. The same segment of key skeleton vectors can gradually adjust the set threshold value upwards in different error frame detections, and the error frame is gradually optimized by adopting a strategy from coarse adjustment to fine adjustment, so that the error frame optimization effect is improved. If a higher set threshold is set at the beginning, the number of error frames detected by the first error frame is too large, and the subsequent error frames are difficult to be converged well, so that the optimization effect of the error frames is greatly reduced.

For example, if the number of cycles is 5, the preset threshold corresponding to the third-class vector may be adjusted according to the parameters [0.7, 0.7, 0.8, 0.9, 0.9], that is, the preset threshold corresponding to the third-class vector is 0.7 for the first time and the second time, the preset threshold corresponding to the third-class vector is 0.8 for the third time, and the preset threshold corresponding to the third-class vector is 0.9 for the fourth time and the fifth time. For the first type of vector, the corresponding preset threshold value can be adjusted according to the parameters of [0.3, 0.3, 0.4, 0.5, 0.5 ]. For the second class of vectors, the corresponding preset threshold may be adjusted according to the parameters of [0.97, 0.97, 0.98, 0.98, 0.99 ].

Optionally, the key bone vector includes any one of a first class vector, a second class vector and a third class vector; the first class of vectors comprises left and right shoulder vectors; the second class of vectors comprises crotch bone center root node to spine center node vectors; the third class of vectors includes: at least one of a left forearm vector, a left upper arm vector, a left calf vector, a left thigh vector, a right forearm vector, a right upper arm vector, a right calf vector, and a right thigh vector.

For example, as shown in fig. 3, the first type of vector may include bone vector ninc, the second type of vector may include bone vector ninc, and the third type of vector may include at least one of bone vectors (r) to (r).

The embodiment of the application can classify the key bone vectors, so that which bone vectors in the key bone vectors are wrong bone vectors can be better determined, and the error reasons of wrong frames can be better analyzed. The error frame correction is better performed according to the error reason of the error frame. For example, if at least one of the skeletal vectors (i) to (b) in the error frame is erroneous, the cause of the error in the error frame is: jumping by hands and feet. If the bone vector ninthly in the error frame is wrong, the error reason of the error frame is as follows: a rotation error. If the bone vector in the error frame is erroneous at R, the error reason for the error frame is: leaning forward and backward (the body of a person abnormally leans forward or backward).

Optionally, if the current correction times are (3N +1) times, the current correction strategy includes a rotation error completion strategy; if the current correction times are (3N +2) times, the current correction strategy comprises a forward lean backward reinforcement strategy; and if the current correction times are (3N +3) times, the current correction strategy comprises a hand and foot jump completion strategy, and N is an integer greater than or equal to 0.

In the embodiment of the application, the correction times are different, and the corresponding correction strategies are different. The method comprises the steps of adopting a rotation error completion strategy for the first time, adopting a forward leaning and backward leaning completion strategy for the second time, adopting a hand and foot jumping completion strategy for the third time, and then sequentially circulating. The influence of rotation error of an error frame on a skeleton vector is relatively maximum, the influence of forward tilting and backward tilting on the skeleton vector is second, and the influence of hand and foot jumping on the skeleton vector is relatively minimum. In the embodiment of the application, firstly, the bone vector with larger error influence is corrected by adopting a rotary error completion strategy, so that the interference of the bone vector with larger error influence on the bone vector with smaller error influence is reduced, and the subsequent correction of the bone vector with smaller error influence is better performed. And the strategy of modifying the skeleton vector with larger error influence, then modifying the skeleton vector with smaller error influence and then circularly and alternately modifying is adopted. And the error frame optimization result is smoother by circulating for multiple times. The number of cycles can be set based on empirical values.

Wherein, the sequence of the rotation error completion and the forward leaning and backward leaning can be interchanged.

Optionally, if the current correction times are (3N +1) times, the current correction strategy includes a forward lean and backward lean completion strategy; if the current correction times are (3N +2) times, the current correction strategy comprises a rotary error completion strategy; and if the current correction times are (3N +3) times, the current correction strategy comprises a hand and foot jump completion strategy, and N is an integer greater than or equal to 0.

In the embodiment of the application, the correction times are different, and the corresponding correction strategies are different. The method comprises the steps of firstly adopting a forward leaning and backward leaning completion strategy, secondly adopting a rotation error completion strategy, thirdly adopting a hand and foot jumping completion strategy, and then sequentially circulating.

Optionally, in step 204, the electronic device corrects the error frame according to the current correction policy, which may specifically include: and the electronic equipment corrects the error frame according to the rotation error completion strategy.

The electronic device corrects the error frame according to the rotation error completion policy, and specifically includes the following steps:

(51) the electronic equipment performs spherical interpolation processing on other skeleton vectors except the first type of vectors, the second type of vectors and the third type of vectors in all skeleton vectors of the error frame;

(52) and the electronic equipment performs spherical interpolation processing on the wrong skeleton vector in the key skeleton vectors.

In the embodiment of the present application, the spherical interpolation processing is performed on the erroneous skeleton vector in the key skeleton vector of the erroneous frame, so that the erroneous skeleton vector in the erroneous frame is removed, and a spherical difference is performed according to the skeleton vectors corresponding to the erroneous skeleton vector in the upper N frame and the lower N frame of the erroneous frame (for example, if the erroneous skeleton vector of the erroneous frame is a left shoulder vector and a right shoulder vector, a spherical difference is performed on the left shoulder vector and the right shoulder vector in the upper N frame and the lower N frame of the erroneous frame), where N is a positive integer. For example, the spherical difference value may be performed according to the bone vectors corresponding to the erroneous bone vector in the previous frame and the next frame of the erroneous frame. And performing spherical interpolation processing on other skeleton vectors except the first class vector, the second class vector and the third class vector in all skeleton vectors of the error frame, namely removing the other skeleton vectors except the first class vector, the second class vector and the third class vector in all skeleton vectors of the error frame, and performing spherical difference according to the skeleton vectors corresponding to the other skeleton vectors in the upper N frame and the lower N frame of the error frame, wherein N is a positive integer.

In the rotational error completion strategy, spherical interpolation processing is performed on other skeleton vectors (other skeleton vectors are skeleton vectors related to a skeleton, such as vectors between

key points

11 and 14, vectors between

key points

1 and 2, vectors between

key points

2 and 3, and vectors between

key points

3 and 4 shown in fig. 3) except for the first-class vector, the second-class vector, and the third-class vector in the error frame, and then spherical interpolation processing is performed on the erroneous skeleton vectors in the key skeleton vectors. The method has the advantages that the smooth movement of the trunk body of the error frame is ensured, and then the limbs of the error frame are corrected, so that the correction effect of the error frame is improved.

Because the length of the human skeleton is kept constant, if linear difference is directly carried out, the length of the skeleton is easy to change, and if the length of the skeleton is adjusted after the linear difference, the motion of the skeleton between two frames cannot keep the angle uniform, and still belongs to error frames of mutation types. Spherical difference has the following benefits compared to linear difference: (1) the modular length of the difference vector can not be suddenly changed; (2) the angular speed is uniform, and the optimization effect of the error frame is better.

Referring to fig. 4, fig. 4 is a schematic diagram of a spherical aberration value according to an embodiment of the present application. As shown in fig. 4, v1 is the skeleton vector of the previous frame, v2 is the skeleton vector of the next frame, v (t) is the skeleton vector inserted by the spherical difference value of the current frame, and v3 is the skeleton vector inserted by the linear difference value of the current frame, and it can be seen that v (t) is performed along the arc formed by the end points of v1 and v 2. v1 and v2 are two adjacent frames, which can be considered to be of equal length. The bone length adopting the spherical difference value cannot be suddenly changed and can be positioned on the angular bisector of v1 and v2, the angular speed is uniform, and the optimization effect of error frames is better.

Optionally, in step 204, the electronic device corrects the error frame according to the current correction policy, which may specifically include: and the electronic equipment corrects the error frame according to a forward tilting and backward tilting completion strategy.

The electronic equipment corrects the error frame according to a forward tilting and backward tilting completion strategy, and specifically comprises the following steps:

(61) the electronic equipment performs spherical interpolation processing on other skeleton vectors except the first type vector, the second type vector and the third type vector in all skeleton vectors of the error frame, and performs spherical interpolation processing on error skeleton vectors in the first type vector, the second type vector and the third type vector;

(62) if the second type of vector in the key skeleton vectors is an error skeleton vector, the electronic equipment calculates a rotation matrix of the second type of vector in the key skeleton vectors and a target vector, wherein the target vector is the second type of vector in a correct frame with the minimum frame number difference with the error frame in the video;

(63) and the electronic equipment corrects all bone vectors of the error frame according to the rotation matrix.

In the embodiment of the present application, the specific implementation of the spherical difference value may refer to the specific description of the step (51) to the step (52), and is not described herein again. In the forward leaning and backward leaning completion strategy, because the forward leaning and backward leaning completion strategy is performed with error frame detection before, vectors related to a main body may have slight changes, spherical interpolation processing is performed on other bone vectors except for the first type vector, the second type vector and the third type vector in all bone vectors of the error frame, spherical interpolation processing is performed on error bone vectors in the first type vector, the second type vector and the third type vector, spherical difference can be performed on vectors related to the main body first to make the main body move smoothly, then a rotation matrix of an error second type vector and a target vector in key bone vectors is calculated, a correct bone vector in the key bone vectors is reserved, and then all bone vectors of the error frame are corrected through the rotation matrix, so that the whole posture of a person is restored, realizing the forward leaning and backward leaning completion. Firstly, the spherical difference value is carried out on the vectors related to the main body, then the rotation matrix of the second type vector and the target vector in the key skeleton vectors is calculated, the overall posture of the character is restored through the rotation matrix, and other vectors are adjusted in place before the posture is restored, so that the restoration effect of the overall posture of the character is improved.

The rotation matrix is a matrix which has the effect of changing the direction of a vector but not changing the size when multiplied by a vector and maintains the chirality.

The target vector is the second type of vector in the correct frame of the video that differs the least from the number of frames in the erroneous frame. For example, the error frame is the 100 th frame, wherein the 95 th to 99 th frames are error frames, the 90 th to 94 th frames are correct frames, the 101 th to 102 th frames are error frames, and the 103 th to 105 th frames are correct frames, if the correct frame with the smallest difference from the number of the error frame (the 100 th frame) is the 103 th frame, the target vector is the second-class vector in the 103 th frame.

Optionally, in step 204, the electronic device corrects the error frame according to the current correction policy, which may specifically include: and the electronic equipment corrects the error frame according to the hand-foot jump completion strategy.

The electronic device corrects the error frame according to the hand-foot jump completion strategy, which specifically includes the following steps:

(71) if the third kind of vectors in the key skeleton vectors contain wrong skeleton vectors, the electronic equipment performs spherical interpolation processing on the wrong skeleton vectors contained in the third kind of vectors in the wrong frame;

(72) and the electronic equipment performs spherical interpolation processing on other skeleton vectors except the third type of vectors in the error frame.

In the embodiment of the present application, the specific implementation of the spherical difference value may refer to the specific description of the step (51) to the step (52), and is not described herein again. In the hand and foot jump completion strategy, firstly, spherical interpolation processing is carried out on the wrong skeleton vector contained in the third class vector in the wrong frame, keeping the correct skeleton vector contained in the third kind of vector in the error frame unchanged, and performing spherical interpolation processing on other skeleton vectors except the third kind of vector in the error frame, because the rotary error compensation and the forward leaning and backward leaning compensation are performed before, when the hand and foot jump compensation is performed, firstly, the spherical difference value is performed on the error skeleton vector in the third kind of vector related to the hand and foot to realize the hand and foot jump compensation, since the error frame detection is already performed before the hand and foot jump completion strategy, the vectors related to the trunk and body may have slight variations, and performing spherical interpolation processing on other skeleton vectors except the third-class vector in the error frame, and performing spherical difference on vectors related to the main body to ensure that the main body moves smoothly.

In one embodiment, the pseudo code of the overall flow of the embodiments of the present application is as follows:

i←0

while i<5

error frame matrix detection (corresponding to steps 201 and 202)

Completion of rotation error (corresponding to steps (51) and (52))

Error frame matrix detection (corresponding to steps 201 and 202)

Correcting forward and backward (corresponding to steps (61), (62) and (63))

Error frame matrix detection (corresponding to steps 201 and 202)

Completion of jump between hands and feet (corresponding to steps (71) and (72))

i←i+1

End While

According to the embodiment of the application, the error frame optimization result is smoother by circulating for multiple times and gradually increasing the set threshold.

The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that the electronic device comprises corresponding hardware structures and/or software modules for performing the respective functions in order to realize the above-mentioned functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the electronic device may be divided into the functional units according to the method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a video frame correction apparatus 500 according to an embodiment of the present disclosure, where the video frame correction apparatus 500 is applied to an electronic device, the video frame correction apparatus 500 may include a determining unit 501 and a correcting unit 502, where:

a determining unit 501, configured to determine an error frame in a video according to the detection result matrix;

the determining unit 501 is further configured to determine a current modification policy;

a correcting unit 502, configured to correct the error frame according to the current correction policy, so as to obtain corrected 3D skeleton data of the video.

Optionally, the video frame correction apparatus 500 may further include an error frame detection unit 503 and a matrix generation unit 504;

the error frame detection unit 503 is configured to execute current error frame detection before the determination unit 501 determines an error frame in the video according to the detection result matrix, so as to obtain a current detection result;

a matrix generating unit 504, configured to generate the detection result matrix according to the current detection result.

Optionally, the determining unit 501 is configured to determine that the video completes video frame correction when the cumulative number of times of correction of the video reaches a first threshold after the correcting unit 502 corrects the error frame according to the current correction policy;

the error frame detection unit 503 is further configured to perform next error frame detection when the accumulated correction number of times of the video is smaller than the first threshold.

Optionally, the video frame correction apparatus 500 may further include a detection result modification unit 505.

The detection result modifying unit 505 is configured to modify, after the error frame detecting unit 503 performs current-time error frame detection to obtain a current-time detection result, the continuous correct frames into error frames under the condition that the number of continuous correct frames in the current-time detection result is less than or equal to a second threshold, so as to obtain a modified current-time detection result;

the matrix generating unit 504 generates the detection result matrix according to the current detection result, including: and generating the detection result matrix according to the modified current detection result.

Optionally, the video frame correction apparatus 500 may further include an updating unit 506.

The determining unit 501 is further configured to determine the number of times of performing error frame detection cumulatively for the video after the error frame detecting unit 503 performs current error frame detection to obtain a current detection result;

the matrix generating unit 504 is further configured to generate the detection result matrix according to the current detection result when the number of times of performing error frame detection in the accumulated manner is less than or equal to M; m is an integer greater than or equal to 3;

the updating unit 506 is configured to update the current detection result when the number of times of performing the error frame detection cumulatively is greater than M;

the matrix generating unit 504 generates the detection result matrix according to the current detection result, including: and generating the detection result matrix according to the updated current detection result.

Optionally, the performing, by the error frame detecting unit 503, a current error frame detection to obtain a current detection result includes: acquiring 3D bone data of the video; the 3D bone data comprises original 3D bone data or revised 3D bone data; calculating key skeleton vectors of two continuous frames in the video according to the 3D skeleton data of the video; calculating the similarity of corresponding key skeleton vectors between the two continuous frames; under the condition that the similarity is smaller than a preset threshold corresponding to the key skeleton vector, judging that the current detection result of the two continuous frames is an error frame; and under the condition that the similarity is greater than a preset threshold corresponding to the key skeleton vector, judging that the current detection result of the two continuous frames is a correct frame.

Optionally, the updating unit 506 updates the current detection result, including: maintaining the detection result of the video frame which is the same as the first round detection result in the current detection result; the first round of detection results comprise all error frames participating in correction after the previous M times of error frame detection and correct frames except all the error frames; updating the detection result of the video frame of which the current detection result is an error frame and the first round detection result is a correct frame; and maintaining the detection result of the video frame of which the current detection result is a correct frame and the first round detection result is an error frame.

the first class of vectors comprises left and right shoulder vectors;

Optionally, if the current correction times are (3N +1) times, the current correction strategy includes a rotation error completion strategy;

Optionally, the modifying unit 502 modifies the error frame according to the current modification policy, including: and under the condition that the current correction strategy comprises a rotation error completion strategy, performing spherical interpolation processing on other skeleton vectors except the first type vector, the second type vector and the third type vector in all skeleton vectors of the error frame, and performing spherical interpolation processing on error skeleton vectors in the key skeleton vectors.

Optionally, the modifying unit 502 modifies the error frame according to the current modification policy, including: under the condition that the current correction strategy comprises a forward leaning and backward leaning completion strategy, performing spherical interpolation processing on other bone vectors except the first type vector, the second type vector and the third type vector in all the bone vectors of the error frame, and performing spherical interpolation processing on error bone vectors in the first type vector, the second type vector and the third type vector; calculating a rotation matrix of a second kind of vector in the key skeleton vectors and a target vector under the condition that the second kind of vector in the key skeleton vectors is an error skeleton vector, wherein the target vector is the second kind of vector in a correct frame with the minimum difference from the frame number of the error frame in the video; and correcting all bone vectors of the error frame according to the rotation matrix.

Optionally, the modifying unit 502 modifies the error frame according to the current modification policy, including: under the condition that the current correction strategy comprises a hand-foot jump completion strategy, a third kind of vector in the key skeleton vectors contains an error skeleton vector, and spherical interpolation processing is carried out on the error skeleton vector contained in the third kind of vector in the error frame; and performing spherical interpolation processing on other skeleton vectors except the third type vector in the error frame.

The determining unit 501, the correcting unit 502, the error frame detecting unit 503, the matrix generating unit 504, the detection result modifying unit 505, and the updating unit 506 in the embodiment of the present application may be processors of electronic devices.

In the embodiment of the application, the error frame in the video can be determined according to the detection result matrix, and the error frame is corrected, so that the video frame with better original performance in the video frame is not influenced, and the correction effect of the video frame is improved.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, as shown in fig. 6, the electronic device 600 includes a processor 601 and a memory 602, and the processor 601 and the memory 602 may be connected to each other through a communication bus 603. The communication bus 603 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 603 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus. The memory 602 is used for storing a computer program comprising program instructions, the processor 601 being configured for invoking the program instructions, the program comprising instructions for performing the method shown in fig. 1 or 2.

The processor 601 may be a general purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs according to the above schemes.

The Memory 602 may be a Read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integral to the processor.

The electronic device 600 may further include a display screen, a speaker, and the like, and may further include a radio frequency circuit, an antenna, and the like.

Embodiments of the present application also provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, and the computer program enables a computer to execute part or all of the steps of any one of the methods as described in the above method embodiments.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.

The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a read-only memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and the like.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash memory disks, read-only memory, random access memory, magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method for video frame modification, comprising:

determining error frames in the video according to the detection result matrix;

determining a current correction strategy;

2. The method of claim 1, wherein before determining the erroneous frame in the video according to the detection result matrix, the method further comprises:

3. The method according to claim 1 or 2, wherein after said correcting said error frame according to said current correction strategy, said method further comprises:

4. The method of claim 2, wherein after performing the current-time error frame detection and obtaining the current-time detection result, the method further comprises:

5. The method of claim 2, wherein after performing the current-time error frame detection and obtaining the current-time detection result, the method further comprises:

executing the step of generating the detection result matrix according to the current detection result under the condition that the number of times of cumulatively executing error frame detection is less than or equal to M; m is an integer greater than or equal to 3;

and generating the detection result matrix according to the updated current detection result.

6. The method according to any one of claims 2, 4, and 5, wherein the performing the current-time error frame detection to obtain the current-time detection result comprises:

7. The method of claim 5, wherein the updating the current detection result comprises:

8. The method of claim 6, wherein the predetermined threshold corresponding to the key bone vector is determined based on the key bone vector and the accumulated number of times of performing erroneous frame detection;

9. The method of claim 8, wherein the key bone vectors comprise any of a first class of vectors, a second class of vectors, and a third class of vectors;

the first class of vectors comprises left and right shoulder vectors;

10. The method of claim 9,

11. The method of claim 10, wherein said modifying the erroneous frame according to the current modification strategy comprises:

12. The method of claim 10, wherein said modifying the erroneous frame according to the current modification strategy comprises:

13. The method of claim 10, wherein said modifying the erroneous frame according to the current modification strategy comprises:

14. A video frame modification apparatus, comprising:

15. An electronic device comprising a processor and a memory, the memory for storing a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1 to 13.

16. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method according to any one of claims 1 to 13.