CN113780206B

CN113780206B - Video image analysis processing method

Info

Publication number: CN113780206B
Application number: CN202111084916.6A
Authority: CN
Inventors: 周环; 吴泽徐; 王书琪
Original assignee: Fujian Pingtan Ruiqian Intelligent Technology Co ltd
Current assignee: Fujian Pingtan Ruiqian Intelligent Technology Co ltd
Priority date: 2021-09-16
Filing date: 2021-09-16
Publication date: 2022-11-04
Anticipated expiration: 2041-09-16
Also published as: CN113780206A

Abstract

The invention discloses a video image analysis processing method, belonging to the technical field of video image processing; the method comprises the following steps: 1) Arranging a video acquisition device on a field, and wearing a sensing device by a user; 2) The video acquisition device and the sensing device acquire video data and sensing data of the whole action and preprocess the video data; 3) Extracting joint points of the preprocessed video, and acquiring force application points and force application time according to the sensing data; 4) Performing interframe pairing on the video and the sensing device, and performing visualization processing to derive a visualization model; the invention utilizes two modes of sensing data and video analysis processing to fuse the sensing data and the video data, realizes multi-element data fusion, provides more motion data for users, makes up uncertainty in motion, and solves the defects that the video data in the prior art are too single, have insufficient precision, can not quantify data, and provide suggestions for customized schemes.

Description

Video image analysis processing method

Technical Field

The invention belongs to the technical field of video image processing, and particularly relates to a video image analysis processing method.

Background

Baseball originated at the earliest in the 15 th century, the paddle ball that was popular in the uk at that time. And later passed to the united states. In 1839, gu Pasi dun, new york, usa held the first baseball game. In 1978, international union of baseball was acknowledged by the international Olympic Commission, and the International Association of baseball was established in 1994 as Luoshan, switzerland. Baseball was listed in the olympic games in 1992. For the training of players, the traditional baseball training takes a coach as a main part, and the coach customizes a training scheme through actions, postures, ball distances and the like of the players during movement, but the mode is too subjective, a lot of data and parameters are limited by the influence of subjective factors of people, and the final result is usually half the time; nowadays, more and more researchers at home and abroad focus on the development of an assistant training system, wherein the proportion of the assistant training system is the greatest for golf. The auxiliary training system for golf game is applicable to baseball, but the defects are obvious in the prior art;

for example, the two prior arts, namely "CN201810555225.1 is a system, method, device and medium for assisting in movement based on student movement" and "CN201910683497.4 is a method for estimating golf swing score based on posture comparison", are based on video processing, and utilize an algorithm to score the posture of a user and a standard posture in a standard database, so that the method can be realized by utilizing the advantages of the algorithm, but has a significant disadvantage that the difference of individuals is ignored, the muscle strength and posture change conditions between people are different, the actually obtained scores are different, and if the scores are processed by the standard database, each person cannot be trained according to the characteristics of the person; the Zepp device in the prior art solves the problem of data feedback to a certain extent, but a three-axis gyroscope is adopted, so that the data precision is also certain; meanwhile, the prior art can only carry out matching according to the gesture and the standard library, data and videos are not fused and transited, and effective data feedback cannot be provided, so that the scheme in the prior art can only mechanically prompt the difference between the gesture and the standard gesture of a user and the video playback function; meanwhile, the standard posture is often the most standard result obtained by a certain unique person under a certain special condition, and the standard posture cannot be quantized into data and applied to the user for customized training.

Disclosure of Invention

Technical problem to be solved

The method and the device solve the defects that video data are too single, accuracy is insufficient, data cannot be quantized, and suggestions are provided for a customized scheme in the prior art.

(II) technical scheme

The invention is realized by the following technical scheme: a video-based image analysis processing method;

step 100: arranging a video acquisition device on a field, and wearing a sensing device by a user;

step 200: the video acquisition device and the sensing device acquire video data and sensing data of the whole action and preprocess the video data;

step 300: extracting joint points of the preprocessed video, and acquiring force application points and force application time according to the sensing data;

step 400: and carrying out interframe pairing on the video and the sensing device, and carrying out visual processing to derive a visual model.

As a further explanation of the above scheme, the step 100 video capture device includes a depth camera and a high-speed camera;

the depth camera comprises an infrared emitter, an RGB camera and a 3D structured light depth sensor.

As a further explanation of the above solution, the step 100 sensing device is an inertial measurement unit; respectively arranged on the hand and the waist of a user;

the sensing device is used for acquiring the rotating speed, the angular speed and the acceleration of the user when the hand and the waist act during the movement of the user and the instantaneous speed and the instantaneous position data of the main force application point.

As a further description of the above scheme, in the preprocessing of the video data in step 200, a background extraction algorithm is specifically used to extract a moving object, a first frame when the moving object starts to move is used as a starting frame of the video, a last frame when the moving object finishes moving is used as an ending frame of the video, and the preprocessing of the video segment between the starting frame and the ending frame is intercepted and ended.

As a further illustration of the above solution, the step 300 of extracting the joint point includes the following steps:

step 310: extracting the preprocessed video;

step 320: detecting a target by using an algorithm;

step 330: openCV detection key points;

step 340: and outputting the result.

As a further description of the above scheme, the step 300 of obtaining the force application point and the force application time is specifically divided into the following two parts:

1) When a user swings a bat to hit the ball, the bat is influenced by the impulse of the ball at the moment of hitting the ball, so that the swing speed and the acceleration of the hand are changed, the change range is used as the time point of positioning and applying force, namely the time point of starting the change is used as the starting point of the force application point and the force application time, and the time point of applying force is used as the end point of the force application time when the user applies force to offset the impulse of the ball;

2) And (3) throwing the ball, wherein momentum is applied to the ball when the user throws the ball, the speed and the acceleration of the hand are changed at the moment when the ball is thrown, and the starting time point is changed to be used as the force application point of the ball throwing.

As a further illustration of the above scenario, the step 400 inter-frame pairing is specific

Step 410: carrying out static judgment before the user starts to move to obtain an initial value of the sensing device;

step 420: when the sensing device detects the change, the change is used as a starting point of sensing data; pairing the starting point with the first frame in the preprocessed video, and taking the last frame of the preprocessed video as the end;

step 430: and clearing the data acquired after the last frame of the preprocessed video.

As a further explanation of the above scheme, the step 400 of visualization processing specifically performs pairing fusion on the obtained sensing data and the bone joint point data, and obtains fusion data of the sensing data and the bone joint point and fusion data of the video data and the bone joint point after the above steps are completed; both data are transmitted to the user terminal.

(III) advantageous effects

Compared with the prior art, the invention has the following beneficial effects:

the invention utilizes two modes of sensing data and video analysis and processing to fuse the sensing data and the video data, realizes multivariate data fusion, provides more motion data for users, makes up uncertainty during motion, and solves the defects that the video data is too single, the precision is insufficient, the data can not be quantized, and suggestions are provided for customized schemes in the prior art.

Detailed Description

Examples

the video acquisition device comprises a depth camera and a high-speed camera; the depth camera comprises an infrared emitter, an RGB camera and a 3D structured light depth sensor.

It should be further noted that in this embodiment, kinect V2 published by microsoft corporation is used, and a 3D structured light depth sensor composed of an infrared emitter, a color RGB camera, and an infrared CMOS camera is used; the infrared transmitter actively projects modulated near-infrared light, the infrared light is reflected when striking an object in the field of view, the infrared camera receives the reflected infrared light, the depth is measured by adopting the TOF technology, the time difference of the light is calculated (usually through phase difference), and the depth of the object (namely the distance from the object to the depth camera) can be obtained according to the time difference.

The sensing devices are respectively arranged on the hand and the waist of a user; the sensing device is used for acquiring the rotating speed, the angular speed and the acceleration of the user when the hand and the waist act during the movement of the user and the instantaneous speed and the instantaneous position data of the main force application point.

It should be further noted that, the inertial measurement unit adopted in the embodiment is an inertial measurement unit with a model number of TAS601 released by the seventh good research institute of ship group, china; the device can output data such as a pitch angle, a roll angle, IMU three-axis direction angular rate and three motion direction angular rates in a geodetic coordinate system relative to a horizontal plane, and has the characteristics of small volume, high precision, low power consumption and the like; the size of the utility model is 30mm long, 30mm wide and 11mm high, the weight is less than 10g, the volume is small, the weight is low, and the influence on the user can be avoided to the greatest extent; it should be further explained that the inertia measurement unit is integrated with the communication unit, the power supply unit, and the storage unit, and the specific integration method is not described in detail for the prior art.

Step 200: the video acquisition device and the sensing device acquire video data and sensing data of the whole action and preprocess the video data; specifically, the video data is preprocessed by extracting a moving target by using a background extraction algorithm, taking a first frame when the moving target starts moving as a starting frame of a video, taking a last frame when the moving target finishes moving as an ending frame of the video, and intercepting a video segment between the starting frame and the ending frame to finish preprocessing.

It should be further noted that, the characteristic of baseball sports is that when a ball flies to a batter, the batter focuses on the ball, and therefore the user is in a relatively static state or a state with a very small motion amplitude, so the inter-frame difference method is adopted in this embodiment to extract, which has the advantage that when there is no moving object in the scene, the change between consecutive frames is very weak, and the change is obvious, a moving target can be quickly detected, so as to establish the first frame of the video, and the algorithm principle is as follows:

A _i ＝|B _i -B _i-1 |

in the formula A _i For the first frame of the identified video, B _i Is the ith frame, B _i-1 Is the i-1 th frame; in order to ensure the first frame of the video to be extracted accurately, a threshold value C is set, when | B | _i -B _i-1 If | is greater than C, set A _i For the first frame of the video, an end frame of the video may be established as described above;

it should be further noted that, generally, the time from when the ball enters the picture to when the user swings the stick is short, but the angle of the device placement varies from person to person, so that the problem of erroneous judgment caused by the fact that the ball enters the picture but the user has not yet made a response is solved, the method adopted in this embodiment further sets the threshold C, because the principle of the inter-frame difference method is that when abnormal object motion occurs in the monitored scene, a relatively obvious difference occurs between frames, two frames are subtracted to obtain an absolute value of the brightness difference of the two frames, whether the absolute value is greater than the threshold is judged to analyze the motion characteristic of the video or image sequence, whether there is object motion in the image sequence is determined, and the area occupied by the person in one image is several times of the ball, so that the threshold is directly further constrained, and the influence of the ball can be avoided; it should be further explained that the above method is not the only method, and the setting of the start-stop frame can be processed by optical flow field algorithm, video object segmentation algorithm, etc.;

step 300: extracting joint points of the preprocessed video, and acquiring force application points and force application time according to the sensing data; as a further illustration of the above solution, the step 300 of extracting the joint point includes the following steps:

step 310: extracting the preprocessed video;

step 320: detecting a target by using an algorithm;

step 330: openCV detection key points;

step 340: and outputting the result.

It should be further explained that the skeleton picture of the Kinect is extracted by using the OpenCV tool in the step, at this time, the stored picture represents the image of each frame of the preprocessed video, and the images are combined and spliced into the video, and the corresponding frame number and code rate of the video are the same as those of the preprocessed video, so that the subsequent processing is facilitated;

the step 300 of obtaining the force point and the force time is specifically divided into the following two parts:

The above principle is as follows:

since the speed value obtained by the sensor is continuous and is used as the point of attack and the starting point of attack when the bat hits a baseball, the following steps are required to complete the extraction of data:

step 350: discretizing data;

step 360: searching a wave trough;

step 370: extracting time points;

it should be further noted that, in this embodiment, the frame number of the video data acquired by the KinectV2 is 30fps, so that in step 350, the discretization is performed by using an equal-width method, and the distribution interval of the equal-width method is performed by using 1/30s of each interval to perform equal-width discretization, so as to facilitate the subsequent pairing with the video data;

after discretization, searching a peak valley in the discrete data by utilizing a second derivative, and obtaining a point y of a minimum numerical value when a user strikes a ball at the moment, wherein the point of the peak valley is used as an end point of a force application time when the user applies force to offset the impulse of the ball;

and when the ball is hit, the corresponding speed data is obtained by the following formula:

v(x)<v(x-1)&&v(x-1)<v(x-2)&&v(x-2)<v(x-3)

v (x) is the speed at the x-th time point, and F (x) is the force point when the above formula is satisfied;

the force application time can be calculated by the following formula:

T＝T(x)-T(y)

t represents the duration of force application, T (x) represents the time of force application point, and T (y) represents the time of force application end point;

having obtained the T variable from the above description, the amount of impulse and force applied to the ball after being hit can be calculated when the weight of the baseball is known, and the formula is as follows:

FT＝p ₁ －p ₂ ＝mv ₁ －mv ₂

wherein F represents the resultant force of all external forces including the gravity borne by the ball when the ball is hit; t represents acting time of the external force; p is a radical of ₁ Indicating the initial momentum of the ball, p ₂ Represents the end momentum of the ball, m represents the mass of the ball; v. of ₁ Represents the initial velocity of the ball, i.e. the velocity of the ball before it is hit; v. of ₂ Represents the final velocity of the ball;

and the initial velocity v of the ball ₁ And end velocity v ₂ Then the speed of the ball needs to be captured by a high-speed camera, and redundant description is not repeated;

step 400: and carrying out interframe pairing on the video and the sensing device, and carrying out visualization processing to derive a visualization model.

step 420: when the sensing device detects the change, the sensing device is used as a starting point of sensing data; pairing the starting point with the first frame in the preprocessed video, and taking the last frame of the preprocessed video as the end;

It should be further noted that the static determination refers to that the inertial measurement unit determines its initial position and initial attitude by using its coordinate system, and the details of the algorithm portion are not repeated here.

Step 400, performing visualization processing, specifically, matching and fusing the obtained sensing data and the bone joint point data, and obtaining fused data of the sensing data and the bone joint point and fused data of the video data and the bone joint point after the steps are completed; both data are transmitted to the user terminal. It should be further explained that the data transmission mode includes wireless communication and wired communication; the visual purpose lies in being convenient for through visual interface with user direct interaction, show under different equipment, scenes simultaneously, the user of being convenient for exchanges the study.

While there have been shown and described what are at present considered the fundamental principles and essential features of the invention and its advantages, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing exemplary embodiments, but is capable of other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. A video-based image analysis processing method; the method is characterized in that:

in the step 200, the video data is preprocessed specifically by using a background extraction algorithm to extract a moving target, a first frame when the moving target starts moving is used as a starting frame of the video, a last frame when the moving target finishes moving is used as an ending frame of the video, and video segment preprocessing between the starting frame and the ending frame is intercepted and ended;

the starting frame and the ending frame are extracted by adopting an interframe difference method, a threshold value C is set, the ith frame is defined as the starting frame, when the absolute value of the image brightness difference between the ith frame and the (i-1) th frame is greater than C, the ith frame is set as the first frame of the video, and the ending frame of the video can be established according to the principle;

the point of exertion and the time of exertion are obtained by the following two parts:

1) When a user swings a bat to hit the ball, the bat is influenced by the impulse of the ball at the moment of hitting the ball, so that the swing speed and the acceleration of the hand are changed, the change range is used as a time point for positioning the force, namely, the time point of starting the change is used as a force point and a starting point of the force time, and when the user exerts force to offset the impulse of the ball, the time point is used as an end point of the force time;

2) Throwing the ball, wherein momentum is applied to the ball when a user throws the ball, and when the speed and the acceleration of a hand at the moment of throwing the ball are changed, the time point of the change start is used as the force application point of throwing the ball;

step 400: carrying out interframe pairing on the video and the sensing device, and carrying out visualization processing to derive a visualization model;

the step 400 of inter-frame pairing specifically comprises

2. A video-based image analysis processing method as claimed in claim 1, characterized by: the video acquisition device in the step 100 comprises a depth camera and a high-speed camera;

3. A video-based image analysis processing method as claimed in claim 1, characterized by: the step 100 sensing device is an inertial measurement unit; respectively arranged on the hand and the waist of a user;

4. A method for video-based image analysis processing as claimed in claim 1, wherein: the step 300 of joint point extraction comprises the following steps:

step 310: extracting the preprocessed video;

step 320: detecting a target by using an algorithm;

step 330: openCV detection key points;

step 340: and outputting the result.

5. A video-based image analysis processing method as claimed in claim 1, characterized by: the step 400 of visual processing specifically performs pairing fusion on the obtained sensing data and the bone joint point data, and obtains fusion data of the sensing data and the bone joint point and fusion data of the video data and the bone joint point after the steps are completed; both data are transmitted to the user terminal.