CN107103298A

CN107103298A - Chin-up number system and method for counting based on image procossing

Info

Publication number: CN107103298A
Application number: CN201710264022.2A
Authority: CN
Inventors: 黄知超; 赵文明; 赵华荣
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2017-04-21
Filing date: 2017-04-21
Publication date: 2017-08-29
Anticipated expiration: 2037-04-21
Also published as: CN107103298B

Abstract

Chin-up number system and method for counting of the invention based on image procossing, under conditions of given horizontal bar height, camera parameters, shooting environmental, the position of horizontal bar, face, human hand is obtained with straight-line detection, Face datection, Face Detection of the Digital Image Processing in theoretical according to the image sequence photographed, and according to image rotation, bending moment theory does not ask the zeroth order square and first moment of its bianry image by the relevant range detected, it is determined that accurately face and human hand centroid position and face area.Then face and human hand change in location in tracing record whole image sequence and the gradation of image value changes of specific region, by Analysis of programming, obtain the chin-up number of times of sportsman.Due to using noncontact measurement, so environment resistant interference performance is strong.Location detecting technology is ripe, and counting error is low.With the requirement of less computing power and execution time, quick, the reliable and accurate requirement of real time of counting is reached.When the sportsman of different heights, different sexes carries out test counting, it is not necessary to reprogramming.

Description

Pull-up counting system and method based on image processing

Technical Field

The invention relates to the field of image processing and machine vision, in particular to a chin up-counting system and a chin up-counting method based on image processing.

Background

The pull-up, middle school sports and high school sports meet selection items mainly test the development level of upper limb muscle strength, are male upper limb strength examination items, are pendulous strength exercises for overcoming the gravity of the upper limb strength, are the most basic methods for back exercise, are also one of important reference standards and items for measuring the male physique, and have very important significance for rapidly and accurately counting the pull-up times.

The current work of counting the pull-up items is mainly accomplished by manpower, and is time-consuming, labor-consuming and high in labor cost, and long-time one-to-many tests easily cause the attention of the examiners to be reduced, so that the body is tired, and counting is mistaken.

And the pull-up item action standard is strict, the subjective ratio of manual counting is relatively large, and the phenomena of misjudgment and unfairness are easily caused by non-uniform action standards.

In addition, the records and the preservation of the student scores are generally manually recorded by an inspector and then manually input into a related score management system, so that the procedures are complicated and errors are easy to make.

Therefore, for pull-up counting of a meeting item, there is an urgent need for a pull-up automatic counting device, so as to relieve the labor burden, improve the detection efficiency, maintain the examination fairness and realize the counting automation.

At present, a plurality of relevant chin counting devices have also appeared domestically, but all are in the structural improvement of horizontal bar, and adopt sensors such as installation infrared probe sensor, ultrasonic sensor mostly, and this kind of chin counting device algorithm, simple structure still can not solve a series of problems that artifical count arouses completely, and the counting result receives external disturbance to influence great, and the erroneous judgement rate is high, does not reach practical requirement. Some counting devices also adopt image sensors, but the special equipment is used, so the price is high, the algorithm is complex, and the real-time requirement cannot be met.

Chinese patent CN105944363A discloses an ultrasonic chin tester and a control method thereof, the tester includes and ultrasonic probe, infrared probe, voice broadcaster, electronic display screen, etc. connected with the tester, the ultrasonic probe is used to measure the human vertex displacement distance in real time and record the times of human chin, the infrared probe is used to detect whether the front end of the wrist or forearm is located on the detection path and the time length on the detection path. The method adopts an ultrasonic probe to measure the displacement distance of the top of the human head in real time, converts the change repetition times of the effective distance into electric signals, combines an infrared probe to carry out human body induction and timing, and records the pull-up times. The method adopts the ultrasonic probe and the infrared probe, the interference of receiving the external environment is large, the same standard is adopted for different testers, so the unfairness exists, the operation difficulty of the tester is large, and the tester is a burden for the testers.

Zhao Su Fang describes a chin-up automatic test system combined with depth images, which studies the chin-up automatic test combined with depth images, and utilizes the acquisition of depth images, and the combination of color images according to depth information to perform head tracking and image segmentation to determine the positions of a horizontal bar and a face chin. The horizontal height of a horizontal bar is determined through depth information before testing, the distance from the lower jaw to a head bone point of each object is determined through face tracking and bone tracking, the head bone point is determined through faster bone tracking in the test, and then the distance from the lower jaw to the head bone point is subtracted, so that whether the horizontal position of the lower jaw is located on the horizontal bar or not is effectively judged. The head skeleton tracking result is also used for judging that the arm is straightened. The maximum height difference between the shoulder joint and the horizontal bar is measured through shoulder joint positioning at the beginning of the test, the maximum height difference is multiplied by a coefficient to be used as an arm extension judgment threshold of a tester, the lower jaw position obtained through head skeleton tracking in the test is calculated, the height difference between the lower jaw position and the horizontal bar is calculated, and if the height difference exceeds the threshold, the arm extension is judged. The system frees manpower to a certain extent and improves the pull-up automation degree, but the image sensor of the system is Kinect of Microsoft corporation, the sensor is expensive, and a depth image data set acquired by the Kinect is very large, so that the requirement on the performance of a computer is high, and the requirement on real-time performance is difficult to achieve.

In summary, the current pull-up counting device has a certain improvement in the degree of automation and the accuracy of counting, but still cannot achieve a practical degree.

Disclosure of Invention

In order to solve the above problems, the present invention provides a pull-up counting system and a counting method based on image processing.

The technical scheme for realizing the purpose of the invention is as follows:

pull-up counting system based on image processing comprises a main controller, an image acquisition module, a data display module, a discrimination information counting module and an action counting module, wherein the main controller is respectively connected with the data display module, the discrimination information counting module and the action counting module, and the image acquisition module is connected with the main controller through a preprocessing module, a human face and hand recognition module and a coordinate extraction module.

Furthermore, an external data import module is also arranged and connected with the main controller.

Furthermore, a light source is also arranged and connected with the main controller.

The counting method adopting the counting system comprises the following steps:

(1) acquiring a specific motion image: collecting an image of a tested person in a specific action at a specified position;

(2) image preprocessing: converting a color RGB image into an 8-bit gray image, then filtering and smoothing the gray image to remove noise interference, carrying out edge detection on the gray image by using a Canny operator to generate an edge binary image, and establishing an image coordinate system by taking the upper left corner of the image as a coordinate origin, the vertical downward direction as an X axis and the horizontal rightward direction as a Y axis;

(3) detecting a straight line segment: in a polar coordinate system, a straight line is represented by parameters of a polar diameter r and a polar angle theta, and the expression is as follows:

wherein,the slope of the straight line is shown,is a straight line intercept;

since more than one line is detected in the image, the slope of the line is increased for the detected lineAnd limiting the length value of the straight line, and finally, taking the central point value x of the two straight lines on the x axis by taking the two most obvious edge straight lines at the upper edge and the lower edge of the horizontal bar₁And x₂And then, the average value is calculated,

thus, the horizontal bar height is:

(4) human face and human hand detection: collecting a front face image and a hand image as a training set, then representing a face and a hand by using haar-like rectangular features, calculating a feature value of each pixel point, selecting the rectangular features which can represent the face and the hand most, and finding the face and the hand by continuously adjusting the position and the proportion of a detection window;

(5) and (3) calculating a rotation invariant moment: the approximate positions of the human face and the human hand found in the step (4) are surrounded by corresponding outer contours, the image is converted into a binary image, a two-dimensional density distribution f (x, y) is formed,

the (p + q) order geometrical moment function of the image f (x, y) is:

where f (x, y) is the brightness value of the image at point (x, y),is the pixel space region defined by the image intensity f (x, y), x^py^qIs a transformation kernel, M^pqReferred to as the geometric moment of order p + q,

geometric moment m of zero order₀₀Representing the total brightness of the image, m being in a binary image₀₀Is the geometric area of the target region;

first order geometric moment m₁₀，m₀₁Is the luminance moment of the image about the X-axis and y-axis, its luminance centroid (X)₀,Y₀) The formula is as follows:

in a binary image, a point (X)₀，Y₀) The geometric center, i.e. the centroid,

therefore, the zeroth order moment m of the binary image is obtained₀₀The face area is obtained and divided by the first moment m₁₀，m₀₁The mass centers of the human face and the human hand are obtained;

(6) obtaining the height value x of the horizontal bar through the steps₀Area of human face s₁Face, face centroid coordinates (x)_{1 hand}，y_{1 face}) And coordinates of the centroid of the human hand (x)_{1 hand}，y_{1 hand}) Using height x of horizontal bar₀Area of human face s₁The distance h between the face and the center of the hand is x_{1 hand}-x_{1 hand}The three numerical values are used as thresholds for judging whether the pull-up action is qualified or not;

(7) acquiring a sequence of images in a chin direction: collecting and recording an image sequence of a tested person in the whole chin counting examination until the whole counting examination is finished;

(8) image sequence preprocessing: graying and Gaussian blurring are carried out on the image sequence; converting a color RGB image into an 8-bit gray image, then filtering and smoothing the gray image to remove noise interference, carrying out edge detection on the gray image by using a Canny operator to generate an edge binary image, and establishing an image coordinate system by taking the upper left corner of the image as a coordinate origin, the vertical downward direction as an X axis and the horizontal rightward direction as a Y axis;

(9) moving human body detection and foreground extraction: modeling the background by taking the moving object as the foreground and the rest as the background, subtracting the background by using the current frame through a subtraction method, and extracting the moving human body from the whole image sequence for analysis and counting; in order to ensure better results of the foreground image, the specific area is subjected to morphological transformation so as to eliminate noise interference of the specific area. Calculating the gray sum of foreground pixel points in the region of interest;

(10) detecting the skin color of a human body: converting the image sequence from an RGB color space to a YCrCb color space, the conversion function being:

wherein: in YCrCb color space, Y refers to luminance information, C_bFinger color tone, C_rRefers to the degree of saturation. In the RGB color space, R represents a red component, G represents a green component, and B represents a blue component;

the skin pixel points in the CrCb two-dimensional space are approximately distributed in an ellipse. If the coordinates (Cr, Cb) are within the elliptical distribution, the pixel is identified as a skin pixel, and if the coordinates (Cr, Cb) are outside the elliptical distribution, the pixel is identified as a non-skin pixel.

Projecting the collected skin color sample points to a CbCr plane, performing nonlinear K-L transformation on the CbCr plane after projection, and performing statistical analysis on skin pixel points to form a skin color elliptic model;

the elliptical skin color model is described as:

wherein, c_x＝109.38，c_y＝152.02，θ＝2.53，ec_x＝1.6，ec_y2.41, 25.39 for a, 14.03 for b; adding corresponding limiting conditions to detect the human face in the image sequence;

(11) and (3) calculating a rotation invariant moment: the human face detected in the step (10) is surrounded by the corresponding outer contour, and the image is converted into a binary gray image which presents a two-dimensional density distribution f (x, y),

the (p + q) order geometrical moment function of the image f (x, y) is:

therefore, the zeroth order moment m of the binary image is obtained₀₀The face area is obtained and divided by the first moment m₁₀，m₀₁The centroid (x) of the face in the image sequence is obtained₂，y₂)；

(12) Pull-up action determination and counting

The distance between the face centroid in the image sequence and the horizontal bar is as follows:

h₁＝x₂-x₀(10)

as the sequence of images changes, s and h₁Is also constantly changing;

when s > is s1 face, the athlete finishes a qualified pull-up action, and the face chin crosses the horizontal bar; after the upward pulling action is finished, when h1> is h, the athlete finishes a qualified arm straightening action, and at the moment, notes a qualified pull-up action;

(13) when the whole image sequence is finished, the whole counting process is stopped, and the number of the chin-up actions is stored and output.

The image preprocessing step (2) includes:

1) the color RGB image is converted to an 8-bit grayscale image with the following conversion function:

wherein: r, G, B are color components of each pixel of the color image, R represents a red component, G represents a green component, B represents a blue component, and Grey is a gray value of each pixel of the gray image;

2) the filtering smoothing operation is:

scanning each pixel in the image by using a 5-by-5 Gaussian template, performing weighted average on pixel values in a neighborhood determined by the template, and replacing the value of the pixel at the central point by using the value of the weighted average;

wherein: the value in the template is the weighting coefficient of the pixel in the neighborhood of the central pixel point.

The convolution function is:

g(i，j)＝∑_k，lf(i+k，j+l)h(k，l) (12)

wherein: g (i, j) is an output pixel value, f (i + k, j + l) is an input pixel value, h (k, l) is a filter weighting coefficient, and k and l take values of 1, 2, 3, 4 and 5;

3) the edge detection operation is:

the gray level image is convolved by a pair of convolution arrays respectively acting on the x direction and the y direction to obtain a first-order gradient G of the gray level image in the x direction and the y direction_xAnd G_y；

The selected convolution templates were:

then the gradient amplitude and direction are calculated by using the following formula;

wherein G is_x、G_yGradient values, p, in the x-and y-directions, respectively_x、p_yThe convolution templates in the x direction and the y direction respectively, G (x, y) and theta (x, y) are the gradient amplitude and the direction of the point (x, y) respectively, and the range of theta is [ 0-180 ]]Degree;

and then, according to a double-threshold method, giving a pixel value 1 or 0 according to a set threshold value, and splicing candidate pixel points into a contour so as to generate a binary edge image.

Simplifying the linear segment detection expression in the step (3) to obtain:

r＝xcosθ+ysinθ (17)

so for point (x)₀，y₀) The straight line passing through this point can be uniformly defined as:

γ_θ＝x₀cosθ+y₀sinθ (18)

wherein each pair (r, theta) represents a pass-through point (x)₀，y₀) A straight line of (a);

therefore, the detected straight line in the image is obtained by connecting the points with the same number of points (r, theta) exceeding the threshold value in the image;

since more than one line is detected in the image, the slope of the line is increased for the detected lineAnd limiting the length value of the straight line, and finally, taking the central point value x of the two straight lines on the x axis by taking the two most obvious edge straight lines at the upper edge and the lower edge of the horizontal bar₁And x₂Calculating the average value;

thus, the horizontal bar height is:

the human face and human hand detection in the step (4) comprises the following steps:

1) collecting a face image on the front side as a training set, wherein the size of the face image is 24 x 24, then representing the face by using rectangular features, and calculating the feature value of each pixel point;

the integrogram is the sum of all pixels in the upper left corner of the coordinate (x, y);

the integral plot function is:

ii(x，y)＝∑_{x′≤x，y′sy}i(x′，y′) (20)

wherein ii (x, y) represents the value of the integrogram, i (x ', y') represents the value of the original image at the point (x ', y'), indicating a color value in the color image, and indicating a gray value in the gray image;

2) selecting some rectangular features which can represent the face most, namely weak classifiers by using a boosting algorithm;

the weak classifier function is:

wherein, x is a detection sub-window, f is a rectangular feature, f (x) is a Haar-like feature value of the sub-window, p can only be positive and negative and refers to a direction, theta is a threshold value, for each rectangular feature f, a weak classifier h (x, f, p, theta) is trained, and the fact is to solve the optimal solution of theta, so that the classification error of the weak classifier h (x, f, p, theta) to all training samples is the lowest;

3) training and combining the weak classifiers into a strong classifier; the process is as follows:

A) given a training sample (x)₁，y₁)，……，(x_n，y_n) N number ofSample image, y_i1 denotes a positive sample, y_iNegative sample 0. Wherein i is less than or equal to n, and the set iteration number is T;

B) initializing negative and positive sample weights as

Wherein m represents a negative sample and l represents the number of positive samples;

C) from iteration 1 to iteration T:

a. normalization weight:

b. screening the optimal weak classifier according to the minimum weighted error rate:

∈_t＝min_f，p，θ∑_iw_i|h(x_i，f，p，θ)-y_i| (23)

c. updating the weight:

wherein: if x_iClassification error, e_i0, otherwise e_i＝1，

D) The T wheel is cascaded into a strong classifier:

wherein

E) Adjusting the position and the proportion of the detection window continuously to find the face;

in addition, the human face detection and human hand detection processes are similar, except that the image used for training is a palm image.

The weak classifier training process comprises the following steps:

A) calculating the characteristic values of all training samples for each rectangular characteristic f through an integral graph;

B) sorting the characteristic values;

C) respectively calculating the probability sum T of all negative examples of each characteristic value^-(ii) a Probability sum of all positive examples T⁺(ii) a Weight sum S of the leading positive example of the feature value⁺(ii) a Weight sum S of leading and trailing examples of the feature value^-；

D) Selecting a feature value of a current elementAnd a characteristic value preceding itThe number between the two is used as a threshold value, the obtained weak classifier is the weak classifier with the lowest classification error, the weak classifier corresponding to the threshold value classifies the feature value pixel points before and after the current feature value into a face and a non-face, and the classification error of the threshold value is as follows:

e＝min(S⁺+(T^--S^-)，S^-+(T⁺-s⁺)) (26)。

the modeling algorithm flow in the step (9) is as follows:

1) each newly detected pixel value x_tAnd comparing the current k Gaussian models according to a function formula (23), and if the mean deviation between the pixel point and the model is within 2.5 sigma, considering the model as a distribution model matched with a new pixel value:

|x_t-u_i，t|≤2.5σ_i，t-1(27)

2) if the matched mode meets the background requirement, the pixel is regarded as a nonsense background, otherwise, the pixel is regarded as a meaningful foreground;

3) the weights of each constituent single Gaussian model are updated according to the formula (24), where α is the learning rate, M_k，tFor the matching pattern, take the value M_k，t＝1，M_k，tThe weights for each model were then normalized to 0:

w_k，t＝(1-α)*w_k，t-1+α*M_k，t(28)

4) the mean μ and standard deviation σ of no matching pattern are unchanged, and the parameters are updated as follows:

ρ＝α*η(x_t|u_k，σ_k) (29)

u_t＝(1-ρ)*μ_t-1+ρ*x_t(30)

5) if no pattern is matched in the step 1), replacing the pattern with the minimum weight, namely, the standard deviation of the pattern is an initial large value, the weight is the minimum value, and the average value is the current pixel value;

6) each mode according to omega/sigma²The data are arranged in descending order, and the mode with heavy weight and small standard deviation is arranged in front;

7) selecting the first B modes as background, B satisfies the following formula:

wherein the parameter T represents the proportion of the background.

8) Subtracting the background obtained by modeling by using the current frame, wherein the rest is the foreground;

in order to ensure better result of the foreground image, the specific area is subjected to morphological transformation to eliminate noise interference of the specific area,

the morphological transformation operation selected is corrosion, and the corrosion kernel is:

the corrosion algorithm is as follows: each pixel point in the image is scanned by a structural element (corrosion kernel), and the structural element and the binary image covered by the structural element are used for carrying out AND operation. If both are 1, the pixel value of the resulting image is 1, otherwise it is 0.

The invention is characterized in that:

1. the camera is selected as a pull-up detection counting sensor, and a non-contact measurement technology is utilized, so that the environmental interference resistance is high.

2. The image position detection technology is mature, and the counting error is low.

3. The method has smaller computer performance requirement and execution time, and meets the real-time requirement of rapidness, reliability and accurate counting.

4. When athletes with different heights and sexes are tested and counted, the program does not need to be changed.

5. The equipment has simple structure and is easy to install and operate.

Drawings

Fig. 1 is a schematic diagram of a physical structure according to an embodiment of the present invention.

Fig. 2 is an overall method flow diagram of an embodiment of the invention.

Fig. 3 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.

In the figure: 1. horizontal bar, 2 tested person, 3 camera, 4 light source, 5 camera support, 6 host, 7 data transmission line, 8 hand position mark line, 9 photographing standing point mark

Detailed Description

Example (b):

in order to ensure the counting accuracy and fairness of the counting device among different testees at different moments, the whole counting device is arranged according to the physical structure schematic diagram of fig. 1, wherein several rigid arrangement requirements are as follows:

(1) the camera is approximately 1.7m to 1.8m in height and is positioned in the middle of the horizontal bar, and the focal line is perpendicular to the plane formed by the horizontal bars.

(2) The background opposite to the camera is a fixed background with a single color, preferably white, so that the background color is prevented from appearing in a color similar to the skin color of a person.

(3) The testee should perform image video acquisition according to the position and the action specified by the device.

The specific implementation flow is as follows:

1. specific motion image acquisition: an image of a subject showing a specific motion at a designated (marked) position is collected. The specific action standard is that the athlete stands with both arms straight upwards, and the fingers of both hands close together with palms facing the right front, and the camera is observed visually.

2. Image preprocessing: the method comprises the steps of converting a color RGB image into an 8-bit gray image, then conducting filtering smoothing on the gray image to remove noise interference, conducting edge detection on the gray image by using a Canny operator to generate an edge binary image, and establishing an image coordinate system by taking the upper left corner of the image as a coordinate origin, the vertical downward direction as an x axis and the horizontal rightward direction as a y axis.

The method comprises the following specific steps:

(1) the color RGB image is converted to an 8-bit grayscale image with the following conversion function:

wherein: r, G, B are the color components of each pixel of the color image, R represents the red color component, G represents the green color component, B represents the blue color component, and Grey is the gray value of each pixel of the gray scale image.

(2) The filtering smoothing operation is:

each pixel in the image is scanned by a 5 x 5 gaussian template, the pixel values in the neighborhood determined by the template are weighted and averaged, and the value of the pixel at the center point is replaced by the value of the weighted average.

The template adopted by the device is a Gaussian template with the size of 5 x 5:

The convolution function is:

g(i，j)＝∑_k，lf(i+k，j+l)h(k，l) (35)

wherein: g (i, j) is the output pixel value, f (i + k, j + l) is the input pixel value, h (k, l) is the filter weighting coefficient, and k, l take on values of 1, 2, 3, 4, 5.

(3) The edge detection operation is:

the gray level image is convolved by a pair of convolution arrays respectively acting on the x direction and the y direction to obtain a first-order gradient G of the gray level image in the x direction and the y direction_xAnd G_y。

The selected convolution templates were:

the gradient magnitude and direction are then calculated using the following equations.

Wherein G is_x、G_yGradient values, p, in the x-and y-directions, respectively_x、p_yThe convolution templates in the x direction and the y direction respectively, G (x, y) and theta (x, y) are the gradient amplitude and the direction of the point (x, y) respectively, and the range of theta is [ 0-180 ]]And (4) degree.

3. Detecting a straight line segment:

in a polar coordinate system, a straight line can be represented by parameters of a polar diameter r and a polar angle θ, and the expression is as follows:

wherein,the slope of the straight line is shown,is a straight line intercept.

The method is simplified and can be obtained:

r＝xcosθ+ysinθ (41)

r_θ＝x₀cosθ+y₀sinθ (42)

wherein each pair (r, theta) represents a pass-through point (x)₀，y₀) Is measured.

Therefore, a straight line in the detected image is obtained by connecting points in the image, the points having the same (r, θ) value and exceeding the threshold.

thus, the horizontal bar height is:

4. human face hand detection

The process is mainly completed by three steps:

(1) collecting a front face image as a training set, wherein the size of the face image is 24 x 24, then using rectangular features to represent the face, and using an integral graph to calculate the feature value of each pixel point.

The integral is the sum of all pixels in the upper left corner of the coordinate (x, y).

The integral plot function is:

ii(x，y)＝∑_{x′sx，y′sy}i(x′，y′) (44)

where ii (x, y) represents the value of the integrogram and i (x ', y') represents the value of the original image at the point (x ', y'), indicating a color value in the color image and a grayscale value in the grayscale image.

(2) Using boosting algorithm to pick some rectangular features (weak classifiers) which can represent the face most.

The weak classifier function is:

wherein, x is a detection sub-window, f is a rectangular characteristic, f (x) is a Haar-like characteristic value of the sub-window, p can only be positive and negative, refers to a direction, and theta is a threshold value. For each rectangular feature f, training a weak classifier h (x, f, p, theta) essentially solves the optimal solution of theta, so that the classification error of the weak classifier h (x, f, p, theta) for all the training samples is the lowest.

The weak classifier training process can be divided into the following steps:

A) for each rectangular feature f, the feature values of all training samples are calculated by an integral graph.

B) The eigenvalues are sorted.

C) Calculating each characteristic value separatelyProbability sum of all negative examples T^-(ii) a Probability sum of all positive examples T⁺(ii) a Weight sum S of the leading positive example of the feature value⁺(ii) a Weight sum S of leading and trailing examples of the feature value^-。

D) Selecting a feature value of a current elementAnd a characteristic value preceding itThe number between the two is used as a threshold value, the obtained weak classifier is the weak classifier with the lowest classification error, and the weak classifier corresponding to the threshold value classifies the feature value pixel points before and after the current feature value into a face and a non-face. The classification error of this threshold is:

e＝min(S⁺+(T^--S^-)，S^-+(T⁺-S⁺)) (46)

(3) weak classifiers are trained and combined into a strong classifier.

The specific operation is as follows:

A) given a training sample (x)₁，y₁)，……，(x_n，y_n) N number of sample images, y_i1 denotes a positive sample, y_iNegative sample 0. Wherein i is less than or equal to n, and the set iteration number is T.

B) Initializing negative and positive sample weights as

Where m represents the negative samples and l represents the number of positive samples.

C) From iteration 1 to iteration T:

1. normalization weight:

2. screening the optimal weak classifier according to the minimum weighted error rate:

∈_t＝min_f，p，θ∑_iw_i|h(x_i，f，p，θ)-y_i| (48)

3. updating the weight:

wherein: if x_iClassification error, e_i0, otherwise e_i＝1，

D) The T wheel is cascaded into a strong classifier:

wherein

E) And adjusting the position and the proportion of the detection window continuously to find the face.

5. Calculation of rotational moment

In flow (4), the approximate locations of the human face and human hand have been found and are enclosed by the corresponding outer contours.

The gray value of the area in the outline is set to be 0, the gray value of the area outside the outline is set to be 255, and the image is converted into a binary image.

The image is considered as a two-dimensional density distribution f (x, y).

The (p + q) order geometrical moment function of the image f (x, y) is:

where f (x, y) is the brightness value of the image at point (x, y),is the pixel space region defined by the image intensity f (x, y), x^py^qIs a transformation kernel, M^pqReferred to as the p + q order geometrical moment.

Geometric moment m of zero order₀₀Representing the total brightness of the image, m being in a binary image₀₀Is the geometric area of the target region.

in a binary image, a point (X)₀，Y₀) The geometric center, i.e., the centroid, of the image region.

Therefore, the zeroth order moment m of the binary image is obtained₀₀The face area is obtained and divided by the first moment m₁₀，m₀₁The centroid of the human face and the human hand is obtained.

6. The height value x of the horizontal bar is obtained through the steps₀Area of human face s₁Face, face centroid coordinates (x)_{1 hand}，y_{1 face}) And coordinates of the centroid of the human hand (x)_{1 hand}，y_{1 hand}). Height x of horizontal bar₀Area of human face s₁The distance h between the face and the center of the hand is x_{1 hand}-x_{1 hand}Three ofThe value is used as a threshold value for judging whether the pull-up action is qualified or not.

7. Pull-up image sequence acquisition

And collecting and recording the video sequence of the athlete in the counting examination in the whole chin up until the whole counting examination is finished.

8. Image sequence pre-processing

And graying and Gaussian blurring are carried out on the image sequence. The selected graying function and gaussian template are the same as in step 2.

In order to avoid the problems of low program processing speed and long running time caused by processing the whole image sequence, the height x of a horizontal bar in the image sequence is selected₀Value above, image sequence widthToA rectangular area at the location, as the region of interest of the entire image sequence.

9. Moving body detection and foreground extraction

In the invention, the moving human body is the center of detection, and the judgment and counting of the action standard are established on the analysis of the motion of the moving human body and are independent of other objects appearing in the image sequence. Therefore, we only need to extract the moving human body from the whole image sequence for analysis and counting.

In image processing, meaningful moving objects are foreground, and others are meaningless background. Therefore, the idea of modeling is adopted, firstly, the background is modeled, then the background is subtracted by the current frame through a subtraction method, and the rest is the moving human body (foreground).

For each pixel point in the image sequence, the gray value in the whole sequence image can be regarded as a constantly changing random process, and the gray value change rule of the pixel point can be described by Gaussian distribution. Therefore, the change condition of each pixel point of the image sequence can be modeled according to the superposition of a plurality of Gaussian distributions with different weights and parameters, each Gaussian distribution corresponds to a state which can possibly generate the gray value change presented by the pixel point, and the weight and the distribution parameters of each Gaussian distribution are updated along with time.

Observation value dataset { x for gray value x of image sequence pixel point₁，x₂......x_n},x_tFor a sample of a pixel at time t, each sample point x_tThe obeyed probability density function is:

τ_i，t＝_i，t ²I (55)

where k is the total number of distribution patterns, η (x)_t，u_i，t，τ_i，t) Is the ith Gaussian distribution at time t, u_i，tIs the mean value of_i，tFor the purpose of its covariance matrix,_i，tis variance, I is a three-dimensional unit matrix, w_i，tThe weight of the ith gaussian distribution at time t.

The detailed modeling algorithm flow is as follows:

(1) each newly detected pixel value x_tAnd comparing the current k Gaussian models according to a function formula (56), and if the mean deviation between the pixel point and the model is within 2.5 sigma, considering the model as a distribution model matched with a new pixel value.

|x_t-u_i，t|≤2.5σ_i，t-1(56)

(2) If the matched pattern meets the background requirement, the pixel is considered as a meaningless background, otherwise, the pixel is considered as a meaningful foreground.

(3) The weights of each constituent single Gaussian model are updated according to the formula (57), where α is the learning rate, M_k，tFor the matching pattern, take the value M_k，t＝1，M_k，tThe weights for each model are then normalized to 0.

w_k，t＝(1-α)*w_k，t-1+α*M_k，t(57)

(4) The mean μ and standard deviation σ of no matching pattern are unchanged, and the parameters are updated as follows:

ρ＝α*η(x_t|u_k，σ_k) (58)

u_t＝(1-ρ)*μ_t-1+ρ*x_t(59)

(5) and (3) if no pattern is matched in the step (1), replacing the pattern with the minimum weight, namely, the standard deviation of the pattern is an initial large value, the weight is a minimum value, and the average value is the current pixel value.

(6) Each mode according to omega/sigma²And the patterns with large weight and small standard deviation are arranged in descending order and are arranged in front.

(7) The first B modes are selected as background, and B satisfies the following formula.

Wherein the parameter T represents the proportion of the background.

(8) And subtracting the background obtained by modeling by using the current frame, and obtaining the remaining part as the foreground.

In order to ensure better results of the foreground image, the specific area is subjected to morphological transformation so as to eliminate noise interference of the specific area.

10. Human skin tone detection

The color space is used to describe and represent the colors of the image. The color space most commonly used in face detection is the YCrCb color model, because the YCrCb color space is less or even neglected affected by Y (luminance) compared to the RGB color space, and skin color generates a good clustering effect in this color space. Thus, the three-dimensional YCrCb color space is converted into the two-dimensional CrCb color space, and the human body skin color points can form an oval shape in the distribution taking Cr and Cb as coordinates. The image sequence is thus first converted from the RGB color space to the YCrCb color space. The transformation function is:

wherein: in YCrCb color space, Y refers to luminance information, C_bFinger color tone, C_rRefers to the degree of saturation. In the RGB color space, R represents a red component, G represents a green component, and B represents a blue component.

It can be known from a large amount of skin statistical information that if skin information is mapped from an RGB color space to a YCrCb space, the skin pixel points will be approximately distributed in an ellipse in a CrCb two-dimensional space. If the coordinates (Cr, Cb) are within the elliptical distribution, the pixel is identified as a skin pixel, and if the coordinates (Cr, Cb) are outside the elliptical distribution, the pixel is identified as a non-skin pixel.

And projecting the collected skin color sample points to a CbCr plane, performing nonlinear K-L transformation on the CbCr plane after projection, and performing statistical analysis on skin pixel points to form a skin color elliptic model.

The elliptical skin color model is described as:

wherein, c_x＝109.38，c_y＝152.02，θ＝2.53，ec_x＝1.6，ec_y＝2.41，a＝25.39，b＝14.03。

Through the process, the corresponding limit conditions are added, and the human face in the image sequence can be detected.

11. Calculation of rotational moment

This process operates in accordance with process (5). But the object sought is the centroid of the face in the sequence of images.

From the above process, the centroid (x) of the face in the image sequence is obtained₂，y₂)。

12. Pull-up action determination and counting

h₁＝x₂-x₀(64)

as the sequence of images changes, s and h₁Are also constantly changing.

When s > is s1 face, the athlete finishes a qualified pull-up action, and the face chin crosses the horizontal bar; after the pull-up action is completed, when h1> is h, it means that the athlete completes a qualified arm straightening action, at this time, it records a qualified pull-up action. Only the upward pulling action or the arm straightening action is completed completely, so that the pull-up action is qualified once.

13. When the whole image sequence is finished, the whole counting process is stopped, and the number of the chin-up actions is stored and output.

Claims

1. A chin up-counting system based on image processing is characterized in that: the intelligent face recognition system comprises a main controller, an image acquisition module, a data display module, a distinguishing information counting module and an action counting module, wherein the main controller is respectively connected with the data display module, the distinguishing information counting module and the action counting module, and the image acquisition module is connected with the main controller through a preprocessing module, a face and hand recognition module and a coordinate extraction module.

2. The pull-up counting system of claim 1, wherein: and the external data import module is also arranged and is connected with the main controller.

3. The pull-up counting system of claim 1, wherein: and the light source is also arranged and connected with the main controller.

4. Method of counting with a pull-up image processing based counting system according to claims 1-3, characterized in that: the method comprises the following steps:

<mrow> <mi>y</mi> <mo>=</mo> <mo>(</mo> <mrow> <mo>-</mo> <mfrac> <mrow> <mi>cos</mi> <mi>&theta;</mi> </mrow> <mrow> <mi>sin</mi> <mi>&theta;</mi> </mrow> </mfrac> </mrow> <mo>)</mo> <mi>x</mi> <mo>+</mo> <mo>(</mo> <mfrac> <mi>r</mi> <mrow> <mi>sin</mi> <mi>&theta;</mi> </mrow> </mfrac> <mo>)</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

wherein,the slope of the straight line is shown,is a straight line intercept;

thus, the horizontal bar height is:

the (p + q) order geometrical moment function of the image f (x, y) is:

the elliptical skin color model is described as:

<mrow> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mi>x</mi> </mtd> </mtr> <mtr> <mtd> <mi>y</mi> </mtd> </mtr> </mtable> </mfenced> <mo>=</mo> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mi>&theta;</mi> </mrow> </mtd> <mtd> <mrow> <mi>s</mi> <mi>i</mi> <mi>n</mi> <mi>&theta;</mi> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>-</mo> <mi>s</mi> <mi>i</mi> <mi>n</mi> <mi>&theta;</mi> </mrow> </mtd> <mtd> <mrow> <mi>c</mi> <mi>o</mi> <mi>s</mi> <mi>&theta;</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mfenced open = "[" close = "]"> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>C</mi> <mi>b</mi> <mo>&prime;</mo> </msubsup> <mo>-</mo> <msub> <mi>c</mi> <mi>x</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <msubsup> <mi>C</mi> <mi>r</mi> <mo>&prime;</mo> </msubsup> <mo>-</mo> <msub> <mi>c</mi> <mi>y</mi> </msub> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>

the (p + q) order geometrical moment function of the image f (x, y) is:

(12) Pull-up action determination and counting

h₁＝x₂-x₀(10)

as the sequence of images changes, s and h₁Is also constantly changing;

5. The method of claim 4, wherein: the image preprocessing step (2) includes:

2) the filtering smoothing operation is:

The convolution function is:

g(i，j)＝∑_k，lf(i+k，j+l)h(k，l) (12)

3) the edge detection operation is:

The selected convolution templates were:

<mrow> <mi>&theta;</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>arctan</mi> <mo>(</mo> <mfrac> <msub> <mi>G</mi> <mi>y</mi> </msub> <msub> <mi>G</mi> <mi>x</mi> </msub> </mfrac> <mo>)</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>16</mn> <mo>)</mo> </mrow> </mrow>

6. The method of claim 4, wherein: simplifying the linear segment detection expression in the step (3) to obtain:

r＝xcosθ+ysinθ (17)

r_θ＝x₀cosθ+y₀sinθ (18)

thus, the horizontal bar height is:

7. the method of claim 4, wherein: the human face and human hand detection in the step (4) comprises the following steps:

the integral plot function is:

ii(x，y)＝∑_{x′≤x，y′≤y}i(x′，y′) (20)

the weak classifier function is:

A) given a training sample (x)₁，y₁)，……，(x_n，y_n) N number of sample images, y_i1 denotes a positive sample, y_iNegative sample 0. Wherein i is less than or equal to n, and the set iteration number is T;

B) initializing negative and positive sample weights as

C) from iteration 1 to iteration T:

a. normalization weight:

<mrow> <msub> <mi>w</mi> <mrow> <mi>t</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mo>&LeftArrow;</mo> <mfrac> <msub> <mi>w</mi> <mrow> <mi>t</mi> <mo>,</mo> <mi>i</mi> </mrow> </msub> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msub> <mi>w</mi> <mrow> <mi>t</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>22</mn> <mo>)</mo> </mrow> </mrow>

∈_t＝min_f，p，θ∑_iw_i|h(x_i，f，p，θ)-y_i| (23)

c. updating the weight:

wherein: if x_iClassification error, e_i0, otherwise e_i＝1，

D) The T wheel is cascaded into a strong classifier:

wherein

the human face detection process is similar to the human hand detection process, except that the image used for training is a palm image.

8. The method of claim 7, wherein: the weak classifier training process comprises the following steps:

B) sorting the characteristic values;

D) Selecting a feature value of a current elementAnd a characteristic value preceding itThe number between the two is used as a threshold value, the obtained weak classifier is the weak classifier with the lowest classification error, the weak classifier corresponding to the threshold value classifies the characteristic value pixel points before and after the current characteristic value into a face and a non-face, and the threshold value is used for classifying the characteristic value pixel points before and after the current characteristic value into the face and the non-faceThe classification error of the values is:

e＝min(S⁺+(T^--S^-)，S^-+(T⁺-S⁺)) (26)。

9. the method of claim 4, wherein: the modeling algorithm flow in the step (9) is as follows:

1) each newly detected pixel value x_tAnd comparing the current k Gaussian models according to a function formula (27), and if the mean deviation between the pixel point and the model is within 2.5 sigma, considering the model as a distribution model matched with a new pixel value:

|x_t-u_i，t|≤2.5σ_i，t-1(27)

3) the weights of each constituent single Gaussian model are updated according to equation (28), where α is the learning rate, M_k，tFor the matching pattern, take the value M_k，t＝1，M_k，tThe weights for each model were then normalized to 0:

w_k，t＝(1-α)*w_k，t-1+α*M_k，t(28)

ρ＝α*η(x_t|u_k，σ_k) (29)

u_t＝(1-ρ)*μ_t-1+ρ*x_t(30)

<mrow> <msubsup> <mi>&sigma;</mi> <mi>t</mi> <mn>2</mn> </msubsup> <mo>=</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&rho;</mi> <mo>)</mo> </mrow> <mo>*</mo> <msubsup> <mi>&sigma;</mi> <mrow> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> <mn>2</mn> </msubsup> <mo>+</mo> <mi>&rho;</mi> <mo>*</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>t</mi> </msub> <mo>-</mo> <msub> <mi>u</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>t</mi> </msub> <mo>-</mo> <msub> <mi>u</mi> <mi>t</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>31</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <mi>B</mi> <mo>=</mo> <mi>arg</mi> <mrow> <mo>(</mo> <mi>min</mi> <mo>(</mo> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>b</mi> </msubsup> <msub> <mi>w</mi> <mi>k</mi> </msub> <mo>></mo> <mi>T</mi> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>32</mn> <mo>)</mo> </mrow> </mrow>

wherein the parameter T represents the proportion of the background.