CN112270220A

CN112270220A - Sewing gesture recognition method based on deep learning

Info

Publication number: CN112270220A
Application number: CN202011096967.6A
Authority: CN
Inventors: 王晓华; 杨思捷; 王文杰; 张蕾; 苏泽斌
Original assignee: Xian Polytechnic University
Current assignee: Xian Polytechnic University
Priority date: 2020-10-14
Filing date: 2020-10-14
Publication date: 2021-01-26
Anticipated expiration: 2040-10-14
Also published as: CN112270220B

Abstract

The invention discloses a sewing gesture recognition method based on deep learning, which is implemented according to the following steps: step 1, data set collection and pretreatment; step 2, inputting the pictures in the preprocessed data set into a GRU neural network in a RGB picture frame mode for data training; step 3, taking the output result of the GRU network as the input of the DNN neural network for further feature extraction to form the GRU-DNN network for identifying the sewing gesture; and 4, sending the features extracted in the step 3 into an SVM classifier for action classification. The invention solves the problems that the DNN in the prior art can not process the condition of change in time sequence in the behavior detection process, and the RNN network structure has no gradient in the detection process, so that the identification effect is inaccurate.

Description

Sewing gesture recognition method based on deep learning

Technical Field

The invention belongs to the technical field of artificial intelligence, and relates to a sewing gesture recognition method based on deep learning.

Background

With the increase in labor costs and the increase in computer technology, "man + machine + environment" systems have also become an irreversible trend. The deep learning related technology obtains remarkable results in the field of behavior detection, overcomes the defect that the traditional artificial feature method can only identify in a simple scene, optimizes the classification task more effectively, and further extracts feature information in data more efficiently.

The existing sewing gesture recognition mainly adopts a recurrent neural network for recognition, and the recurrent neural network mainly represents the models as follows: RNN (recurrent neural network) model, LSTM model, GRU (gated cyclic unit) model. The RNN model can connect the current process with the past state and has a certain memory function. The LSTM model and the GRU model are structural variants of the RNN model, and compared with the RNN model, the LSTM neural network enables the recurrent neural network to memorize past information and selectively forget unimportant information. Compared with an LSTM network structure, the GRU neural network can utilize all information of pictures in the identification process, solves the problem that gradient disappears under long sequence information on the basis of the LSTM, and is simpler in structure and better in identification effect compared with an LSTM structure model. The DNN neural network (deep neural network) is also widely applied to the field of behavior recognition as a feedforward artificial neural network, can solve the problem of deep level, and can extract features better in depth, but DNN cannot process the change situation in time sequence when performing behavior detection, and the problem that the gradient disappears in the detection process occurs in the basic RNN structure, and the effect of acquiring the deep level information of the image when the RNN network performs detection is relatively deficient compared with DNN.

Disclosure of Invention

The invention aims to provide a sewing gesture recognition method based on deep learning, and solves the problems that in the prior art, the DNN cannot timely process the condition that the time sequence changes when behavior detection is carried out, and the basic RNN network structure has gradient disappearance in the detection process, so that the recognition effect is inaccurate.

The invention adopts the technical scheme that a sewing gesture recognition method based on deep learning is implemented according to the following steps:

step 1, data set collection and pretreatment;

step 2, inputting the pictures in the preprocessed data set into a GRU neural network in a RGB picture frame mode for data training;

step 3, taking the output result of the GRU network as the input of the DNN neural network for further feature extraction to form the GRU-DNN network for identifying the sewing gesture;

and 4, sending the features extracted in the step 3 into an SVM classifier for action classification.

The present invention is also characterized in that,

the step 1 specifically comprises the following steps:

step 1.1, collecting sewing gesture data pictures, and carrying out color correction on the collected sewing gesture data pictures through a dynamic threshold method so as to eliminate the influence of illumination on color rendering;

step 1.2, adjusting the brightness of the sewing gesture data picture processed in the step 1.1 to be 0.6 to 1.5 times of the original brightness;

and step 1.3, randomly rotating the sewing gesture data picture with the brightness adjusted in the step 1.2 for 90 degrees or 180 degrees or 270 degrees without rotating to obtain a preprocessed sewing gesture data picture serving as a training set.

The step 1.1 specifically comprises the following steps:

step 1.1.1, dividing each sewing gesture data picture in a training set into a plurality of areas;

step 1.1.2, calculating C of pixel points in each region_bAnd C_rAnd all pixels C in each region_bAnd C_rAverage value M of_bAnd M_rIn which C is_bIndicating the color saturation of a pixel, C_rRepresenting the tone of the pixel point;

C_b＝-0.169×R-0.331×G+0.500×B (1)

C_r＝0.500×R-0.419×G-0.081×G (2)

r, G, B is the component value C of red, green and blue of each pixel point in the collected sewing gesture data image_b(n) is the color saturation of the nth pixel point in the corresponding region, C_r(N) is the tone of the nth pixel point in the corresponding region, and N is the number of the pixel points in the corresponding region;

step 1.1.3, C of each region is calculated separately_bAnd C_rCumulative value D of corresponding absolute difference of components_bAnd D_rThe calculation formula is as follows:

wherein N is the number of pixel points per region, C_b(n) is the color saturation of the nth pixel point in the corresponding region, C_r(n) is the tone of the nth pixel point in the corresponding region;

step 1.1.4, each pixel point D is interpreted_b/D_rA value of (D) if_b/D_rIs smaller than the corresponding region M_b/M_rIf so, ignoring the pixel point in the corresponding area;

step 1.1.5, for each sewing gesture data picture, removing the sewing gesture data picture through judgment of step one 1.1.4The ignored pixel points are re-solved for M corresponding to each region according to the formulas (3) to (6)_b、M_r、D_b、D_rThen M corresponding to each region_b、M_r、D_b、D_rRespectively summing the M and taking the average value as the M of the corresponding sewing gesture data picture_B、M_R、D_B、D_RValue, wherein M_BIs the average value of color saturation of the picture corresponding to the whole sewing gesture data, M_RFor the average value of the tone of the picture corresponding to the whole sewing gesture data, D_BFor the cumulative value of the absolute difference of the color saturation of the picture corresponding to the entire sewing gesture data, D_RThe integral value of the absolute difference of the tone of the corresponding whole sewing gesture data picture is obtained;

step 1.1.6, if the pixel point in each region simultaneously satisfies the formulas (7) and (8), the pixel point is preliminarily determined as a white reference point:

|C_b(n)-(M_b+D_b x sign(M_b))|＜1.5 x D_B (7)

|C_r(n)-(1.5 x M_r x sign(M_r))|＜1.5 x D_R (8)

in the formula, M_b、M_rAverage value of hue and saturation components of sewing gesture data picture, D_b、D_rSign is a signal processing function D for the calculated cumulative value of the absolute difference of hue and saturation components for each small region_BFor the cumulative value of the absolute difference of the color saturation of the picture corresponding to the entire sewing gesture data, D_RThe integral value of the absolute difference of the tone of the corresponding whole sewing gesture data picture is obtained;

step 1.1.7, sorting the preliminarily determined white reference points in each area according to the brightness of the white reference points, and taking the first 10 percent of the white reference points as the finally determined white reference points;

step 1.1.8, calculate the average value R of all white reference point brightness in each area_aver、G_aver、B_aver；

Wherein m is the number of the white reference points finally determined in the corresponding region, R₁、R₂……R_mFor the color component of the red channel of each white reference point, G₁、G₂…G_mThe color component of the green channel being the determined white reference point, B₁、B₂…B_mA color component of a blue channel that is the determined white reference point;

step 1.1.9, calculating the gain of each channel, wherein the calculation formula is as follows:

R_gain＝Y_max/R_aver (12)

G_gain＝Y_max/G_aver (13)

B_gain＝Y_max/B_aver (14)

Y＝0.299 x R+0.587 x G+0.114 x B (15)

in the formula: y is_maxIs the maximum value of the Y component in the color space in the whole image, R_aver、G_aver、B_averThe average value of the brightness of the white reference point is R, G, B, and the component values of red, green and blue of each pixel point in the collected sewing gesture data image are R, G, B;

step 1.1.10, calculate the final color for each channel:

R′＝R x R_gain (16)

G′＝G x G_gain (17)

B′＝B x B_gain (18)

in the formula, R, G, B represents the red, green and blue component values of each pixel point in the collected sewing gesture data image, and R ', G ' and B ' represent the red, green and blue components of the pixel points in the corrected sewing gesture data image.

The step 2 specifically comprises the following steps:

step 2.1, the color components of the red channel, the green channel and the blue channel of the corrected sewing gesture data picture obtained in the step 1.1.10 are stored in a computer in a matrix form, and then the three matrices are converted into a column vector X which is used as a characteristic vector to be sent into a GRU network structure;

step 2.2, calculating the value of an update gate in the GRU network structure, specifically:

determining how much information is repeated from the previous time to the next time, and calculating the formula as follows:

Z_t＝σ×(W×X_t+U×h_t-1) (19)

in the formula, X_tFor the t-th component of the input feature vector X, h_t-1For the stored information of the t-1 step, sigma is a logic sigmoid function, W and U are weight matrixes, the updating gate adds the two parts of information and puts the two parts of information into a sigmod activation function, the activation result is compressed to be between 0 and 1, and the updating gate controls the degree of bringing the state of the previous moment into the current state, namely the information of the previous moment is applied to the current moment;

step 2.3, the calculation of the reset gate specifically comprises the following steps: determining how much information in the past needs to be forgotten, and calculating the formula as follows:

r_(t)＝σ×(W×X_t+U×h_t-1) (20)

in the formula: w and U are weight matrices, X_tFor the t-th component of the input feature vector X, h_t-1The information of the t-1 step is stored;

step 2.4, calculating the current memory content, and storing the current memory content in a reset gate, wherein the calculation formula is as follows:

h′_t＝tanh(Wx_t+r_t⊙Uh_t-1) (21)

in the formula, r_tTo reset the output value of the gate, X_tIs the first of the input sequence xt components, h_t-1The information of the t-1 step is stored;

step 2.5, the final output content of the gating circulation unit is obtained by adding the information preserved from the previous moment to the final memory and the information preserved from the current memory to the final moment, and the calculation formula is as follows:

h_t＝Z_t⊙h_t-1+(1-Z_t)⊙h′_t (22)

in the formula, Z_tTo update the door calculation, h_t-1For the saved information of step t-1, Z_t⊙h_t-1Indicates the information, h ', reserved from the previous step to the final memory'_tFor the current memory content, (1-Z)_t)⊙h′_tAnd information indicating that the current memory content is reserved to be finally memorized is obtained, and data training is completed.

The step 3 specifically comprises the following steps:

step 3.1, taking the final memory information stored in the GRU network structure as the input of the DNN neural network, and then carrying out initialization parameters, namely the initialization of the weight w and the bias b;

step 3.2, calculating an activation function, wherein the calculation formula is as follows:

wherein z is an independent variable, and z is 0, ± 1, ± 2 … …;

and 3.3, carrying out forward propagation to obtain an output result, wherein an output formula is as follows:

a^l＝σ×(W^l×a^l-1+b^l) (24)

wherein l represents the number of layers, a^l-1Is the output of layer l-1 in the neural network, a^lAs output of layer I in the neural network, W^lIs the weight of the l-th layer, b^lIs the bias of the l-th layer;

step 3.4, calculating a loss function, wherein the calculation formula is as follows:

in the formula: a is^lThe output of the l layer in the neural network is shown, x is a sequence output after GRU neural network training, and y is real training sample output;

and 3.5, performing reverse propagation, wherein a calculation formula updated for each layer of parameters W and b is as follows:

Z^l＝W^l×a^l-1+b^l (26)

wherein Z is^lFor the inactive output of the l-th layer, the loss function is coupled to Z^lCalculating a partial derivative to obtain:

couple the loss function to W^lCalculating a partial derivative to obtain:

pair of loss functions b^lCalculating a partial derivative to obtain:

wherein, a^l-1Refers to the output of the l-1 layer neural network, b^lIs the bias of the l-th layer;

jointly solving the steps (24) to (29) to obtain W^l、b^lRealize to W^l、b^lIs continuously updated.

And 3.6, calculating backwards layer by layer from the input layer until the calculation is carried out to the output layer, and obtaining a final feature extraction result.

The invention has the beneficial effects that:

the invention discloses a sewing gesture recognition method based on deep learning, which combines a GRU network structure and a DNN network structure for behavior detection according to strong relevance of the GRU network structure in time and space during detection and effectiveness of the DNN network structure in extracting deep features. And carrying out color correction on the input data by using a dynamic threshold value method so as to eliminate the influence of illumination on color rendering. The pictures are rotated by 90 degrees, 180 degrees and 270 degrees to enhance the robustness of each angle in the imaging process. The preprocessed data are trained by utilizing the GRU network structure, the output result is sent into the DNN network structure as the input data of the DNN network structure for further feature extraction, compared with a single DNN network structure, the GRU-DNN network structure fully utilizes information on a time sequence and can obtain information of a deeper image when behavior detection is carried out, and the recognition effect is more accurate compared with the single network structure.

Drawings

FIG. 1 is an overall flow chart of a sewing gesture recognition method based on deep learning according to the present invention;

FIG. 2 is a flow chart of color correction during data preprocessing in the sewing gesture recognition method based on deep learning according to the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention relates to a sewing gesture recognition method based on deep learning, the flow of which is shown in figure 1 and is implemented according to the following steps:

step 1, data set collection and pretreatment; the method specifically comprises the following steps:

step 1.1, collecting sewing gesture data pictures, and carrying out color correction on the collected sewing gesture data pictures through a dynamic threshold method so as to eliminate the influence of illumination on color rendering; the color correction is mainly because a certain deviation exists between the acquired image and the real image, and the influence of illumination on color rendering is eliminated by adopting a dynamic threshold algorithm, as shown in fig. 2, specifically:

step 1.1.2, calculating C of pixel points in each region_bAnd C_rTo do so byAnd all pixel points C in each region_bAnd C_rAverage value M of_bAnd M_rIn which C is_bIndicating the color saturation of a pixel, C_rRepresenting the tone of the pixel point;

C_b＝-0.169×R-0.331×G+0.500×B (1)

C_r＝0.500×R-0.419×G-0.081×G (2)

step 1.1.4, each pixel point D is interpreted_b/D_rA value of (D) if_b/D_rIs smaller than the corresponding region M_b/M_rA value of (1), thenIgnoring the pixel point of the corresponding region;

step 1.1.5, for each sewing gesture data picture, judging in step 1.1.4, removing ignored pixel points, and re-obtaining M corresponding to each region according to formulas (3) - (6)_b、M_r、D_b、D_rThen M corresponding to each region_b、M_r、D_b、D_rRespectively summing the M and taking the average value as the M of the corresponding sewing gesture data picture_B、M_R、D_B、D_RValue, wherein M_BIs the average value of color saturation of the picture corresponding to the whole sewing gesture data, M_RFor the average value of the tone of the picture corresponding to the whole sewing gesture data, D_BFor the cumulative value of the absolute difference of the color saturation of the picture corresponding to the entire sewing gesture data, D_RThe integral value of the absolute difference of the tone of the corresponding whole sewing gesture data picture is obtained;

|C_b(n)-(M_b+D_b x sign(M_b))|＜1.5 x D_B (7)

|C_r(n)-(1.5 x M_r x sign(M_r))|＜1.5 x D_R (8)

R_gain＝Y_max/R_aver (12)

G_gain＝Y_max/G_aver (13)

B_gain＝Y_max/B_aver (14)

Y＝0.299 x R+0.587 x G+0.114 x B (15)

step 1.1.10, calculate the final color for each channel:

R′＝R x R_gain (16)

G′＝G x G_gain (17)

B′＝B x B_gain (18)

in the formula, R, G, B is the red, green and blue component values of each pixel point in the collected sewing gesture data image, and R ', G ' and B ' are the red, green and blue components of the pixel points in the corrected sewing gesture data image;

step 1.3, the sewing gesture data picture with the brightness adjusted in the step 1.2 is randomly rotated for 90 degrees or 180 degrees or 270 degrees without rotating to enhance the robustness of different imaging angles, and the sewing gesture data picture after preprocessing is obtained and used as a training set;

step 2, inputting the pictures in the preprocessed data set into a GRU neural network in a RGB picture frame mode for data training; the method specifically comprises the following steps:

step 2.1, the color components of the red channel, the green channel and the blue channel of the corrected sewing gesture data picture obtained in the step 1.1.10 are stored in a computer in a matrix form, and then the three matrices are converted into a column vector X which is used as a characteristic vector to be sent into a GRU network structure; for example:

it is assumed that R ', G ', and B ' obtained in step 1.1.10 are stored in the computer as follows:

the three matrices represent the preprocessed image in the computer, the values in the matrices correspond to the red, green and blue intensity values in the image, and for the convenience of feature extraction of the neural network, the 3 matrices are converted into 1 vector X, and the final result of X can be obtained in the above example:

from the above, it can be seen that the R ', G ', B ' matrices are 3 × 3 in size, respectively, and then the total dimension of the vector X is 3 × 3 × 3, resulting in 27. In the field of artificial intelligence, each data input into the neural network is called a feature, so the example mentioned above has 27 features, the 27-dimensional vector is also called a feature vector, and the neural network receives the feature vector as an input to perform prediction;

the converted feature vectors are sent into a GRU network structure, and values of an update gate and a reset gate in the GRU network structure are calculated respectively;

Z_t＝σ×(W×X_t+U×h_t-1) (19)

in the formula, X_tFor the t-th component of the input feature vector X, h_t-1For the stored information of the t-1 step, sigma is a logic sigmoid function, W and U are weight matrixes, the updating gate adds the two parts of information and puts the information into a sigmod activation function, the activation result is compressed to be between 0 and 1, and the updating gate controls the degree of bringing the state of the previous moment into the current state, namely how much information of the previous moment is applied to the current moment, Z_tThe larger the information is brought in;

r_(t)＝σ×(W×X_t+U×h_t-1) (20)

in the formula: w and U are weight matrices, X_tFor the t-th component of the input feature vector X,h_t-1the information of the t-1 step is stored;

h′_t＝tanh(Wx_t+r_t⊙Uh_t-1) (21)

in the formula, r_tTo reset the output value of the gate, X_tFor the t-th component of the input sequence x, h_t-1The information of the t-1 step is stored;

h_t＝Z_t⊙h_t-1+(1-Z_t)⊙h′_t (22)

in the formula, Z_tTo update the door calculation, h_t-1For the saved information of step t-1, Z_t⊙h_t-1Indicates the information, h ', reserved from the previous step to the final memory'_tFor the current memory content, (1-Z)_t)⊙h′_tInformation indicating that the current memory content is reserved to be finally memorized is obtained, and data training is completed;

step 3, taking the output result of the GRU network as the input of the DNN neural network for further feature extraction to form the GRU-DNN network for identifying the sewing gesture; the method specifically comprises the following steps:

wherein z is an independent variable, and z is 0, ± 1, ± 2 … …;

step 3.3, forward propagation, namely, a forward propagation algorithm, namely, a series of linear operations and activation operations are performed by using a plurality of weighting coefficient matrixes W, bias vectors b and input value vectors X, from an input layer, backward calculation is performed layer by layer until an output layer is obtained, and an output formula of the output layer is as follows:

a^l＝σ×(W^l×a^l-1+b^l) (24)

and 3.5, performing back propagation to continuously update the parameters W and b, finding a proper linear coefficient matrix W and a proper bias vector b through a back propagation algorithm, and enabling the output calculated by all input training samples to be equal to or very close to the sample output as much as possible, wherein the calculation formula for updating the parameters W and b of each layer is as follows:

Z^l＝W^l×a^l-1+b^l (26)

couple the loss function to W^lCalculating a partial derivative to obtain:

pair of loss functions b^lCalculating a partial derivative to obtain:

Step 3.6, starting from the input layer, calculating backwards layer by layer until the calculation is carried out to the output layer, and enabling the output result of the training sample calculation to be as close to the real training sample output result as possible through the calculation, wherein the output result of the training sample calculation at the moment is used as the finally extracted feature output;

Claims

1. A sewing gesture recognition method based on deep learning is characterized by comprising the following steps:

step 1, data set collection and pretreatment;

2. The sewing gesture recognition method based on deep learning according to claim 1, wherein the step 1 specifically comprises:

3. The sewing gesture recognition method based on deep learning according to claim 2, wherein the step 1.1 is specifically as follows:

C_b＝-0.169×R-0.331×G+0.500×B (1)

C_r＝0.500×R-0.419×G-0.081×G (2)

step 1.1.5, for each sewing gesture data picture, judging in step 1.1.4, removing ignored pixel points, and pressingAccording to formulas (3) - (6), M corresponding to each area is solved again_b、M_r、D_b、D_rThen M corresponding to each region_b、M_r、D_b、D_rRespectively summing the M and taking the average value as the M of the corresponding sewing gesture data picture_B、M_R、D_B、D_RValue, wherein M_BIs the average value of color saturation of the picture corresponding to the whole sewing gesture data, M_RFor the average value of the tone of the picture corresponding to the whole sewing gesture data, D_BFor the cumulative value of the absolute difference of the color saturation of the picture corresponding to the entire sewing gesture data, D_RThe integral value of the absolute difference of the tone of the corresponding whole sewing gesture data picture is obtained;

|C_b(n)-(M_b+D_b×sign(M_b))|＜1.5×D_B (7)

|C_r(n)-(1.5×M_r×sign(M_r))|＜1.5×D_R (8)

Wherein m is the number of the white reference points finally determined in the corresponding region, R₁、R₂......R_mFor the color component of the red channel of each white reference point, G₁、G₂...G_mThe color component of the green channel being the determined white reference point, B₁、B₂...B_mA color component of a blue channel that is the determined white reference point;

R_gain＝Y_max/R_aver (12)

G_gain＝Y_max/G_aver (13)

B_gain＝Y_max/B_aver (14)

Y＝0.299×R+0.587×G+0.114×B (15)

step 1.1.10, calculate the final color for each channel:

R′＝R×R_gain (16)

G′＝G×G_gain (17)

B′＝B×B_gain (18)

4. The sewing gesture recognition method based on deep learning according to claim 3, wherein the step 2 specifically comprises:

Z_t＝σ×(W×X_t+U×h_t-1) (19)

r_(t)＝σ×(W×X_t+U×h_t-1) (20)

h′_t＝tanh(Wx_t+r_t⊙Uh_t-1) (21)

h_t＝Z_t⊙h_t-1+(1-Z_t)⊙h′_t (22)

5. The sewing gesture recognition method based on deep learning according to claim 4, wherein the step 3 specifically comprises:

wherein z is an independent variable, and z is 0, ± 1, ± 2 … …;

a^l＝σ×(W^l×a^l-1+b^l) (24)

wherein l represents the number of layers, a^l-1Is the output of layer l-1 in the neural network, a^lFor the input of the l layer in the neural networkGo out, W^lIs the weight of the l-th layer, b^lIs the bias of the l-th layer;

Z^l＝W^l×a^l-1+b^l (26)

couple the loss function to W^lCalculating a partial derivative to obtain:

pair of loss functions b^lCalculating a partial derivative to obtain: