CN116645732B

CN116645732B - Site dangerous activity early warning method and system based on computer vision

Info

Publication number: CN116645732B
Application number: CN202310885602.9A
Authority: CN
Inventors: 赵树升; 林燕芬; 赵晓华
Original assignee: Xiamen Institute of Technology
Current assignee: Xiamen Institute of Technology
Priority date: 2023-07-19
Filing date: 2023-07-19
Publication date: 2023-10-10
Anticipated expiration: 2043-07-19
Also published as: CN116645732A

Abstract

The invention discloses a site dangerous activity early warning method and system based on computer vision. The invention belongs to the technical field of intelligent monitoring and early warning, and particularly relates to a site dangerous activity early warning method and system based on computer vision.

Description

Site dangerous activity early warning method and system based on computer vision

Technical Field

The invention relates to the technical field of intelligent monitoring and early warning, in particular to a site dangerous activity early warning method and system based on computer vision.

Background

Unsafe factors and potential dangerous activities generally exist in a field, the field order and the personal safety of the masses are seriously influenced, the traditional manual inspection is in missing report due to limited inspection coverage, misreport is caused by subjective judgment, human resources and cost are consumed, and the problems of fatigue and attention decline of long-term inspection so as to influence the accuracy and efficiency of early warning are solved; the general feature extraction method has the contradictory problems that the accuracy and the expression capacity are weak due to excessive processing parameters, and the high error rate of the attitude estimation result is caused by fewer parameters; the general labeling method has the contradictory problems that too many factors cause the complexity of the algorithm to be too high and the operation efficiency to be low, and too few factors cause the accuracy of labeling the human body gestures to be low so as to influence the subsequent evaluation and early warning.

Disclosure of Invention

Aiming at the situation, in order to overcome the defects of the prior art, the invention provides a site dangerous activity early warning method and a site dangerous activity early warning system based on computer vision, aiming at the problems that the traditional manual inspection is missed due to limited inspection coverage, misinformation is caused by subjective judgment, manpower resources and cost are consumed, and the fatigue and the attention of long-term inspection are reduced so as to influence the early warning precision and efficiency, the scheme adopts a machine vision algorithm to monitor in real time, automatically process data, identify gesture types with high precision, have expandability and discover the occurrence of potential dangerous activities in time; aiming at the contradictory problems that the accuracy and the expression capability are weak due to excessive processing parameters and the high error rate of the attitude estimation result can be caused by fewer parameters in a general feature extraction method, the scheme selects the most representative feature by calculating and selecting different feature combinations, improves the accuracy and the expression capability of the feature, compensates the human body attitude, and ensures that the structure is more accurate and reliable; aiming at the contradictory problems that the general labeling method is too high in algorithm complexity and low in operation efficiency due to too many factors, and the subsequent evaluation and early warning are influenced due to low accuracy of labeling the human body gestures due to too few factors, the labeling method is suitable for different characteristics to consider various relation factors based on an objective function, and has stronger adaptability and robustness.

The technical scheme adopted by the invention is as follows: the invention provides a site dangerous activity early warning method based on computer vision, which comprises the following steps:

step S1: the method comprises the steps of data acquisition, acquiring a human body posture image public data set, wherein the human body posture image public data set comprises a human body posture image and a corresponding label, and the label is of a human body posture type;

step S2: extracting features, namely extracting features of the human body posture image;

step S3: posture compensation, namely performing posture compensation on the human body posture;

step S4: labeling, namely iteratively optimizing the low-dimensional matrix and the relation matrix based on the objective function, and finally selecting the characteristics to label the gesture type of the human gesture;

step S5: step S2 to step S4 are processes of building a human body posture recognition model, inputting images in a test sample library into the model built in the step, and evaluating the model performance based on the operation data and the output labeling result.

Further, in step S2, feature extraction is performed on the human body posture image, specifically including the following steps;

step S21: the method comprises the steps of feature evaluation, constructing an acquired human body posture image public data set into a sample library, randomly selecting 70% of the sample library as a training sample library, the rest 30% of the sample library as a test sample library, obtaining an annotation matrix of an image by using basic annotation, setting the annotation matrix as Q, determining a basic structure of the image set according to a Laplacian matrix of the training sample library, mapping the basic structure to a feature vector of the matrix Q, combining a calculation result with an elastic network algorithm to obtain a feature selection calculation equation, and selecting a feature combination mode with minimum feature evaluation by using different feature combinations, wherein the adopted formula is as follows:

wherein alpha is a motion feature calculation parameter, a shared feature subspace is constructed according to the alpha, and T is a multi-mark feature mapping matrix of the image; l is the preprocessed image labeling matrix, S is a training sample library, and i is the ith feature;

step S22: selecting a feature combination, presetting the maximum feature quantity, and the steps are as follows:

step S221: setting the selected feature combinations as empty sets;

step S222: for each feature i, calculating a feature evaluation value reduction amount after adding the feature i to the selected feature combination, namely calculating a difference between the new feature evaluation value after adding the feature i and the original feature evaluation value;

step S223: selecting the feature with the largest reduction as a new feature and adding the new feature to the selected feature combination;

step S224: repeating the step S222 and the step S223 until the preset maximum feature quantity is reached or the objective function value cannot be continuously reduced;

step S225: and selecting the feature combination with the largest reduced objective function value as a final feature selection result.

Further, in step S3, posture compensation is performed on the human body posture, specifically including the following steps:

step S31: constructing an activation function, combining the characteristic value of the image with the excitation function, reclassifying the image according to the image characteristic value obtained in advance, and setting a CNN activation function sigma according to the image classification result, wherein the following formula is used:

where o is the excitation function of the neural network, i is the index of the neurons, z is the real sequence in the neural network, and is the absolute summation;

step S32: two-dimensional estimation, namely inputting an image processing result and a characteristic value into a neural network, and obtaining two-dimensional estimation Eg through convolution calculation _i，j The formula used is as follows:

Eg _i,j ＝f||(X _i,j )||*X _i,j ^* ；

wherein X is _i，j Two-dimensional motion gesture feature, X _i，j ^* Is the characteristic value X _i，j F is the calculated norm;

step S33: constructing a regulator, adding the calculation accuracy of the regulator control estimation, wherein the following formula is adopted:

wherein y (k) is the output result of the regulator, k is a discrete number of time steps, θ is the weight coefficient of the multi-tag feature, C '(t) is the time error in the estimation calculation process, y (k) is the output result of the regulator, and R' is the integral adjustment process of the estimation calculation;

step S34: and (3) filtering, namely constructing a filter transfer function Lc, adding a low-pass filter and a high-pass filter, and finishing a filtering process by using the following formula:

wherein A 'is low-frequency image data in the image, and E' is high-frequency data in the image;

step S35: and (3) posture compensation, wherein after the posture compensation, the images are returned to respective sample libraries, and the following formula is adopted:

∑Δe′＝∑Wl+Δω；

where ΣΔe' is the attitude compensation, wl is the estimation result error vector, and Δω is the spatial error vector obtained from the history data.

Further, in step S4, the low-dimensional matrix and the relation matrix are iteratively optimized based on the objective function, and finally, the feature is selected to label the gesture type for the human gesture, which specifically includes the following steps:

step S41: the fitting regularization term is calculated using the following formula:

wherein X is ^(s) Is a training sample library, X ^(t) A library of test samples is shown and,is a fitting regularization term of training samples and test samples, (x) _i ，x _j ) Is the pairing between training samples, (x _i ^(c) ,x _j ^(c) ) Is the pairing between training samples with the same label C, phi is the adaptation degree, m and n are the numbers of the training sample library and the test sample library respectively, thetac is the weight coefficient of the label characteristic of C, and C is the label number;

step S42: the adaptation matrix M is calculated using the following formula:

M＝M ₀ +θM _c ；

where θ is the weight coefficient of the multi-tag feature, M ₀ And M _c An initial matrix and a multi-label feature mapping matrix, respectively, (M) ₀ ) _I，j Sum (M) _c ) _i，j Representing matrix elements, m ^(c) And n ^(c) The number of training sample libraries containing label c and test sample libraries containing label c,and->Respectively representing a training sample library containing a label c and a test sample library containing the label c;

step S43: the hypergraph regularization term is calculated using the following formula:

where φ is manifold learning function, μ _i Sum mu _j Is a different hypergraph regularization term parameter, W _Eij Is a hypergraph parameter;

step S44: the objective function is calculated using the following formula:

where X is the sample library, U is a low-dimensional matrix of features sharing a feature subspace and U ε R ^n’*n W is a relationship matrix between X and U and W εR ^n*n’ N' is the dimension of the shared feature subspace, tr is the trace of the matrix,is a complexity control function, F is complexity, gamma ₁ And gamma ₂ Is a different matrix factorization coefficient, λ is a non-negative constant;

step S45: iterative optimization, comprising the following steps:

step S451: initializing initial values of U and W;

step S452: optimizing U based on an objective function, minimizing the following formula:

wherein T is a matrix transpose;

step S453: optimizing W based on an objective function minimizes the following equation:

step S454: iterative optimization process until reaching convergence condition;

step S46: labeling, namely selecting characteristics related to the human body gesture according to the optimized U and W, and labeling gesture types for the human body gesture according to the selected characteristics.

Further, in step S5, step S2 to step S4 are processes of building a human body posture recognition model, and the images in the test sample library are input into the model built in the above step, and the model performance is evaluated based on the operation data and the output labeling result, specifically including the following steps:

step S51: the correct recognition rate of the single image is calculated by the following formula:

wherein y (k) is the correct recognition rate of a single image, k is the index of the image, A is the number of images of the specified action in the experimental set, and k (i) represents the number of images of the specified action;

step S52: the super recognition rate is calculated by the following formula:

in the method, in the process of the invention,the number of the images of the unspecified actions in the experimental data set;

step S53: computing image annotation measurementsThe formula used is as follows:

wherein g is an information point set which can be marked in the image without analysis, and h is a standard information point set;

step S54: the similarity a is calculated using the following formula:

where ij denotes the index of different frames of the same human action in two cycles, d _ij Is the Euclidean distance of the same human action after moving in two cycles;

step S55: the estimation accuracy D is calculated using the following formula:

wherein, f (b) is the joint Euclidean distance of the target image, f (c 1) is the Euclidean distance estimation of the joints of the target image, and n1 represents the number of human joints in the image;

step S56: the estimation efficiency T is calculated using the following formula:

wherein t is _i Representing the time consumption of the ith estimation item, N representing the total number of estimation items;

step S57: performance judgment, namely evaluating the performance of the model based on six indexes, if the performance does not reach an expected target, reselecting a parameter establishment model established by the human body posture recognition model, and if the performance reaches the expected target, completing the establishment of the human body posture recognition model;

step S58: and (3) operating, presetting a human body posture type included in dangerous actions, collecting real-time human body moving images, inputting the real-time human body moving images into a human body posture recognition model, and if the human body posture type marked by the model on the images belongs to the dangerous actions, giving out early warning.

The invention provides a site dangerous activity early warning system based on computer vision, which comprises a data acquisition module, a feature extraction module, an attitude compensation module, a labeling module and an evaluation module;

the data acquisition module acquires a human body posture image public data set, wherein the human body posture image public data set comprises a human body posture image and a corresponding label, the label is of a human body posture type, and the acquired data is sent to the feature extraction module;

the feature extraction module receives the data sent by the data acquisition module, selects the feature combination with the minimum feature evaluation to determine the finally selected feature, and sends the data to the posture compensation module;

the gesture compensation module receives the data sent by the feature extraction module, performs gesture compensation on the human gesture by using a convolutional neural network, and sends the data to the labeling module;

the labeling module receives the data sent by the gesture compensation module, determines an objective function by calculating a fitting regular term, an adaptation matrix and a hypergraph regular term, optimizes the matrix based on the objective function to determine the labeled gesture type, and sends the data to the evaluation module;

the evaluation module receives the data sent by the labeling module, determines a finally established human body posture recognition model based on the operation data of the model to the test sample library and the performance evaluation of the model by the labeling result, recognizes the real-time human body moving image by using the model, and sends out early warning if the model is the human body posture type of dangerous action.

By adopting the scheme, the beneficial effects obtained by the invention are as follows:

(1) Aiming at the problems that the traditional manual inspection is not reported due to limited inspection coverage, misinformation is caused by subjective judgment, human resources and cost are consumed, and the fatigue and the attention decline of the long-term inspection affect the early warning precision and efficiency, the scheme adopts the machine vision algorithm for real-time monitoring, automatically processing data, identifying the gesture type with high precision, having expandability and timely finding the occurrence of potential dangerous activities.

(2) Aiming at the contradictory problems that the accuracy and the expression capability are weak due to excessive processing parameters and the high error rate of the attitude estimation result can be caused by fewer parameters in a general feature extraction method, the scheme selects the most representative feature by calculating and selecting different feature combinations, improves the accuracy and the expression capability of the feature, compensates the human body attitude, and ensures that the structure is more accurate and reliable.

(3) Aiming at the contradictory problems that the general labeling method is too high in algorithm complexity and low in operation efficiency due to too many factors, and the subsequent evaluation and early warning are influenced due to low accuracy of labeling the human body gestures due to too few factors, the labeling method is suitable for different characteristics to consider various relation factors based on an objective function, and has stronger adaptability and robustness.

Drawings

FIG. 1 is a schematic flow chart of a site dangerous activity early warning method based on computer vision;

FIG. 2 is a schematic diagram of a site hazard activity early warning system based on computer vision provided by the invention;

FIG. 3 is a flow chart of step S3;

fig. 4 is a flow chart of step S4;

fig. 5 is a flow chart of step S5.

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention; all other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the description of the present invention, it should be understood that the terms "upper," "lower," "front," "rear," "left," "right," "top," "bottom," "inner," "outer," and the like indicate orientation or positional relationships based on those shown in the drawings, merely to facilitate description of the invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the invention.

Referring to fig. 1, the method for early warning of dangerous activities in a site based on computer vision provided by the invention comprises the following steps:

In the second embodiment, referring to fig. 1, the feature extraction is performed on the human body posture image in step S2 based on the above embodiment, and specifically includes the following steps;

step S221: setting the selected feature combinations as empty sets;

In the third embodiment, referring to fig. 1 and 3, based on the above embodiment, in step S3, posture compensation is performed on a human body posture, and specifically includes the following steps:

Eg _i,j ＝f||(X _i,j )||*X _i,j ^* ；

∑Δe′＝∑Wl+Δω；

By executing the above operation, aiming at the contradictory problem that the accuracy and the expression capability are weak due to excessive processing parameters and the high error rate of the attitude estimation result can be caused by fewer parameters in a general feature extraction method, the scheme selects the most representative feature by calculating and selecting different feature combinations, improves the accuracy and the expression capability of the feature, compensates the human body attitude, and ensures that the structure is more accurate and reliable.

In a fourth embodiment, referring to fig. 1 and fig. 4, the embodiment is based on the above embodiment, and in step S4, the low-dimensional matrix and the relation matrix are iteratively optimized based on the objective function, and finally the feature is selected to label the gesture type for the human body gesture, which specifically includes the following steps:

wherein X is ^(s) Is a training sample library, X ^(t) A library of test samples is shown and,is a fitting regularization term of training samples and test samples, (x) _i ，x _j ) Is the pairing between training samples, (x _i ^(c) ,x _j ^(c) ) Is the pairing between training samples with the same tag c,/->Is the adaptation degree, m and n are the numbers of the training sample library and the test sample library respectively, thetac is the weight coefficient of the C label characteristics, and C is the label number;

step S42: the adaptation matrix M is calculated using the following formula:

M＝M ₀ +θM _c ；

step S44: the objective function is calculated using the following formula:

where X is the sample library, U is a low-dimensional matrix of features sharing a feature subspace and U ε R ^n’*n W is a relationship matrix between X and U and W εR ^n*n’ N' is the dimension of the shared feature subspace, tr is the trace of the matrix,is a complexity control function, F is complexity, gamma ₁ And gamma ₂ Is a different matrix factorization coefficient, lambda is aA non-negative constant;

step S45: iterative optimization, comprising the following steps:

step S451: initializing initial values of U and W;

wherein T is a matrix transpose;

step S454: iterative optimization process until reaching convergence condition;

By executing the operation, the contradictory problems that the complexity of the algorithm is too high and the operation efficiency is low due to too many factors, and the accuracy of labeling the human body gestures is low due to too few factors to influence the subsequent evaluation and early warning are solved according to the general labeling method.

In step S5, steps S2 to S4 are processes of building a human body posture recognition model, the images in the test sample library are input into the model built in the above steps, and the model performance is evaluated based on the operation data and the output labeling result, which specifically includes the following steps:

step S52: the super recognition rate is calculated by the following formula:

step S54: the similarity a is calculated using the following formula:

step S55: the estimation accuracy D is calculated using the following formula:

Through executing the operation, the problem that the traditional manual inspection is not reported due to limited inspection coverage, misinformation is generated due to subjective judgment, human resources and cost are consumed, and the accuracy and the efficiency of early warning are affected due to long-term inspection fatigue and attention decline is solved.

In a sixth embodiment, referring to fig. 2, the field dangerous activity early warning system based on computer vision provided by the invention is based on the above embodiment, and includes a data acquisition module, a feature extraction module, a gesture compensation module, a labeling module and an evaluation module;

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

The invention and its embodiments have been described above with no limitation, and the actual construction is not limited to the embodiments of the invention as shown in the drawings. In summary, if one of ordinary skill in the art is informed by this disclosure, a structural manner and an embodiment similar to the technical solution should not be creatively devised without departing from the gist of the present invention.

Claims

1. A site dangerous activity early warning method based on computer vision is characterized in that: the method comprises the following steps:

step S5: step S2 to step S4 are processes of establishing a human body posture recognition model, inputting images in a test sample library into the model established in the step, and evaluating the performance of the model based on the operation data and the output labeling result;

in step S4, the low-dimensional matrix and the relation matrix are iteratively optimized based on the objective function, and finally, the feature is selected to label the gesture type for the human gesture, which specifically includes the following steps:

wherein X is ^(s) Is a training sample library, X ^(t) A library of test samples is shown and,is a fitting regularization term of training samples and test samples, (x) _i ，x _j ) Is the pairing between training samples, (x _i ^(c) ，x _j ^(c) ) Is the pairing between training samples with the same tag c,/->Is the adaptation degree, m and n are the numbers of the training sample library and the test sample library respectively, thetac is the weight coefficient of the C label characteristics, and C is the label number;

step S42: the adaptation matrix M is calculated using the following formula:

M＝M ₀ +θM _c ；

step S44: the objective function is calculated using the following formula:

where X is the sample library, U is a low-dimensional matrix of features sharing a feature subspace and U ε R ^n’n W is a relationship matrix between X and U and W εR ^nn’ N' is the dimension of the shared feature subspace, tr is the trace of the matrix,is a complexity control function, F is complexity, gamma ₁ And gamma ₂ Is a different matrix factorization coefficient, λ is a non-negative constant;

step S45: iterative optimization, comprising the following steps:

step S451: initializing initial values of U and W;

wherein T is a matrix transpose;

step S454: iterative optimization process until reaching convergence condition;

step S46: labeling, namely selecting characteristics related to the human body gesture according to the optimized U and W, and labeling gesture types for the human body gesture according to the selected characteristics;

in step S2, extracting the characteristics of the human body posture image, which comprises the following steps of;

step S221: setting the selected feature combinations as empty sets;

step S225: selecting the feature combination with the largest objective function value as a final feature selection result;

in step S3, posture compensation is performed on the human body posture, specifically including the following steps:

where o is the excitation function of the neural network, i is the index of the neurons, z is the real sequence in the neural network,is the sum of absolute values;

Eg _i，j ＝f||(X _i，j )||X _i，j ^* ；

wherein y (k) is the output result of the regulator, k is a discrete number of time steps, θ is the weight coefficient of the multi-label feature, C '(t) is the time error in the estimation calculation process, and R' is the integral adjustment process of the estimation calculation;

∑Δe′＝∑Wl+Δω；

wherein ΣΔe' is attitude compensation, wl is an estimation result error vector, and Δω is a spatial error vector obtained from historical data;

in step S5, step S2 to step S4 are processes of building a human body posture recognition model, and input images in a test sample library into the model built in the above steps, and evaluate the performance of the model based on the operation data and the output labeling result, and specifically include the following steps:

step S52: the super recognition rate is calculated by the following formula:

step S54: the similarity a is calculated using the following formula:

step S55: the estimation accuracy D is calculated using the following formula:

2. A site dangerous activity early warning system based on computer vision, for implementing a site dangerous activity early warning method based on computer vision as described in claim 1, characterized in that: the system comprises a data acquisition module, a feature extraction module, a gesture compensation module, a labeling module and an evaluation module;