CN116115239A

CN116115239A - Embarrassing working gesture recognition method for construction workers based on multi-mode data fusion

Info

Publication number: CN116115239A
Application number: CN202211474969.3A
Authority: CN
Inventors: 夏侯遐迩; 李子睿; 夏吉康; 李启明
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2022-11-23
Filing date: 2022-11-23
Publication date: 2023-05-16

Abstract

The invention discloses a multi-mode data fusion-based embarrassing working posture recognition method for construction workers, which comprises the steps of collecting original electroencephalogram data, original behavior data and original posture images of a monitored person; preprocessing original electroencephalogram data, and extracting time domain, frequency domain and nonlinear characteristics; performing standardized operation on the original behavior data, and extracting the mean value as a behavior data characteristic; extracting space coordinates of main points of a human body from a gesture image of a monitored person to be used as gesture state characteristics; fusing the extracted data features based on a pre-fusion strategy; the fused characteristic data set is input into a trained BP neural network, and the embarrassing gesture work category of the monitored person is output, so that the automatic identification of the embarrassing work gesture is realized by extracting the multi-mode data fusion characteristics of the electroencephalogram data, the behavior data and the gesture image of the monitored person, and the defects of insufficient accuracy and the like existing in gesture identification based on single-mode data are overcome.

Description

Embarrassing working gesture recognition method for construction workers based on multi-mode data fusion

Technical Field

The invention relates to the technical field of building safety and health management, in particular to a method for identifying embarrassing working postures of building workers based on multi-mode data fusion.

Background

The current construction industry safety situation in China is very serious, the accident rate is 3 times of the average accident rate of each industry, accidents caused by unsafe behaviors of construction workers account for more than 80% of the total number of the safe accidents in the construction industry, wherein embarrassing working postures account for more than 13.1% of the total number of unsafe behaviors involved in the operation of workers, so that the embarrassing working postures are regarded as the most important types, and the construction workers are generally in embarrassing working postures for a long time to execute tasks, can cause a series of musculoskeletal diseases and directly threaten occupational health safety, so that the construction industry is also classified as one of the most dangerous industries of musculoskeletal diseases and injuries.

Currently, a great number of front-edge technologies in the fields of computer science and automation are applied to identifying and early warning embarrassing working postures of construction workers, from the activity perspective, the computer vision technology is combined with a deep learning algorithm to be applied to identifying direct postures, actions and behaviors, and meanwhile, various motion sensors are used for extracting human body behavior data, so that the embarrassing working postures are identified and early warned in real time; from a psychological perspective, various wearable devices (an electroencephalogram helmet, a smart wristband, an eye tracker and the like) are used for measuring physiological data reflecting cognitive states and emotional states, such as electrocardio, electroencephalogram, skin electricity, eye movement tracks and the like, so that the embarrassing working posture is indirectly monitored.

The existing recognition technology tends to realize the interpretation and classification functions from single-mode data, and the actions and cognition of workers have complexity and interaction, so that the gestures are difficult to objectively and accurately recognize only from the single-mode data, the multi-mode fusion technology is based on collaborative reasoning of multiple heterogeneous mode data, and the embarrassing working gestures can be recognized more accurately by collecting and fusing the multi-mode data through multiple technologies; in view of the lack of an embarrassing working gesture recognition application framework based on multi-mode fusion in the current building field, the invention provides a method for recognizing the embarrassing working gesture of a building worker based on multi-mode data fusion.

Disclosure of Invention

In order to solve the technical problems, the invention provides a method for identifying embarrassing working gestures of a construction worker based on multi-mode data fusion, which comprises the following steps of

S1, acquiring multi-modal data of a monitored person, wherein the multi-modal data comprises original electroencephalogram data, original behavior data and a gesture image;

s2, performing artifact elimination and feature extraction processing on the original electroencephalogram data to obtain electroencephalogram data features;

s3, carrying out standardization, window division and mean value feature extraction processing on the original behavior data to obtain behavior data features;

s4, carrying out human body point location identification and space coordinate extraction on the gesture image to obtain gesture state characteristics;

s5, carrying out feature fusion on the electroencephalogram data features, the behavior data features and the posture state features based on a pre-fusion strategy to obtain multi-mode data fusion features;

s6, inputting the multi-mode data fusion characteristics into a trained BP neural network;

and S7, outputting the embarrassing working posture type of the monitored person. .

The technical scheme of the invention is as follows:

further, in step S2, artifact removal is performed on the original electroencephalogram data, including external artifact removal and internal artifact removal, and external artifact removal is performed through a finite impulse response band-pass filter; and distinguishing and screening out internal artifacts in the original electroencephalogram data through independent component analysis.

In the foregoing method for identifying embarrassing working posture of a construction worker based on multi-modal data fusion, in step S2, multi-dimensional electroencephalogram characteristic data of a monitored person is extracted by a fixed window dividing method, the multi-dimensional electroencephalogram characteristic data comprises time domain characteristics, frequency domain characteristics and nonlinear characteristics, and the time domain characteristics comprise standard deviation, fluctuation index and kurtosis; the frequency domain features include Delta band power spectral density, theta band power spectral density, alpha band power spectral density, beta band power spectral density, and Gamma band power spectral density; the nonlinear features include approximate entropy, fuzzy entropy, and hurst index.

The method for identifying the embarrassing working posture of the construction worker based on the multi-mode data fusion has the advantages that the standard deviation in the time domain characteristics is calculated according to the following algorithm,

where n represents the total number of data points collected under the channel, x _i Representing the i-th data point acquired under the channel,

representing an average of n data points collected under the channel;

the algorithm for the calculation of the fluctuation index in the time domain features is as follows,

where n represents the total number of data points collected under the channel, x (i) represents the i-th data point collected under the channel, and x (i+1) represents the i+1-th data point collected under the channel;

the algorithm for the calculation of kurtosis in the time domain features is as follows,

where n represents the total number of data points collected under the channel, s represents the standard deviation of n data points collected under the channel, and x _i Representing the i-th data point acquired under the channel,

representing the average of n data points collected under the channel.

The method for identifying embarrassing working gestures of construction workers based on multi-mode data fusion comprises the following steps of

S2.1.1, setting original electroencephalogram data as X (1), X (2), … … and X (n), and forming an m-dimensional vector X (i) = [ X (i), X (i+1), … … and X (i+m-1) ] according to the sequence of the number, wherein i=1, 2, … … and n-m+1;

s2.1.2 defining the distance d between the ith and jth vectors X (i) and X (j) _ij ，

d _ij ＝max[|x(i+k)-x(j+k)|]，0≤k≤m-1

S2.1.3 given a threshold r, for each vector X (i), the statistics satisfy d _ij The number of times of the Std condition, wherein Std is the standard deviation of the sequence data, and the ratio of the statistical number of times to the total number of distances n-m is calculated and recorded as

S2.1.4, will

Taking logarithm, taking average value of all the i, and recording as phi ^m (r)，/>

S2.1.5, adding 1 to m, repeating steps S2.1.1 to S2.1.4 with m+1 dimension to obtain

And phi ^m+1 (r) and find the approximate entropy as

ApEn＝∑ _n→∞ [φ ^m (r)-φ ^m+1 (r)]

Where ApEn represents the approximate entropy.

The method for identifying the embarrassing working posture of the construction worker based on multi-mode data fusion comprises the following steps of

S2.2.1, setting original brain electrical data as x (1), x (2), … … and x (n), and n points in total;

s2.2.2 defining the embedding dimension m and the similarity tolerance r, reconstructing the phase space, and generating a group of m-dimensional vectors X (i) = [ X (i), X (i+1), …, X (i+m-1)]-x ₀ (i) Which represents a vector of m consecutive data points from x (i), where i=1, 2, …, n-m+1,

wherein x is ₀ (i) Representing the mean of the m data points;

s2.2.3 defining a fuzzy membership function a (x),

wherein r represents a similarity tolerance;

s2.2.4 it is deformed into according to the A (x) expression

Where j=1, 2, …, n-m+1, and j is not equal to i,

representing the maximum absolute distance between window vectors X (i) and X (j), calculated as follows,

s2.2.5 define a function

S2.2.6 repeating steps S2.2.1-S2.2.5, reconstructing to generate m+1-dimensional vector according to sequence order, and defining function

/>

S2.2.7 on the basis of the step S2.2.5, defining the fuzzy entropy as

For finite time series data consisting of N data points, the fuzzy entropy can ultimately be expressed as

Wherein FuzzyEn (m, r, N) represents fuzzy entropy.

The method for identifying embarrassing working posture of construction workers based on multi-mode data fusion comprises the following steps of

S2.3.1 calculating the average value from the original brain electrical data sequence x (1), x (2), … …, x (n) composed of n data points

S2.3.2 calculating the sum w of the deviation of the previous t number and the average value m _t Is that

S2.3.3 the difference R (n) between the maximum value and the minimum value of the sum of calculated deviations is

R(n)＝max(0,w ₁ ,w ₂ ,…,w _n )-min(0,w ₁ ,w ₂ ,…,w _n )

S2.3.4 and calculating the hurst index H as

S (n) is the standard deviation of the original electroencephalogram data sequence.

The method for identifying embarrassing working gestures of construction workers based on multi-mode data fusion in the step S4, the method for obtaining gesture state features comprises the following steps

S4.1, identifying 33 main points of the human body for the gesture image corresponding to each time window;

s4.2, identifying the edge of the human body, and cutting the size of the picture along the edge;

s4.3, positioning a human body center point as a centroid, selecting a minimum square frame containing all 33 main points, and positioning the cut gesture image in a three-dimensional space;

s4.4, establishing a three-dimensional rectangular coordinate system by taking the upper right corner of the minimum square frame as an origin;

s4.5, further determining the space coordinates (x, y, z) and the visibility coordinates (v) of the 33 main points, and taking the space coordinates as the posture state characteristics of the monitored person.

In the foregoing method for identifying embarrassing working posture of a construction worker based on multimodal data fusion, in step S6, the training method of the BP neural network includes a preparation method of a model training data set and a building and training method of the BP neural network, wherein the preparation method of the model training data set includes the following steps

S6.1.1, summoning construction site construction workers, wearing various devices and sensors for acquiring multi-mode data for the construction workers;

s6.1.2, performing simulated construction operation by construction workers under different embarrassing working postures;

s6.1.3, each device for collecting multi-mode data and the sensor collect and synchronously export the multi-mode data;

s6.1.4, performing corresponding preprocessing, feature extraction and fusion on the acquired multi-mode data to finally obtain multi-mode data fusion features of construction workers in an embarrassing working posture;

s6.1.5, obtaining a training data set for training the BP neural network.

The method for identifying embarrassing working postures of construction workers based on multi-mode data fusion comprises the following steps of

S6.2.1, taking the multi-mode data fusion characteristic as an input layer, setting the neuron number of a hidden layer as 500, and setting labels corresponding to each embarrassing working posture;

s6.2.2 initializing hyper-parameters of the BP neural network, wherein the weight of the unidirectional full link between layers is set as a random number between [ -1,1 ];

s6.2.3, inputting training data corresponding to marked different embarrassing working postures into the BP neural network;

s6.2.4, continuously adjusting the learning rate alpha of the neural network and each super parameter until the minimum error value is obtained, and obtaining the learning rate corresponding to the minimum error value;

s6.2.5 finally obtaining the trained BP neural network suitable for the recognition of the embarrassing working gesture.

The beneficial effects of the invention are as follows:

in the invention, the brain electricity data of a monitored person is collected through an brain electricity meter, the behavior data of the monitored person is collected through a gyroscope accelerometer and a pressure sensor, the behavior data comprise acceleration, angular velocity and angle, and simultaneously, the gesture image of the monitored person is collected through a computer vision technology; then, artifact removal and feature extraction processing are carried out on the acquired electroencephalogram data, standardization, window division and mean feature extraction are carried out on the acquired behavior data, and human body point location identification and space coordinate extraction are carried out on the acquired gesture image, so that electroencephalogram data features, behavior data features and gesture state features are obtained; based on the early fusion strategy, the extracted data are subjected to feature fusion, then input into the trained BP neural network, and the embarrassing gesture work category of the monitored person is output, so that the embarrassing work gesture of the on-site construction worker can be accurately and objectively automatically identified.

Drawings

FIG. 1 is an overall flow chart of an identification method in an embodiment of the invention;

FIG. 2 is a flow chart of the processing of raw electroencephalogram data in an embodiment of the present invention;

FIG. 3 is a flow chart of a process of gesture image in an embodiment of the present invention;

FIG. 4 is a flowchart of an embarrassing working gesture recognition based on multi-modal data fusion in an embodiment of the present invention;

FIG. 5 is a flow chart of the fusion of multi-modal data features in an embodiment of the present invention;

fig. 6 is a flowchart of the construction and training of a BP neural network in an embodiment of the present invention.

Detailed Description

The method for identifying embarrassing working posture of construction workers based on multi-mode data fusion provided by the embodiment, as shown in figure 1, comprises the following steps of

EmotiveEpoc X electroencephalograph with 14 electrodes (AF 3, F7, FC5, T7, P7, O1, O2, P8, T8, FC6, F8, F4 and AF 4) is carried, electroencephalogram data under channels corresponding to the 14 electrodes are collected according to a sampling rate of 128Hz, and real-time recording, storage and export of the electroencephalogram data are realized by matching with Emotiv Pro software, so that original electroencephalogram data of a monitored person are obtained.

The sole pressure sensing shoe pad based on the array distributed flexible film pressure sensor is used for measuring the sole pressure distribution condition in real time, the readings of 16 sensing points distributed by each shoe pad are recorded according to the frequency of 2 times average every 0.5 seconds so as to reflect the real-time sole pressure distribution and fluctuation condition of a monitored person, and the recorded pressure data are derived in real time to obtain the original sole pressure data of the monitored person.

Based on a six-axis gyroscope accelerometer, components of acceleration, angular velocity and angle of a monitored person in X, Y, Z coordinate axes in a three-dimensional space are recorded and exported in real time according to a 100Hz sampling rate, acceleration, angular velocity and angle data of the monitored person are obtained, and the acceleration, angular velocity and angle data are combined with original plantar pressure data and are arranged into original behavior data.

Based on cameras arranged on site, shooting image records are carried out on the gestures of the monitored person according to the recording frequency of 1 piece every 0.5 second, and gesture images of the monitored person are obtained through real-time derivation and uploading.

as shown in fig. 2, preprocessing operation is performed on the acquired original brain electrical data of the monitored person, and internal artifacts and external artifacts contained in the original brain electrical data are eliminated, so that the brain electrical data after the artifact elimination can accurately and objectively reflect the real activity condition of the brain; dividing the electroencephalogram data by a fixed window with the size of 0.5 s; and extracting time domain features, frequency domain features and nonlinear features from the brain electrical data after eliminating the artifacts, and taking the time domain features, the frequency domain features and the nonlinear features as measures of information properties and measurable properties of the brain electrical data.

External artifact cancellation: the invention adopts an effective impulse response (FIR) band-pass filter, and sets the high and low cut-off frequencies to be 50Hz and 1Hz respectively so as to remove the external artifacts higher than 50Hz and lower than 1Hz, thereby avoiding the rapid drift and the slow drift of the electroencephalogram data and realizing the elimination of the external artifacts of the electroencephalogram data.

Wherein the setting of the high cut-off frequency takes into account the upper limit value of the nyquist frequency, which is equal to half the sampling rate, i.e. 64Hz, and thus the high cut-off frequency is set to 50Hz; in addition, the setting of the low cut-off frequency takes into account the frequency range of the Delta wave with the lowest acquired frequency, i.e., 1-4Hz, and thus the low cut-off frequency is set to 1Hz.

Internal artifact cancellation: the invention eliminates the internal artifacts of the brain electrical data and obtains the brain electrical data of a monitored person after eliminating the internal and external artifacts by carrying out Independent Component Analysis (ICA) on the brain electrical data and further screening out the distinguished artifact components such as eye movement and muscle and the like.

In order to ensure that the extracted brain electrical data features have enough discrimination to fully reflect and measure the cognition and emotion states of a monitored person, the chaos of the brain electrical data serving as a nonlinear and non-stationary time sequence is comprehensively considered, and three types of brain electrical features including a time domain, a frequency domain and a nonlinear are selected and extracted.

According to the invention, a fixed window dividing method is selected, 154-dimension electroencephalogram characteristic data of a monitored person is extracted according to a window size of 0.5 seconds and is used as a quantitative index of real-time cognition and emotion states of the monitored person under the recording frequency of each gesture image of 0.5 seconds, and table 1 is the 154-dimension electroencephalogram characteristic data of the monitored person.

Summary of the electroencephalogram data features extracted in Table 1

The invention adopts a time domain analysis method to extract three mathematical statistical parameters of standard deviation, fluctuation index and kurtosis as the time domain characteristics of the brain electrical data.

The standard deviation (Standard deviation, std) is the square root of the Variance (Variance), is the square root of the arithmetic mean of the square of the dispersion of each variable value and the mean in the sequence, can better reflect the degree of dispersion of data, is a characteristic value widely applied to measuring the degree of dispersion of the brain electrical time domain, and the calculation algorithm of the standard deviation is as follows,

representing the average of n data points collected under the channel.

The fluctuation index (voperformance index) is widely used in electroencephalogram and electrocardiographic signal processing, the fluctuation intensity of time series data is represented by the average of the sum of differences between adjacent signals, the calculation algorithm of the fluctuation index is as follows,

representing the average of n data points collected under the channel.

Kurtosis is a measure of the degree of a flat peak or spike in the data distribution, and a Kurtosis of 0 indicates that the distribution of the sequence data is the same as the steepness of a normal distribution; kurtosis greater than 0 indicates that the data distribution is steeper than normal; kurtosis of less than 0 indicates that the data distribution is flatter than normal, the kurtosis is calculated as follows,

representing the n acquired under the channelAverage of data points.

The invention adopts a frequency spectrum analysis method to extract the power spectral density as the frequency domain characteristic of the brain electrical data of the monitored person, converts the brain electrical data of the time sequence into the sum of a series of sine and cosine functions through Fourier change, and realizes the conversion from the time domain to the frequency domain, thereby reflecting the mode of the brain electrical data (power) distributed along with the frequency.

The invention selects a windowed average periodic method (Welch method) to estimate the power spectral density of the sequence, and calculates the power spectral densities of 5 frequency bands of 14 channels of Delta waves (1-3 Hz), theta waves (4-8 Hz), alpha waves (9-13 Hz), beta waves (14-30 Hz) and Gamma waves (> 30 Hz) as brain electrical data frequency domain characteristics.

The invention adopts a nonlinear power algorithm to extract approximate entropy, fuzzy entropy and Hurst index as nonlinear characteristics of brain electrical data, thereby reflecting the complexity level of brain activities of a constructor.

The invention regards the approximate entropy as the nonlinear statistical characteristic of the signal law, reflects the fluctuation condition of the brain electrical data of a monitored person, and is taken as the input information of a machine learning classifier, the calculation of the approximate entropy comprises the following steps of

d _ij ＝max[|x(i+k)-x(j+k)|]，0≤k≤m-1

S2.1.4, will

Taking logarithm, taking average value of all the i, and recording as phi ^m (r)，

And phi ^m + ¹ (r) and find the approximate entropy as

ApEn＝∑ _n→∞ [φ ^m (r)-φ ^m+1 (r)1

Where ApEn represents the approximate entropy.

In general, the sequence length n is a finite value, and the result obtained according to steps S2.1.1 to S2.1.5 is an estimated value of the approximate entropy when the sequence length is the value n.

The Fuzzy entropy (Fuzzy entropy) is similar to the approximate entropy function, is a measurement algorithm for accurately analyzing the complexity of a chaotic sequence, the complexity of the sequence influences the randomness and the restorability of signals, compared with algorithms such as the approximate entropy, sample entropy and the like, the Fuzzy entropy (Fuzzy entropy) has stronger effectiveness on complexity measure, lower parameter sensitivity and dependency and robustness and measurement continuity, and is widely applied to the physiological signal analysis fields such as electroencephalogram and the like, the Fuzzy entropy algorithm is selected to extract the Fuzzy entropy of 14-channel sequence data as the nonlinear characteristic of the electroencephalogram data, and comprises the following steps of

wherein x is ₀ (i) Representing the mean of the m data points;

s2.2.3 defining a fuzzy membership function a (x),

wherein r represents a similarity tolerance;

s2.2.4 it is deformed into according to the A (x) expression

Where j=1, 2, …, n-m+1, and j is not equal to i,

s2.2.5 define a function

S2.2.7 on the basis of the step S2.2.5, defining the fuzzy entropy as

Wherein FuzzyEn (m, r, N) represents fuzzy entropy.

The invention relates to a hurst index (Hurstindex) which is a time sequence analysis method and can be used for measuring the smoothness of fractal time sequence, wherein the hurst index is used as a nonlinear characteristic of electroencephalogram data and is used for measuring the nonstationary characteristic of the acquired electroencephalogram data so as to represent the psychological fluctuation condition of a subject and is used as structural information input into a machine learning classifier, and the calculation of the hurst index comprises the following steps of

R(n)＝max(0,w ₁ ,w ₂ ,…,w _n )-min(0,w ₁ ,w ₂ ,…,w _n )

S2.3.4 and calculating the hurst index H as

calculating variance of time series data (comprising sole pressure data, acceleration, angular velocity and angle data) acquired by the sensor in a time window with the size of 0.5 seconds, and if the variance is within an error allowable range, considering that the behavior data acquired by the time window has effectiveness, and further performing standardization processing and extracting behavior data characteristics; otherwise, the data collected by the time window are removed, and variance calculation and judgment are carried out on the behavior data collected by the next 0.5 second time window.

According to the sole pressure data (time series data) measured and output by the sole pressure sensing insoles in real time, according to a time window with the size of 0.5 seconds adopted in the electroencephalogram data characteristic extraction process, respectively calculating the arithmetic mean value of 2 pressure readings acquired every 0.5 seconds of 32 pressure sensing points distributed on the two insoles, and taking the arithmetic mean value as the pressure characteristic data in the time window.

The acceleration, angular velocity and angle data (time series data) are measured and output in real time through a six-axis gyroscope accelerometer, the sampling rate of the six-axis gyroscope accelerometer is set to be 100Hz, and the acceleration x (g) in the x direction, the acceleration x (g) in the y direction, the accelerometer z (g) in the z direction and the angular velocity w in the x direction are acquired 50 times in every 0.5 seconds according to a time window with the size of 0.5 seconds adopted in the electroencephalogram data characteristic extraction process _x In the y directionAngular velocity w of (2) _y Angular velocity w in z direction _z Angle θ in x-direction _x Angle θ in y-direction _y Angle θ in the z-direction _z The total of 9 readings are averaged and combined with the pressure signature data as a behavioral signature in the time window.

processing gesture images of a monitored person acquired by a BlazePose gesture estimation model at a recording frequency of 1 piece every 0.5 seconds by adopting a computer vision technology based on a MediaPipe framework, identifying 33 main points (including nose, eyes, ears, buttocks, knees and the like) of a human body in each image, extracting space coordinates of each point, and taking the space coordinates as characteristic indexes reflecting the gesture state of the monitored person, wherein the method comprises the following steps of S4.1, identifying 33 main points of the human body for gesture images corresponding to each time window;

for each gesture image corresponding to a 0.5 second time window, estimating the human gesture by using a BlazePose gesture detection model, and identifying 33 main points (including nose, eyes, ears, buttocks, knees and the like) of the human body; further identifying the edges of the human body and cutting the picture size along the edges; further locating the center point of the human body as the centroid, selecting the smallest square frame containing all main points, and locating the cut gesture image in a three-dimensional space to complete the preprocessing of the gesture image, wherein the specific related code solutions are as follows:

static image mode: if set to false, the solution treats the input image as a video stream, will attempt to detect the most prominent portrait in the first image, and further locate the pose markers after successful detection, in subsequent images the pattern simply tracks the landmarks without invoking other detection until it loses tracking to reduce computation and delay; if set to true, personnel detection will run each input image, making it more suitable for processing a batch of static, possibly uncorrelated images; the static image mode is set to false by default.

Model complexity: the complexity of the location landmark model includes 0,1, or 2, landmark accuracy and inference delay generally increase with the complexity of the model; default set model complexity is 1.

Smoothing landmarks: if the smooth landmark is set to true, the solution would filter landmarks in different input images to reduce jitter, but ignore if the static_image_mode is also set to true; the default sets the smoothness landmark to true.

enable_SEGMENTATION: if set to true, the solution generates a segmentation mask in addition to the pose landmarks; default to this parameter is false.

Smooth segmentation: if the parameter is set to true, the solution filters the segmentation masks of the different input images to reduce jitter; if the enable_segment parameter is set to false or the static_image_mode parameter is set to true, ignoring; default settings smooth split into true.

Min_detection_configuration: when the minimum confidence value in the human detection model is in the interval [0.0,1.0], the detection is determined to be successful; the parameter is set to 0.5 by default.

Min_transmission_configuration: setting the minimum confidence value from the marker-tracking model to a higher value of 0.0,1.0 may improve the robustness of the solution, but at the cost of higher delay; if the static image mode parameter is set to true, ignore, with personnel posture monitoring running only on each image; the parameter is set to 0.5 by default.

s4.5, further determining the space coordinates (x, y, z) and the visibility coordinates (v) of the 33 main points, and taking the space coordinates as the posture state characteristics of the monitored person;

on the basis of preprocessing each monitored posture image, namely, positioning the clipped posture image in a three-dimensional space, then establishing a three-dimensional rectangular coordinate system by taking the right upper corner point of a square as an origin, further determining the space coordinates (x, y, z) and the visibility coordinates (v) of 33 points, and taking the space coordinates as posture state characteristics of the monitored, wherein the specific related output is set as follows:

gesture landmarks: a gesture coordinate list is set, and x and y respectively refer to landmark coordinates with normalized image width and height; z refers to the coordinate depth of the landmark taking the depth of the midpoint of the buttocks as the origin, the smaller the numerical value is, the closer the coordinate is to the camera, and the scale z used by the invention is approximately the same as x; v refers to the value, i.e., the likelihood, of the point being visible (present and not occluded) in the image.

Gesture world coordinates: the parameter refers to a gesture landmark list in another world coordinate, x, y and z represent real world three-dimensional coordinates in meters, and the origin is located at the center between buttocks; v is the same as defined in the corresponding phase_landmarks.

According to the method, important points are identified by analyzing the gesture image of a monitored person, then the result of the model is presented to a screen through a real-time prediction window by using a cv2 module of OpenCV, meanwhile, each point data of the photo is presented in a landmark form comprising four coordinates (x, y, z and visibility), the coordinates are output to a CSV file in a form of digital coordinates (x, y, z and v), and v represents the visibility of a specific point on the gesture image, and the value range is 0 to 1.

The multi-mode data analysis process, namely the embarrassing working gesture recognition process of the monitored person is shown in fig. 4, on the basis of the extracted electroencephalogram data features, behavior data features and gesture state features, the multi-mode data fusion features are obtained through feature fusion, feature data are input into a trained BP neural network, and then the embarrassing gesture recognition result of the monitored person is output, wherein the embarrassing working gesture recognition result comprises 8 preset embarrassing working gestures and non-embarrassing working gestures.

as shown in fig. 5, the invention selects a pre-fusion strategy for fusion before decision making, and fuses three extracted multi-modal data features (electroencephalogram data features, behavior data features and posture state features), so as to realize early utilization and mining of the relevance among a plurality of data features from different modalities.

constructing a multimodal dataset for input to the BP neural network from the three extracted classes of multimodal data features comprises: the three-dimensional data set comprises three main points of 33 main points of computer vision recognition, namely, space coordinates of x, y and z (99 dimensions), time domain, frequency domain and nonlinear electroencephalogram data features (154 dimensions), plantar pressure features (32 dimensions) collected by a pressure sensor and acceleration, angular velocity and angle data features (9 dimensions) collected by a gyroscope accelerator, wherein the total dimension is 394, and a multi-modal data set { x ] is constructed for the unknown posture state of a monitored person ⁽¹⁾ ,x ⁽²⁾ ,……,x ⁽²⁹⁴⁾ And inputting the obtained data into a trained BP neural network, and finally outputting the recognition result of the embarrassing working posture of the monitored person, wherein 8 preset embarrassing working postures and non-embarrassing working postures are shown in table 2.

TABLE 2 embarrassing working gesture recognition results

And S7, outputting the embarrassing working posture type of the monitored person.

The embarrassing working posture recognition method provided by the invention is constructed based on a trained BP neural network, and the training method of the BP neural network comprises a preparation method of a model training data set and a construction and training method of the BP neural network.

As shown in fig. 6, the preparation method of the model training data set includes the steps of: the construction site building workers are summoned to develop a multi-mode data acquisition experiment, and the EmotivEpoc X electroencephalogram acquisition instrument, the pressure sensing insole and the gyroscope accelerometer are worn for the workers, so that the workers are required to simulate construction operations under 8 different embarrassing working posture states, wherein the embarrassing working posture comprises: the hand works above the top of the head (e.g., plastering operation), bends the knee or ankle and works in squat position, bends the neck back or forth, bends the waist, turns the body back, climbs the ladder body and twists, and works in a narrow space with the body folded too much sitting position and the single foot support touches forward.

The method comprises the steps of collecting and synchronously leading out brain electricity data and behavior data (pressure data, acceleration, angular speed and angle data) of a subject worker while simulating construction operation by the building worker in an embarrassing working posture, shooting posture images of the building worker through a camera, and finally obtaining multi-mode data of the building worker in 8 embarrassing working postures.

And carrying out corresponding preprocessing and feature extraction on the acquired multi-modal data to finally obtain multi-modal data features of a construction worker under an embarrassing working posture, wherein the multi-modal data features comprise electroencephalogram data features, behavior data features and posture state features, and further realizing fusion of the multi-modal data features through a pre-fusion strategy to obtain training data for training the BP neural network.

As shown in fig. 6, the method for constructing and training the BP neural network includes the following steps:

building a BP neural network: the multi-modal feature data set { x }, is processed ⁽¹⁾ ,x ⁽²⁾ ,……,x ⁽²⁹⁴⁾ 294 features in the method are set as input layers, the number of neurons in a hidden layer is designed to be 500, and Y1-Y8 are labels corresponding to 8 embarrassing working postures;

initializing super parameters of a neural network: the weight of the unidirectional full link between the layers is set as a random number between [ -1,1 ];

inputting training samples: the multi-mode fusion characteristic data (training data) corresponding to marked different embarrassing working postures are input into a BP neural network model, wherein the invention is characterized in that: 2, dividing the training set and the testing set in proportion;

the learning rate alpha of the neural network is adjusted to obtain the minimum error value: and continuously adjusting the learning rate of the neural network and each super parameter to finally obtain the learning rate alpha=0.09 corresponding to the minimum error value, and obtaining the trained BP neural network suitable for the recognition of the embarrassing working gesture through adjusting the optimization process.

In addition to the embodiments described above, other embodiments of the invention are possible. All technical schemes formed by equivalent substitution or equivalent transformation fall within the protection scope of the invention.

Claims

1. A method for identifying embarrassing working posture of a construction worker based on multi-mode data fusion is characterized by comprising the following steps of: comprises the following steps

2. The method for identifying embarrassing working gestures of a construction worker based on multi-modal data fusion according to claim 1, wherein the method comprises the following steps: in the step S2, artifact removal is performed on the original electroencephalogram data, including external artifact removal and internal artifact removal, and external artifact removal is performed through a finite impulse response band-pass filter; and distinguishing and screening out internal artifacts in the original electroencephalogram data through independent component analysis.

3. The method for identifying embarrassing working gestures of a construction worker based on multi-modal data fusion according to claim 1, wherein the method comprises the following steps: in the step S2, multi-dimensional electroencephalogram characteristic data of the monitored person is extracted by a fixed window dividing method, wherein the multi-dimensional electroencephalogram characteristic data comprises time domain characteristics, frequency domain characteristics and nonlinear characteristics, and the time domain characteristics comprise standard deviation, fluctuation indexes and kurtosis; the frequency domain features include Delta band power spectral density, theta band power spectral density, alpha band power spectral density, beta band power spectral density, and Gamma band power spectral density; the nonlinear features include approximate entropy, fuzzy entropy, and hurst index.

4. The method for identifying embarrassing working gestures of a construction worker based on multi-modal data fusion according to claim 3, wherein the method comprises the following steps: the standard deviation in the time domain features is calculated as follows,

representing an average of n data points collected under the channel;

/>

representing the average of n data points collected under the channel.

5. The method for identifying embarrassing working gestures of a construction worker based on multi-modal data fusion according to claim 3, wherein the method comprises the following steps: the calculation of the approximate entropy in the nonlinear feature comprises the following steps

d _ij ＝max[|x(i+k)-x(j+k)|]，0≤k≤m-1

S2.1.4, will

Taking logarithm, taking average value of all the i, recording as phi m (r),

And phi m+1 (r), and find the approximate entropy as

ApEn＝∑ _n→∞ [φ ^m (r)-φ ^m+1 (r)]

Where ApEn represents the approximate entropy.

6. The method for identifying embarrassing working gestures of a construction worker based on multi-modal data fusion according to claim 3, wherein the method comprises the following steps: the algorithm of the fuzzy entropy in the nonlinear characteristic comprises the following steps of

wherein x is ₀ (i) Representing the mean of the m data points;

s2.2.3 defining a fuzzy membership function a (x),

wherein r represents a similarity tolerance;

s2.2.4 it is deformed into according to the A (x) expression

/>

Where j=1, 2, …, n-m+1, and j is not equal to i,

s2.2.5 define a function

S2.2.7 on the basis of the step S2.2.5, defining the fuzzy entropy as

Wherein FuzzyEn (m, r, N) represents fuzzy entropy.

7. The method for identifying embarrassing working gestures of a construction worker based on multi-modal data fusion according to claim 3, wherein the method comprises the following steps: the calculation of the hurst index in the nonlinear feature comprises the following steps of

R(n)＝max(0,w ₁ ,w ₂ ,…,w _n )-min(0,w ₁ ,w ₂ ,…,w _n )

S2.3.4 and calculating the hurst index H as

8. The method for identifying embarrassing working gestures of a construction worker based on multi-modal data fusion according to claim 1, wherein the method comprises the following steps: in the step S4, the method for obtaining the gesture state features comprises the following steps

9. The method for identifying embarrassing working gestures of a construction worker based on multi-modal data fusion according to claim 1, wherein the method comprises the following steps: in the step S6, the training method of the BP neural network includes a preparation method of a model training data set and a building and training method of the BP neural network, wherein the preparation method of the model training data set includes the following steps

s6.1.5, obtaining a training data set for training the BP neural network.

10. The method for identifying embarrassing working gestures of a construction worker based on multi-modal data fusion according to claim 9, wherein the method comprises the following steps: the BP neural network building and training method comprises the following steps of