CN115841651B

CN115841651B - Constructor intelligent monitoring system based on computer vision and deep learning

Info

Publication number: CN115841651B
Application number: CN202211602196.2A
Authority: CN
Inventors: 陈祺荣; 陈科宇; 杭世杰; 林俊; 汤序霖; 陈钰开; 李晨慧; 李卫勇; 朱东烽; 杨哲; 杨健明; 聂勤文; 张华健; 邬学文; 汪爽; 练月荣
Original assignee: Guangzhou Jishi Construction Group Co ltd; Guangdong Yuncheng Architectural Technology Co ltd; Hainan University
Current assignee: Guangzhou Jishi Construction Group Co ltd; Guangdong Yuncheng Architectural Technology Co ltd; Hainan University
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2023-08-22
Anticipated expiration: 2042-12-13
Also published as: CN115841651A

Abstract

The invention discloses a constructor intelligent monitoring system based on computer vision and deep learning, which comprises an intelligent statistics module and an intelligent monitoring module, wherein the intelligent statistics module comprises a video acquisition module, a deep learning algorithm analysis module and an analysis result display module, and the intelligent monitoring module comprises a safety helmet identification module, a working clothes identification module, a state identification module and an alarm reminding module. The invention not only can realize intelligent statistics of the number of the in-and-out persons and vehicles in the construction site and the total number of the in-and-out persons and the vehicles in the local area, but also can respectively identify the dressing and the state of the constructors passing through the entrance of the construction site by utilizing the image identification technology and remind the constructors which do not meet the standard, thereby effectively avoiding the occurrence of safety accident phenomena caused by that the constructors do not wear safety helmets, do not wear working clothes or have poor mental state, and further effectively improving the construction safety performance of the construction site.

Description

Constructor intelligent monitoring system based on computer vision and deep learning

Technical Field

The invention relates to the technical field of intelligent statistics, in particular to a constructor intelligent monitoring system based on computer vision and deep learning.

Background

At present, in the building construction process, the construction site is required to implement closed management in principle, and an access control system for entering and exiting the site is established. The gate inhibition system used in the current building construction site is mainly a system formed by combining three-roller gate, swing gate, wing gate, stainless steel gate and other channel systems with an induction card reader-writer, all managers and constructors can freely enter and exit the face or second-generation identity card system in advance, the personnel cannot enter the construction site without being registered and authorized, meanwhile, a monitoring camera is installed at the entrance and exit of the construction site, and video recordings in a period of time are stored in a local server for reference.

However, the current access control system has the following defects in practical application:

1) Frequent personnel access at the exit and entrance of the building site, disordered personnel flow and difficult site management; 2) The paper is adopted to record the attendance data, so that the actual working hours are difficult to accurately count, and the data is easy to tamper; 3) The project manager cannot clearly and timely master the number of construction operators, the number of various workers and the number of specialized sub-division units on site, which is not beneficial to improving the site management efficiency; 4) When construction labor staff wage disputes occur, the supervision department is difficult to obtain evidence and the weight is difficult to maintain; 5) Vehicles and personnel in the vehicles have difficulty in entering and exiting the field.

For the problems in the related art, no effective solution has been proposed at present.

Disclosure of Invention

Aiming at the problems in the related art, the invention provides a constructor intelligent monitoring system based on computer vision and deep learning so as to overcome the technical problems in the related art.

For this purpose, the invention adopts the following specific technical scheme:

the constructor intelligent monitoring system based on computer vision and deep learning comprises an intelligent statistics module and an intelligent monitoring module;

the intelligent statistics module is used for detecting constructors passing through a construction site entrance by using a trained deep learning algorithm, tracking targets by using a tracking algorithm, counting and counting the targets when the targets collide with a detection line, and displaying the targets in real time by a client;

the intelligent monitoring module is used for respectively identifying the dressing and the state of constructors passing through the site entrance by utilizing a preset image identification technology and reminding constructors which do not accord with the standard.

Further, the intelligent statistics module comprises a video acquisition module, a deep learning algorithm analysis module and an analysis result display module;

the video acquisition module is used for acquiring real-time monitoring pictures according to monitoring cameras erected at various entrances and exits of a construction site, and inputting the real-time monitoring pictures into a POE switch to obtain initial video materials after conversion;

the deep learning algorithm analysis module is used for outputting video analysis pictures and personnel statistics data corresponding to the initial video materials in real time through a trained deep learning algorithm, and outputting the analysis pictures and statistics results to the client;

the analysis result display module is used for displaying video analysis pictures and statistics data of personnel and vehicles at the client, and is also used for managing the analysis pictures and the statistics data obtained by switching cameras at different entrances and exits according to requirements.

Further, the deep learning algorithm analysis module comprises a detection area setting module, a video analysis module, a data statistics module and a data transmission module;

the detection area setting module is used for controlling the actual detection range in each picture by adjusting the corresponding detection range according to the picture layout of different entrances and exits so as to realize the setting of the detection area;

the video analysis module is used for analyzing images in the initial video material through coordinates of the wire collision detection points, so that analysis and statistics of personnel and vehicles in a detection area are realized;

the data statistics module is used for judging and counting whether the target is in or out through the color of the wire collision area of the target frame or not;

the data transmission module is used for outputting the analysis picture and the statistical result to the client through the RTSP server and the HTTP push service.

Further, the setting of the detection area includes the following steps:

the positions of all the endpoints are determined in sequence according to the shape of the required area and expressed by an array, each element of the array is a binary array representing the endpoint of the graph, and the adjustment and the setting of the detection area are realized by adjusting the array.

Further, the analyzing the image in the initial video material by the coordinates of the collision detection point to realize the analysis and statistics of the personnel and the vehicles in the detection area comprises the following steps:

acquiring position parameters and categories of all objects in a current image;

judging whether the geometric center of an object in the new frame image and the offset of a geometric center of a certain object in the previous frame image are within a preset offset, if so, judging that the two objects are the same object and have the same ID, if not, judging that the new object exists in the new frame image, and assigning a new ID for the object;

using an image of a known object ID with a rectangle representing the object range, taking the parameter of a certain object range as x ₁ ，y ₁ ，x ₂ ，y ₂ Wherein x is ₁ <x ₂ ，y ₁ <y ₂ The coordinates of the wire-strike detection point are (check_point_x, check_point_y), check_point_x=x ₁ ，check_point_y＝int[y ₁ +(y ₂ -y ₁ )*0.6]Int means rounding the operation result;

and judging whether the object collision detection point is positioned in the judging area, if so, carrying out statistical operation on the object, and if not, not carrying out statistical operation.

Further, the determining and counting of the entrance and exit of the object through whether the object frame collides with the line or not and the color of the line collision area of the object frame comprises the following steps:

the method comprises the steps that all pictures captured by a current camera are defined as detection areas, a blue and yellow strip-shaped area is set as a judging area, and the pictures are recorded as entering when a target goes up to hit a yellow line, and recorded as exiting when the target goes down to hit a blue line;

acquiring real-time monitoring pictures acquired by monitoring cameras at all entrances and exits of a construction site, and performing size reduction treatment on the acquired real-time monitoring pictures;

judging whether a target appears in the detection area of the reduced real-time monitoring picture, if not, regarding the monitoring picture as an invalid picture and performing neglect cleaning, if so, framing the target and outputting;

detecting whether the target frame collides with the wire or not and the color of the wire collision area of the target frame, and judging and counting the entry and exit of the target according to the color of the wire collision area of the target frame.

Further, the intelligent monitoring module comprises a safety helmet identification module, a working clothes identification module, a state identification module and an alarm reminding module;

the safety helmet identification module is used for identifying constructors without wearing safety helmets passing through the site entrance by utilizing an image identification technology;

the work clothes identification module is used for identifying constructors who do not wear work clothes and pass through the worksite entrance by utilizing an image identification technology;

the state recognition module is used for recognizing the mental state of constructors passing through the worksite entrance by utilizing an image recognition technology;

the alarm reminding module is used for reminding constructors who do not wear safety helmets, do not wear working clothes and do not accord with the standard in mental state.

Further, the state recognition module comprises a constructor face image acquisition module, a fusion feature recognition module and a state recognition result output module;

the constructor face image acquisition module is used for acquiring face images of constructors in real-time monitoring pictures of all entrances and exits of a construction site;

the fusion characteristic recognition module is used for recognizing the facial image of the constructor by utilizing a facial state recognition algorithm based on independent characteristic fusion, so as to recognize the mental state of the constructor entering the construction site;

the state identification result output module is used for outputting mental state information of constructors in real-time monitoring pictures of all entrances and exits of the construction site.

Further, the fusion feature recognition module comprises a global state feature extraction module, a local state feature extraction module, a state feature fusion module and a state feature analysis recognition module;

the global state feature extraction module is used for extracting global state features of the facial image through discrete cosine transformation, and removing correlation of the global state features by utilizing an independent component analysis technology to obtain independent global state features;

the local state feature extraction module is used for extracting features of the eyes and the mouth regions in the image sequence, and performing Gabort wavelet transformation and feature fusion on the eyes and the mouth regions respectively to obtain dynamic multi-scale features of the two local regions as local state features of the facial image;

the state feature fusion module is used for fusing the independent global state features and the local state features, and adding local detail information into the global features to obtain face state fusion features;

the state characteristic analysis and recognition module is used for analyzing and recognizing the obtained facial state fusion characteristics through a preset classifier to obtain the mental state information of the constructor, wherein the mental state of the constructor comprises a awake state, a mild fatigue state, a moderate fatigue state and a severe fatigue state.

Further, the preset classifier is obtained by selecting part of features through an AdaBoost algorithm, removing redundant features and training, and the calculation formula of the preset classifier is as follows:

wherein T represents the final number of algorithm loops, a _t Representation classifier h _t (X) the selected weights, determined by AdaBoost algorithm learning, x= (X) ₁ ，X ₂ ，…，X _T ) Representing the dynamic Gabor features of the selected facial image sequence.

The beneficial effects of the invention are as follows:

1) The invention can realize intelligent statistics of the number of the in-and-out of the personnel and the vehicles in the construction site and the total number of the personnel and the vehicles in the local area by detecting the constructors passing through the entrance of the construction site by using a trained deep learning algorithm, tracking the targets by using a tracking algorithm and counting the targets when the targets collide with a detection line, and can also respectively identify the dressing and the state of the constructors passing through the entrance of the construction site by using an image identification technology and remind the constructors not conforming to the standard, thereby effectively avoiding the occurrence of safety accidents caused by the fact that the constructors do not wear safety caps, do not wear working clothes or have poor mental state, and further effectively improving the construction safety performance of the construction site.

2) The invention not only can achieve more than 95% of recognition accuracy under various illumination conditions, but also can ensure that the time difference between an analysis picture and an original picture is not more than 1 second while ensuring the accuracy of statistical data, and has the advantages of high recognition accuracy and high recognition speed.

3) Based on the algorithm used by the invention and the plasticity contained in the UI, different functions such as face recognition, work recognition and the like can be added according to different requirements, so that the expansibility is high; in addition, the invention has low price of the required hardware equipment, saves the cost and improves the field management efficiency at the same time so as to obtain more economic benefits.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a block diagram of a constructor intelligent monitoring system based on computer vision and deep learning according to an embodiment of the present invention;

FIG. 2 is a block diagram of a deep learning algorithm analysis module in a constructor intelligent monitoring system based on computer vision and deep learning according to an embodiment of the invention;

FIG. 3 is a block diagram of a state identification module in a constructor intelligent monitoring system based on computer vision and deep learning according to an embodiment of the present invention;

fig. 4 is a block diagram of a fused feature recognition module in a constructor intelligent monitoring system based on computer vision and deep learning according to an embodiment of the present invention.

In the figure:

1. an intelligent statistics module; 2. an intelligent monitoring module; 11. a video acquisition module; 12. a deep learning algorithm analysis module; 121. a detection region setting module; 122. a video analysis module; 123. a data statistics module; 124. a data transmission module; 13. an analysis result display module; 21. a safety helmet identification module; 22. a work clothes identification module; 23. a state recognition module; 231. a constructor facial image acquisition module; 232. a fusion feature recognition module; 2321. a global state feature extraction module; 2322. a local state feature extraction module; 2323. a state characteristic fusion module; 2324. a state characteristic analysis and identification module; 233. a state identification result output module; 24. and an alarm reminding module.

Detailed Description

For the purpose of further illustrating the various embodiments, the present invention provides the accompanying drawings, which are a part of the disclosure of the present invention, and which are mainly used to illustrate the embodiments and, together with the description, serve to explain the principles of the embodiments, and with reference to these descriptions, one skilled in the art will recognize other possible implementations and advantages of the present invention, wherein elements are not drawn to scale, and like reference numerals are generally used to designate like elements.

According to the embodiment of the invention, an intelligent constructor monitoring system based on computer vision and deep learning is provided.

The invention will now be further described with reference to the accompanying drawings and detailed description, as shown in fig. 1 to 4, a constructor intelligent monitoring system based on computer vision and deep learning according to an embodiment of the invention, the system comprises an intelligent statistics module 1 and an intelligent monitoring module 2;

the intelligent statistics module 1 is used for detecting constructors passing through a construction site entrance by using a trained deep learning algorithm, tracking targets by using a tracking algorithm, counting and counting the targets when the targets collide with a detection line, and displaying the targets in real time by a client;

the intelligent statistics module 1 comprises a video acquisition module 11, a deep learning algorithm analysis module 12 and an analysis result display module 13;

the video acquisition module 11 is used for acquiring real-time monitoring pictures according to monitoring cameras erected at various entrances and exits of a construction site, and inputting the real-time monitoring pictures into a POE switch to obtain an initial video material after conversion;

the deep learning algorithm analysis module 12 is configured to output, in real time, a video analysis picture corresponding to the initial video material and personal statistics data through a trained deep learning algorithm, and output the analysis picture and statistics results to a client;

the main feature extraction network adopted by the algorithm of the embodiment is superior to a cross-stage partial network, so that the memory consumption is reduced, the learning capacity of the convolutional neural network is enhanced, the network is further widened, and the algorithm accuracy is ensured. In a training strategy of a trunk feature extraction network, a plurality of images are spliced to simulate objects in a complex environment in order to ensure the recognition accuracy in the complex environment. The algorithm also carries out decision again on the changed image, improves the weak environment of the decision boundary and improves the robustness of the system. The algorithm identifies the object by three steps:

first, the image is sampled 5 times by using a convolution layer with a step length of 2 and a convolution kernel of 3×3, so as to extract the trunk feature of the image and generate five feature layers with different sizes.

And secondly, carrying out multiscale receptive field fusion on the small-size feature layer, carrying out maximum value pooling, and carrying out parameter aggregation on the processed small-size feature map and the large-size map through tensor splicing. The algorithm is still applicable under detection of different sizes.

Finally, the feature layer is divided into grids with different sizes, so that various targets with larger size difference can be detected, and target loss is avoided. And generating multiple prior frames with different sizes on the grids, returning the probability that each prior frame returns the object to be the preset object category, and returning the confidence level and the actual position of all preset object categories by taking the prior frame with the maximum confidence level in the multiple prior frames as the actual position of the object.

Specifically, the confidence interval is set manually, and the confidence is given after the algorithm recognizes the object. In the test, the minimum value of the confidence coefficient is 0.3, the confidence coefficient interval is 0.3,1, and under the interval, the identification accuracy is high due to the fact that the types of objects to be detected are single, the natural environment change is small. The optimal confidence interval is greatly influenced by environmental factors, so that the optimal confidence interval applicable to different environments does not exist, and the optimal confidence interval under the corresponding condition can be obtained only by debugging the confidence interval for a plurality of times under the actual environment.

Specifically, the deep learning algorithm analysis module 12 includes a detection area setting module 121, a video analysis module 122, a data statistics module 123, and a data transmission module 124;

the detection area setting module 121 is configured to control the actual detection ranges in each picture by adjusting the corresponding detection ranges according to the picture layout of different entrances and exits, so as to set a detection area (the detection area includes all picture ranges, the determination area includes only a blue Huang Tiaoxing area, and if a person appears in a picture but does not pass through a blue yellow stripe area, the person can be captured but does not account for statistical data);

in this embodiment, it is found in the field test process that the detection ranges of the video frames in different areas need to be adjusted. Generally, all pictures acquired by the camera are taken as detection areas by default for recognition, however, in practical application, the whole monitoring picture is not always needed to participate in recognition, for example, a main person on a project site tested in the development process enters and exits a channel picture mainly comprising a gate channel and a right guard duty room. The algorithm can identify and detect personnel in the whole picture under the default condition, if a person walks back and forth in the duty room, the personnel can be judged as 'entering' or 'exiting' by the algorithm, and the misjudgment can cause huge errors to the system statistics result. Therefore, in the process of landing the project on site, the system needs to adjust the detection range of the corresponding background algorithm according to the picture layout of different entrances and exits. The actual detection range in each screen is controlled by setting a detection area in an algorithm code. For any point on the image, the position of each endpoint can be determined by a binary array, if a detection area needs to be set, the positions of the endpoints can be sequentially determined according to the shape of the required area, the detection area at the moment is expressed by an array, and each element of the array is a binary array representing the endpoint of the graph, so that the aim of adjusting the geometric area is fulfilled by the array. All persons entering the channel picture are identified, but only the persons passing through the detection area in the figure enter the statistics, and the persons moving outside the detection area do not influence the statistics.

The video analysis module 122 is configured to analyze the image in the initial video material by using coordinates of the wire collision detection point, so as to analyze and count personnel and vehicles in the detection area;

the algorithm in this embodiment can identify and give the object type and range in the image after the initialization is completed, and the algorithm adopts a rectangular frame to determine the position of the image, and in the plane, only one diagonal coordinate, such as the top right corner vertex coordinate value (x ₁ ，y ₁ ) Coordinate value of vertex with lower left corner (x ₂ ，y ₂ ) The position of the rectangle in the image can be determined by four parameters, so the algorithm only needs to return the four parameters and the object category. Four parameters returned by the algorithm are used for determining the collision detection point and drawing a rectangle.After the position parameters and the categories of all the objects in the current image are obtained, if the offset between the geometric center of the object in the new frame image and the geometric center of one object in the previous frame image is within the set offset, the two objects are regarded as the same object and have the same ID. If a new object exists in a new frame, a new ID is assigned to the object. After the above processing, an image in which the object range is represented by a rectangle and the object ID is known is obtained. Let the parameter of the object range be x ₁ ，y ₁ ，x ₂ ，y ₂ Wherein x is ₁ <x ₂ ，y ₁ <y ₂ The coordinates of the wire-strike detection point are (check_point_x, check_point_y), check_point_x=x ₁ ，check_point_y＝int[y ₁ +(y ₂ -y ₁ )*0.6]Int means rounding the operation result. The algorithm performs the statistical operation only when the line collision detection point of a certain object is positioned in the judgment area, otherwise, the statistical operation is not performed.

The data statistics module 123 is configured to determine and count whether the target frame collides with the wire or not and the color of the wire collision area of the target frame;

when the personnel statistics function is realized, a more visual and accurate judgment mode is adopted. Namely, presetting a judging area in an algorithm, wherein a strip-shaped area with preset blue and yellow is taken as the judging area. And when the target goes up to hit the yellow line, the target is marked as entering, and when the target goes down to hit the blue line, the pedestrian entering and exiting number is calculated. When the algorithm runs, the video is processed, firstly, the size is reduced, whether the target appears in the picture is detected, if no target appears in the picture, the algorithm can ignore and clean the picture as invalid picture, when the target exists in the picture, the algorithm can frame the target and output the frame, and finally, whether the target frame collides with a line or not and whether the area where the target frame collides with is judged as blue or yellow as a basis for the entry and the exit of the target.

The data transmission module 124 is configured to output the analysis picture and the statistics result to the client through the RTSP server and the HTTP push service.

After the video material is identified and analyzed by the deep learning algorithm, an analysis picture needs to be output to the client, in order to minimize the delay between the real-time monitoring picture and the analysis picture displayed in the client, the transmission mode used in this embodiment is an RTSP server, and a RTSP (Real Time Streaming Protocol) real-time streaming protocol. Firstly, a video stream interface capable of supporting an RTSP is written in an algorithm, and then an RTSP stream is obtained according to information such as an IP address, a port number, a device user name, a password and the like of a camera. Further, the analysis picture can be transmitted to the client by inputting the stream address into an interface preset by the algorithm.

Similarly to outputting the real-time analysis picture, the data counted in the deep learning algorithm also needs to be transmitted to the client in real time, and in this embodiment, HTTP is used to realize data transmission. HTTP (Hyper Text Transfer Protocol), the hypertext transfer protocol, has the advantage of being simple, flexible and easy to extend. In this embodiment, an HTTP program is written in a server, that is, a deep learning algorithm, then a TCP connection from a client to the server is created, and a data request packet is written in the client, so that when the client sends the packet, an HTTP request is implemented. After the server receives the request, the response is organized according to the request content and the TCP connection is multiplexed to report the response content back to the client, the response content is analyzed and read in the client, and finally the response content is displayed on a user interface of the client in real time.

The analysis result display module 13 is configured to display a video analysis screen and statistics data of personnel and vehicles on a client, and is also configured to manage analysis screens and statistics data acquired by personnel switching cameras at different entrances and exits according to requirements.

In order to enable the real-time video analysis picture to be more intuitively presented to site management staff and facilitate the site management staff to acquire analysis pictures of different entrances and exits according to requirements, in the embodiment, RTSP and HTTP push streaming services are added into an algorithm through the combination of a deep learning algorithm and a fantasy 4 engine, the acquired video analysis picture and a statistical result of staff are displayed on a client in real time, and other functions are added to the client.

The functions included in the client mainly comprise:

(1) The personnel and vehicle statistics function comprises the quantity of personnel and vehicles entering and exiting and the total quantity of personnel and vehicles in the local area;

(2) And when a plurality of cameras are erected at different entrances and exits, the visual angle switching function can switch among different cameras through the client and acquire personnel and vehicle entrance and exit data recorded by the cameras in real time.

The intelligent monitoring module 2 is used for respectively identifying the dressing and the state of constructors passing through the site entrance by utilizing a preset image identification technology and reminding constructors which do not accord with the standard.

The intelligent monitoring module 2 comprises a safety helmet identification module 21, a work clothes identification module 22, a state identification module 23 and an alarm reminding module 24;

the helmet identification module 21 is used for identifying constructors who pass through the site entrance and do not wear the helmet by utilizing an image identification technology;

the work clothes identification module 22 is used for identifying constructors who do not wear work clothes and pass through a work site entrance by utilizing an image identification technology;

the state recognition module 23 is used for recognizing the mental state of constructors passing through the worksite entrance by using an image recognition technology;

specifically, the state recognition module 23 includes a constructor facial image acquisition module 231, a fusion feature recognition module 232, and a state recognition result output module 233;

the constructor face image obtaining module 231 is configured to obtain face images of constructors in real-time monitoring frames of each gateway on a construction site;

the fusion feature recognition module 232 is used for recognizing the facial image of the constructor by using a facial state recognition algorithm based on independent feature fusion, so as to recognize the mental state of the constructor entering the construction site;

the fusion feature recognition module 232 includes a global state feature extraction module 2321, a local state feature extraction module 2322, a state feature fusion module 2323, and a state feature analysis recognition module 2324;

the global state feature extraction module 2321 is configured to extract global state features of the facial image through discrete cosine transform (discrete cosine transform, DCT), and remove correlation of the global state features by using independent component analysis (independent component analysis, ICA) technology, so as to obtain independent global state features;

the discrete cosine transform is a common image data compression method, and for an MxN digital image (x, y), the definition of the 2D discrete cosine transform is:

where u=0, 1,2, M-1; v=0, 1,2,;

the discrete cosine transform is characterized in that: when the frequency domain change factors u and v are large, the DCT coefficient C (u and v) has small value; the larger value of C (u, v) is mainly distributed in the upper left corner area with smaller u and v, which is also a concentration area of useful information, and the useful information of the area is extracted as the global fatigue characteristic of the image in the embodiment.

Independent principal component analysis is an effective method for solving the problem of blind signal separation, and independent source signals can be successfully separated from mixed signals through a transformation matrix by ICA, and the characteristic can be used for reducing the dimension of fatigue characteristic vectors and reducing the higher-order correlation among components in the characteristic vectors in the aspect of fatigue characteristic extraction.

In addition to the second-order statistical correlation which can be removed by the PCA method, the higher-order statistical correlation quantity also occupies a large component in the facial expression image, so that the higher-order correlation quantity in the facial expression image can be removed by the ICA method to obtain the characteristic with more discrimination capability. The basic idea of the ICA algorithm is to represent a series of random variables with a set of basis functions, while assuming statistical or as independent as possible between the components.

In this embodiment, ICA is used to successfully separate mutually independent features from global DCT features of a facial image sequence through a transformation matrix, so that not only can the dimension of feature vectors be reduced, but also the higher-order correlation between components in the feature vectors can be reduced to obtain independent global features with better discrimination capability.

The local state feature extraction module 2322 is configured to extract features of an eye region and a mouth region in an image sequence, and perform gabor wavelet transform and feature fusion on the eye region and the mouth region respectively, so as to obtain dynamic multi-scale features of two local regions as local state features of a facial image;

the characteristic features of the face during fatigue have different scales, have overall larger scales and fine smaller scales, so that the analysis of a single scale is difficult to extract all important features of the face fatigue expression, however, the analysis of the face visual information can be better carried out by utilizing multi-scale decomposition to extract the features in the scale and direction containing more fatigue information, and for the face video image sequence, the fatigue information needs to be analyzed by a multi-scale method because the scales of different face movements during fatigue are different.

Gabor wavelet is a powerful tool for multi-scale analysis, compared with DCT, gabor transformation obtains optimal localization in time domain and frequency domain at the same time, its transformation coefficient describes gray scale characteristics of the area near a given position on an image, and has the advantages of insensitivity to illumination, position, etc., and is suitable for representing local characteristics of human face, while DCT focuses on global information of the image, and often ignores local information more important in the face recognition process, so in this embodiment, gabor wavelet transformation is introduced to extract local multi-scale characteristics of the image, and fuse with independent global characteristics, and local characteristics of the image are added in global information.

The state feature fusion module 2323 is configured to fuse an independent global state feature with a local state feature, and add local detail information to the global feature to obtain a face state fusion feature;

the state feature analysis and recognition module 2324 is configured to analyze and recognize the obtained facial state fusion feature through a preset classifier to obtain mental state information of a constructor, where the mental state of the constructor includes a clear state (eyes are in normal contention, eyeballs are active, head is correct, eye concentration is achieved, eyebrows are flat), a mild fatigue state (eyeballs are active and drop, eye is left, eyebrows are in sagging trend, forehead is tight, head rotation frequency is increased, and spirit is not vibrated), a moderate fatigue state (eye closure, yawning, nodding and other phenomena occur, eyebrows are severely sagged, facial muscle deformation is serious), and a severe fatigue state (eye closure trend is aggravated, and continuous eye closing phenomenon and attention distraction occur).

The preset classifier is obtained by selecting part of features through an AdaBoost algorithm, removing redundant features and training, and the calculation formula of the preset classifier is as follows:

The state recognition result output module 233 is used for outputting mental state information of constructors in real-time monitoring pictures of all entrances and exits of the construction site.

The alarm reminding module 24 is used for reminding constructors who do not wear safety helmets, do not wear work clothes and do not accord with the standard in mental state.

In summary, by means of the technical scheme, the trained deep learning algorithm is utilized to detect constructors passing through the construction site entrance, the tracking algorithm is utilized to track targets, and counting are performed when the targets collide with the detection line, so that intelligent counting of the number of the constructors and vehicles entering and exiting and the total number of the constructors and vehicles in the local area can be realized.

Meanwhile, the invention not only can achieve more than 95% of recognition accuracy under various illumination conditions, but also can ensure that the time difference between an analysis picture and an original picture is not more than 1 second while ensuring the accuracy of statistical data, and has the advantages of high recognition accuracy and high recognition speed.

Meanwhile, based on the algorithm used by the invention and the plasticity contained in the UI, different functions such as face recognition, work type recognition and the like can be added according to different requirements, so that the expansibility is high; in addition, the invention has low price of the required hardware equipment, saves the cost and improves the field management efficiency at the same time so as to obtain more economic benefits.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. The constructor intelligent monitoring system based on computer vision and deep learning is characterized by comprising an intelligent statistics module (1) and an intelligent monitoring module (2);

the intelligent statistics module (1) is used for detecting constructors passing through a construction site entrance by using a trained deep learning algorithm, tracking targets by using a tracking algorithm, counting and counting when the targets collide with a detection line, and displaying in real time by a client;

the intelligent monitoring module (2) is used for respectively identifying the dressing and the state of constructors passing through the site entrance by utilizing a preset image identification technology and reminding constructors which do not accord with the standard;

the intelligent statistics module (1) comprises a video acquisition module (11), a deep learning algorithm analysis module (12) and an analysis result display module (13);

the video acquisition module (11) is used for acquiring real-time monitoring pictures according to monitoring cameras erected at various entrances and exits of a construction site, and inputting the real-time monitoring pictures into a POE switch to obtain an initial video material after conversion;

the deep learning algorithm analysis module (12) is used for outputting video analysis pictures and personnel statistics data corresponding to the initial video materials in real time through a trained deep learning algorithm, and outputting the analysis pictures and statistics results to a client;

the analysis result display module (13) is used for displaying video analysis pictures and statistics data of personnel and vehicles on the client side, and is also used for enabling management personnel to switch the analysis pictures and the statistics data obtained by cameras at different entrances and exits according to requirements;

the deep learning algorithm analysis module (12) comprises a detection area setting module (121), a video analysis module (122), a data statistics module (123) and a data transmission module (124);

the detection area setting module (121) is used for controlling the actual detection range in each picture by adjusting the corresponding detection range according to the picture layout of different entrances and exits so as to realize the setting of the detection area;

the video analysis module (122) is used for analyzing images in the initial video material through coordinates of the wire collision detection points, so that analysis and statistics of personnel and vehicles in a detection area are realized;

the data statistics module (123) is used for judging and counting whether the target frame collides with the wire or not and the color of the wire collision area of the target frame;

the data transmission module (124) is used for outputting the analysis picture and the statistical result to the client through the RTSP server and the HTTP push service.

2. The intelligent constructor monitoring system based on computer vision and deep learning as set forth in claim 1, wherein the setting of the detection area comprises the steps of:

3. The intelligent monitoring system for constructors based on computer vision and deep learning according to claim 2, wherein the analysis of the image in the initial video material by the coordinates of the collision detection point, the analysis and statistics of the personnel and vehicles in the detection area are realized by the following steps:

acquiring position parameters and categories of all objects in a current image;

4. The intelligent monitoring system for constructors based on computer vision and deep learning according to claim 3, wherein the determining and counting of the goal passing in and out by the goal frame wire collision and the color of the goal frame wire collision area comprises the following steps:

5. The intelligent monitoring system for constructors based on computer vision and deep learning according to claim 1, wherein the intelligent monitoring module (2) comprises a safety helmet identification module (21), a work clothes identification module (22), a state identification module (23) and an alarm reminding module (24);

the helmet identification module (21) is used for identifying constructors without wearing the helmet, which pass through the site entrance, by utilizing an image identification technology;

the work clothes identification module (22) is used for identifying constructors who do not wear work clothes and pass through a worksite entrance by utilizing an image identification technology;

the state identification module (23) is used for identifying the mental state of constructors passing through the worksite entrance by utilizing an image identification technology;

the alarm reminding module (24) is used for reminding constructors who do not wear safety helmets, do not wear work clothes and do not accord with the spirit state standard.

6. The intelligent monitoring system for constructors based on computer vision and deep learning according to claim 5, wherein the state recognition module (23) comprises an constructor face image acquisition module (231), a fusion feature recognition module (232) and a state recognition result output module (233);

the constructor face image acquisition module (231) is used for acquiring face images of constructors in real-time monitoring pictures of all entrances and exits of a construction site;

the fusion characteristic recognition module (232) is used for recognizing the facial image of the constructor by utilizing a facial state recognition algorithm based on independent characteristic fusion, so as to recognize the mental state of the constructor entering the construction site;

the state identification result output module (233) is used for outputting the mental state information of constructors in real-time monitoring pictures of all entrances and exits of the construction site.

7. The intelligent constructor monitoring system based on computer vision and deep learning of claim 6, wherein the fusion feature recognition module (232) comprises a global state feature extraction module (2321), a local state feature extraction module (2322), a state feature fusion module (2323) and a state feature analysis recognition module (2324);

the global state feature extraction module (2321) is used for extracting global state features of the facial image through discrete cosine transformation, and removing correlation of the global state features by utilizing an independent component analysis technology to obtain independent global state features;

the local state feature extraction module (2322) is used for extracting features of an eye region and a mouth region in an image sequence, and performing Gabort wavelet transformation and feature fusion on the eye region and the mouth region respectively to obtain dynamic multi-scale features of two local regions as local state features of a facial image;

the state feature fusion module (2323) is used for fusing the independent global state feature and the local state feature, and adding local detail information into the global feature to obtain a face state fusion feature;

the state feature analysis and recognition module (2324) is used for analyzing and recognizing the obtained facial state fusion features through a preset classifier to obtain mental state information of constructors, wherein the mental states of the constructors comprise an awake state, a mild fatigue state, a moderate fatigue state and a severe fatigue state.

8. The intelligent monitoring system for constructors based on computer vision and deep learning according to claim 7, wherein the preset classifier is obtained by selecting part of features, removing redundant features and training through an AdaBoost algorithm, and the calculation formula of the preset classifier is as follows: