CN116863395A

CN116863395A - Building site construction monitoring system

Info

Publication number: CN116863395A
Application number: CN202310708599.3A
Authority: CN
Inventors: 陈涛; 杨昕; 刘祥; 孙科; 杨帆; 刘伟; 汪左成; 甄玉龙; 马玉林; 郭希良; 陈钇朴; 欧阳义斌
Original assignee: Beijing Institute of Radio Metrology and Measurement
Current assignee: Beijing Institute of Radio Metrology and Measurement
Priority date: 2023-06-15
Filing date: 2023-06-15
Publication date: 2023-10-10

Abstract

The application discloses a construction monitoring system for a construction site, which comprises a video monitoring module, a microwave imaging module, a sensor module, a deep learning module, an edge computing module and an Internet of things module, wherein the video monitoring module is used for monitoring a construction site in daytime, the microwave imaging module is generally used for monitoring a construction site at night, the sensor module is used for monitoring the concentration of toxic and harmful gases, the deep learning module is used for obtaining the condition of personnel on the construction site by analyzing videos and images through FasterR-CNN fused with RPN, the edge computing module is used for assisting the deep learning module, and the Internet of things module is used for transmitting effective videos and pictures recognized by the deep learning module to a background data center; real-time and faster data processing and analysis: the data processing is closer to the data source, the pressure of the network communication bandwidth and the data center is relieved, the service response capability is improved, the privacy data is protected, the speed, the safety, the expansibility, the versatility and the reliability are integrated, the application program can run more efficiently at a faster speed, and the intelligent characteristic is obvious.

Description

Building site construction monitoring system

Technical Field

The application relates to a construction site monitoring system, in particular to a construction site monitoring system which works by utilizing a video monitoring technology, a microwave imaging technology, a sensor technology, an Internet of things technology, an edge computing technology and a deep learning technology.

Background

With the rapid development of information technologies such as big data and the internet of things, and the 'two new' strategic goals of national pipe network companies, an electronic system or a network system for detecting and monitoring a fortification area, displaying and recording field images in real time, and retrieving and displaying historical images has become a trend. In the past, video surveillance technology applications have focused mainly on government and special departments of finance, public security, traffic, electricity, and the like, and industries. Among them, government and financial sectors occupy 20.9% and 20.6% of market share, respectively. However, with the progress of social informatization, the demands of video monitoring in more and more industries and fields are greatly increased, and video monitoring technology starts to extend from individual fields of banks, traffic and the like to multiple fields, and the traditional security monitoring is developed to management monitoring and production management monitoring.

The intelligent construction site of the pipeline construction site has become a necessary trend, and the intelligent construction site comprises site networking, video monitoring, working condition acquisition, data aggregation, transmission and the like. Because the construction environment of the long-distance pipeline is bad, a plurality of hardware devices added in the construction of the intelligent construction site bring a plurality of unnecessary troubles to the implementation of construction units, and the mobile network environment in the remote area is bad, so that data cannot be transmitted to a data center, thereby forming a data island. The integrated equipment with the edge computing capability, convenient movement and high endurance is urgently needed.

Disclosure of Invention

In order to solve the prior art problems, the application provides a construction site construction monitoring system, which comprises

The video monitoring module shoots through a camera and is used for monitoring the construction site in the daytime, acquiring the video of the construction process of the construction site,

the microwave imaging module images the construction site by scanning, is used for monitoring the construction site at night, acquires the image of the construction process of the construction site,

the sensor module is used for monitoring the concentration of toxic and harmful gases, and when the concentration exceeds the standard, an alarm is sent to warn on-site operators to evacuate;

the deep learning module is respectively in data interaction with the video monitoring module, the microwave imaging module and the sensor module and is used for obtaining the condition of construction site personnel through FasterR-CNN analysis video and images fused with RPN,

the edge calculation module is used for carrying out data interaction with the deep learning module, and is used for assisting the deep learning module to acquire effective videos and pictures corresponding to the conditions that a person does not take a safety helmet, does not wear a tool, smokes or falls on a construction site when the deep learning module analyzes the videos and the images, and improving the recognition efficiency of the deep learning module;

and the Internet of things module is used for carrying out data interaction with the deep learning module and transmitting the effective video and the pictures identified by the deep learning module to a background data center.

Preferably, the microwave imaging module adopts millimeter wave radar imaging technology, and comprises a transmitting and receiving radio frequency assembly for transmitting and receiving frequency modulation continuous waves, and the frequency of a transmitted signal is linearly increased along with the time variation.

Preferably, the sensor module is further used for providing the data of the on-site smoke concentration for the deep learning module, and judging whether the condition of on-site smoke of the personnel exists in the video or the image by combining the video and the image.

Preferentially, faster R-CNN is divided into 6 steps:

(1) Training the RPN on the trained model;

(3) Collecting proposals by utilizing the RPN trained in the step 1;

(4) Training Fast RCNN for the first time;

(5) The second training RPN is similar to the first training;

(6) Collecting proposals by using the RPN trained in the step 4 again;

(7) The second training of Fast RCNN is similar to the first training.

Preferably, the deep learning module is further configured to fuse the RPN into a FasterR-CNN, including: (1) Pre-training RPN on ImageNet and finishing on PASCAL VOC dataset; (2) Training a Fast R-CNN alone using region probes generated by the trained PRNs, this also being pre-trained on ImageNet; (3) Initializing RPN with CNN model part, then carrying out finishing on the rest layer in the RPN, wherein Fast R-CNN and the feature extractor of the RPN are shared; (4) A fixed feature extractor for performing finishing on the Fast R-CNN residual layer; after multiple iterations, fast R-CNN and RPN are organically fused together to form a unified network.

Preferably, the RPN receives the whole picture by adopting a CNN model and extracts a feature map, a N-by-N sliding window is adopted on the feature map, k priori frames with different sizes or proportions are arranged at the position of each sliding window to represent the position of each sliding window to predict k candidate areas, a low-dimensional feature is mapped on each sliding window, and then the feature is respectively sent to two connecting layers, namely a classification layer and a regression layer, wherein the classification layer outputs 2k and is used for representing the probability value of each candidate area containing an object or a background; the regression layer outputs 4k coordinate values for representing the positions of each candidate region relative to each prior frame; the classification layer and regression layer of the location of each sliding window are shared.

Preferably, the deep learning module is further configured to match the prior frame with the group-trunk by using the Fast R-CNN, set the threshold of non-maximum suppression and IoU to 0.7, screen out the number of candidate regions meeting the requirement, and then select top-N region probes for training the Fast R-CNN according to descending order of confidence.

Preferentially, the matching principle is: (1) IoU highest a priori box to a certain group-trunk; (2) A priori box with a IoU value greater than 0.7 for a group-trunk; a group-trunk can be matched when one prior frame is met, the prior frame after matching is a positive sample belonging to an object, and the group-trunk is taken as a regression target; a priori boxes with IoU values below 0.3 for any of the group-trunk are called negative samples; setting NMS: the non-maximum suppression, ioU threshold is set to 0.7, the number of candidate regions meeting the requirements is screened out, then top-N region probes are selected for training Fast R-CNN in descending order of confidence.

Preferentially, train RPN:

reading a pre-trained model, performing iterative training, extracting feature maps by using Conv Layers, and using the following Loss:

in the above formula, i represents anchors index, p _i A representation is given by positive softmax probability of the general formula,

represents the corresponding GT prediction probability, t represents predict bounding box, t ^* Representing the GT box corresponding to the corresponding positive anchor.

Preferably, loss is divided into 2 parts:

the softmax loss calculated by the cls loss and rpn _cls_loss layer is used for classifying the anchors into positive and negative network training;

soomth L1 loss calculated by regloss, rpn _loss_bbox layer is used for bounding box regression network training.

The calculation formula of the soomth L1 loss is as follows:

the set matrix formed by the anchors of the RPN, x and y are the upper left and lower right corner coordinates of the set matrix, and w and h represent width and height.

The application discloses the following technical effects:

1. real-time and faster data processing and analysis: the data processing is closer to the data source than is done at the external data center or cloud, so that the delay time can be reduced

2. Less network traffic: as the number of devices of the internet of things increases, the data generation speed increases dramatically. Therefore, the network bandwidth becomes more limited, cloud heavy load is heavy, and a larger data bottleneck is caused, the system can relieve the pressure of the network communication bandwidth and a data center, can also improve service response capability, protects private data, and achieves the purposes of integrating speed, safety, expansibility, versatility and reliability.

3. The running efficiency of the application program is improved: as the hysteresis decreases, the application can run more efficiently at a faster speed.

4. Lower cost: the edge calculation of the combination of 'AI+ edge calculation' is not limited to calculation, and the intelligent characteristic is obvious.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of the operation of a worksite construction monitoring device;

FIG. 2 is a block diagram of a construction site monitoring apparatus;

FIG. 3 is a flow chart of a deep learning of a worksite construction monitoring device;

FIG. 4 is a diagram of a training of the construction site monitoring equipment RPN network;

FIG. 5 is a collection diagram of the worksite construction monitoring device proposals;

fig. 6 is a diagram of a site construction monitoring device Fast RCNN network training.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.

1-2, the present application provides a worksite construction monitoring system comprising

Further preferably, the microwave imaging module of the present application adopts millimeter wave radar imaging technology, and includes a transmitting and receiving radio frequency assembly for transmitting and receiving frequency modulated continuous wave, and the frequency of the transmitted signal increases linearly with time.

Further preferably, the sensor module is further used for providing the data of the on-site smoke concentration for the deep learning module and judging whether the condition of on-site smoke extraction of personnel exists in the video or the image by combining the video and the image.

Further preferably, the application refers to the fast R-CNN divided into 6 steps:

(1) Training the RPN on the trained model;

(3) Collecting proposals by utilizing the RPN trained in the step 1;

(4) Training Fast RCNN for the first time;

(5) The second training RPN is similar to the first training;

(6) Collecting proposals by using the RPN trained in the step 4 again;

(7) The second training of Fast RCNN is similar to the first training.

Further preferably, the deep learning module mentioned in the present application is further used for fusing RPN into FasterR-CNN, comprising: (1) Pre-training RPN on ImageNet and finishing on PASCAL VOC dataset; (2) Training a Fast R-CNN alone using region probes generated by the trained PRNs, this also being pre-trained on ImageNet; (3) Initializing RPN with CNN model part, then carrying out finishing on the rest layer in the RPN, wherein Fast R-CNN and the feature extractor of the RPN are shared; (4) A fixed feature extractor for performing finishing on the Fast R-CNN residual layer; after multiple iterations, fast R-CNN and RPN are organically fused together to form a unified network.

Further preferably, the RPN of the application adopts a CNN model to receive the whole picture and extract the feature map, adopts a N x N sliding window on the feature map, sets k priori frames with different sizes or proportions at the position of each sliding window to represent the position of each sliding window to predict k candidate areas, maps a low-dimensional feature on each sliding window, and then sends the feature into two connecting layers, namely a classification layer and a regression layer, wherein the classification layer outputs 2k to represent the probability value of each candidate area containing an object or background; the regression layer outputs 4k coordinate values for representing the positions of each candidate region relative to each prior frame; the classification layer and regression layer of the location of each sliding window are shared.

Further preferably, the deep learning module is further configured to match the prior frame with the group-trunk by using the Fast R-CNN, set the threshold of non-maximum suppression, ioU to 0.7, screen out the number of candidate regions meeting the requirement, and then select top-N region probes for training the Fast R-CNN according to the descending order of confidence.

Further preferably, the matching principle mentioned in the present application is: (1) IoU highest a priori box to a certain group-trunk; (2) A priori box with a IoU value greater than 0.7 for a group-trunk; a group-trunk can be matched when one prior frame is met, the prior frame after matching is a positive sample belonging to an object, and the group-trunk is taken as a regression target; a priori boxes with IoU values below 0.3 for any of the group-trunk are called negative samples; setting NMS: the non-maximum suppression, ioU threshold is set to 0.7, the number of candidate regions meeting the requirements is screened out, then top-N region probes are selected for training Fast R-CNN in descending order of confidence.

Further preferably, the training RPN mentioned in the present application:

Further preferably, the Loss is divided into 2 parts:

The calculation formula of the soomth L1 loss is as follows:

The system comprises an electronic system or a network system for detecting and monitoring the fortification area by utilizing a video monitoring technology, displaying and recording the field image in real time, and retrieving and displaying the historical image. In the past, video surveillance technology applications have focused mainly on government and special departments of finance, public security, traffic, electricity, and the like, and industries. Among them, government and financial sectors occupy 20.9% and 20.6% of market share, respectively. However, with the progress of social informatization, the demands of video monitoring in more and more industries and fields are greatly increased, and video monitoring technology starts to extend from individual fields of banks, traffic and the like to multiple fields, and the traditional security monitoring is developed to management monitoring and production management monitoring.

Microwave imaging refers to an imaging means using microwaves as an information carrier, and essentially belongs to the electromagnetic backscattering problem. It is also known as microwave holographic imaging because it uses both amplitude information and phase information of the object being imaged scattered. The principle is that the microwave irradiates the measured object, and then the shape or (complex) dielectric constant distribution of the object is reconstructed through the measurement value of the scattering field outside the object.

The sensor technology is a technology for converting signals such as sound, light, vibration, etc. into electrical signals. The intelligent construction site monitoring system is used in an intelligent construction site to monitor the state of the intelligent construction site. The internet of things realizes the internet of things, and all information of the pipeline can be wirelessly transmitted to a background server in an intelligent building site.

Edge computing refers to providing near-end services by adopting an open platform with integrated network, computing, storage and application core capabilities on the side close to the object or data source. The application program is initiated at the edge side, and faster network service response is generated, so that the basic requirements of the industry in the aspects of real-time service, application intelligence, security, privacy protection and the like are met. Edge computation is between a physical entity and an industrial connection, or at the top of a physical entity. The cloud computing can still access the historical data of the edge computing.

Deep learning is the inherent regularity and presentation hierarchy of learning sample data, and the information obtained during such learning is helpful in interpreting data such as text, images and sounds. Its final goal is to have the machine have analytical learning capabilities like a person, and to recognize text, image, and sound data. Deep learning is a complex machine learning algorithm that achieves far greater results in terms of speech and image recognition than prior art.

The video monitoring module is mainly used for monitoring construction sites in daytime and comprises a front-end camera, a transmission cable and a video monitoring platform. The camera can be divided into a network digital camera and an analog camera, and can be used for collecting front-end video image signals. The complete video monitoring consists of most of shooting, transmission, control, display and record registration 5. The video camera transmits video images to the control host through the network cable or the coaxial video cable, the control host distributes video signals to each monitor and video equipment, and simultaneously, voice signals to be transmitted can be synchronously recorded into the video recorder. Through the control host, an operator can send out instructions to control the actions of the cradle head, such as up, down, left and right, and perform focusing and zooming operations on the lens, and can realize the switching of multiple cameras through the video matrix. The special video processing mode is utilized to carry out operations such as recording, playback, calling out, storage and the like on the image.

The microwave imaging module is mainly used for monitoring construction sites at night, and takes microwaves as an imaging means of an information carrier, and the principle of the microwave imaging module is that the microwaves are used for irradiating an object to be measured, and then the shape or (complex) dielectric constant distribution of the object is reconstructed through the measured value of a scattering field outside the object. Since the dielectric constant is closely related to the moisture content of biological tissue, microwave imaging is well suited for imaging biological tissue.

The sensor module adopts a harmful gas laser telemeter which is mainly used for monitoring the concentration of toxic and harmful gases such as methane, hydrogen sulfide, carbon monoxide and the like, and once the concentration of the toxic and harmful gases exceeds the standard, the system can give an alarm to prompt on-site operators to withdraw.

The internet of things module adopts the GPRS technology to wirelessly transmit monitoring data in the construction monitoring process to a background data center.

The edge calculation is to enable the mobile terminal equipment to play subjective activity and to make an autonomous decision to a certain extent. And intelligent judgment and action decision are made, and a part of information selected by screening is uploaded to a background data center, so that the pressure of network communication is greatly relieved. The mobile terminal device can autonomously make a partial decision even in the event of a temporary loss of contact with the background data center. The edge calculation can not only relieve the pressure of network communication bandwidth and data center, but also promote service response capability and protect private data, thereby integrating speed, safety, expansibility, versatility and reliability.

In order to analyze whether personnel carry safety helmets, wear tools, smoke and fall, the deep learning method is adopted, and the deep learning is realized by using a Faster R-CNN algorithm.

As shown in fig. 3-6, for Fast R-CNN, it still requires a selective search method to produce candidate regions, which is very time consuming. To solve this problem, the fast R-CNN model introduces RPN: region Proposal Network directly yield candidate regions. Faster R-CNN can be seen as a combination of RPN and Fast R-CNN models, i.e. Faster R-CNN=RPN+fast R-CNN.

For the RPN network, a CNN model, commonly referred to as a feature extractor, is first used to receive the entire picture and extract the feature map. A sliding window of N x N, herein 3*3, is then used on this signature map, mapping a low-dimensional feature for each sliding window position. This feature is then fed into two fully connected layers, one for classification prediction and the other for regression, respectively. For each window position there are typically set k a priori boxes of different sizes or proportions: anchors, default bounding boxes, meaning that each location predicts k candidate regions: region proposals. For the classification layer, the output size is 2k, which indicates the probability value that each candidate region contains an object or is a background, and the regression layer outputs 4k coordinate values, which indicate the position of each candidate region relative to each prior frame. For each sliding window position, the two fully connected layers are shared. Thus, the RPN may be implemented using a convolutional layer: first, an n-by-n convolution yields a low dimensional feature, followed by two 1*1 convolutions for classification and regression, respectively.

It can be seen that RPN uses two classifications, distinguishing only background from object, but does not predict the class of object, i.e., class-diagnostic. Because coordinate values are predicted at the same time, during training, a priori frame is matched with a group-trunk, and the principle is as follows: (1) IoU highest a priori box to a certain group-trunk; (2) The prior frame with a IoU value larger than 0.7 of a group-trunk can be matched with a group-trunk as long as one prior frame is satisfied, so that the prior frame is a positive sample and the group-trunk is taken as a regression target. For a priori boxes where IoU values with any one group-trunk are below 0.3, it is considered a negative sample. The RPN network is individually trainable and the individual trained RPN model gives a number of region proposals. Because of the large number of prior frames, many candidate regions for RPN prediction overlap, and NMS is performed first: the non-maximum suppression, ioU threshold is set to 0.7 and operates to reduce the number of candidate regions and then select top-N region probes for training the Fast R-CNN model in descending order of confidence. The role of RPN is to replace the role of Selective search, but the speed is Faster, so that the fast R-CNN can be accelerated both for training and prediction.

The fast R-CNN model adopts a training strategy of 4-step iteration: (1) Firstly, pre-training RPN on ImageNet, and finishing on PASCAL VOC data set; (2) Training a Fast R-CNN model alone using a region proposals generated by a trained PRN, which model is also pre-trained on ImageNet; (3) Initializing the RPN by using a CNN model part (feature extractor) of the Fast R-CNN, and then carrying out finishing on the rest layers in the RPN, wherein the Fast R-CNN and the feature extractor of the RPN are shared; (4) And (5) fixing the feature extractor, and performing finishing on the Fast R-CNN residual layer. Thus, after a plurality of iterations, fast R-CNN and RPN can be organically fused together to form a unified network. In yet another approach, the approximate co-training strategy combines 2 loss of RPN with 2 loss of Fast R-CNN and then co-trains. Note that this process, the loss of Fast R-CNN does not counter-propagate to the region proposals generated by RPN, so this is an approximation (non-approximated joint training if this counter-propagation is considered). It should be noted that joint training is faster and the same performance can be trained.

After the RPN is adopted, the accuracy and the speed of the fast R-CNN model are greatly improved. The method of adopting RPN instead of heuristic region pro-post by Faster R-CNN is a great innovation, the later two-stage method is basically researched by adopting the basic framework, and compared with the later algorithm, faster R-CNN still occupies the wind in accuracy.

FIG. 2 is a schematic diagram of the operation of the worksite construction monitoring apparatus, as shown in: the construction site construction monitoring equipment is deployed at the construction site, and the monitoring data collected on site can be sent to a background data center in a wireless communication mode. The data collected on site are: the current, voltage, wire feeding speed, temperature between welding layers and ambient temperature and humidity generated by the full-automatic welding machine; video data generated by the video monitoring module; video data generated by the microwave imaging module.

The deep learning module adopts a fast R-CNN algorithm, wherein the core part is the training of the fast R-CNN, and the training process is divided into 6 steps in practice:

training the RPN network on the trained model, corresponding to FIG. 4;

collecting proposals by utilizing the RPN network trained in the step 1, which corresponds to FIG. 5;

training the Fast RCNN network for the first time, which corresponds to FIG. 6;

a second training RPN network, similar to the first training;

collecting proposals by using the RPN network trained in the step 4 again, corresponding to FIG. 4;

the second training of the Fast RCNN network is similar to the first training.

The training process flow diagram is shown in fig. 3.

Training an RPN network

First, the pre-trained model provided by the RBG is read, and iterative training is started using VGG. Looking at the stage1_ rpn _train. Pt network structure, fig. 4.

Similar to the detection network, feature maps are still extracted using Conv dyes. The Loss used by the entire network is as follows:

representing the probability of the corresponding GT prediction, i.e. IoU between the ith anchor and GT>0.7, considered as the anchor is positive, ++>On the contrary IoU<0.3, considered as the anchor as negative, ++>As for those 0.3<IoU<Anchor of 0.7 does not participate in training; t represents predict bounding box, t ^* Representing the GT box corresponding to the corresponding positive anchor. It can be seen that the whole Loss is divided into 2 parts:

the cls, namely the softmax loss calculated by the rpn _cls_loss layer is used for classifying the anchors into positive and negative network training;

the reg loss, namely the soomth L1 loss calculated by the rpn _loss_bbox layer, is used for bounding box regression network training. Note that in this loss, multiplyCorresponding to the regression of only the positive anchors.

Since in actual process N _cls And N _reg The difference being too great, balancing the two with a parameter lambda, e.g. N _cls ＝256，N _reg Arrangement at=24002 kinds of Loss can be uniformly considered in the total network Loss calculation process. Where the comparative importance is L _reg The calculation formula of the soomth L1 loss is as follows:

collecting proposals through a trained RPN network

In this step, the pro-osal rois is acquired using the previous RPN network, while positive softmax probability is acquired, as in fig. 5, and the acquired information is then saved in the python jack file. The network is essentially the same as the RPN network under test, with little distinction.

Training Faster RCNN networks

The previously saved file is read, and the proposals and positive probability are obtained. The network is input from the data layer. Then, the extracted proposals is firstly transmitted into a network as the rois, as shown in fig. 6, and then bbox_inside_weights+bbox_outside_weights are calculated, and the bbox_outside_weights act as RPN and are transmitted into a soomth_L1_loss layer, as shown in fig. 6, so that the final recognition softmax and the final recognition bounding box regression can be trained.

The subsequent stage2 training is all the same or different.

Mention of RPN networks cannot be made of anchors. The anchors are actually a set of rectangles generated by rpn/generate_anchors. Directly running the generate_anchors in the author demo can get the following outputs:

[[-84.-40.99.55.]

[-176.-88.191.103.]

[-360.-184.375.199.]

[-56.-56.71.71.]

[-120.-120.135.135.]

[-248.-248.263.263.]

[-36.-80.51.95.]

[-80.-168.95.183.]

[-168.-344.183.359.]]

wherein the 4 values (x 1, y1, x2, y 2) of each row represent the coordinates of the upper left and lower right corner points of the rectangle. The 9 rectangles have 3 shapes in total and have aspect ratios of approximately three of width: height e {1:1,1:2,2:1 }. In practice, the multiscale method commonly used in detection is introduced by anchors.

The microwave imaging module adopts millimeter wave radar imaging technology, when the working frequency is 76-81GHz, the corresponding wavelength is about 4mm, and the millimeter wave radar module comprises a Transmitting (TX) component, a Receiving (RX) Radio Frequency (RF) component, a clock and other analog components, and also comprises an analog-to-digital converter (ADC), a Microcontroller (MCU), a Digital Signal Processor (DSP) and other digital components. The transmitting component emits a Frequency Modulated Continuous Wave (FMCW) signal whose frequency increases linearly with time. The microwave imaging module images the construction site through scanning, and then gives the generated image to the edge calculation module and the deep learning module for intelligent analysis.

The edge calculation module has extremely strong calculation power, can combine the deep learning module to analyze out effectual video and picture to the video that video monitoring module produced and the image that microwave imaging module 3 produced, if the workman does not take the safety helmet, does not wear the frock, smoking, fall down, wireless long-distance transmission arrives background data center, and then avoids the transmission of the original video data of a large amount of data, makes construction supervision more intelligent.

The application discloses a construction site construction monitoring device which mainly has the following functions: a key is started, and a network environment can be automatically built for the site; allowing multiple data acquisition devices to access and not be expanded; automatically storing data locally, and transmitting the data to a cloud end in real time under the condition of a conditional condition; the intelligent safety helmet integrates edge end intelligence and cloud end intelligence, has an alarm prediction function, and analyzes whether a person takes a safety helmet, wears a tool, smokes and falls; the intelligent construction site operation state visual monitoring system has an intelligent construction site operation state visual monitoring function and a construction report generating function.

Automatically building a network environment for the site; allowing multiple data acquisition devices to access and not be expanded; automatically storing data locally, and transmitting the data to a cloud end in real time under the condition of a conditional condition; the intelligent alarm device integrates edge-end intelligence and cloud-end intelligence, has an alarm prediction function, has an edge computing capability, and is used for analyzing whether personnel carry safety helmets, wear tools, smoke and fall by combining with a Faster R-CNN deep learning algorithm; the intelligent construction site operation state visual monitoring system has an intelligent construction site operation state visual monitoring function and a construction report generating function.

The stable WIFI network is provided for equipment such as field video monitoring, large-scale machines and tools, mobile terminals and the like in a simple and convenient network construction mode, and powerful guarantee is provided for the integrity of full-digital handover of construction facilities.

Allowing multiple data acquisition devices to access and without limitation. The field data sensing function is like a double-double 'tentacle', and important data such as current, voltage, wire feeding speed, temperature between welding layers, ambient temperature and humidity and the like generated by the full-automatic welding machine can be acquired. It supports access to Tcp/IP, modbus, etc. protocols for data transmission.

And automatically storing the data locally, and transmitting the data to the cloud in real time under the condition of a conditional condition. The video data are stored and transmitted uniformly by deploying a plurality of sets of video monitoring equipment on the site, the overall condition of the site is monitored in real time, the quality behaviors of the key procedures can be traced and inquired, and the quality control level is improved. The method supports edge end data analysis, performs intelligent analysis on the collected video pictures of the construction unit, and can give an alarm in real time to remind a safety officer of the unit when conditions such as illegal behaviors and safety risks are found.

The intelligent cloud terminal has the functions of intelligent edge terminal and intelligent cloud terminal, and has an alarm prediction function. The edge calculation and audible and visual alarm functions are that a thinking 'brain' and an expressive 'mouth' are added to a construction site. Intelligent chips such as a GPU (graphics processing unit), an NPU (non-point processing unit) and the like are deployed in the all-in-one machine, the collected full-automatic welding working condition data, video monitoring data, engineering entity quality data and the like are subjected to edge end intelligent analysis, when quality and safety risks occur, strong acousto-optic signal alarm is carried out through supporting facilities of the all-in-one machine, management staff is informed, quality improvement is promoted on a construction site, and powerful support is provided for realizing intrinsic safety.

The intelligent construction site operation state visual monitoring function and the construction report generating function are also very practical. By analyzing the connection state of all the collected intelligent construction site equipment such as a welding machine, a camera and the like and comparing the connection state with data such as PCM (pulse code modulation) and construction daily reports, reports such as unit operation condition reports, video monitoring operation condition reports, operation condition collection equipment operation condition reports, monitoring integrity reports in a shed and unit operation condition daily reports can be generated, and daily management and control are more refined.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In the description of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A worksite construction monitoring system, characterized by: comprising

2. The worksite construction monitoring system of claim 1, wherein: the microwave imaging module adopts millimeter wave radar imaging technology and comprises a transmitting and receiving radio frequency component for transmitting and receiving frequency modulation continuous waves, and the frequency of a transmitting signal is linearly increased along with the time variation.

3. The worksite construction monitoring system of claim 1, wherein: the sensor module is also used for providing the data of the on-site smoke concentration for the deep learning module and judging whether the condition of on-site smoke extraction of personnel exists in the video or the image by combining the video and the image.

4. The worksite construction monitoring system of claim 1, wherein: faster R-CNN is divided into 6 steps:

(1) Training the RPN on the trained model;

(3) Collecting proposals by utilizing the RPN trained in the step 1;

(4) Training Fast RCNN for the first time;

(5) The second training RPN is similar to the first training;

(6) Collecting proposals by using the RPN trained in the step 4 again;

(7) The second training of Fast RCNN is similar to the first training.

5. The worksite construction monitoring system of claim 4, wherein: the deep learning module is also used for fusing RPN into FasterR-CNN, and comprises the following steps: (1) Pre-training RPN on ImageNet and finishing on PASCAL VOC dataset; (2) Training a Fast R-CNN alone using region probes generated by the trained PRNs, this also being pre-trained on ImageNet; (3) Initializing RPN with CNN model part, then carrying out finishing on the rest layer in the RPN, wherein Fast R-CNN and the feature extractor of the RPN are shared; (4) A fixed feature extractor for performing finishing on the Fast R-CNN residual layer; after multiple iterations, fast R-CNN and RPN are organically fused together to form a unified network.

6. The worksite construction monitoring system of claim 5, wherein: the RPN receives the whole picture by adopting a CNN model, extracts a feature map, adopts a N-by-N sliding window on the feature map, sets k priori frames with different sizes or proportions at the position of each sliding window, represents the position of each sliding window to predict k candidate areas, maps a low-dimensional feature on each sliding window, and then sends the feature into two connecting layers, namely a classification layer and a regression layer, wherein the classification layer outputs 2k and is used for representing the probability value of each candidate area containing an object or background; the regression layer outputs 4k coordinate values for representing the positions of each candidate region relative to each prior frame; the classification layer and regression layer of the location of each sliding window are shared.

7. The worksite construction monitoring system of claim 6, wherein: the deep learning module is further configured to match the prior frame with the group-trunk through the Fast R-CNN, set the non-maximum suppression and IoU threshold to 0.7, screen out the number of candidate regions meeting the requirement, and then select top-N region pro-pos for training the Fast R-CNN according to the descending order of confidence.

8. The worksite construction monitoring system of claim 7, wherein: the matching principle is as follows: (1) IoU highest a priori box to a certain group-trunk; (2) A priori box with a IoU value greater than 0.7 for a group-trunk; a group-trunk can be matched when one prior frame is met, the prior frame after matching is a positive sample belonging to an object, and the group-trunk is taken as a regression target; a priori boxes with IoU values below 0.3 for any of the group-trunk are called negative samples; setting NMS: the non-maximum suppression, ioU threshold is set to 0.7, the number of candidate regions meeting the requirements is screened out, then top-N region probes are selected for training Fast R-CNN in descending order of confidence.

9. The worksite construction monitoring system of claim 7, wherein:

training RPN:

in the above formula, i represents anchors index, p _i A representation is given by positive softmax probability of the general formula,represents the corresponding GT prediction probability, t represents predict bounding box, t ^* Representing the GT box corresponding to the corresponding positive anchor.

10. The worksite construction monitoring system of claim 7, wherein:

loss is divided into 2 parts:

soomth L1 loss calculated by regloss, rpn _loss_bbox layer is used for bounding box regression network training;

the calculation formula of the soomth L1 loss is as follows: