CN114612823A

CN114612823A - Personnel behavior monitoring method for laboratory safety management

Info

Publication number: CN114612823A
Application number: CN202210216865.6A
Authority: CN
Inventors: 张勇; 张宇晴; 池海楠; 蔺暄淇
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2022-03-06
Filing date: 2022-03-06
Publication date: 2022-06-10

Abstract

The invention provides a personnel behavior monitoring method facing laboratory safety management. The method is based on the laboratory monitoring video, and simultaneously calculates the face recognition result and the human body position information in the video frame; transmitting the face recognition result to the human body based on the geometric relationship between the face recognition result rectangular frame and the human body position information rectangular frame; recording the center position of the foot of the human body in each frame of the video, and extracting the motion track of the human body in the video; marking a plurality of key points in a laboratory plan, searching positions corresponding to the key points of the plan in a video sequence, marking the positions with the same marks, establishing a two-dimensional mapping relation between a monitoring picture and the laboratory plan to map the motion track of a human body in the video to the laboratory plan, fusing the track under the condition that a plurality of monitoring cameras exist in a laboratory, and analyzing the action route of the human body based on the track in the plan. The invention can detect the information of personnel entering the laboratory and record the activity condition of the personnel in the laboratory.

Description

Personnel behavior monitoring method for laboratory safety management

Technical Field

The invention mainly relates to the fields of deep learning, computer vision and camera calibration, in particular to a personnel behavior monitoring method for laboratory safety management.

Background

The safety management of the laboratory is the basic guarantee for the normal operation of the experiment. Therefore, the method assists the safety management of the laboratory through artificial intelligence technologies such as a sensor and a camera, and is one of effective means for guaranteeing the safety of the laboratory.

At present, various mature deep learning models exist, and people appearing in a video scene can be tracked based on video monitoring to obtain a trajectory diagram of the people in the video; and the face recognition technology has also been widely used in many access control systems. However, due to the uncertainty of daily wearing and decoration of people, the current human body identification and tracking method cannot extract effective characteristics enough for judging the identity of people, namely the id information of the human body cannot be obtained; the extraction of the human body track is mostly the extraction of the track of a human in a video picture, but the track in the video picture cannot accurately express the action route of the human due to information loss when the video converts three-dimensional world coordinates into two-dimensional pixel coordinates and the barrel distortion phenomenon of a wide-angle camera; and if the area of the laboratory is too large or too many articles are placed, the monitoring dead angle is easily caused by single-camera monitoring, and the accurate action route of the person can be obtained only by the combined shooting of multiple cameras.

Human face is an effective method for determining the body id, but face recognition is disabled when the person faces away from the camera or lowers his head. However, when a person disappears from the field of view and then reappears, it is difficult to determine whether the person who has reentered the field of view of the camera and the person who has appeared in the field of view of the camera before belong to the same target. The human face recognition is combined with the human body tracking and re-recognition to obtain the tracking result and id of the human body in each frame, and the main difficulty lies in that: how to transmit the identified accurate face id to the human body to obtain the human body id; when the orientation of the face does not meet the basic requirements of recognition, human body tracking and re-recognition are utilized, and human body id information is continuously kept in the video frames; when the weight recognition of the person fails, namely the person moves out of the visual field range and reappears, the person is mistaken as a different person by the algorithm, and the condition of the weight recognition failure is corrected by using the face recognition result.

The human body track is an effective mode for judging the activity state of a person in the camera view field, and compared with the method for acquiring the track of the person in a laboratory plan view based on a human body tracking result, the method can reflect the action route of the person in the camera view field more accurately.

In a laboratory scene, based on a monitoring video, a reasonable face recognition and human body tracking combined method and a camera calibration scheme are designed, the position and id information of a person are obtained, the person entering a laboratory is accurately recognized, an accurate action route of the person entering the laboratory is obtained, and the method is an effective mode for judging whether the person enters a dangerous area or not and judging whether the operation flow of the person entering the laboratory is correct or not. The method has positive significance for reducing the potential safety hazard of the laboratory and ensuring the life safety of the experimenters.

Disclosure of Invention

In order to solve the problem that the prior art cannot accurately identify the id information of a person entering a laboratory in real time and further cannot acquire the action route of the laboratory person in the laboratory, the invention provides a personnel behavior monitoring method facing the safety management of the laboratory, wherein an HFRN (Human and Face Recognition Network) which can be used for Face Recognition and Human body tracking is built, the real-time position information and the id information of the laboratory person are acquired in real time, and the track of the laboratory person in a laboratory plan is extracted, so that the aim of monitoring the action route of the laboratory person in real time is fulfilled. The method comprises the following specific steps: 1) Based on a laboratory monitoring video and a face image of an admitted person in a laboratory, a human body tracking result and a face recognition result in a video frame are obtained, and a face id is transmitted to a human body.

Specifically, the invention provides an HFRN (Human and Face Recognition Network) for Human Face detection and Human id Recognition. The network takes an original video frame image as input, a backbone network is constructed based on a cavity convolution and a feature pyramid to extract multi-scale features of the video frame, two detection branches are constructed and respectively used for detecting the position and the number of a human body and the position and the id of the human face, and then the human face id is transmitted to the human body based on the spatial distribution relation among detection results.

For detecting and tracking branches of the human body, firstly, establishing a spatiotemporal relation between a current frame and all previous frames based on Kalman filtering for tracking and re-identifying the human body, wherein the Kalman filtering is used for the motion trend of the human body in all previous frames. And then, carrying out cascade matching on the detection result of the current frame and the human body within the reasonable motion trend range, wherein if the detection result is matched with the human body, the target appears in the previous frame, and if the detection result is not matched with the human body, the target is a new target. In the stage, the people are numbered according to the sequence of entering the video visual field, and when the people reappear after disappearing from the visual field, the original numbers can still be reserved for the human body.

For the branches for detecting the face position and id, the network adopts a triplet loss function in a training stage to ensure that the coding difference value of two head portraits from the same person is small, the coding difference value of two head portraits from different persons is large, and additional supervision information is added, so that the branches can not only output a rectangular frame capable of expressing the face position information, but also extract 5 key points for each face, respectively correspond to the left eye center, the right eye center, the nose center, the left end of a mouth corner and the right end of the mouth corner of a person, and have more robust detection precision for dense or fuzzy faces. After outputting the detection result of the face of the current frame, the network maps the face image of the laboratory admittance personnel and the face image detected in the video frame to the Euclidean space, compares the spatial distance between the face image of the laboratory admittance personnel and the face image in the video frame, is used for obtaining the similarity between the face in the video frame and the face in the laboratory admittance list, and transmits the id information of the personnel in the list to the face similar to the face in the video frame.

Further, the specific steps of transmitting the id of the face to the human body are as follows: using (x) as a rectangular frame of the face detection result of each frame_1i,y_1i) And (x)_2i,y_2i) Indicating that i belongs to n, n is the number of the faces detected by the current frame, and i is the current frameThe ith human face, (x)_1i,y_1i) As the coordinates of the upper left corner of the face box, (x)_2i,y_2i) The coordinates of the lower right corner of the face frame. Using the human body tracking result of each frame as (X)_1j,Y_1j) And (X)_2j,Y_2j) Indicating that j belongs to m, m is the number of detected human bodies of the current frame, j is the jth human body of the current frame, (X)_1j,Y_1j) Is the coordinate of the upper left corner of the human body frame, (X)_2j,Y_2j) The coordinates of the lower right corner of the human body frame. When (x)_1i,y_1i)、(x_2i,y_2i)、(X_1j,Y_1j)、(X_2j,Y_2j) And when the coordinates meet the position relation shown in the formula 1, transmitting the id information of the face i to the human body j.

The positional relationship satisfying the id passing condition implied in equation 1 is shown in fig. 3.

2) And intercepting and storing the face image which enters the laboratory and is not in the laboratory access list.

Specifically, when the similarity between the face in the video and the face of the laboratory admittance is lower than a certain threshold, there are two reasons to explain this phenomenon. The first reason is that the extracted human face features are limited and insufficient to identify the id information of the human face because the human face is too small or the human face is a side face with a large inclination angle; and the second reason is that people who are not in the laboratory admittance list enter the laboratory.

In order to ensure that all the intercepted face pictures with low similarity are non-admissible people, the invention sets two limits to screen the faces with the similarity lower than the threshold value. The setting method of the threshold value is as follows: after extracting features of human faces in a video and human face images of laboratory admission personnel through a backbone network, converting each image into a matrix with the dimension of 10 multiplied by 3, wherein the matrix represents the features of the human faces, and calculating the similarity between the two human faces according to the cosine similarity, wherein the similarity threshold is set to be 90 percent, namely whether the cosine similarity of the feature matrices of the two human faces is less than 90 percent is compared.

Limiting one, not intercepting a face image with the resolution less than 30 x 30; and secondly, not intercepting the non-frontal face image. The invention judges whether the face is a positive face or not based on five key points of the left eye center, the right eye center, the nose center, the left end of the mouth corner and the right end of the mouth corner of each face. Firstly, sequentially connecting the left eye center, the right end of a mouth corner and the left end of the mouth corner of a person to form a quadrilateral area; then locking the aspect ratio of the region, and reducing the area to 1/4 to obtain a sub-region (shown in FIG. 2) which is located at the same center position as the original region; and finally, judging whether the coordinate corresponding to the center of the nose is positioned in the sub-area, if so, judging that the face is a front face, and if not, judging that the face is a side face or a head-down state. The method comprises the steps of expanding the face with the resolution ratio of more than 30 x 30 and being a front face by 1.5 times, cutting an image of the area and storing the image into a warning folder, and naming the image by using a current timestamp.

3) And a plurality of key points are defined in the laboratory plane map, the same positions are correspondingly marked in the camera view, and a two-dimensional mapping relation between the video scene and the laboratory plane map is established according to the coordinates of the points to obtain a transformation matrix.

Specifically, for the transformation relationship between the laboratory plan and the coordinates of the laboratory video scene, as shown in equation 2,

where a, b, c, d, c, f, g, h are parameters of the transformation matrix, X ', Y' are sets of fixed coordinate points in the laboratory plan, X ', Y' are corresponding laboratory video scene coordinates, W is the weight of the transformation matrix, and W can be converted into formula 3 according to the matrix multiplication rule,

W＝gx′+hy′+1, (3)

based on

equations

2 and 3, X ', Y' are expressed in the form of a matrix as equation 4,

and (3) calibrating more than 4 key points in the laboratory plan, searching corresponding coordinate points in a laboratory video scene, substituting the coordinate points into a formula 4 to obtain a formula 5, and calculating transformation matrix parameters based on a least square method. When the number of key points in the laboratory plan is denoted as p, there are:

according to the transformation matrix parameters, a unique coordinate point corresponding to any point in the ground in the visual field of the laboratory video can be acquired in the laboratory plan.

4) For a single camera in a laboratory, the center coordinates of the feet of the human body are obtained based on the human body tracking result, the coordinates are mapped into a laboratory plane graph according to a mapping function, and the track of personnel entering the laboratory in the plane graph is obtained.

Specifically, the two-dimensional mapping relationship is established between the laboratory plan and the ground in video monitoring, and the corresponding position of the human body can be obtained in the laboratory plan only by finding the coordinates of the center of the human body foot on the ground from the monitoring video. The corresponding coordinates of the upper left corner and the lower right corner of the human body tracking frame are respectively (x)_1i,y_1i) And (x)_2i,y_2i) For the case that the whole human body enters the monitoring visual field, the invention selects (1/2 x)_1i+1/2×x_2i,y_2i) As the center coordinates of the human foot; for the condition that the foot area of the human body does not enter the monitoring video, the position of the center of the foot of the human body is predicted according to the position and the width of the human body tracking frame. By analyzing 1632 monitoring video images, the aspect ratio of the tracking frame is 2.63-2.95 when the human body tracking frame accurately covers the whole area of the human body. And the width w of the tracking frame is accurate even if the leg of the person does not enter the monitoring field of view, in this case, the present invention predicts the center coordinates of the foot based on the average value of the aspect ratio of the tracking frame and expresses the coordinates as (1/2 × x)_1i+1/2×x_2i,y_1i+2.79×(x_2i-x_1i))。

In a single camera, determining coordinate points of a person in each frame according to the two-dimensional mapping relation between the foot center coordinate of each human body in each frame and a laboratory plan, and connecting the coordinate points of the same person according to the time sequence of video frames to form a track in the plan.

5) Based on a plurality of cameras that same laboratory different positions department settled, fuse a plurality of trails that many cameras gathered, form the orbit of more accurate laboratory personnel in the plan.

Specifically, because the shooting of a single camera leaves a visual dead angle in a laboratory, and the more marginal position of the single wide-angle camera has the more serious barrel distortion, the accuracy of camera calibration is reduced. Therefore, when a plurality of monitoring cameras exist in a plurality of positions in a laboratory, the invention designs a track fusion scheme based on the plurality of cameras.

Further, assuming that r monitoring cameras exist in the same laboratory, for the same human body at the same time, r internal coordinates of the camera can be detected, and r predicted position points of the human body can be obtained in a laboratory plan. Theoretically, the r points should be completely overlapped in the plan view, but due to camera distortion, calibration error, tracking error and other problems, the r points may be hashed in a local sub-area of the plan view. According to the method, firstly, the local reachable density among the sample points is judged through a local abnormal factor detection algorithm, the local reachable density is the reciprocal of the average value of reachable distances among the r predicted position points, and the higher the density of the sample points is, the lower the probability that the sample points are abnormal factors is. Through multiple experiments, the local reachable density threshold value is set to be 30, if the local reachable density from a certain sample point to other points is less than 30, the point is an outlier, and otherwise, the point is a non-outlier. Removing outliers in the r points according to the method; then, according to the remaining r' coordinate points, the coordinates of the human body in the plan view at the current time are calculated

The calculation formula is as followsAnd formula 6.

The coordinate points of the same person obtained based on the formula 6 are sorted according to the time sequence, and a more robust track in the laboratory plan compared with a single camera can be obtained.

6) The path of the person's action within the plan is analyzed based on the trajectory of the laboratory person within the plan and the location of the various areas and items in the laboratory.

Specifically, each functional area and fixed facility in the plan view are divided into an area adjacent to the functional area and the fixed facility, and the area is used as a basis for judging whether laboratory personnel approach the area. Based on the extracted track in the plan and the range of each sub-area, the time and the sequence of laboratory personnel passing through each area and facility can be judged, and then whether the phenomenon of irregular flow exists is judged. For example, in a biological laboratory, a sterile area and an experimental area are divided, and the sterile area and the experimental area can be defined in a plan view. People enter a laboratory to enter a sterile area first to replace sterile clothes and caps, then enter the laboratory area, and if the track of a certain human body extracted under the scene does not pass through the range contained in the sterile area, the phenomenon that the operation flow of the experimenter is irregular is judged.

Advantageous effects

The invention provides a human body detection and action route analysis method for laboratory safety management based on image data transmitted by a monitoring camera in a laboratory. Compared with the traditional method, the method can obtain the real-time tracking result with the human body id and the human body action route with more practical application value. The invention utilizes the mode of combining face recognition and human body tracking to obtain the accurate id of the human body, and when the human body is back to the camera or is shielded within a period of time, the id information of the human body can be continuously kept: the corresponding relation between a video scene and a laboratory plane graph is obtained by utilizing a camera calibration method, a multi-camera fusion strategy is designed, the track of a person in the laboratory plane graph is extracted in a robust and dead-corner-free mode, and the action route of the person in the laboratory is generated. The invention can analyze the action route of the experimenter for a long time only by carrying out one-time calibration according to the cameras and the plane diagrams in each laboratory, and is suitable for various laboratories such as biology, chemistry, information rooms and the like.

Drawings

FIG. 1 Overall flow diagram of the invention

Fig. 2 is an explanatory view for judging whether or not a human face is a front face,

FIG. 3 is an explanatory diagram of human body id transfer to human face

FIG. 4 is a schematic view of camera calibration

FIG. 5 is an effect diagram of multi-target tracking of human body and obtaining of human id

Figure 6 is a diagram of human body action route extraction effect in plan view

Detailed Description

The invention will be further described with reference to the accompanying drawings.

1. Model frame

The overall framework of the human body detection and action route analysis method for laboratory safety management is shown in figure 1. The bottom data required by the model are a front face image of an admissible person in the laboratory, one or more monitoring cameras and a laboratory plan. Firstly, acquiring face recognition and human body tracking results based on a monitoring video; then, transmitting the id of the face to the human body, and intercepting the face image in the non-access list; then, according to a laboratory plan and a video picture, respectively calibrating a plurality of key points in the plan and the video, and establishing a mapping relation between the plan and the video picture; and finally, acquiring the track of the personnel entering the laboratory in the plane graph and analyzing the action route of the personnel based on the tracking result and the mapping relation.

The following detailed description is made:

(1) the present invention converts all frames in a video sequence into RGB images as input to the proposed HFRN Network (Human and Face Recognition Network). The backbone network of the network integrates a cavity convolution and a space pyramid structure, the total number of 3 network layers in a characteristic pyramid is 32, 16 and 8 respectively. The feature map output by the network layer with the step length of 32 corresponds to the receptive field of 32 × 32 in the original image, and can be used for detecting human faces and human body regions with large areas. Similarly, the receptive fields corresponding to the step lengths of 16 and 8 have a strong detection effect on small targets far away from the camera. And combining the feature maps output by the three network layers to serve as the output features of the backbone network.

The invention takes the output of the backbone network as the input of the human body detection branch and the human face recognition branch at the same time. In the human body detection branch, a simple convolution layer with small calculation amount is adopted to extract the appearance characteristics of the human body, the characteristics are expressed by low-dimensional vectors, and after each frame is detected and tracked, the appearance characteristics are extracted and stored once. And performing similarity calculation between the human body appearance characteristics of the current frame and the appearance characteristics stored before once every time tracking of one frame is realized, wherein the similarity is used as a basis for judging the human body number and realizing human body weight identification. In this branch, the time to acquire the human tracking result is about 0.08 seconds/frame.

In the face recognition branch, firstly, inputting features output by a backbone network into a GoogleNet Style inclusion model, wherein the model can encode each face in a monitoring video and represent each face as a 128-dimensional feature vector; and then inputting the collected human face frontal image of the admissions personnel in the laboratory into the same model and coding the human face frontal image into a 128-dimensional vector in the same way. When the laboratory admittance personnel are not changed, only the collected front face image of the admittance personnel is coded once, and the coded information is stored into a specific file; thirdly, based on the L2 paradigm, the image codes of the human faces in the monitoring videos are compared with the front face image codes of the admittance persons, and the person with the highest similarity and higher than 90% is identified as the same person. If the similarity between the face and the front faces of all the admittance persons is lower than 90%, the face collected from the video frame is considered to belong to strangers or the collecting effect of the face is considered to be poor. The time for the branch to perform face recognition is about 0.06 seconds/frame.

Typically, the number of frames of video is about 30 frames/second. If each frame is subjected to face recognition and human body tracking, the face recognition and the human body tracking cannot be performed in real time. In a laboratory scene, the situation that the laboratory staff moves violently rarely occurs, so that the deformation quantity between frames is not obvious in adjacent time intervals. Therefore, the invention adopts a frame skipping mode to achieve the real-time effect, carries out human body tracking once every 3 frames, and carries out face recognition once every 20 frames. The reason that the number of face recognition times is much lower than the number of human body tracking times is that only the id information of the face needs to be transmitted to the human body in the key video frame, because the human body id information acquired after transmission can be automatically transmitted to the next frame in human body tracking. And when a new person enters a laboratory, the face recognition is carried out once at intervals, so that the id of the new person can be judged in time and transmitted to the human body, and the condition of human body tracking failure is effectively corrected.

(2) When the detected face is a side face, a large error may occur in the extraction of the facial features of the detected face, and the accuracy of acquiring the human body id and the behavior track is affected. Therefore, the invention designs a constraint condition for judging whether the detected face is a positive face or not based on 5 key points of the face. As shown in fig. 2, the area indicated by the white dotted line frame is a quadrilateral closed area formed by sequentially connecting the left eye center, the right end of the mouth corner and the left end of the mouth corner of a person; the area indicated by the black dashed line frame is a sub-area having an area of 1/4 and located at the same center as the white dashed line frame, and when the coordinate point representing the center of the nose is located in the area included in the black dashed line frame, the face is determined to be a front face, otherwise, the face is determined to be a side face. The human body face-righting judgment can reduce the error transmission phenomenon when the human face id is transmitted to the human body and improve the accuracy rate of judging whether non-laboratory admittance personnel enter a laboratory. And (3) when the face id is transmitted to the human body, if the face id is incorrectly identified, the human body id is wrong and wrong accumulation is caused. The invention sets four limits, and transmits the face id to the human body when the face position coordinate accords with the four limit conditions. In fig. 3, the larger black solid line frame is a human body tracking result, the dashed line frame is a range where a human face obtained based on the human body tracking result and four limiting conditions may appear, the smaller black solid line frame is a human face recognition result, and when all regions of the human face recognition result are located in the dashed line frame, the id of the human face recognition is transmitted to the human body. If the face recognition result of the current frame is Unknown, namely the id information of the current frame is displayed as 'Unknown', the predicted id of the human body in the previous frame is reserved firstly. If the human body just enters the visual field of the camera and is a back shadow, the human body temporarily has no face recognition result, and the human body id at the moment is the serial number of the sequence of entering the human body into the laboratory.

(4) The camera calibration method is shown in the attached figure 4, the left figure in the attached figure 4 is a calibration result in a laboratory plan, in order to achieve a better calibration result, key points are calibrated at positions which are obvious and fixed in a laboratory as much as possible, and two-dimensional coordinates of the key points in the plan are recorded; the right diagram in fig. 4 is the calibration result in the video scene, and the coordinate points of the same symbol in the two diagrams are the corresponding points in the plan view and the video scene. In order to solve 8 parameters in the mapping matrix, at least 4 groups of key points are calibrated, and in order to reduce calibration errors, more than 4 groups of key points are taken, and matrix parameters are obtained by a least square method. The camera calibration results at 5 key points are shown in fig. 4.

(5) The human body tracking result of the invention is shown in figure 5, two human bodies are detected in the figure, the upper letter of each tracking frame is the name of the tracked person, and the name is used as the id information of the tracked person. In the figure, all areas of a human body are positioned in the visual field, and the center position of the foot of the human body is shown as the position of a gray five-pointed star; another human body has only a partial region in the visual field, and the center of the foot is located according to the coordinate relationship between rectangular frames "(1/2 × x_1i+1/2×x_2i,y_1i+2.79×(x_2i- x_1i) ) "is obtained.

(6) And acquiring the motion trail of the human in a laboratory plan according to the foot coordinates of the human body and the mapping matrix, as shown in a grey dispersion dot diagram in figure 6. The plan view totally comprises 8 experiment tables with the numbers of 1-8, and an oval area circumscribed with the experiment tables represents the adjacent area of the experiment tables; the arrow beside the trajectory diagram indicates the direction of movement of the person. The action route of laboratory personnel with id of "example" can be analyzed through the track diagram, and the laboratory personnel pass through the passageway between the two rows of the experiment tables, turn into the gap between the experiment table 4 and the experiment table 6 and finally stay at the position close to the experiment table 6.

Claims

1. A personnel behavior monitoring method facing laboratory safety management is characterized by comprising the following steps:

(1) acquiring a human body tracking result and a human face recognition result in a video frame based on a laboratory monitoring video and a human face image of a laboratory admittance person, and transmitting a human face id to a human body;

(2) intercepting and storing a face image which enters a laboratory and is not in a laboratory access list;

(3) a plurality of key points are defined in the laboratory plane map, the same positions are correspondingly marked in the camera view, and a two-dimensional mapping relation between a video scene and the laboratory plane map is established according to the coordinates of the points to obtain a transformation matrix;

(4) for a single camera in a laboratory, acquiring the central coordinate of the foot of the human body based on the human body tracking result, mapping the coordinate into a laboratory plane map according to a mapping function, and acquiring the track of personnel entering the laboratory in the plane map;

(5) based on a plurality of cameras arranged at different positions of the same laboratory, a plurality of tracks collected by the plurality of cameras are fused to form more accurate tracks of laboratory personnel in a plan;

(6) the path of the person's action within the plan is analyzed based on the trajectory of the laboratory person within the plan and the location of the various areas and items in the laboratory.

2. The personnel behavior monitoring method facing laboratory safety management according to claim 1, characterized in that:

an HFRN network used for human face detection and human id identification is provided; the network takes an original video frame image as input, constructs a backbone network based on a hole convolution and a feature pyramid to extract multi-scale features of the video frame, constructs two detection branches respectively used for detecting the position and the number of a human body and the position and the id of a human face, and then transmits the id of the human face to the human body based on a spatial distribution relation among detection results;

for detecting and tracking branches of a human body, firstly establishing a spatiotemporal relationship between a current frame and all previous frames based on Kalman filtering for human body tracking and re-identification, wherein the Kalman filtering is used for the motion trend of the human body in all previous frames; then, cascade matching is carried out on the detection result of the current frame and the human body within the reasonable motion trend range, if the detection result is matched with the human body, the target appears in the previous frame, and if the detection result is not matched with the human body, the target is a new target; numbering the personnel according to the sequence of the personnel entering the video field of vision, and when the personnel reappear after disappearing from the field of vision, the original number can still be reserved for the human body;

for the branches for detecting the face positions and the id, the network adopts a tripletloss function in a training stage, and is used for ensuring that the coding difference value of two head portraits from the same person is small, the coding difference value of the two head portraits from different persons is large, and additional monitoring information is added, so that the branches can not only output a rectangular frame capable of expressing face position information, but also extract 5 key points for each face image, respectively correspond to the left eye center, the right eye center, the nose center, the left end of a mouth corner and the right end of the mouth corner of a person, and have more robust detection accuracy on dense or fuzzy faces; after outputting the detection result of the face of the current frame, the network maps the face image of the laboratory admittance personnel and the face image detected in the video frame to an Euclidean space, compares the spatial distance between the face image of the laboratory admittance personnel and the face image in the video frame, is used for obtaining the similarity between the face in the video frame and the face in the laboratory admittance list, and transmits the id information of the personnel in the list to the face similar to the face in the video frame;

the specific steps of transmitting the id of the face to the human body are as follows: using (x) as a rectangular frame of the face detection result of each frame_1i,y_1i) And (x)_2i,y_2i) Indicating that i belongs to n, n is the number of the faces detected by the current frame, i is the ith face of the current frame, and (x)_1i,y_1i) As the coordinates of the upper left corner of the face box, (x)_2i,y_2i) Coordinates of the lower right corner of the face frame; using the human body tracking result of each frame as (X)_1j,Y_1j) And (X)_2j,Y_2j) Indicating that j belongs to m, m is the number of detected human bodies of the current frame, j is the jth human body of the current frame, (X)_1j,Y_1j) Is the coordinate of the upper left corner of the human body frame, (X)_2j,Y_2j) Coordinates of the lower right corner of the human body frame; when (x)_1i,y_1i)、(x_2i,y_2i)、(X_1j,Y_1j)、(X_2j,Y_2j) When the coordinates meet the position relation shown in formula 1, transmitting the id information of the face i to a human body j;

3. the personnel behavior monitoring method facing laboratory safety management according to claim 1, characterized in that:

when the similarity between the face in the video and the face of the laboratory admittance person is lower than a certain threshold, two reasons can explain the phenomenon; the first reason is that the extracted human face features are limited and insufficient to identify the id information of the human face because the human face is too small or the human face is a side face with a large inclination angle; the second reason is that people who are not in the laboratory access list enter the laboratory; the setting method of the threshold value is as follows: extracting features of human faces in a video and human face images of laboratory admission personnel through a backbone network, converting each image into a matrix with the dimension of 10 multiplied by 3, wherein the matrix represents the features of the human faces, calculating the similarity between the two human faces according to cosine similarity, and setting a similarity threshold value to be 90 percent, namely comparing whether the cosine similarity of the feature matrices of the two human faces is less than 90 percent;

in order to ensure that all the intercepted face pictures with low similarity are non-admissible people, two limits are set to screen the faces with the similarity lower than a threshold value; limiting one, not intercepting a face image with the resolution less than 30 x 30; secondly, not intercepting the non-frontal face image; judging whether the face is a positive face or not based on five key points of the left eye center, the right eye center, the nose center, the left end of the mouth corner and the right end of the mouth corner of each face; firstly, sequentially connecting the left eye center, the right end of a mouth corner and the left end of the mouth corner of a person to form a quadrilateral area; then locking the aspect ratio of the region, and reducing the area to 1/4 to obtain a sub-region which is positioned at the same central position as the original region; finally, whether the coordinate corresponding to the center of the nose is located in the sub-area is judged, if yes, the face is the front face, and if not, the face is the side face or the head is in a low head state; and (3) expanding the face with the resolution of more than 30 x 30 and being a front face by 1.5 times, cutting the image of the area and storing the image into a warning folder, and naming the image by using the current timestamp.

4. The personnel behavior monitoring method facing laboratory safety management according to claim 1, characterized in that: for the transformation between the laboratory plan and the coordinates of the laboratory video scene as shown in equation 2,

W＝gx′+hy′+1, (3)

based on equation 2 and equation 3, X ', Y' are expressed in the form of a matrix as equation 4,

setting more than 4 key points in a laboratory plan, searching corresponding coordinate points in a laboratory video scene, substituting the coordinate points into a formula 4 to obtain a formula 5, and calculating transformation matrix parameters based on a least square method; the number of key points in the laboratory plan is represented as p, and then:

5. The personnel behavior monitoring method facing laboratory safety management according to claim 1, characterized in that:

establishing a two-dimensional mapping relation between the laboratory plan and the ground in video monitoring, and acquiring the corresponding position of the human body in the laboratory plan only by finding out the coordinates of the center of the human body foot on the ground from the monitoring video; the corresponding coordinates of the upper left corner and the lower right corner of the human body tracking frame are respectively (x)_1i,y_1i) And (x)_2i,y_2i) For the case where the whole human body enters the monitoring visual field, the method is selected (1/2 × x_1i+1/2×x_2i,y_2i) As the center coordinates of the human foot; for the condition that the foot area of the human body does not enter the monitoring video, predicting the position of the center of the foot according to the position and the width of the human body tracking frame; by analyzing 1632 monitoring video images, the aspect ratio of the tracking frame is 2.63-2.95 when the human body tracking frame accurately covers the whole area of the human body; and the width w of the tracking frame is accurate even if the leg of the person does not enter the monitoring field of view, so in this case, the center coordinate of the foot is predicted based on the average value of the aspect ratios of the tracking frames, and this coordinate is expressed as (1/2 × x)_1i+1/2×x_2i,y_1i+2.79×(x_2i-x_1i))；

6. The personnel behavior monitoring method facing laboratory safety management according to claim 1, characterized in that:

because the shooting of a single camera leaves a visual dead angle in a laboratory, and the more serious barrel distortion exists at the edge position of the single wide-angle camera, the calibration accuracy of the camera is reduced; therefore, aiming at the condition that a plurality of monitoring cameras exist in a plurality of directions in a laboratory, a track fusion scheme based on the plurality of cameras is designed;

assuming that r monitoring cameras exist in the same laboratory, r internal coordinates of the camera can be detected for the same human body at the same time, and r predicted position points of the human body are obtained in a laboratory plan; theoretically, the r points should be completely overlapped in the plan, but the r points can be hashed in a local sub-area of the plan due to problems of camera distortion, calibration error, tracking error and the like; firstly, judging local reachable density among sample points through a local abnormal factor detection algorithm, wherein the local reachable density is the reciprocal of the average value of reachable distances among r predicted position points, and the higher the density of the sample points is, the lower the probability that the sample points are abnormal factors is; setting a local reachable density threshold value to be 30, if the local reachable density from a certain sample point to other points is less than 30, the point is an outlier, otherwise, the point is a non-outlier; removing outliers in the r points according to the method, and then calculating the coordinates of the human body in the plan view at the current time according to the remaining r' coordinate points

The calculation formula is shown in formula 6;

7. The personnel behavior monitoring method facing laboratory safety management according to claim 1, characterized in that:

dividing each functional area and fixed facility in the plan into an area adjacent to the functional area and the fixed facility, and using the area as a basis for judging whether laboratory personnel approach the area; and judging the time and the sequence of the laboratory personnel passing through each region and facility based on the extracted track in the plan and the range of each subregion, and further judging whether the phenomenon of irregular flow exists.