CN114882596B

CN114882596B - Behavior early warning method and device, electronic equipment and storage medium

Info

Publication number: CN114882596B
Application number: CN202210801570.5A
Authority: CN
Inventors: 陈彪; 熊海飞
Original assignee: Shenzhen Xinrun Fulian Digital Technology Co Ltd
Current assignee: Shenzhen Xinrun Fulian Digital Technology Co Ltd
Priority date: 2022-07-08
Filing date: 2022-07-08
Publication date: 2022-11-15
Anticipated expiration: 2042-07-08
Also published as: CN114882596A

Abstract

The application relates to a behavior early warning method, a behavior early warning device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring an optical characteristic diagram and a thermal characteristic diagram of a target object in a target area; inputting the optical characteristic diagram and the thermal characteristic diagram into a pre-trained first network model to obtain the posture of the target object, and inputting the optical characteristic diagram into a pre-trained second network model to obtain the wearing condition of the wearing object of the target object; respectively judging whether the posture of the target object and the wearing condition of the wearing object of the target object are matched with a preset specification; and under the condition that the posture of the target object is not matched with the preset specification or the wearing condition of the wearing object of the target object is not matched with the preset specification, early warning is carried out on the behavior of the target object. The human body posture recognition system can effectively avoid the influence of factors such as light rays and shelters, improves the accuracy of human body posture recognition, and can also early warn the behavior of not wearing according to requirements, so that the safety of a target object is improved.

Description

Behavior early warning method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of deep learning technologies, and in particular, to a behavior early warning method and apparatus, an electronic device, and a storage medium.

Background

Laboratories are now ubiquitous in colleges and universities and in business units in certain areas, with their potential risks being witnessed. When the operation of an experimenter is not standardized, such as the liquid medicine is poured out and is not standardized, the solid medicine is taken and is not standardized, the distance between the head and the medicine is too short, and the like, or the experimenter wears the non-standardized medicine, such as goggles, masks, gloves and the like, the safety hazard is brought to the experimenter, and therefore the early warning of the behavior of the experimenter becomes an ignorable problem.

The existing behavior early warning mode of an experimenter generally identifies the posture of a human body through a human body image acquired by a camera, and then judges whether the behavior of the experimenter is standard or not through the posture of the human body. However, the human body posture recognized in this way is easily affected by light, a shelter, and the like, resulting in low accuracy of the human body posture. In addition, in the prior art, the human body posture is only identified simply, and the wearing of the experimenter is not judged, so that the existing behavior early warning mode of the experimenter has low accuracy and great potential safety hazard.

Disclosure of Invention

The application provides a behavior early warning method, a behavior early warning device, electronic equipment and a storage medium, and aims to solve the problems that the accuracy of a behavior early warning mode of an existing experimenter is low and great potential safety hazards exist.

In a first aspect, the present application provides a behavior early warning method, including:

acquiring an optical characteristic diagram and a thermal characteristic diagram of a target object in a target area;

inputting the optical characteristic diagram and the thermal characteristic diagram into a first network model trained in advance to obtain the posture of the target object, and inputting the optical characteristic diagram into a second network model trained in advance to obtain the wearing condition of the wearing object of the target object;

respectively judging whether the posture of the target object and the wearing condition of the wearing object of the target object are matched with a preset specification or not;

and under the condition that the posture of the target object is not matched with the preset specification or the wearing condition of the wearing object of the target object is not matched with the preset specification, early warning the behavior of the target object.

Optionally, the first network model includes a first network layer, a first convolution layer, a first pooling layer, an deconvolution layer, and a full connection layer;

inputting the optical characteristic diagram and the thermal characteristic diagram into a first network model trained in advance to obtain the posture of the target object, wherein the posture obtaining method comprises the following steps:

inputting the optical characteristic diagram into the first network layer for gesture recognition to obtain a confidence diagram of the gesture key points of the target object, wherein the optical characteristic diagram and the thermal characteristic diagram both carry time stamps;

inputting the fused feature map into the first convolution layer to perform convolution calculation to obtain a first intermediate feature map, wherein the fused feature map is obtained by fusing the thermal feature map and the confidence map corresponding to the optical feature map, which have the same time stamp, according to the time stamp;

inputting the first intermediate characteristic diagram into the first pooling layer for pooling to obtain a second intermediate characteristic diagram;

inputting the second intermediate feature map into the deconvolution layer for deconvolution calculation to obtain a third intermediate feature map;

and inputting the third intermediate characteristic diagram into the full-connection layer for data vectorization, and classifying according to the obtained vector to obtain the posture of the target object.

Optionally, the second network model comprises an encoding layer and a decoding layer;

the inputting the optical characteristic diagram into a pre-trained second network model to obtain the wearing condition of the wearing object of the target object includes:

inputting the optical characteristic diagram into the coding layer for coding to obtain a fourth intermediate characteristic diagram fused with a plurality of receptive fields;

and inputting the fourth intermediate feature map into the decoding layer for decoding to obtain the wearing condition of the wearing object of the target object.

Optionally, the encoding layer includes a first fusion layer and a plurality of second convolutional layers with different weight coefficients, different sizes of convolution kernels corresponding to the second convolutional layers are different, and sizes of convolution kernels corresponding to the second convolutional layers are inversely proportional to the weight coefficients corresponding to the second convolutional layers;

inputting the optical characteristic diagram into the coding layer for coding to obtain a fourth intermediate characteristic diagram fused with a plurality of receptive fields, wherein the fourth intermediate characteristic diagram comprises:

inputting the optical characteristic diagrams into a plurality of second convolution layers respectively to carry out convolution calculation to obtain characteristic diagrams corresponding to a plurality of receptive fields;

and inputting the feature maps corresponding to the multiple receptive fields into the first fusion layer for feature fusion to obtain the fourth intermediate feature map.

Optionally, the decoding layers comprise a second merging layer, a classifier layer, a plurality of serially connected first sub-decoding layers and a plurality of serially connected second sub-decoding layers, the plurality of second sub-decoding layers being serially connected after the plurality of first sub-decoding layers; wherein each of the first sub-decoding layers comprises a second pooling layer and a third convolutional layer, and each of the second sub-decoding layers comprises an upsampling layer and a fourth convolutional layer;

the inputting the fourth intermediate feature map into the decoding layer for decoding to obtain the wearing condition of the wearing object of the target object includes:

sequentially inputting the fourth intermediate characteristic diagram into a plurality of first sub-decoding layers and a plurality of second sub-decoding layers for decoding to obtain a fifth intermediate characteristic diagram corresponding to each first sub-decoding layer and each second sub-decoding layer;

inputting the fifth intermediate feature map into the second fusion layer for feature fusion to obtain a sixth intermediate feature map;

and inputting the sixth intermediate feature map into the classifier layer to obtain the wearing condition of the wearing object of the target object.

Optionally, before the inputting the optical characteristic map and the thermal characteristic map into a first network model trained in advance to obtain the posture of the target object, and inputting the optical characteristic map into a second network model trained in advance to obtain the dressing wearing condition of the target object, the method further includes:

acquiring a training sample diagram, wherein the training sample diagram comprises an optical characteristic sample diagram and a thermodynamic characteristic sample diagram, the optical characteristic sample diagram carries label information for representing different postures, and the thermodynamic characteristic sample diagram carries label information for representing the wearing conditions of different wearing objects;

inputting the optical characteristic sample diagram and the thermodynamic characteristic sample diagram into a first model to be trained for training, inputting the optical characteristic sample diagram into a second model to be trained for training, and training to obtain the first network model and the second network model when the sum of the loss values of the first model to be trained and the second model to be trained is smaller than a preset threshold value.

Optionally, a calculation formula of a sum of loss values of the first model to be trained and the second model to be trained is:

Loss =Loss _p +loss _t

Loss _p =f1+α*f2

therein, loss _p Representing Loss of human posture, loss _t A loss value representing the wearing condition of the wearing object, f1 representing a loss value of a bone key point, f2 representing a loss value of a bone joint, L _conf Represents a loss value, L, of a wearing object _loc Represents the loss value of the selection frame corresponding to the wearing object, N represents the total number of the selection frames corresponding to the wearing object, alpha and lambda are weight coefficients, x represents the value 1 if one prediction selection frame is paired with the real selection frame, otherwise the value 0,c represents the prediction value of the wearing object, c ₀ A real value representing a wearing object, l represents a center position and a length and a width of the prediction selection box, g represents a center position and a length and a width of the real selection box, and s ₀ Denotes the area size of the real selection box and s denotes the area size of the prediction selection box.

In a second aspect, the present application further provides a behavior early warning device, the device including:

the first acquisition module is used for acquiring an optical characteristic diagram and a thermal characteristic diagram of a target object in a target area;

the input module is used for inputting the optical characteristic diagram and the thermal characteristic diagram into a pre-trained first network model to obtain the posture of the target object, and inputting the optical characteristic diagram into a pre-trained second network model to obtain the wearing condition of the wearing object of the target object;

the judging module is used for respectively judging whether the posture of the target object and the wearing condition of the wearing object of the target object are matched with a preset specification;

and the early warning module is used for early warning the behavior of the target object under the condition that the posture of the target object is not matched with the preset specification or the wearing condition of the wearing object of the target object is not matched with the preset specification.

In a third aspect, the present application further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete mutual communication through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the steps of the behavior early warning method in any embodiment of the first aspect when the processor executes the program stored in the memory.

In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the behavior alert method as described in any one of the embodiments of the first aspect.

In the embodiment of the application, an optical characteristic diagram and a thermal characteristic diagram of a target object in a target area are obtained; inputting the optical characteristic diagram and the thermal characteristic diagram into a pre-trained first network model to obtain the posture of the target object, and inputting the optical characteristic diagram into a pre-trained second network model to obtain the wearing condition of the wearing object of the target object; respectively judging whether the posture of the target object and the wearing condition of the wearing object of the target object are matched with a preset specification; and under the condition that the posture of the target object is not matched with the preset specification or the wearing condition of the wearing object of the target object is not matched with the preset specification, early warning is carried out on the behavior of the target object. Through the method, the posture of the target object can be recognized based on the optical characteristic diagram and the thermal characteristic diagram, and compared with the traditional method of determining the posture of the target object by only depending on the optical characteristic diagram, the method can effectively avoid the influence of factors such as light, shelters and the like, thereby improving the accuracy of human body posture recognition. In addition, the scheme can also identify the wearing condition of the wearing object of the target object and give an early warning to the wearing behavior which is not worn according to the requirement, so that the safety of the target object is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic flow chart of a behavior early warning method according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of a first network model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a confidence map of pose key points of a target object according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of another first network model provided in an embodiment of the present application;

fig. 5 is a schematic structural diagram of a second network model according to an embodiment of the present application;

fig. 6 is a schematic diagram of data processing in a second network model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a behavior warning device according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a schematic flow chart of a behavior early warning method provided in an embodiment of the present application. The behavior early warning method comprises the following steps:

step 101, acquiring an optical characteristic diagram and a thermal characteristic diagram of a target object in a target area.

Specifically, the target region may include an experimental region and a region such as an entrance and an exit of the experimental region. The target object refers to a person appearing in a target area, and the target object may be one person or a plurality of persons, and the present application is not particularly limited. The optical characteristic diagram refers to an image acquired through an optical camera, and the thermal characteristic diagram refers to an image acquired through an infrared array sensor. It should be noted that, because the distance and the number of the optical camera and the infrared array sensor affect the image acquisition, when the optical camera and the infrared array sensor are arranged, the optical camera and the infrared array sensor should be arranged at multiple angles, and the optical camera and the infrared array sensor are arranged in pairs as far as possible, so that the distance between the optical camera and the infrared array sensor is as small as possible. For example, an optical camera and an infrared array sensor may be provided at the entrance of the experimental area to acquire images near the entrance, and an optical camera and an infrared array sensor may be provided at each of the test benches in the experimental area to acquire images around each of the benches. Therefore, the problems that the characteristics are lost, too small characteristics cannot be identified and the like due to unreasonable arrangement positions of the optical camera and the infrared array sensor can be solved.

And 102, inputting the optical characteristic diagram and the thermal characteristic diagram into a pre-trained first network model to obtain the posture of the target object, and inputting the optical characteristic diagram into a pre-trained second network model to obtain the wearing condition of the target object.

Specifically, the first network model and the second network model are both pre-trained deep learning models, the first network model is used for recognizing the posture of the target object, and the second network model is used for recognizing the wearing condition of the wearing object of the target object. The wearing objects include, but are not limited to, goggles, masks, gloves, protective clothing, etc.

And 103, respectively judging whether the posture of the target object and the wearing condition of the wearing object of the target object are matched with a preset specification.

Specifically, the preset specification refers to a preset specification related to the operation and wearing condition of the experimenter. For example, the preset specification can be used for specifying the posture of the experimenter for pouring liquid medicines, the posture of taking solid medicines, the distance between the head and the medicines, and whether goggles, masks, gloves and the like need to be worn. Therefore, the posture of the target object and the wearing condition of the wearing object of the target object can be matched with the preset specification, if the posture of the target object and the wearing condition of the wearing object of the target object are matched with the preset specification, the operation and wearing condition of the target object are in compliance, and early warning is not needed; if the posture of the target object or the wearing condition of the wearing object of the target object is not matched with the preset specification, the operation or the wearing condition of the target object is not in compliance, and early warning is needed.

And step 104, under the condition that the posture of the target object is not matched with the preset specification or the wearing condition of the wearing object of the target object is not matched with the preset specification, early warning is carried out on the behavior of the target object.

Specifically, the early warning modes include an audible and visual reminding mode, a voice reminding mode and the like, and the application is not limited specifically. When the posture of the target object is not matched with the preset specification or the wearing condition of the wearing object of the target object is not matched with the preset specification, the behavior of the target object can be early warned, so that the target object is reminded to correct the posture and the wearing condition of the target object in time.

In the embodiment, the posture of the target object can be recognized based on the optical characteristic diagram and the thermal characteristic diagram, and compared with the traditional mode of determining the posture of the target object by only depending on the optical characteristic diagram, the mode can effectively avoid the influence of factors such as light, a shelter and the like, so that the accuracy of human body posture recognition is improved. In addition, the scheme can also identify the wearing condition of the wearing object of the target object and give an early warning to the wearing behavior which is not worn according to the requirement, so that the safety of the target object is improved.

Further, the first network model comprises a first network layer, a first convolution layer, a first pooling layer, an deconvolution layer and a full-link layer;

the step 102 of inputting the optical characteristic diagram and the thermal characteristic diagram into a pre-trained first network model to obtain the posture of the target object includes:

inputting the optical characteristic diagram into a first network layer for gesture recognition to obtain a confidence diagram of gesture key points of the target object, wherein the optical characteristic diagram and the thermal characteristic diagram both carry time stamps;

inputting the fused feature map into a first convolution layer for convolution calculation to obtain a first intermediate feature map, wherein the fused feature map is obtained by fusing a thermal feature map and a confidence map corresponding to an optical feature map with the same time stamp according to the time stamp;

inputting the first intermediate characteristic diagram into a first pooling layer for pooling to obtain a second intermediate characteristic diagram;

inputting the second intermediate feature map into a deconvolution layer for deconvolution calculation to obtain a third intermediate feature map;

In one embodiment, the first network model may be A Student Experiment Operation Supervision Algorithm Network (ASEOSAN) based on gesture recognition. The first network model may include a first network layer, a first convolutional layer, a first pooling layer, an anti-convolutional layer, and a fully-connected layer, as shown in FIG. 2. After the optical characteristic diagram and the thermal characteristic diagram are input into the first network model, the optical characteristic diagram collected by the optical camera may be input into the first network layer to generate a confidence diagram of the pose key point of the target object, wherein the confidence diagram of the pose key point of the target object may be as shown in fig. 3; and fusing the thermal characteristic diagram with the same time stamp and the confidence diagram corresponding to the optical characteristic diagram according to the time stamp carried by the optical characteristic diagram and the thermal characteristic diagram to obtain a fused characteristic diagram. Inputting the fused feature map into a first convolution layer for convolution calculation, extracting image features of the fused feature map to obtain a first intermediate feature map, inputting the first intermediate feature map into a first pooling layer for pooling to reduce influence of useless information, and enhancing feature information to obtain a second intermediate feature map; inputting the second intermediate characteristic diagram into a deconvolution layer to perform deconvolution calculation so as to strengthen characteristic information and avoid information loss, and obtaining a third intermediate characteristic diagram; and finally, inputting the third intermediate characteristic diagram into a full-connection layer for data vectorization, and classifying according to the obtained linear vector to obtain the posture of the target object.

As an optional implementation manner, the first network layer of the first network model may be an alphaposition network, the first convolution layer of the first network model may be a three-dimensional (3D) sub-convolution layer including a plurality of serially connected 3D sub-convolution layers, the first pooling layer of the first network model may be a maxporoling layer, and the first network model deconvolution layer may be a 3D sub-convolution layer including a plurality of serially connected 3D sub-convolution layers. As shown in fig. 4, in the first network model, the 3D sub-convolution layer and the 3D sub-deconvolution layer are both four layers, a confidence map of pose key points of the target object is generated through the alphapos network, image features are extracted through the four layers of 3D sub-convolution, maxporoling and four layers of 3D sub-deconvolution, linear vectors are generated through the full connection layer, and finally a pose instruction is output, for example, "0" represents a correct operation specification, and "1" represents an incorrect operation specification, etc.

It should be noted that, when the optical camera and the infrared array sensor are used to simultaneously acquire characteristic data (such as an optical characteristic diagram and a thermal characteristic diagram), there is a time error, so that when the optical characteristic diagram and the thermal characteristic diagram are processed, data alignment is required to be performed on the two. In this embodiment, time alignment may be performed based on time information such as a time stamp carried by the optical characteristic diagram and the thermal characteristic diagram, so as to implement data fusion processing of the two. The time stamp carried by the optical characteristic diagram is used for representing the time point when the optical camera collects the optical characteristic diagram, and the time stamp carried by the thermal characteristic diagram is used for representing the time point when the infrared array sensor collects the thermal characteristic diagram.

In this embodiment, the feature information of the optical feature map and the thermal feature map may be fully fused to determine the posture of the target object based on the fused feature information, and compared with a conventional method of determining the posture of the target object by simply relying on the optical feature map, the method may effectively avoid the influence of factors such as light, a blocking object, and the like, thereby improving the accuracy of human body posture recognition.

Further, the second network model comprises an encoding layer and a decoding layer;

the step 102 of inputting the optical characteristic diagram into a pre-trained second network model to obtain the wearing condition of the wearing object of the target object includes:

inputting the optical characteristic diagram into an encoding layer for encoding to obtain a fourth intermediate characteristic diagram fused with a plurality of receptive fields;

and inputting the fourth intermediate characteristic diagram into a decoding layer for decoding to obtain the wearing condition of the wearing object of the target object.

In an embodiment, the second network model may be a Student Experiment operation Supervision algorithm network (TSEOSAN) based on target detection. The second network model can comprise an encoding layer and a decoding layer, so that the optical characteristic diagram can be input into the encoding layer to be encoded to obtain a fourth intermediate characteristic fused with a plurality of receptive fields, and then the fourth intermediate characteristic diagram is input into the decoding layer to be decoded to obtain the wearing condition of the wearing object of the target object. Since the features of the optical signature are acquired in the encoding stage for a plurality of different receptive fields and are fused, the fourth intermediate feature obtained has the low-level features and the high-level features in the optical signature. The low-level features have higher resolution and contain more position and detail information, but the lower-level features have lower semantic property and more noise due to less convolution; and the high-level features have stronger semantic information, but have lower resolution and poorer detail perception capability. The two are efficiently fused, the advantages of the two are obtained, and the abandoned vinasse is the key for improving the segmentation model, so that the problem that the wearing object of the target object in the image is difficult to extract completely is effectively solved.

In this embodiment, the wearing condition of the wearing object of the target object may be identified through the second network model, and a behavior of wearing the wearing object without being performed according to a requirement is pre-warned, so that the security of the target object is improved.

Further, the coding layer comprises a first fusion layer and a plurality of second convolution layers with different weight coefficients, the sizes of convolution kernels corresponding to different second convolution layers are different, and the sizes of convolution kernels corresponding to the second convolution layers are in inverse proportion to the weight coefficients corresponding to the second convolution layers;

the above steps, inputting the optical characteristic diagram into the coding layer for coding, obtaining a fourth intermediate characteristic diagram fused with a plurality of receptive fields, including:

inputting the optical characteristic diagrams into the plurality of second convolution layers respectively to carry out convolution calculation to obtain characteristic diagrams corresponding to a plurality of receptive fields;

and inputting the feature maps corresponding to the multiple receptive fields into the first fusion layer for feature fusion to obtain a fourth intermediate feature map.

In an embodiment, the model structure of the second network model is as shown in fig. 5, the coding layer may include a first fusion layer and a plurality of second convolutional layers with different weight coefficients, the sizes of the convolutional kernels corresponding to the different second convolutional layers are different, and the sizes of the convolutional kernels corresponding to the second convolutional layers are inversely proportional to the weight coefficients corresponding to the second convolutional layers, that is, the smaller the convolutional kernel is, the larger the corresponding weight coefficient is; the larger the convolution kernel, the smaller the corresponding weight coefficient. Therefore, the second network model can be used for extracting the relevant information of the wearing object of the target object and performing weight characteristic channel fusion with the first part at each jump connection stage, so that weight sharing is performed, the characteristic data of the wearing object of the target object is effectively extracted, and the wearing object of the target object is conveniently and effectively identified in the follow-up process.

Further, the decoding layers comprise a second merging layer, a classifier layer, a plurality of first sub-decoding layers connected in series and a plurality of second sub-decoding layers connected in series, and the plurality of second sub-decoding layers are connected in series after the plurality of first sub-decoding layers; each first sub-decoding layer comprises a second pooling layer and a third convolution layer, and each second sub-decoding layer comprises an up-sampling layer and a fourth convolution layer;

the step of inputting the fourth intermediate feature map into a decoding layer for decoding to obtain the wearing condition of the wearing object of the target object includes:

and inputting the sixth intermediate characteristic diagram into the classifier layer to obtain the wearing condition of the wearing object of the target object.

Specifically, the number of the first sub-decoding layers, the number of the second sub-decoding layers, the number of the sub-convolution layers in the third convolution layer, and the number of the sub-convolution layers in the fourth convolution layer may be set according to actual needs, and the present application is not particularly limited. As an alternative embodiment, with continued reference to fig. 5, the decoding layers may include a second merging layer, a classifier layer, a plurality of serially connected first sub-decoding layers, and a plurality of serially connected second sub-decoding layers, the plurality of second sub-decoding layers being serially connected after the plurality of first sub-decoding layers; therefore, convolution calculation is carried out by using convolution kernels with different sizes in the encoding stage to obtain shape information, deconvolution is replaced by using a bilinear interpolation and convolution mode in the decoding stage, and semantic information is effectively enriched in the up-sampling process.

Before inputting the optical feature map into the second mesh model, the optical feature map needs to be cut into an image with a uniform scale according to the central area, for example, the size of the optical feature map is cut into 256 × 320 × z, so that feature extraction of the second mesh model is facilitated.

As an alternative, referring to fig. 6, in the encoding process, convolution kernels of 1*1, 3*3 and 5*5 may be respectively used to perform convolution calculation on the optical feature map, so that feature maps corresponding to a plurality of different receptive fields may be obtained, and then the feature maps are fused according to their weight coefficients, and assuming that the weight coefficients corresponding to 3 convolution kernels are 5. And sequentially inputting the feature map after feature fusion into a plurality of first sub-decoding layers and a plurality of second sub-decoding layers to obtain respective fifth intermediate feature maps, wherein each first sub-decoding layer comprises 1 pooling layer and 3 series-connected sub-convolution layers, and each second sub-decoding layer comprises 1 upsampling layer and 3 or 5 series-connected sub-convolution layers. And finally, fusing the fifth intermediate characteristic graphs obtained by each first sub-decoding layer and each second sub-decoding layer, performing convolution calculation after fusion, and classifying to obtain the wearing condition of the wearing object of the target object. In this way, the second network model can be used to extract the information related to the wearing object of the target object, and the weight characteristic channel fusion is performed with the first part at each jump connection stage, and the weight coefficients of 1*1, 3*3 and 5*5 are 5:3: and 2, weight sharing is performed by distribution, so that the characteristic data of the wearing object of the target object is effectively extracted, and the wearing object of the target object is effectively identified.

In the embodiment, the weight distribution and feature fusion ideas are utilized to fuse feature maps with different scales, so that the image segmentation performance is improved, and the wearing object of the target object in the optical feature map is effectively identified.

Further, before the step 102 of inputting the optical characteristic diagram and the thermal characteristic diagram into the pre-trained first network model to obtain the posture of the target object, and inputting the optical characteristic diagram into the pre-trained second network model to obtain the wearing condition of the target object, the method further includes:

and inputting the optical characteristic sample diagram and the thermodynamic characteristic sample diagram into a first model to be trained for training, inputting the optical characteristic sample diagram into a second model to be trained for training, and training to obtain a first network model and a second network model when the sum of the loss values of the first model to be trained and the second model to be trained is smaller than a preset threshold value.

Specifically, the optical characteristic sample map and the thermodynamic characteristic sample map each carry label information, and the label information is used for characterizing the posture of the target object in each sample map or the wearing condition of the wearing object of the target object. The preset threshold value can be set according to actual needs, and the application is not particularly limited.

In one embodiment, before the first network model and the second network model are used for predicting the optical characteristic diagram and the thermodynamic characteristic diagram, the first model to be trained and the second model to be trained need to be trained. Specifically, an optical characteristic sample diagram and a thermodynamic characteristic sample diagram are divided into a training set and a testing set according to a preset proportion, then a first model to be trained and a second model to be trained are trained and tested, and when the sum of loss values of the first model to be trained and the second model to be trained is smaller than a preset threshold value, the training is stopped, so that a first network model and a second network model are obtained.

Before the optical characteristic sample diagram and the thermodynamic characteristic diagram are input into the first model to be trained and the second model to be trained, the optical characteristic diagram needs to be clipped into a uniform-scale image according to a central region. Meanwhile, one-Hot encoding is also required to be performed on the tag data in the sample graph, namely, one-Hot encoding, namely, one-bit effective encoding, in which an N-bit state register is used to encode N states, each state has its own independent register bit, and only One bit is effective at any time. Because in the machine learning algorithm, the classification features are as follows: the gender of a person can be divided into male and female, and the country can be divided into china, usa, france, etc., and these features are not continuous but discrete, unordered, and thus cannot be directly put into a machine learning algorithm for processing. The problem can be well solved by adopting One-Hot coding, the characteristic information of the worn object of the target object is supposed to comprise 3 characteristics of wearing or not wearing goggles, wearing or not wearing a mask and wearing or not wearing gloves, the 3 characteristics are respectively represented by binary vectors, if the worn goggles are 01, the unworn goggles are 10, the worn masks are 01, the unworn masks are 10, the worn gloves are 01 and the unworn gloves are 10, and the target object 1 in the sample graph 1 is supposed to have wearing goggles, unworn masks and gloves; sample in fig. 2 target object 2 has a worn mask and gloves, and no goggles; sample fig. 3 target object 2 has a visor and mask worn, and no gloves worn, then the characteristics of these 3 sample figures are as follows:

feature(s)	Goggles for wearing or not	Whether or not to wear the mask	Wearing or not gloves
				Sample FIG. 1	01	10	10
Sample FIG. 2	10	01	01
				Sample FIG. 3	01	01	10

Therefore, these 3 sample maps can be represented by One-Hot coding as:

sample fig. 1: [011010];

sample fig. 2: [100101];

sample fig. 3: [010110].

Therefore, data representation and processing can be carried out on different types of features in the sample graph, and the first model to be trained and the second model to be trained can better understand semantic information in the sample graph.

Further, the calculation formula of the sum of the loss values of the first model to be trained and the second model to be trained is as follows:

Loss =Loss _p +loss _t

Loss _p =f1+α*f2

therein, loss _p Representing Loss of human posture, loss _t A loss value representing a wearing condition of a wearing object, f1 representing a loss value of a bone key point, f2 representing a loss value of a bone joint, L _conf Represents a loss value, L, of a wearing subject _loc Represents the loss value of the selection frame corresponding to the wearing object, N represents the total number of the selection frames corresponding to the wearing object, alpha and lambda are weight coefficients, x represents the value 1 if one prediction selection frame is paired with the real selection frame, otherwise the value 0,c represents the prediction value of the wearing object, c ₀ A real value representing a wearing object, l represents a center position and a length and a width of the prediction selection box, g represents a center position and a length and a width of the real selection box, and s ₀ Denotes the area size of the real selection box and s denotes the area size of the prediction selection box.

The loss value f1 of the bone key points is:

wherein K represents the number of the bone key points, i represents the ith bone key point, and X _i0 Abscissa value, X, representing true skeletal key point _i Abscissa value, y, representing predicted skeletal keypoints _i0 Ordinate value, y, representing a key point of a real bone _i Ordinate values representing predicted skeletal keypoints.

The loss value f2 of the bone joint is as follows:

wherein M represents the number of bone joints, j represents the jth bone joint,

indicating the jth skeletal jointThe true joint vector of (a) is,

a modulus representing the true joint vector of the jth bone joint,

a predicted joint vector representing the jth skeletal joint,

the modulus of the predicted joint vector representing the jth skeletal joint.

Wherein the loss value L of the wearing object _conf Loss value L of the selection frame corresponding to the wearing object _loc Comprises the following steps:

wherein x represents that if a prediction selection box is paired with a real selection box, its value is 1, otherwise its value is 0,c represents the prediction value of the wearing object, c ₀ A real value representing a wearing object, l represents a center position and a length and a width of the prediction selection box, g represents a center position and a length and a width of the real selection box, and s ₀ Denotes the area size of the real selection box and s denotes the area size of the prediction selection box.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a behavior early warning device provided in the embodiment of the present application. As shown in fig. 7, the behavior warning apparatus 700 includes:

a first obtaining module 701, configured to obtain an optical characteristic map and a thermal characteristic map of a target object in a target area;

the input module 702 is configured to input the optical characteristic diagram and the thermal characteristic diagram into a pre-trained first network model to obtain a posture of the target object, and input the optical characteristic diagram into a pre-trained second network model to obtain a wearing condition of a wearing object of the target object;

the judging module 703 is configured to respectively judge whether the posture of the target object and the wearing condition of the wearing object of the target object match a preset specification;

the early warning module 704 is configured to perform early warning on the behavior of the target object when the posture of the target object does not match the preset specification, or when the wearing condition of the wearing object of the target object does not match the preset specification.

Further, the first network model comprises a first network layer, a first convolution layer, a first pooling layer, an deconvolution layer and a full-link layer; the input module 702 includes:

the first input submodule is used for inputting the optical characteristic diagram to the first network layer for gesture recognition to obtain a confidence diagram of gesture key points of the target object, wherein the optical characteristic diagram and the thermal characteristic diagram both carry time stamps;

the second input submodule is used for inputting the fused feature map into the first convolution layer for convolution calculation to obtain a first intermediate feature map, and the fused feature map is obtained by fusing the thermal feature map and the confidence map corresponding to the optical feature map, which have the same time stamp, according to the time stamp;

the third input submodule is used for inputting the first intermediate characteristic diagram into the first pooling layer for pooling to obtain a second intermediate characteristic diagram;

the fourth input submodule is used for inputting the second intermediate characteristic diagram into the deconvolution layer for deconvolution calculation to obtain a third intermediate characteristic diagram;

and the fifth input submodule is used for inputting the third intermediate characteristic diagram into the full-connection layer for data vectorization, and classifying according to the obtained vector to obtain the posture of the target object.

Further, the second network model comprises an encoding layer and a decoding layer; the input module 702 includes:

the sixth input submodule is used for inputting the optical characteristic diagram into the coding layer for coding to obtain a fourth intermediate characteristic diagram fused with a plurality of receptive fields;

and the seventh input submodule is used for inputting the fourth intermediate characteristic diagram into the decoding layer for decoding to obtain the wearing condition of the wearing object of the target object.

Further, the coding layer comprises a first fusion layer and a plurality of second convolution layers with different weight coefficients, the sizes of convolution kernels corresponding to different second convolution layers are different, and the sizes of convolution kernels corresponding to the second convolution layers are in inverse proportion to the weight coefficients corresponding to the second convolution layers; the sixth input submodule includes:

the first input unit is used for inputting the optical characteristic diagrams into the second convolution layers respectively to carry out convolution calculation so as to obtain characteristic diagrams corresponding to a plurality of receptive fields;

and the first fusion unit is used for inputting the feature maps corresponding to the multiple receptive fields into the first fusion layer for feature fusion to obtain a fourth intermediate feature map.

Further, the decoding layers comprise a second merging layer, a classifier layer, a plurality of first sub-decoding layers connected in series and a plurality of second sub-decoding layers connected in series, and the plurality of second sub-decoding layers are connected in series after the plurality of first sub-decoding layers; each first sub-decoding layer comprises a second pooling layer and a third convolution layer, and each second sub-decoding layer comprises an up-sampling layer and a fourth convolution layer; the seventh input submodule includes:

a second input unit, configured to input the fourth intermediate feature map to the multiple first sub-decoding layers and the multiple second sub-decoding layers in sequence for decoding, so as to obtain a fifth intermediate feature map corresponding to each first sub-decoding layer and each second sub-decoding layer;

the second fusion unit is used for inputting the fifth intermediate feature map into the second fusion layer for feature fusion to obtain a sixth intermediate feature map;

and the third input unit is used for inputting the sixth intermediate feature map into the classifier layer to obtain the wearing condition of the wearing object of the target object.

Further, the apparatus 700 further comprises:

the second acquisition module is used for acquiring a training sample map, the training sample map comprises an optical characteristic sample map and a thermodynamic characteristic sample map, the optical characteristic sample map carries label information for representing different postures, and the thermodynamic characteristic sample map carries label information for representing the wearing conditions of different wearing objects;

and the training module is used for inputting the optical characteristic sample diagram and the thermodynamic characteristic sample diagram into a first model to be trained for training, inputting the optical characteristic sample diagram into a second model to be trained for training, and training to obtain a first network model and a second network model under the condition that the sum of the loss values of the first model to be trained and the second model to be trained is smaller than a preset threshold value.

The calculation formula of the sum of the loss values of the first model to be trained and the second model to be trained is as follows:

Loss =Loss _p +loss _t

Loss _p =f1+α*f2

therein, loss _p Representing Loss of human posture, loss _t A loss value representing the wearing condition of the wearing object, f1 representing a loss value of a bone key point, f2 representing a loss value of a bone joint, L _conf Represents a loss value, L, of a wearing subject _loc Represents the loss value of the selection frame corresponding to the wearing object, N represents the total number of the selection frames corresponding to the wearing object, alpha and lambda are weight coefficients, x represents the value 1 if one prediction selection frame is paired with the real selection frame, otherwise the value 0,c represents the prediction value of the wearing object, c ₀ A real value representing a wearing object, l represents a center position and a length and a width of the prediction selection box, g represents a center position and a length and a width of the real selection box, and s ₀ Denotes the area size of the real selection box and s denotes the area size of the prediction selection box.

It should be noted that, the apparatus 700 implements the steps of the behavior early warning method provided by any one of the foregoing method embodiments, and can achieve the same technical effects, which are not described in detail herein.

As shown in fig. 8, an electronic device according to an embodiment of the present application includes a processor 811, a communication interface 812, a memory 813, and a communication bus 814, where the processor 811, the communication interface 812, and the memory 813 communicate with each other through the communication bus 814,

a memory 813 for storing a computer program;

in an embodiment of the present application, the processor 811, when executing the program stored in the memory 813, is configured to implement the behavior early warning method provided in any one of the foregoing method embodiments, including:

inputting the optical characteristic diagram and the thermal characteristic diagram into a pre-trained first network model to obtain the posture of the target object, and inputting the optical characteristic diagram into a pre-trained second network model to obtain the wearing condition of the wearing object of the target object;

respectively judging whether the posture of the target object and the wearing condition of the wearing object of the target object are matched with a preset specification;

and under the condition that the posture of the target object is not matched with the preset specification or the wearing condition of the wearing object of the target object is not matched with the preset specification, early warning is carried out on the behavior of the target object.

Embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the behavior early warning method provided in any one of the foregoing method embodiments.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of behavioral early warning, the method comprising:

under the condition that the posture of the target object is not matched with the preset specification or the wearing condition of the wearing object of the target object is not matched with the preset specification, early warning is carried out on the behavior of the target object;

the first network model comprises a first network layer, a first convolution layer, a first pooling layer, an deconvolution layer and a full-connection layer;

inputting the third intermediate characteristic diagram into the full-connection layer for data vectorization, and classifying according to the obtained vector to obtain the posture of the target object;

wherein the second network model comprises an encoding layer and a decoding layer;

inputting the fourth intermediate characteristic diagram into the decoding layer for decoding to obtain the wearing condition of the wearing object of the target object;

the coding layer comprises a first fusion layer and a plurality of second convolution layers with different weight coefficients, the sizes of convolution kernels corresponding to the second convolution layers are different, and the sizes of convolution kernels corresponding to the second convolution layers are in inverse proportion to the weight coefficients corresponding to the second convolution layers;

inputting the optical characteristic diagrams into the plurality of second convolution layers respectively to carry out convolution calculation so as to obtain characteristic diagrams corresponding to a plurality of receptive fields;

2. The method of claim 1, wherein the decoding layers comprise a second merging layer, a classifier layer, a plurality of serially connected first sub-decoding layers, and a plurality of serially connected second sub-decoding layers, the plurality of second sub-decoding layers being serially connected after the plurality of first sub-decoding layers; wherein each of the first sub-decoding layers comprises a second pooling layer and a third convolutional layer, and each of the second sub-decoding layers comprises an upsampling layer and a fourth convolutional layer;

3. The method of claim 1, wherein before inputting the optical signature and the thermal signature into a first pre-trained network model to obtain the pose of the target object and inputting the optical signature into a second pre-trained network model to obtain the dressing of the target object, the method further comprises:

acquiring a training sample diagram, wherein the training sample diagram comprises an optical characteristic sample diagram and a thermodynamic characteristic sample diagram, and the optical characteristic sample diagram carries label information for representing different postures and label information for representing wearing conditions of different wearing objects;

4. The method according to claim 3, wherein the sum of the loss values of the first model to be trained and the second model to be trained is calculated by:

Loss=Loss _p +loss _t

Loss _p =f1+α*f2

therein, loss _p Representing Loss of human posture, loss _t A loss value representing the wearing condition of the wearing object, f1 representing a loss value of a bone key point, f2 representing a loss value of a bone joint, L _conf Represents a loss value, L, of a wearing subject _loc Represents the loss value of the selection frame corresponding to the wearing object, N represents the total number of the selection frames corresponding to the wearing object, alpha and lambda are weight coefficients, x represents the value 1 if one prediction selection frame is paired with the real selection frame, otherwise the value 0,c represents the prediction value of the wearing object, c ₀ A real value representing a wearing object, l represents a center position and a length and a width of the prediction selection box, g represents a center position and a length and a width of the real selection box, and s ₀ Representing true choicesThe area size of the box, s represents the area size of the prediction selection box, and the loss value f2 of the bone joint is as follows:

the true joint vector representing the jth bone joint,

a modulus representing the true joint vector of the jth bone joint,

a predicted joint vector representing the jth skeletal joint,

the modulus of the predicted joint vector representing the jth skeletal joint.

5. A behavioral early warning device, the device comprising:

the early warning module is used for early warning the behavior of the target object under the condition that the posture of the target object is not matched with the preset specification or the wearing condition of the wearing object of the target object is not matched with the preset specification;

the first network model comprises a first network layer, a first convolution layer, a first pooling layer, an deconvolution layer and a full-connection layer; the input module includes:

the second input submodule is used for inputting the fused feature map into the first convolution layer for convolution calculation to obtain a first intermediate feature map, and the fused feature map is obtained by fusing the thermal feature map with the same timestamp and the confidence map corresponding to the optical feature map according to the timestamp;

a fifth input submodule, configured to input the third intermediate feature map to the full connection layer for data vectorization, and classify according to the obtained vector to obtain the posture of the target object;

wherein the second network model comprises an encoding layer and a decoding layer; the input module further comprises:

a seventh input sub-module, configured to input the fourth intermediate feature map to the decoding layer for decoding, so as to obtain a wearing condition of the wearing object of the target object;

the coding layer comprises a first fusion layer and a plurality of second convolutional layers with different weight coefficients, the sizes of convolution kernels corresponding to the second convolutional layers are different, and the sizes of the convolution kernels corresponding to the second convolutional layers are in inverse proportion to the weight coefficients corresponding to the second convolutional layers; the sixth input submodule includes:

and the first fusion unit is used for inputting the feature maps corresponding to the receptive fields into the first fusion layer for feature fusion to obtain the fourth intermediate feature map.

6. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the steps of the behavior alerting method of any one of claims 1-4 when executing the program stored in the memory.

7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the behavior alerting method as claimed in any one of claims 1 to 4.