CN112733819B

CN112733819B - Multi-mode security monitoring method based on deep learning image processing

Info

Publication number: CN112733819B
Application number: CN202110339216.0A
Authority: CN
Inventors: 古沐松; 范文杰; 孙珮凌; 游磊; 苗放
Original assignee: Chengdu University
Current assignee: Chengdu University
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2021-06-18
Anticipated expiration: 2041-03-30
Also published as: CN112733819A

Abstract

The invention provides a multi-mode security monitoring method based on deep learning image processing, which is characterized in that scenes needing to be monitored are divided according to importance degrees and are regularly adjusted according to subsequent machine learning results, so that limited monitoring resources are applied to places needing to be monitored most, meanwhile, contents needing to be monitored are focused, people most possibly needing to be monitored are screened out by setting danger IDs (identity) and secondary danger IDs, the two operations are combined, the people most needing to be monitored and appearing at the places needing to be monitored most are monitored most heavily, meanwhile, a large amount of external training data are pasted by collecting the actual scenes needing to be monitored, the enrichment of pre-training set data according with the scenes needing to be monitored is realized, and the final identification precision is further improved. Meanwhile, the areas most likely to send abnormal behaviors can be adjusted and supervised according to the training result. The video online monitoring with the maximum efficiency is achieved with lower equipment cost and expenditure.

Description

Multi-mode security monitoring method based on deep learning image processing

Technical Field

The invention belongs to the technical field of computer image processing and monitoring, and particularly relates to a multi-mode security monitoring method based on deep learning image processing.

Background

With the rapid development of computer technology, image processing technology based on machine learning is becoming more mature and applied in various industries. Such as quick payment of face recognition, coded lock of face recognition, quick intelligent recognition of image recognition and the like. But it is mainly applied at the level of recognition of images. Because the database of images is quite abundant, the accuracy of image recognition is quite high under the condition that huge image data resources are used for support; however, in the database of videos in the world today, the data volume is far lower than that of images, so the identification directly performed on the videos is limited by database resources, and is far lower than that of the images in terms of technical maturity and accuracy. Generally, for video identification, a video is converted into an image of one frame, and then the image is identified, so that the video is finally identified. In this case, since a video of one second generally contains several tens of frames of images, the overhead of performing identification analysis on the video conventionally is enormous. On the basis, the prior art generally adopts a sparse frame mode to perform image recognition by sampling from a complete video frame image, and performs image recognition under the condition of ensuring that image content is not lost as much as possible, but the overhead is still huge. Meanwhile, the method is also commonly used for offline video identification, in a multi-scene monitoring scene, the monitored identification is performed online in real time, and meanwhile, the image processing is needed to assist the monitoring scene, and generally, the method is also used in a scene in which multiple cameras cannot be qualified by a single person to check multiple regions. In this case, if a plurality of cameras are monitored on line at the same time, the overhead is huge, the requirement on equipment is quite high, the applicability is much lower, meanwhile, for the diversification of monitoring, the monitoring algorithm is rich in content, the cost and the overhead are increased, and if the overhead increased by the algorithm is superposed on the monitoring of the plurality of cameras, the overhead is increased exponentially.

Disclosure of Invention

The invention provides a multi-mode security monitoring method based on deep learning image processing, which aims at the defects of the prior art, and is characterized in that scenes needing to be monitored are divided according to importance degrees and are regularly adjusted according to subsequent machine learning results, so that limited monitoring resources are applied to places needing to be monitored, contents needing to be monitored are focused, people most possibly needing to be monitored are screened out by setting danger IDs and secondary danger IDs, the two operations are combined, the people most needing to be monitored and appearing in the places needing to be monitored are focused and monitored, meanwhile, a large amount of external training data are pasted by collecting the actual scenes needing to be monitored, the enrichment of pre-training set data conforming to the scenes needing to be monitored is realized, and the final identification precision is improved. Meanwhile, the areas most likely to send abnormal behaviors can be adjusted and supervised according to the training result. The video online monitoring with the maximum efficiency is achieved with lower equipment cost and expenditure.

The specific implementation content of the invention is as follows:

the invention provides a multi-mode security monitoring method based on deep learning image processing, which comprises the following steps of:

step 1: constructing a global map model of the monitored site;

step 2: dividing a plurality of different security monitoring sub-areas under the constructed global map model; dividing different security monitoring sub-areas into different monitoring force levels, installing a corresponding monitoring camera for each security monitoring sub-area, and monitoring for a period of time;

and step 3: selecting an image database as a pre-training set for pre-training to obtain a pre-training model; all monitoring images historically monitored by monitoring cameras of the security monitoring subareas are called for enriching a pre-training set;

and 4, step 4: during actual monitoring, firstly setting a monitoring display queue, and sequencing monitoring cameras of all security monitoring sub-areas in the monitoring display queue according to the high-low intervals of the monitoring strength grades; then setting corresponding initial monitoring time according to the set monitoring strength grade, wherein the initial monitoring time corresponding to the monitoring cameras with the same monitoring strength grade is the same, and the initial monitoring time corresponding to the monitoring cameras with higher monitoring strength grade is more; then, extracting images acquired by the N monitoring cameras in real time according to the sequence in the monitoring display queue to display the images on a monitoring screen in the security room;

and 5: for the real-time collected images displayed on the monitoring screen, face recognition and human body recognition are carried out by combining a face recognition algorithm and a human body recognition algorithm, whether abnormal behaviors occur or not is judged by using a human body behavior recognition algorithm, and human body tracking is carried out by using a human body tracking algorithm; when the initial monitoring time is over, images which are displayed on the monitoring screen and are not acquired in real time by pedestrians are removed from the monitoring screen, and images acquired in real time by subsequent monitoring cameras are called from the monitoring display queue to be displayed and monitored on the monitoring screen, and the images are circulated in the monitoring display queue in cycles; increasing the display time on the monitoring screen for pedestrians appearing on the monitoring screen before the initial monitoring time is over; when abnormal behaviors of pedestrians are displayed and monitored on the monitoring screen, alarming and pushing are carried out in the form of an identification frame to workers, whether the abnormal behaviors exist or not is verified by the workers, and processing is carried out; for the images which are verified to have abnormal behaviors, the display time of the corresponding monitoring camera is increased again;

step 6: for the human body verified to have abnormal behavior, extracting the identified corresponding human face information, setting a unique dangerous figure ID, and storing the human face information into the corresponding dangerous figure ID; for the pedestrian whose face information is stored in the dangerous person ID, the following processing principle is executed:

in subsequent monitoring, when a pedestrian with face information stored in the dangerous person ID is identified in the image stored on the monitoring screen, the display and monitoring time of the corresponding monitoring camera on the monitoring screen is increased;

for the pedestrian whose face information is stored in the dangerous figure ID, when the behavior is monitored by using a human behavior recognition algorithm, whether interaction behavior exists between the pedestrian and other pedestrians under the same monitoring camera is also recognized, the face information with the interaction behavior among the other pedestrians is stored, an interaction behavior threshold value B and a secondary dangerous figure ID are set, the interaction behavior of the other pedestrians is subjected to accumulated points, and the face information of the other pedestrians of which the accumulated points exceed the interaction behavior threshold value B is stored in the secondary dangerous figure ID; when the pedestrian with the face information stored in the secondary dangerous figure ID appears on the monitoring camera, the display and monitoring time of the corresponding monitoring camera is also increased; the threshold B is a self-defined value according to actual conditions;

and 7: after the operation of the step 4-the step 6 is carried out and the time A is reached, summarizing the times of abnormal behaviors occurring in each security monitoring sub-area in the time A, and dividing the monitoring force grades again according to the times; the time A is the self-defined time quantum according to the actual situation;

and 8: and after the operation of the steps 4-6 reaches the time C, summarizing the images of the abnormal behaviors identified by the single monitoring camera, calculating an offset value of the identification frame of the abnormal behaviors in the single image, which is positioned in the center of the image, in the image, and adjusting the position and the attitude of the corresponding monitoring camera according to the calculated offset value.

To better implement the present invention, further, in said step 3, the specific operations of enriching the pre-training set are: and transferring all historical monitoring images of the monitoring cameras of each security monitoring subregion, in the pre-training process, pulling out pedestrians in the pre-training set from the images of the pre-training set, carrying out zooming, deformation stretching and color transformation, pasting the pedestrians in the pre-training set to the transferred monitoring images at different angles to obtain rich training images, and adding the rich training images into the pre-training set for pre-training.

In order to better implement the present invention, further, the specific operations sequenced in the monitoring display queue according to the high-low interval of the monitoring strength level in step 4 are as follows: the method comprises the steps of setting a first queue and a second queue, firstly adding two monitoring cameras with the highest monitoring degree grade to the first position of the first queue and the second queue respectively, then adding two monitoring cameras with the lowest monitoring degree grade to the second position of the first queue and the second queue respectively, then adding two monitoring cameras with the highest monitoring degree grade in the monitoring cameras which are not sequenced to the third position of the first queue and the second queue respectively, then adding two monitoring cameras with the lowest monitoring degree grade in the monitoring cameras which are not sequenced to the fourth position of the first queue and the second queue respectively, and so on until all the monitoring cameras are sequenced to the first queue or the second queue, and then splicing the first position of the second queue and the tail of the first queue to obtain a monitoring display queue.

In order to better implement the present invention, further, in the step 4, a pre-loading time D is set, when a display duration of a certain monitoring camera on the monitoring screen remains the time D, images collected by monitoring cameras sequenced in the monitoring sequencing queue are pre-loaded, and when the display duration of a certain monitoring camera on the monitoring screen is used up, the pre-loaded images are used for replacement display.

In order to better realize the invention, further, the human body tracking algorithm adopts a global optimization multi-target tracking method based on the bounding box regression and the twin neural network, in the tracking process, when the target human body is shielded or leaves the area monitored by the monitoring camera to cause target tracking loss, the last frame image before the target tracking loss is stored, when redundant monitoring results are monitored in subsequent monitoring, similarity measurement is carried out on redundant monitoring results of a last frame of image core lost by target tracking which is saved more by a twin neural network, a target human body which is judged to be true through the similarity measurement is regarded as the same target human body, a regression identification frame is established for continuing tracking, a target human body which is judged to be false through the similarity measurement is regarded as a different target human body, and a new identification frame is established for carrying out human body tracking.

In order to better realize the method, the combination of the mes loss function, the cross entropy loss function and the contrast loss function is further adopted as a joint loss function to be used for joint training of the bounding box regression and the twin neural network in the global optimization multi-target tracking method.

In order to better realize the invention, further, when the human face recognition algorithm and the human body recognition algorithm are used for carrying out the human face recognition and the human body recognition, firstly, anti-interference class-by-class kmeans clustering is carried out, the kmeans clustering is carried out on the human face and the human body, initial anchor points are obtained, the initial anchor points are divided equally, the human face recognition and the human body recognition are divided into half of the initial anchor points, and then two monitoring layers of a yolo-tiny frame are adopted to respectively output the human face recognition result and the human body recognition result.

In order to better realize the invention, further, when the recognition and monitoring times of the human face and the human body reach 1000 times, a genetic algorithm is added to finely adjust the initial anchor point.

In order to better implement the present invention, further, the specific steps of using the human behavior recognition algorithm to perform human behavior recognition are as follows:

step S1: collecting pictures with abnormal behaviors as a positive sample set, and selecting pictures without abnormal behaviors as a negative sample set;

step S2: based on the human body target identified by the human body identification algorithm, extracting human body skeleton characteristics of the human body target from the positive sample set and the negative sample set by using an OpenPose identification technology, and vectorizing the extracted human body skeleton characteristics into human body skeleton characteristic vectors;

step S3: taking the vectorized human skeleton feature vector as a training data set for human behavior recognition, and recognizing abnormal behaviors through a ResNet-56 action classification recognition model; and outputs the result of the recognition.

To better implement the present invention, further, for the face information stored in the sub-risk person ID:

if the abnormal behavior is not identified within the stored time E and the accumulated sum of the recalculated interactive behaviors does not exceed the threshold B, deleting the corresponding face information from the ID of the secondary dangerous person;

if the abnormal behavior is not identified within the stored time E, but the accumulated sum of the recalculated interactive behaviors exceeds the threshold B, continuously retaining the corresponding face information from the ID of the secondary dangerous person and refreshing the timing of the time E;

if the abnormal behavior is recognized within the time E after the storage, an alarm is given and the corresponding face information is transferred from the sub-dangerous person ID to the dangerous person ID and stored.

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) the invention adopts a zone dividing mode and sets different important levels for the zones, thereby realizing supervision of different degrees, and displaying on the monitoring screen according to sequencing cycle, realizing monitoring of the most important zone with lower cost and simultaneously considering monitoring of other non-important zones; on the basis, the monitoring of the person most needing to be monitored is combined; by combining two of the two parts, the monitoring of potential risks is highlighted, and the expenditure and the cost are greatly reduced;

(2) the method has the advantages that the background of the scene to be monitored is extracted, pedestrian data in other databases are pasted, data quantity related to the scene to be detected is greatly enriched, operations such as scaling, deformation stretching and color transformation are carried out on the pedestrian data, training data are further enriched again, and the detection precision is improved based on the enriched pre-training data;

(3) the method for setting the monitoring display queues according to the high-low interval sequencing of the monitoring strength levels realizes the balanced division of important areas and unimportant areas, avoids the queue threads of a plurality of important areas being crowded in the same time period, and accordingly increases the detection blind area of important persons in the important areas;

(4) the human body is tracked by adopting a global optimization multi-target tracking method based on bounding box regression and twin neural networks, the detection result is not needed to be correlated in a time domain, the online tracking detection information can be realized, the critical specific target identity information in the tracking information is reserved, and the more stable and higher-precision tracking monitoring is realized; meanwhile, the combination of the mes loss function, the cross entropy loss function and the contrast loss function is adopted as a combined loss function for the combined training of the bounding box regression and the twin neural network in the global optimization multi-target tracking method, so that the end-to-end training can be improved;

(5) the anti-interference class-by-class kmeans clustering is adopted for optimization, because the human face and the human body are in one-to-one relation, equalization processing can be realized by adopting bisection, and because the human body and the human face are always shielded in the actual processing process, the human body and the human face are not completely in one-to-one correspondence in number, and after multiple times of training, a genetic algorithm is added for fine adjustment, so that the training precision can be increased.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic flow chart of the interactive scoring accumulation according to the present invention;

FIG. 3 is a diagram illustrating an exemplary partitioning of the first queue and the second queue according to the present invention;

fig. 4 is an exemplary diagram of a monitoring display queue obtained by splicing the divided first queue and the second queue shown in fig. 3;

FIG. 5 is a schematic illustration of a sequencing for monitoring and displaying a detection display queue;

fig. 6 is a schematic diagram showing the ratio of the display time of the partial monitoring cameras after the test is performed in a certain cell.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments, and therefore should not be considered as a limitation to the scope of protection. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "disposed," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Example 1:

the embodiment provides a multi-mode security monitoring method based on deep learning image processing, as shown in fig. 1 and fig. 2, including the following steps.

Step 1: constructing a global map model; the method specifically comprises the following steps: and constructing a global map model of the monitored site.

Step 2: dividing sub-areas and installing monitoring cameras; the method specifically comprises the following steps: dividing a plurality of different security monitoring sub-areas under the constructed global map model, dividing the different security monitoring sub-areas into different monitoring force levels, installing a corresponding monitoring camera for each security monitoring sub-area, and operating and monitoring for a period of time.

And step 3: preparing a pre-training set; the method specifically comprises the following steps: and selecting an image database as a pre-training set for pre-training to obtain a pre-training model, and calling all monitoring images historically monitored by monitoring cameras of the security monitoring subarea for enriching the pre-training set.

The specific operations of enriching the pre-training set are as follows: and transferring all historical monitoring images of the monitoring cameras of each security monitoring subregion, in the pre-training process, pulling out pedestrians in the pre-training set from the images of the pre-training set, carrying out zooming, deformation stretching and color transformation, pasting the pedestrians in the pre-training set to the transferred monitoring images at different angles to obtain rich training images, and adding the rich training images into the pre-training set for pre-training.

And 4, step 4: setting detection strength grade and initial monitoring time; the method specifically comprises the following steps: during actual monitoring, firstly setting a monitoring display queue, and sequencing monitoring cameras of all security monitoring sub-areas in the monitoring display queue according to the high-low intervals of the monitoring strength grades; then setting corresponding initial monitoring time according to the set monitoring strength grade, wherein the initial monitoring time corresponding to the monitoring cameras with the same monitoring strength grade is the same, and the initial monitoring time corresponding to the monitoring cameras with higher monitoring strength grade is more; and then, extracting images acquired by the N monitoring cameras in real time according to the sequence in the monitoring display queue to display the images on a monitoring screen in the security room.

And 5: recognizing human faces, human bodies and abnormal behaviors; the method specifically comprises the following steps: for the real-time collected images displayed on the monitoring screen, face recognition and human body recognition are carried out by combining a face recognition algorithm and a human body recognition algorithm, whether abnormal behaviors occur or not is judged by using a human body behavior recognition algorithm, and human body tracking is carried out by using a human body tracking algorithm; when the initial monitoring time is over, images which are displayed on the monitoring screen and are not acquired in real time by pedestrians are removed from the monitoring screen, and images acquired in real time by subsequent monitoring cameras are called from the monitoring display queue to be displayed and monitored on the monitoring screen, and the images are circulated in the monitoring display queue in cycles; increasing the display time on the monitoring screen for pedestrians appearing on the monitoring screen before the initial monitoring time is over; when abnormal behaviors of pedestrians are displayed and monitored on the monitoring screen, alarming and pushing are carried out in the form of an identification frame to workers, whether the abnormal behaviors exist or not is verified by the workers, and processing is carried out; for the image verified as having the abnormal behavior, the display time of the corresponding monitoring camera is increased again.

Step 6: storing the mainly monitored dangerous persons and the secondarily monitored dangerous persons; the method specifically comprises the following steps: for the human body verified to have abnormal behavior, extracting the identified corresponding human face information, setting a unique dangerous figure ID, and storing the human face information into the corresponding dangerous figure ID; for the pedestrian whose face information is stored in the dangerous person ID, the following processing principle is executed.

In the subsequent monitoring, when it is recognized in the image stored on the monitor screen that a pedestrian whose face information is stored in the ID of the dangerous person is present, the display and monitoring time of the corresponding monitoring camera on the monitor screen is increased.

As shown in fig. 2, after video stream is input, image frames are sparsely intercepted, face recognition is performed, human behavior recognition and relative position recognition are performed based on the face recognition, for pedestrians whose face information is stored in dangerous figure IDs, when behaviors are monitored by using a human behavior recognition algorithm, whether interaction behaviors exist between the pedestrians and other pedestrians under the same monitoring camera is also recognized, face information with interaction behaviors existing in other pedestrians is stored, an interaction behavior threshold value B and a next dangerous figure ID are set, interaction behaviors of other pedestrians are cumulatively added, and face information of other pedestrians whose cumulatively added exceeds the interaction behavior threshold value B is stored in the next dangerous figure ID; when the pedestrian whose face information is stored in the ID of the sub-dangerous person appears on the monitoring camera, the display and monitoring time of the corresponding monitoring camera is also increased. The threshold B is a self-defined value according to actual conditions.

And 7: updating the monitoring degree grade and the initial monitoring time; the method specifically comprises the following steps: after the operation of the step 4-the step 6 is carried out and the time A is reached, summarizing the times of abnormal behaviors occurring in each security monitoring sub-area in the time A, and dividing the monitoring force grades again according to the times; the time A is the self-defined time quantity according to the actual situation.

And 8: adjusting the subarea and the monitoring camera; the method specifically comprises the following steps: and after the operation of the steps 4-6 reaches the time C, summarizing the images of the abnormal behaviors identified by the single monitoring camera, calculating an offset value of the identification frame of the abnormal behaviors in the single image, which is positioned in the center of the image, in the image, and adjusting the position and the attitude of the corresponding monitoring camera according to the calculated offset value.

The working principle is as follows: the invention adopts a zone dividing mode and sets different important levels for the zones, thereby realizing supervision of different degrees, and displaying on the monitoring screen according to sequencing cycle, realizing monitoring of the most important zone with lower cost and simultaneously considering monitoring of other non-important zones; on the basis, the monitoring of the person most needing to be monitored is combined; in combination, the potential risk monitoring is highlighted, and the expenditure and the cost are greatly reduced.

By setting the dangerous person ID and the sub-dangerous person ID, important persons are monitored, if the dangerous person ID stores persons with abnormal behaviors, important monitoring is needed, and the sub-dangerous person ID stores persons related to dangerous persons, if someone in the dangerous persons initiates actions such as striking to others, the struck person is stored in the sub-dangerous person ID, and the struck person is likely to be subjected to rectification and the like with the sub-dangerous person ID. As another example, in a group work case, a chief or action may be only one person, but needs multiple persons to cooperate with each other at a later time, and this cooperation action may not be an abnormal behavior that can be directly recognized, but is displayed on a monitoring screen, so that the situation when the abnormal behavior occurs later can be monitored in advance, or when a security worker looks at the monitoring screen, some abnormal behaviors that cannot be recognized by a machine can be recognized through human experience, and thus the abnormal behaviors can be processed.

As shown in fig. 6, a circle of test is performed by using a part of monitoring cameras of a certain cell, wherein the selected cameras include a camera of a large door of the cell, a camera of one unit, a camera of two units, a camera of three units, a camera of a first movable square, a camera of a second movable square, a camera of a third movable square, a camera of a pavilion and fitness equipment area, and a camera of a back door of the cell; after a week of test, the obtained data are shown in fig. 6, and the time ratios of the cameras of the community gate, the camera of one unit, the camera of two units, the camera of three units, the camera of the first activity square, the camera of the second activity square, the camera of the third activity square, the camera of the pavilion and fitness equipment area and the camera of the community back door displayed on the screen are respectively 36%, 7%, 8%, 9%, 6%, 5%, 8% and 15%. It can be seen that in the area with large flow of people, the monitoring time is longest, and in the place with relatively few activities, the monitoring strength is reduced, but a certain monitoring strength is maintained.

Example 2:

in this embodiment, on the basis of the foregoing embodiment 1, in order to better implement the present invention, further, the specific operations sequenced in the monitoring display queue according to the high-low intervals of the monitoring strength level in the step 4 are as follows: the method comprises the steps of setting a first queue and a second queue, firstly adding two monitoring cameras with the highest monitoring degree grade to the first position of the first queue and the second queue respectively, then adding two monitoring cameras with the lowest monitoring degree grade to the second position of the first queue and the second queue respectively, then adding two monitoring cameras with the highest monitoring degree grade in the monitoring cameras which are not sequenced to the third position of the first queue and the second queue respectively, then adding two monitoring cameras with the lowest monitoring degree grade in the monitoring cameras which are not sequenced to the fourth position of the first queue and the second queue respectively, and so on until all the monitoring cameras are sequenced to the first queue or the second queue, and then splicing the first position of the second queue and the tail of the first queue to obtain a monitoring display queue.

The working principle is as follows: as shown in fig. 3 and 4, assuming that twelve monitoring cameras with detection strength levels decreasing from 1 to 12 are set, 1 and 2 are respectively used as the first bits of the first queue and the second queue, 11 and 12 are used as the second bits of the two queues, 3 and 4 are respectively used as the three bits of the two queues, and so on, so as to obtain the first queue and the second queue, and then the two queues are spliced to obtain the detection display queue shown in fig. 4.

Other parts of this embodiment are the same as those of embodiment 1, and thus are not described again.

Example 3:

in this embodiment, on the basis of any one of the foregoing embodiments 1-2, in order to better implement the present invention, further, in step 4, a time D for preloading is set, when a display duration of a certain monitoring camera on the monitoring screen remains the time D, images collected by monitoring cameras sequenced in the monitoring sequencing queue are preloaded, and when the display duration of a certain monitoring camera on the monitoring screen is used up, the preloaded images are used for replacement display.

The working principle is as follows: as shown in fig. 5, assuming that three monitoring cameras in the detection display queue are used for display at a time, in fig. 5, the monitoring cameras No. 1, No. 11, and No. 3 in the detection display queue of fig. 4 are used for display at first, and then when the display time of one monitoring camera among the monitoring cameras No. 1, No. 11, and No. 3 is still left for a time D, the image of the monitoring camera No. 9 is preloaded, and when the display time of the previous monitoring camera is used up, the image of the monitoring camera No. 9 is directly switched and displayed, so that the black screen phenomenon in the loading stage is avoided when direct switching occurs.

Other parts of this embodiment are the same as any of embodiments 1-2 described above, and thus are not described again.

Example 4:

this embodiment is based on any one of embodiments 1 to 3 described above, and further, the human body tracking algorithm adopts a global optimization multi-target tracking method based on bounding box regression and twin neural networks, in the tracking process, when the target human body is shielded or leaves the area monitored by the monitoring camera to cause target tracking loss, the last frame image before the target tracking loss is stored, when redundant monitoring results are monitored in subsequent monitoring, similarity measurement is carried out on redundant monitoring results of a last frame of image core lost by target tracking which is saved more by a twin neural network, a target human body which is judged to be true through the similarity measurement is regarded as the same target human body, a regression identification frame is established for continuing tracking, a target human body which is judged to be false through the similarity measurement is regarded as a different target human body, and a new identification frame is established for carrying out human body tracking.

The working principle is as follows: the human body is tracked by adopting a global optimization multi-target tracking method based on bounding box regression and twin neural networks, the detection result is not needed to be correlated in a time domain, the online tracking detection information can be realized, the critical specific target identity information in the tracking information is reserved, and the more stable and higher-precision tracking monitoring is realized; meanwhile, the combination of the mes loss function, the cross entropy loss function and the contrast loss function is adopted as a combined loss function to be used for combined training of the bounding box regression and the twin neural network in the global optimization multi-target tracking method, so that the end-to-end training can be improved.

Other parts of this embodiment are the same as any of embodiments 1 to 3, and thus are not described again.

Example 5:

in this embodiment, on the basis of any one of the above embodiments 1 to 4, to better implement the present invention, further, when performing face recognition and human body recognition using a face recognition algorithm and a human body recognition algorithm, anti-interference class-by-class kmeans clustering is performed first, kmeans clustering is performed on both the face and the human body to obtain initial anchor points, the initial anchor points are divided into two halves, and then two monitoring layers of a yolo-tiny frame are used to output a face recognition result and a human body recognition result respectively.

The working principle is as follows: the anti-interference class-by-class kmeans clustering is adopted for optimization, because the human face and the human body are in one-to-one relation, equalization processing can be realized by adopting bisection, and because the human body and the human face are always shielded in the actual processing process, the human body and the human face are not completely in one-to-one correspondence in number, and after multiple times of training, a genetic algorithm is added for fine adjustment, so that the training precision can be increased.

Other parts of this embodiment are the same as any of embodiments 1 to 4, and thus are not described again.

Example 6:

in this embodiment, on the basis of any one of the above embodiments 1 to 5, in order to better implement the present invention, further, the specific steps of using the human behavior recognition algorithm to perform human behavior recognition are as follows:

Other parts of this embodiment are the same as any of embodiments 1 to 5, and thus are not described again.

Example 7:

this embodiment is based on any one of embodiments 1 to 6 above, and regarding the face information stored in the sub-risk person ID:

The working principle is as follows: if the pedestrian who has no abnormal behavior for a long time is proved to have stable behavior, the pedestrian is deleted from the ID of the subdangerous person, and the monitoring strength is reserved for other key monitoring objects.

Other parts of this embodiment are the same as any of embodiments 1 to 6, and thus are not described again.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications and equivalent variations of the above embodiments according to the technical spirit of the present invention are included in the scope of the present invention.

Claims

1. A multi-mode security monitoring method based on deep learning image processing is characterized by comprising the following steps:

step 1: constructing a global map model of the monitored site;

for the pedestrian whose face information is stored in the dangerous figure ID, when the behavior is monitored by using a human behavior recognition algorithm, whether interaction behavior exists between the pedestrian and other pedestrians under the same monitoring camera is also recognized, the face information with the interaction behavior among the other pedestrians is stored, an interaction behavior threshold value B and a secondary dangerous figure ID are set, the interaction behavior of the other pedestrians is subjected to accumulated points, and the face information of the other pedestrians of which the accumulated points exceed the interaction behavior threshold value B is stored in the secondary dangerous figure ID; when the pedestrian with the face information stored in the secondary dangerous figure ID appears on the monitoring camera, the display and monitoring time of the corresponding monitoring camera is also increased; the threshold B is a self-defined value according to actual conditions; for the face information stored in the sub-risk person ID:

if the abnormal behavior is recognized within the stored time E, alarming and transferring and storing the corresponding face information from the sub dangerous figure ID to the dangerous figure ID;

2. The multimode security monitoring method based on deep learning image processing as claimed in claim 1, wherein in step 3, the specific operations of enriching the pre-training set are: and transferring all historical monitoring images of the monitoring cameras of each security monitoring subregion, in the pre-training process, pulling out pedestrians in the pre-training set from the images of the pre-training set, carrying out zooming, deformation stretching and color transformation, pasting the pedestrians in the pre-training set to the transferred monitoring images at different angles to obtain rich training images, and adding the rich training images into the pre-training set for pre-training.

3. The multi-mode security monitoring method based on deep learning image processing as claimed in claim 1, wherein the specific operations of sorting in the monitoring display queue according to the high-low interval of the monitoring strength level in the step 4 are as follows: the method comprises the steps of setting a first queue and a second queue, firstly adding two monitoring cameras with the highest monitoring degree grade to the first position of the first queue and the second queue respectively, then adding two monitoring cameras with the lowest monitoring degree grade to the second position of the first queue and the second queue respectively, then adding two monitoring cameras with the highest monitoring degree grade in the monitoring cameras which are not sequenced to the third position of the first queue and the second queue respectively, then adding two monitoring cameras with the lowest monitoring degree grade in the monitoring cameras which are not sequenced to the fourth position of the first queue and the second queue respectively, and so on until all the monitoring cameras are sequenced to the first queue or the second queue, and then splicing the first position of the second queue and the tail of the first queue to obtain a monitoring display queue.

4. The multimode security monitoring method based on deep learning image processing as claimed in claim 1, wherein in step 4, a pre-loading time D is set, when a display duration of a certain monitoring camera on the monitoring screen is still left with the time D, images collected by monitoring cameras sequenced in the monitoring sequencing queue are pre-loaded, and when the display duration of a certain monitoring camera on the monitoring screen is used up, the pre-loaded images are used for replacement display.

5. The multimode security monitoring method based on deep learning image processing as claimed in claim 1 or 2 or 3 or 4, it is characterized in that the human body tracking algorithm adopts a global optimization multi-target tracking method based on bounding box regression and twin neural network, in the tracking process, when the target human body is shielded or leaves the area monitored by the monitoring camera to cause target tracking loss, the last frame image before the target tracking loss is stored, when redundant monitoring results are monitored in subsequent monitoring, similarity measurement is carried out on redundant monitoring results of a last frame of image core lost by target tracking which is saved more by a twin neural network, a target human body which is judged to be true through the similarity measurement is regarded as the same target human body, a regression identification frame is established for continuing tracking, a target human body which is judged to be false through the similarity measurement is regarded as a different target human body, and a new identification frame is established for carrying out human body tracking.

6. The multi-mode security monitoring method based on deep learning image processing as claimed in claim 5, wherein a combination of a mes loss function, a cross entropy loss function and a contrast loss function is adopted as a joint loss function for joint training of bounding box regression and twin neural networks in the global optimization multi-target tracking method.

7. The multi-mode security monitoring method based on deep learning image processing as claimed in claim 1, wherein when face recognition and human body recognition are performed by using a face recognition algorithm and a human body recognition algorithm, anti-interference class-by-class kmeans clustering is performed first, kmeans clustering is performed on both the face and the human body to obtain initial anchor points, the initial anchor points are divided into two halves, and then the face recognition result and the human body recognition result are output by using two monitoring layers of a yolo-tiny frame.

8. The multi-mode security monitoring method based on deep learning image processing as claimed in claim 7, wherein when the number of recognition and monitoring of human faces and human bodies reaches 1000, a genetic algorithm is added to fine-tune the initial anchor points.

9. The multi-mode security monitoring method based on deep learning image processing as claimed in claim 1, wherein the specific steps of using the human behavior recognition algorithm to perform human behavior recognition are as follows: