CN111556294A

CN111556294A - Safety monitoring method, device, server, terminal and readable storage medium

Info

Publication number: CN111556294A
Application number: CN202010394039.1A
Authority: CN
Inventors: 唐梦云; 罗剑; 涂思嘉; 唐艳平; 孙利; 颜小云; 冷鹏宇; 黄湘琦; 陈泳君; 刘水生; 甘祥
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-05-11
Filing date: 2020-05-11
Publication date: 2020-08-18
Anticipated expiration: 2040-05-11
Also published as: CN111556294B

Abstract

The application provides a safety monitoring method, a safety monitoring device, a server, a terminal and a readable storage medium, and belongs to the technical field of computers. The method comprises the following steps: the method comprises the steps of inputting a plurality of first video images of a first target place into at least two first image recognition models respectively, determining a target image recognition model based on at least two output crowd density information, inputting a plurality of second video images into the target image recognition model, and determining first safety monitoring information according to the crowd density information of at least one second video image. According to the method and the device, the video images acquired at the initial monitoring stage are processed according to at least two first image recognition models to obtain the crowd density information determined by a plurality of models, the target image recognition models are selected based on the determined crowd density information, the models more suitable for the first target places can be selected for the subsequent monitoring process, the accuracy of the crowd density information can be adaptively improved, and the safety is improved.

Description

Safety monitoring method, device, server, terminal and readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a security monitoring method, apparatus, server, terminal, and readable storage medium.

Background

Nowadays, the types of viruses are more and more, and the mutation capability is stronger and stronger. Most viruses can be transmitted through respiratory tracts, spray, saliva and the like, have extremely high infectivity, and pose great threat to the health of people. In public environment, people can protect by wearing the mask and guaranteeing certain spacing distance and other modes, and then effectively restrain the propagation of virus, guarantee self and other people's safety. However, many people still do not pay attention to protection, so safety management personnel are needed to supervise the protection situation of people in the public environment, such as the crowd gathering situation, so as to manage abnormal situations in time.

At present, when the crowd gathering condition is detected, the monitoring image collected by a monitoring system is mainly identified through a neural network model, the head of a person in the monitoring image is identified, the crowd density in the monitoring image is further determined according to the identified result, and the gathered head of the person is displayed to be different colors according to the crowd density, so that safety management personnel can determine the crowd gathering condition, and timely dredge when the crowd gathers more.

In the implementation process, the crowd density is determined by detecting the head of a person, the shielding problem is likely to occur, the shielded head of the person cannot be identified, the determined crowd density is inaccurate, and the accuracy of safety monitoring is low.

Disclosure of Invention

The embodiment of the application provides a safety monitoring method, a safety monitoring device, a server, a terminal and a readable storage medium, and the safety monitoring accuracy can be improved. The technical scheme is as follows:

in one aspect, a safety monitoring method is provided, which includes:

acquiring at least one first video image of a first target site;

respectively inputting at least two first image recognition models into the at least one first video image, and outputting at least two crowd density information of the at least one first video image, wherein the at least two first image recognition models respectively recognize people in the video image on the basis of different elements of the people;

determining a target image recognition model based on the at least two pieces of crowd density information, wherein the target image recognition model is a model of which the output crowd density information meets a target condition;

inputting at least one second video image of the first target place into the target image recognition model, and outputting crowd density information of the at least one second video image, wherein the shooting time of the at least one second video image is behind that of the at least one first video image;

and determining first safety monitoring information according to the crowd density information of the at least one second video image, wherein the first safety monitoring information is used for indicating the crowd gathering condition of the first target place.

In one aspect, a safety monitoring method is provided, which includes:

displaying a safety monitoring interface, wherein the safety monitoring interface comprises a crowd gathering display option which is used for providing crowd gathering display function of at least one place;

responding to the triggering operation of the crowd gathering display option, acquiring a first image of the at least one place, and displaying the crowd density information of the place in a distinguishing way through different marking ways by the first image;

a first image of the at least one location is displayed.

In one aspect, a safety monitoring device is provided, the device comprising:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring at least one first video image of a first target place;

the model processing module is used for respectively inputting the at least one first video image into at least two first image recognition models and outputting at least two crowd density information of the at least one first video image, and the at least two first image recognition models respectively recognize people in the video image on the basis of different elements of the people;

the model determining module is used for determining a target image recognition model based on the at least two pieces of crowd density information, and the target image recognition model is a model of which the output crowd density information meets a target condition;

the model processing module is further used for inputting at least one second video image of the first target place into the target image recognition model and outputting crowd density information of the at least one second video image, wherein the shooting time of the at least one second video image is behind that of the at least one first video image;

and the first information determining module is used for determining first safety monitoring information according to the crowd density information of the at least one second video image, wherein the first safety monitoring information is used for indicating the crowd gathering condition of the first target place.

In a possible implementation manner, the model processing module is configured to input the at least one first video image into a human head detection model, determine a human head position in the at least one first video image through the human head detection model, and determine crowd density information in the at least one first video image according to the human head position;

the model processing module is used for inputting the at least one first video image into a pedestrian detection model, determining the position of a pedestrian in the at least one first video image through the pedestrian monitoring model, and determining crowd density information in the at least one first video image according to the position of the pedestrian;

the model processing module is used for inputting the at least one first video image into a crowd density estimation model, determining a crowd density map of the at least one first video image through the crowd density estimation model, and determining crowd density information in the at least one first video image according to the crowd density map.

In a possible implementation manner, the model processing module is configured to output the crowd density information corresponding to the at least two head positions if the crowd density information corresponding to the at least two head positions meets a preset condition, and determine the crowd density information corresponding to the at least two head positions and the next head position based on the next head position meeting a preset distance condition with the at least two head positions if the crowd density information corresponding to the at least two head positions does not meet the preset condition.

In a possible implementation manner, the model processing module is configured to output the crowd density information corresponding to the at least two pedestrian positions if the crowd density information corresponding to the at least two pedestrian positions meets a preset condition, and determine the crowd density information corresponding to the at least two pedestrian positions and the next pedestrian position based on the next pedestrian position meeting a preset distance condition with the at least two pedestrian positions if the crowd density information corresponding to the at least two pedestrian positions does not meet the preset condition.

In one possible implementation, the at least one first video image carries a verification tag;

the device also includes:

the first detection module is used for detecting a label carried by the acquired video image;

and the first image determining module is used for determining the acquired video image as the first video image if the acquired video image is detected to carry the check tag, executing the step of inputting the at least one first video image into at least two first image identification models respectively and outputting at least two crowd density information of the at least one first video image.

In one possible implementation, the apparatus further includes:

and the second image determining module is used for determining the acquired video image as the second video image if the acquired video image is detected not to carry the check tag, executing the step of inputting at least one second video image of the first target place into the target image identification model, and outputting the crowd density information of the at least one second video image.

In one possible implementation, the apparatus further includes:

the analysis module is used for analyzing the received video image;

and the adding module is used for adding the check tags to the video images of the first number obtained by analysis to obtain the at least one first video image, and not adding the check tags to the video images of the second number located behind the video images of the first number in sequence to obtain the at least one second video image.

In one possible implementation, the apparatus further includes:

the second detection module is used for detecting the first safety monitoring information;

the first sending module is used for sending first alarm information if the first safety monitoring information is detected to meet a first alarm condition, and the first alarm information is used for indicating the first target place to have crowd gathering.

In one possible implementation, the apparatus further includes:

the second acquisition module is used for acquiring a video image of a second target place;

the first identification module is used for identifying the video image through a second image identification model if the second target place is a target type place, and determining a person who does not wear the mask in the video image;

the marking module is used for marking people who do not wear the mask in the video image;

and the second information determining module is used for determining second safety monitoring information according to the number of people who do not wear the mask on the video image, and the second safety monitoring information is used for indicating the number of people who do not wear the mask on the second target field.

In a possible implementation manner, the recognition module is configured to determine a face region in the video image through a face detection model in the second image recognition model, intercept the face region in the video image to obtain at least one face image, recognize the at least one face image through a classification model in the second image recognition model, and determine whether a person corresponding to the face image wears the mask.

In one possible implementation, the apparatus further includes:

the third detection module is used for detecting the second safety monitoring information;

and the second sending module is used for sending second alarm information if the second safety monitoring information is detected to meet a second alarm condition, and the second alarm information is used for indicating that people do not wear the mask in the second target place.

In one possible implementation, the apparatus further includes:

the third acquisition module is used for acquiring a video image of a third target place;

and the second identification module is used for identifying the video image of the third target place to obtain queuing information of different place units in the third target place.

In one possible implementation, the apparatus further includes:

the fourth acquisition module is used for acquiring video images of different site units in a third target site;

and the third identification module is used for identifying the video images of the different site units to obtain the loaded proportion information of the different site units in the third target site.

In one aspect, a safety monitoring device is provided, the device comprising:

the system comprises an interface display module, a display module and a display module, wherein the interface display module is used for displaying a safety monitoring interface, the safety monitoring interface comprises a crowd gathering display option, and the crowd gathering display option is used for providing a crowd gathering display function of at least one place;

the first acquisition module is used for responding to triggering operation of the crowd gathering display option, acquiring a first image of the at least one place, and displaying the crowd density information of the place in a distinguishing way in different marking ways through the first image;

and the first image display module is used for displaying the first image of the at least one place.

In a possible implementation manner, the first obtaining module is configured to send a first image obtaining request to a server, where the first image obtaining request carries a location identifier of the at least one location, and receive the first image sent by the server.

In one possible implementation, the safety monitoring interface further includes a mask detection option, the mask detection option is used for providing a mask wearing condition display function for personnel in at least one place;

the device also includes:

the second acquisition module is used for responding to the triggering operation of the crowd gathering display option, acquiring a second image of the at least one place, and distinguishing and displaying whether people in the place wear the mask or not in different labeling modes through the second image;

and the second image display module is used for displaying a second image of the at least one place.

In a possible implementation manner, the second obtaining module is configured to send a second image obtaining request to the server, where the second image obtaining request carries the location identifier of the at least one location, and receive the second image sent by the server.

In one possible implementation, the apparatus further includes:

and the first information display module is used for responding to the crowd gathering display instruction of the third target place and displaying queuing information of different place units in the third target place, wherein the queuing information is used for representing the gathering condition of people queued in the place units.

In one possible implementation, the apparatus further includes:

the first sending module is used for sending a queuing information obtaining request to the server, wherein the queuing information obtaining request carries the place identifier of the third target place;

and the first receiving module is used for receiving the queuing information sent by the server.

In one possible implementation, the apparatus further includes:

and the second information display module is used for responding to the crowd gathering display instruction of the third target place and displaying the carried proportion information in different place units in the third target place, wherein the carried proportion information is used for representing the proportion of the carried number of people in the place units to the total number of the carried number of people.

In one possible implementation, the apparatus further includes:

a second sending module, configured to send a loaded proportion information obtaining request to the server, where the loaded proportion information obtaining request carries the location identifier of the third target location;

and the second receiving module is used for receiving the loaded proportion information sent by the server.

In one aspect, a server is provided that includes one or more processors and one or more memories having at least one program code stored therein, the program code being loaded and executed by the one or more processors to perform the operations performed by the security monitoring method.

In one aspect, a terminal is provided that includes one or more processors and one or more memories having at least one program code stored therein, the program code being loaded and executed by the one or more processors to implement the operations performed by the security monitoring method.

In one aspect, a computer-readable storage medium having at least one program code stored therein is provided, the program code being loaded and executed by a processor to implement the operations performed by the security monitoring method.

According to the scheme, at least two first image recognition models are respectively input into a plurality of acquired first video images of a first target place, the target image recognition models are determined based on at least two pieces of output crowd density information, a plurality of second video images of the first target place are input into the target image recognition models, crowd density information of the plurality of second video images is output, and first safety monitoring information is determined according to the crowd density information of the at least one second video image. According to the method and the device, the video images acquired at the initial monitoring stage are processed according to at least two first image recognition models to obtain the crowd density information determined by the multiple models, the target image recognition models are selected based on the determined crowd density information, the models more suitable for the first target places can be selected for the subsequent monitoring process, the accuracy of the crowd density information can be improved in a self-adaptive mode, and the safety is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an implementation environment of a safety monitoring method according to an embodiment of the present application;

fig. 2 is a flowchart of a safety monitoring method provided in an embodiment of the present application;

fig. 3 is a flowchart of a safety monitoring method provided in an embodiment of the present application;

fig. 4 is a flowchart of a safety monitoring method provided in an embodiment of the present application;

fig. 5 is a schematic diagram illustrating a tag addition result of a video image according to an embodiment of the present application;

FIG. 6 is a flow chart of an algorithm for determining crowd density information provided by an embodiment of the present application;

fig. 7 is a flowchart of a safety monitoring method provided in an embodiment of the present application;

FIG. 8 is a flowchart of a mask inspection process provided in an embodiment of the present application;

fig. 9 is a flowchart of a safety monitoring method provided in an embodiment of the present application;

FIG. 10 is a schematic view of a safety monitoring interface provided by an embodiment of the present application;

fig. 11 is a flowchart of a safety monitoring method provided in an embodiment of the present application;

FIG. 12 is a schematic diagram of a mask wearing overview interface provided by an embodiment of the present application;

FIG. 13 is a schematic view of a mask fit display interface provided by an embodiment of the present application;

FIG. 14 is a schematic view of a mask wear display interface provided by an embodiment of the present application;

fig. 15 is a flowchart of a safety monitoring method according to an embodiment of the present application;

FIG. 16 is a diagram of a queuing case overview interface provided by an embodiment of the present application;

FIG. 17 is a schematic diagram of a queuing case presentation interface provided by an embodiment of the present application;

fig. 18 is a flowchart of a safety monitoring method provided in an embodiment of the present application;

fig. 19 is a schematic view of an elevator congestion situation display interface provided in an embodiment of the present application;

fig. 20 is an architecture diagram of a safety monitoring system according to an embodiment of the present application;

FIG. 21 is a flow chart of a system implementation provided by an embodiment of the present application;

fig. 22 is a schematic structural diagram of a safety monitoring device according to an embodiment of the present application;

fig. 23 is a schematic structural diagram of a safety monitoring device according to an embodiment of the present application;

fig. 24 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 25 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image Recognition, image semantic understanding, image retrieval, Optical Character Recognition (OCR), video processing, video semantic understanding, video content/behavior Recognition, Three-Dimensional object reconstruction, Three-Dimensional (3D) technology, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and further include common biometric technologies such as face Recognition and fingerprint Recognition.

Intelligent security: the passive defense of the traditional security is replaced, the intelligent early warning in advance, the timely warning in the process and the efficient tracing after the process are realized, and the current situations of passive civil defense and inefficient retrieval of the traditional video monitoring system are solved.

Loss function: the better the loss function, and generally the better the model performance, used to evaluate the degree to which the predicted and true values of the model are different.

A neural network: is a mathematical or computational model that mimics the structure and function of a biological neural network and is used to estimate or approximate functions.

ROI area: region Of Interest (Region Of Interest), in machine vision, image processing, the Region to be processed is outlined from the processed image in the form Of a box, a circle, an ellipse, an irregular polygon, or the like.

Internet Of Things (Internet Of Things, IoT) technology: the network is an information carrier of the Internet, the traditional telecommunication network and the like, and all common objects capable of performing independent functions are enabled to realize interconnection and intercommunication.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like. For example, the Tencent instant view is a set of intelligent security video monitoring solution, which can also be called instant view for short, and the Advance intelligent early warning, timely warning in the event and efficient tracing after the event are realized by butting with an Internet of things equipment platform and combining with the AI visual analysis capability. In a plurality of security and protection scenes such as intelligent buildings, intelligent medical treatment, intelligent communities and the like, the AI algorithm required by different scenes can be flexibly combined according to actual scenes such as personal safety, field security, efficient tracing and the like, so that the difficult problems faced by different scenes are practically solved, and risk early warning and intelligent operation are realized.

The scheme provided by the application relates to technologies such as image recognition of artificial intelligence cloud service, and is specifically explained by the following embodiments:

fig. 1 is a schematic diagram of an implementation environment of a security monitoring method provided in an embodiment of the present application, and referring to fig. 1, the implementation environment includes: a terminal 101 and a server 102.

The terminal 101 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal 101 and the server 102 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited thereto. The terminal 101 may receive the alarm information sent by the server 102 and prompt the security manager based on the received alarm information. The terminal 101 may be installed and run with related security monitoring software, that is, a client, and after receiving the warning information, the security manager may open the security monitoring software, and by triggering a button corresponding to the warning information on a visual interface of the terminal 101, view the security monitoring result. Optionally, the safety manager may also open the safety monitoring software at any time, and actively check the safety monitoring condition corresponding to the button by triggering the corresponding button on the visual interface of the terminal 101.

The terminal 101 may be generally referred to as one of a plurality of terminals, and the embodiment is only illustrated by the terminal 101. Those skilled in the art will appreciate that the number of terminals described above may be greater or fewer. For example, the number of the terminals may be only one, or the number of the terminals may be several tens or several hundreds, or more, and the number of the terminals and the type of the device are not limited in the embodiment of the present application.

The server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. The server 102 may obtain video images in different target locations in real time, identify the obtained video images through different image identification models, and determine corresponding safety monitoring information according to an identification result. The server 102 and the terminal 101 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited thereto. The server 102 may also detect the safety monitoring information, generate alarm information when the safety monitoring information meets an alarm condition, and send the alarm information to the terminal 101 to prompt a safety manager. Optionally, the number of the servers may be more or less, and the embodiment of the present application does not limit this. Of course, the server 102 may also include other functional servers to provide more comprehensive and diverse services.

Fig. 2 is a flowchart of a security monitoring method provided in an embodiment of the present application, referring to fig. 2, applied to a server, where the method includes:

201. at least one first video image of a first target site is acquired.

It should be noted that the first target location may be an office area, a mall, an amusement park, an area to be monitored in a residential area, and the like, which is not limited in this embodiment of the present application, and at least one camera may be disposed in the first target location and is used to collect a video image in the first target location. The captured video images in the first target site may include at least one first video image and at least one second video image, and the first video image may carry a verification tag.

The at least one first video image may be a video image obtained by analyzing video data acquired by the camera, and the video images appearing hereinafter may also be all video images obtained by analyzing video data.

202. And respectively inputting at least two first image recognition models into the at least one first video image, and outputting at least two crowd density information of the at least one first video image, wherein the at least two first image recognition models respectively recognize people in the video image on the basis of different elements of the people.

It should be noted that the first image recognition model can recognize the person in the video image based on the element of the person. For example, the first image recognition model may recognize a person in a video image based on the head of the person, the entire body of the person, and so on. Optionally, the first image recognition model may also recognize a person based on other elements, which is not limited in this embodiment of the application.

203. And determining a target image recognition model based on the at least two pieces of crowd density information, wherein the target image recognition model is a model of which the output crowd density information meets the target condition.

It should be noted that the target condition may be that the crowd density information is maximum, and optionally, the target condition may also be of another type, which is not limited in this embodiment of the present application.

204. And inputting at least one second video image of the first target place into the target image recognition model, and outputting the crowd density information of the at least one second video image, wherein the shooting time of the at least one second video image is behind that of the at least one first video image.

205. And determining first safety monitoring information according to the crowd density information of the at least one second video image, wherein the first safety monitoring information is used for indicating the crowd gathering condition of the first target place.

According to the scheme provided by the embodiment of the application, the video images acquired at the initial monitoring stage are processed according to the at least two first image recognition models to obtain the crowd density information determined by at least one model, the target image recognition model is selected based on the determined crowd density information, the model more suitable for the first target place can be selected for the subsequent monitoring process, the accuracy of the crowd density information can be improved in a self-adaptive mode, and the safety is improved.

Fig. 3 is a flowchart of a security monitoring method provided in an embodiment of the present application, referring to fig. 3, and applied to a terminal, where the method includes:

301. displaying a security monitoring interface including a crowd gathering presentation option for providing crowd gathering presentation functionality of at least one venue.

302. And responding to the triggering operation of the crowd gathering display option, acquiring a first image of the at least one place, and displaying the crowd density information of the place in a distinguishing way through different marking ways by the first image.

303. A first image of the at least one location is displayed.

The scheme provided by the embodiment of the application displays the safety monitoring interface on the visual interface of the terminal, so that safety management personnel can trigger the crowd gathering display option on the safety monitoring interface to display the first image of at least one place, and can help the safety management personnel to visually see the crowd gathering condition in each place, so that the crowd gathering condition can be timely processed, and the safety of each place is guaranteed.

Fig. 4 is a flowchart of a safety monitoring method provided in an embodiment of the present application, and referring to fig. 4, the method includes:

401. the server acquires video data of a first target place, and decodes the video data to obtain at least one video image.

In a possible implementation manner, the server may obtain video data collected by a camera disposed in the first target location, and decode the obtained video data to obtain at least one video image, that is, at least one video frame.

It should be noted that, when the server decodes the acquired video data, at least one video image obtained by analysis may be divided into a verification area and a prediction area, and the server automatically initiates processing on the video images in the verification area and the prediction area, so that an image identification model applicable to the video image in the prediction area may be found based on the video image in the verification area in the subsequent process. This process is described below: in a possible implementation manner, the above partitioning process of the check area and the prediction area is implemented by the server selectively adding, to at least one decoded video image, a check tag, where the check tag is used to indicate that the video image is a first video image, that is, a check image, and a video image to which no tag is added can be used as a second video image, that is, a predicted image. The first video image can be used for selecting a target image recognition model suitable for the current first target place, namely, a crowd density prediction algorithm, the second video image can be used for estimating the crowd density of the current first target place by using the target image recognition model, and the crowd density information determined by the target image recognition model is directly used as a prediction result without operating other models.

In one possible implementation manner, the process of adding, by the server, the check tag to the decoded at least one video image includes: and the server adds check labels to the video images of the first number obtained by analysis to obtain first video images, and then adds no labels to the video images of the second number which are positioned behind the video images of the first number in sequence, and takes the video images of the second number as second video images.

In a possible implementation manner, the adding process of the check tag may be started in a time-sharing manner, for example, the server sets a plurality of start times, so that after the system time reaches a certain start time, the processing process of adding the check tag to the video images obtained through analysis is started, and after the check tag is added to the first number of video images, the adding of the check tag is stopped, so that the server may automatically add the tag based on possible brightness change, crowd density change, and the like in the scene, and provide a data basis for subsequent model selection, so as to achieve the purpose of adaptively adjusting the identification accuracy.

In a possible manner, the above-mentioned adding process may be repeated continuously for a plurality of times, that is, the adding of the label is performed alternately, after the above-mentioned processing is performed on the first number of video images and the second number of video images, the subsequent processing is performed similarly on the first number of video images and the second number of video images, and so on, and the above-mentioned processing is repeated continuously. For example, fig. 5 is a schematic diagram of a tag addition result of a video image provided in an embodiment of the present application, referring to fig. 5, when a server decodes video data acquired from a camera 501, a check tag is alternately added to the video image obtained by decoding, so as to obtain at least one video image 502, where the 0 th to 5 th video images are added with the check tag, that is, the first video image, the 6 th to i-1 th video images are not added with the check tag, that is, the second video image, the i th to i +5 th video images are added with the check tag, that is, the first video image, and i may be any integer value greater than 7.

It should be noted that the specific values of the first number and the second number may be set according to a crowd density change period of a specific scene, if the crowd density change period is short, the value of the first number may be set to be greater than the value of the second number, and if the crowd density change period is long, the value of the first number may be set to be less than the value of the second number. The number of the first video images and the number of the second video images are set according to the crowd density change period of the specific scene, so that the processing pressure of a server can be reduced on the premise that the determined model is suitable for the current scene, the determining speed of crowd density information is increased, and the safety monitoring efficiency can be improved.

In a possible mode, the server can adjust the value of the first number (namely the number of the video images added with the check tags) every other preset time period so as to adapt to the crowd density change conditions in different time periods, and then the determined target image recognition model can be ensured to be higher in accuracy, so that the safety is improved.

It should be noted that, after receiving the video images, the server may process all the video images to implement selective addition of the verification tag, and then acquire the first video image and the second video image based on the video images to which the tags are added. Optionally, after receiving the video images, the server may further selectively add the check tags to the video images one by one, and each time a video image is processed, detect whether the video image is the first video image or the second video image according to whether the video image carries the check tag, further input the video image into the corresponding model according to the detection result, and continue processing the next video image through the same steps as the above process.

402. And the server detects the label carried by the acquired video image and executes corresponding steps according to the detection result.

It should be noted that the first video image carries the check tag, and the second video image does not carry the check tag, so that the server can detect the tag carried by the acquired video image, determine whether the acquired video image is the first video image or the second video image according to whether the video image carries the check tag, and further execute corresponding steps.

403. And if the acquired video image is detected to carry the check tag, the server determines that the acquired video image is the first video image.

404. The server respectively inputs the at least one first video image into at least two first image recognition models and outputs at least two crowd density information of the at least one first video image, and the at least two first image recognition models respectively recognize people in the video image based on different elements of the people.

It should be noted that, a plurality of first image recognition models may be stored in the server in advance for recognizing the first video image. For example, the first image recognition model may be a human head detection model, a pedestrian detection model, and a crowd density estimation model, and optionally, the first image recognition model may also be of other types, which is not limited in this embodiment of the present application.

The identification of the first video image by the first image identification model may include the following three implementation manners:

in a possible implementation manner, the server may input the at least one first video image to a human head detection model, determine a human head position in the at least one first video image through the human head detection model, and determine crowd density information in the at least one first video image according to the human head position.

When the crowd density information in the first video image is determined according to the head position, the head detection model identifies each head position in the first video image by processing the first video image, and if the crowd density information corresponding to at least two head positions meets a preset condition, the crowd density information corresponding to the at least two head positions is output; if the crowd density information corresponding to the at least two head positions does not meet the preset condition, determining the crowd density information corresponding to the at least two head positions and the next head position based on the next head position meeting the preset distance condition with the at least two head positions. The preset condition may be that the crowd density information is greater than a preset threshold, the preset threshold may be any value, optionally, the preset condition may also be other conditions, and the value of the preset threshold and the type of the preset condition are not limited in the embodiment of the present application.

It should be noted that the above detection method based on human head is suitable for the case where the human density is high and the human body partial occlusion is serious. In the implementation process, the human head detection model may use a human head or human face detection algorithm such as a single-stage target detection algorithm (RetinaNet), a high-precision real-time human face detector (FaceBoxes), and optionally, the human head detection model may also use other algorithms, which is not limited in the embodiment of the present application.

In another possible implementation manner, the server may input the at least one first video image into a pedestrian detection model, determine a pedestrian position in the at least one first video image through the pedestrian detection model, and determine crowd density information in the at least one first video image according to the pedestrian position.

When the crowd density information in the first video image is determined according to the positions of the pedestrians, the pedestrian detection model identifies the positions of the pedestrians in the first video image by processing the first video image, and if the crowd density information corresponding to at least two positions of the pedestrians meets a preset condition, the crowd density information corresponding to the at least two positions of the pedestrians is output; if the crowd density information corresponding to the at least two pedestrian positions does not meet the preset condition, determining the crowd density information corresponding to the at least two pedestrian positions and the next pedestrian position based on the next pedestrian position meeting the preset distance condition with the at least two pedestrian positions. The preset condition may be that the crowd density information is greater than a preset threshold, the preset threshold may be any value, optionally, the preset condition may also be other conditions, and the value of the preset threshold and the type of the preset condition are not limited in the embodiment of the present application.

It should be noted that the detection method based on the pedestrian is suitable for scenes with sparse people and without serious body occlusion. Due to the fact that the characteristics of the whole human body are utilized, under the scene with sparse people, the detection result obtained by the pedestrian-based detection method is more accurate than the detection result obtained by the human head-based detection method, but in the scene with dense people, due to the fact that the human body is seriously shielded, the detection missing condition possibly occurs in the pedestrian-based detection method, and the accuracy of the detection result is lower. In the implementation process, the pedestrian detection model may use Fast Region-based Convolutional Neural Networks (Fast R-CNN), Faster Region-based Convolutional Neural Networks (Fast R-CNN), Mask-based Convolutional Neural Networks (Mask R-CNN), real-time object detection (young only loonce, YOLO), retinet, and other object detection algorithms, and optionally, the pedestrian detection model may also use other algorithms, which is not limited in the embodiment of the present application.

In another possible implementation manner, the server may input the at least one first video image into a crowd density estimation model, determine a crowd density map of the at least one first video image through the crowd density estimation model, and determine crowd density information in the at least one first video image according to the crowd density map.

It should be noted that, in the above detection method based on the crowd density map, the crowd density information is determined by learning the mapping relationship between the local features of the image and the corresponding density map, and spatial information of the image is added in the processing process, so that the accuracy of the detection result is higher. In the implementation process, the crowd density estimation model may use algorithms such as a Multi-Column Convolutional Neural Network (MCNN) and a decision Network (desidenet), and optionally, the crowd density estimation model may also use other algorithms, which is not limited in this embodiment of the present application.

For example, the server may determine the crowd density information through any two of the three manners, and may also determine the crowd density information through the three manners, which is not limited in this embodiment of the application. The video images acquired at the initial monitoring stage are processed according to the at least two first image recognition models to obtain the crowd density information determined by at least one model, and the target image recognition models are selected based on the determined crowd density information, so that the models more suitable for the first target places can be selected for the subsequent monitoring process, the accuracy of the crowd density information can be improved in a self-adaptive mode, and the safety is improved.

It should be noted that the server may process the at least one first video image one by one, and each time one first video image is processed, the server may detect the currently processed first video image, and if it is detected that the first video image is the last first video image, the server may directly perform the following step 405; if it is detected that the first video image is not the last first video image, the server may continue to process the next first video image until the last first video image is processed, and then perform step 405 described below.

405. The server determines a target image recognition model based on the at least two pieces of crowd density information, wherein the target image recognition model is a model of which the output crowd density information meets a target condition.

It should be noted that, if at least two kinds of crowd density information of only one first video image are determined in step 404, the server may determine, based on the at least two kinds of crowd density information of the first video image, the first image recognition model whose output crowd density information satisfies a target condition as the target image recognition model, where the target condition may be that the crowd density information output by the target image recognition model is greater than the crowd density information output by other first image recognition models, and optionally, the target condition may also be other conditions, which is not limited in this embodiment of the present application. If at least two kinds of crowd density information of at least one first video image are determined in step 404, the server may count at least two kinds of crowd density information of each first video image, calculate an average value of at least one crowd density determined by each first image recognition model according to the first image recognition model corresponding to each crowd density information, as average crowd density information, determine the first image recognition model whose average crowd density information satisfies a target condition as a target image recognition model, where the target condition may be that the average crowd density information of the target image recognition model is greater than the average crowd density information of other first image recognition models, and optionally, the target condition may also be other conditions, which is not limited in this embodiment of the present application.

The first image recognition model with the output crowd density information meeting the target condition is selected as the target image recognition model, the crowd density hypothesis strategy belongs to, the crowd density hypothesis strategy can reduce the missed detection situation, the accuracy of the determined crowd density information is improved, and the safety can be improved. In addition, the average crowd density information of each first video image determined by each first image recognition model is counted, and then the target image recognition model can be determined based on the average crowd density information of each first image recognition model, so that the accuracy of the determined target image recognition model is higher, and the safety monitoring efficiency is improved.

406. If the acquired video image is detected not to carry the check tag, the server determines that the acquired video image is the second video image, and the shooting time of the at least one second video image is behind the shooting time of the at least one first video image.

It should be noted that the at least one first video image and the at least one second video image correspond to two different time periods of one monitoring time period, and the shooting time of the at least one second video image is after the shooting time of the at least one first video image.

407. The server inputs at least one second video image of the first target place into the target image recognition model and outputs crowd density information of the at least one second video image.

The target image recognition model may be any one of a human head detection model, a pedestrian detection model, and a crowd density estimation model, and optionally, the target image recognition model may also be a first image recognition model of another type, which is not limited in this embodiment of the present application. The target image recognition model is more suitable for the scene corresponding to the second video image, so that the crowd density information determined by the target image recognition model is more accurate, and the safety is improved.

It should be noted that, referring to fig. 6 in steps 401 to 407, fig. 6 is a flowchart of an algorithm for determining crowd density information provided in an embodiment of the present application, a server may parse video data acquired in step 601 to obtain at least one video image, determine whether a current video image is a first video image in step 602, if the current video image is the first video image, execute step 603, predict crowd density information through multiple first image recognition models, count a prediction result of each first image recognition model in step 604, further determine whether the current video image is a last first video image in step 605, and if the current video image is the last first video image, determine a target image recognition model based on the crowd density information of the current video image directly, and if the current video image is not the last first video image, continuing to process the next first video image until the last first video image is processed, determining a target image recognition model according to the average crowd density information, so that through the steps 608 to 609, when the current video image is not the first video image, operating the target image recognition model selected based on the first video image to determine the crowd density information of the second video image, and obtaining the crowd density information.

408. The server determines first safety monitoring information according to the crowd density information of the at least one second video image, wherein the first safety monitoring information is used for indicating the crowd gathering condition of the first target place.

It should be noted that, after the first safety monitoring information is determined, the server may further detect the first safety monitoring information, and if it is detected that the first safety monitoring information meets a first alarm condition, send out first alarm information, where the first alarm information is used to indicate that the first target location is gathered by people. Specifically, the server may generate first warning information indicating that the first target location is gathered by people when the first safety monitoring information meets a first warning condition, send the first warning information to the terminal, and prompt the first warning information through the terminal.

The first alarm condition may be that the crowd density information is greater than a preset threshold, and the preset threshold may be any value, which is not limited in the embodiment of the present application. In some possible implementations, different levels of first alarm conditions may also be set, for example, people with crowd density information greater than 10 may be set as an alarm imminent condition, people with crowd density information greater than 20 may be set as an alarm comparatively crowded condition, and people with crowd density information greater than 30 may be set as an alarm very crowded condition. Optionally, the first alarm condition may also be other types of conditions, which is not limited in this embodiment of the application. The first warning information can also comprise position information of a camera for collecting the second video image and collecting time information of the second video image, so that safety management personnel can directly determine the place and time of the personnel gathering situation according to the first warning information, and further can ensure that the safety management personnel can rapidly evacuate the gathering personnel, the possibility of virus propagation is reduced, and the safety of the first target place is improved.

Because the scene of crowd aggregation is complex, including multiple or simple or complex scenes such as indoor, outdoor, daytime, night, S-type queues, Z-type queues and the like, the same first image recognition model is difficult to be applied to all the scenes of crowd aggregation, and the crowd density change in different time periods under the same camera may be large (such as dining room queuing scenes, high crowd density in dining time periods, and low crowd density in other time periods), or influenced by other environmental factors such as illumination, shielding and the like, the first image recognition models applicable to different time periods of the same camera may be different, therefore, the scheme provided by the embodiment of the application processes the video image obtained in the initial monitoring period according to at least two first image recognition models to obtain the crowd density information determined by at least one model, and selects the target image recognition model based on the determined crowd density information, the model more suitable for the first target place can be selected for the follow-up monitoring process, so that the accuracy of the crowd density information can be improved in a self-adaptive mode, and the safety is improved. In addition, the scheme that this application embodiment provided can confirm first safety monitoring information according to the crowd density information of at least one second video image, and then reminds the crowd gathering condition in the security administrator second video image according to first safety monitoring information, need not security administrator and judge by oneself according to second video picture, realizes that the management process is changed into initiative by passive, improves safety monitoring efficiency, can practice thrift manpower resources moreover, lets limited manpower resources exert bigger function.

Fig. 7 is a flowchart of a safety monitoring method provided in an embodiment of the present application, and referring to fig. 7, the method includes:

701. the server obtains a video image of the second target location.

In a possible implementation manner, the server may obtain video data collected by a camera disposed in the second target location, and decode the video data to obtain a video image of the second target location.

702. And if the second target place is the target type place, the server determines a face area in the video image through a face detection model in the second image recognition model.

The target type place may be a place where the mask must be worn, and optionally, the target type place may also be another type of place, which is not limited in the embodiment of the present application. The video image may carry location type information of a second target location, and the server may detect the location type information of the second target location to determine whether the second target location is a target type location.

In a possible implementation manner, if the second target location is a target type location, the server may input the video image into a face detection model in the second image recognition model, extract a feature map of the video image through a convolution layer of the face detection model, sample the feature map of the video image through a sampling layer of the face detection model to obtain a convolution feature of the feature map of the video image, and determine a face region in the video image according to the convolution feature through a full connection layer of the face detection model.

The face detection model may be FaceBoxes, RetinaNet, a multitask Convolutional Neural Network (MTCNN), and the like, and optionally, the face detection model may also be a Neural Network model of another type, which is not limited in this embodiment of the present application. Because training this to this initial when acquireing the face detection model, be difficult to acquire a large amount of sample images of wearing the gauze mask in the short time, lead to wearing the sample image of gauze mask less, and the sample image of not wearing the gauze mask is more, therefore can set up different weights to different types of sample image, and carry out dynamic adjustment to the weight based on the condition of sample image, make the weight of the sample image of wearing the gauze mask great, and the weight of the sample image of not wearing the gauze mask is less, make initial model can focus on the sample image of wearing the gauze mask fast, improve the model training effect, and then can improve the discernment accuracy of face detection model.

It should be noted that fig. 8 is a technical flowchart of a mask detection process provided in an embodiment of the present application, and referring to fig. 8, a pyramid model 801 may be adopted for the face detection model, so that feature maps of different granularities may be extracted, and further, effective detection and identification of faces of different sizes may be realized through step 802 based on convolution features of different granularities, so as to effectively solve the problem of large scale change of faces in a picture at a monitoring camera angle, improve accuracy of face identification, and further improve security.

703. And the server intercepts the face area in the video image to obtain at least one face image.

It should be noted that, referring to fig. 8, the server may cut the face regions 803 to 807 from the video image according to the detected face region. Through intercepting out the face area, whether wear the gauze mask and discern again, can improve the accuracy of discernment result, and then can improve safety monitoring's accuracy.

704. The server identifies the at least one face image through the classification model in the second image identification model, and determines whether a person corresponding to the face image wears the mask.

In a possible implementation manner, the server may input the at least one face image into the classification model in the second image recognition model one by one, extract the feature map of the face image through the convolution layer of the classification model, sample the feature map of the face image through the sampling layer of the classification model to obtain the convolution feature of the feature map of the face image, and determine, through the full connection layer of the classification model, whether the corresponding person in the face image wears the mask according to the convolution feature. Referring to fig. 8, the server may input the intercepted face regions 803 to 807 into a classification model 808 one by one, so as to recognize whether a mask is worn.

It should be noted that the classification model may be a Residual Network (ResNet), a multi-size convolution (inclusion) Network, or a deep separable convolution (Xception) Network, and optionally, the classification model may also be another type of neural Network model, which is not limited in this embodiment of the present application.

705. And the server determines the person not wearing the mask in the video image and marks the person not wearing the mask in the video image.

In a possible implementation manner, the server may determine, according to the flag indicating whether the corresponding person in the face image wears the mask, the person who does not wear the mask in the video image, and then mark the person who does not wear the mask in the video image. For example, the person who does not wear the mask in the video image may be marked with a square frame, as shown in fig. 8, where 809 is a result of marking the person who does not wear the mask in a square frame manner. Optionally, other ways may also be adopted to label the person who does not wear the mask in the video image, which is not limited in the embodiment of the present application.

706. The server determines second safety monitoring information according to the number of people without wearing the mask on the video image, wherein the second safety monitoring information is used for indicating the number of people without wearing the mask on the second target field.

It should be noted that, after the second safety monitoring information is determined, the server may further detect the second safety monitoring information, and if it is detected that the second safety monitoring information meets a second alarm condition, send out second alarm information, where the second alarm information is used to indicate that someone in the second target location does not wear the mask. Specifically, the server may generate second alarm information indicating that people do not wear the mask in the second target location when the second safety monitoring information satisfies a second alarm condition, send the second alarm information to the terminal, and prompt the second alarm information through the terminal.

The second alarm condition may be a face region labeled in the video image, and optionally, the second alarm condition may also be other types of conditions, which is not limited in this embodiment of the application. The second warning message can also comprise position information of a camera for collecting the video image, time information of the video image and face images of people who do not wear the mask, so that safety management personnel can directly determine the time of people who do not wear the mask and the places of people who do not wear the mask according to the second warning message, and further can ensure that safety management personnel can manage people who do not wear the mask in time, the possibility of virus propagation is reduced, and the safety of a second target place is improved.

The scheme that this application embodiment provided, the gauze mask through detecting among the video image wears the condition, can be according to the people's that this video image did not wear the gauze mask quantity, confirm second safety monitoring information, and then indicate the condition that someone did not wear the gauze mask among the safety control personnel video image according to second safety monitoring information, it judges by oneself according to video picture to need not safety control personnel, realize that the administrative process is become initiative by passive, improve safety monitoring efficiency, and can practice thrift manpower resources, let limited manpower resources exert bigger function. For other people in the second target place, the mask detection can reduce virus and droplet infection caused by not wearing the mask, effectively enhance isolation among individuals and guarantee safety of people.

Fig. 9 is a flowchart of a safety monitoring method provided in an embodiment of the present application, and referring to fig. 9, the method includes:

901. the terminal displays a safety monitoring interface, the safety monitoring interface including a crowd gathering presentation option, the crowd gathering presentation option for providing a crowd gathering presentation function of at least one location.

In a possible implementation manner, a security manager may trigger a client in a visual interface of a terminal, and the terminal may respond to the trigger operation to open the client and display a security monitoring interface of the client, where fig. 10 is a schematic diagram of the security monitoring interface provided in an embodiment of the present application, and fig. 10 is a schematic diagram of the security monitoring interface, where two functional accesses, a crowd gathering display option and a mask detection option, are provided in the security monitoring interface.

902. The terminal responds to the triggering operation of the crowd gathering display option, acquires a first image of the at least one place, and the first image displays the crowd density information of the place in a distinguishing mode in different marking modes.

In one possible implementation manner, the security manager may trigger the crowd gathering display option on the security monitoring interface to trigger the crowd gathering display instruction, and the terminal may obtain the first image of the at least one place in response to the crowd gathering display instruction.

It should be noted that the acquiring may be to send a first image acquiring request carrying a location identifier to a server, and the server responds to the first image acquiring request to acquire at least one video image of a location indicated by the location identifier in real time, so as to determine crowd density information in the image through the steps of the above steps 201 to 204, and the server may label the video image based on the crowd density information to generate a first image, and feed the first image back to the terminal for displaying. Of course, the server may also send the crowd density information and the video image to the terminal, and the terminal labels the video image based on the crowd density information to generate the first image.

The display areas corresponding to different crowd density information in the first image can be distinguished and displayed by using the label frames with different colors, and the display areas corresponding to different crowd density information in the first image can also be distinguished and displayed by using the overlay layers with different colors. For example, when the crowd density information is greater than or equal to the preset threshold, a red color block may be used to perform overlay display on the corresponding region in the first image, and when the crowd density information is less than the preset threshold, a green color block may be used to perform overlay display on the corresponding region in the first image, or when the crowd density information is less than the preset threshold, a green color block may be used to overlay the corresponding head position in the first image.

903. The terminal displays a first image of the at least one location.

It should be noted that the above process may be implemented by the security manager performing an operation on the visual interface of the terminal after receiving the first warning information, and may also be implemented by the security manager performing an operation on the visual interface of the terminal according to an actual situation, which is not limited in this embodiment of the application.

Fig. 11 is a flowchart of a safety monitoring method provided in an embodiment of the present application, and referring to fig. 11, the method includes:

1101. the terminal displays a safety monitoring interface, the safety monitoring interface comprises a mask detection option, and the mask detection option is used for providing a display function of the wearing condition of the mask of at least one place.

It should be noted that step 1101 is the same as step 901, and is not described again here.

1102. The terminal responds to the trigger operation of the mask detection option, a second image of the at least one place is obtained, and the second image displays whether a person wears the mask in the place or not in a distinguishing mode in different marking modes.

In one possible implementation, the security manager may trigger the mask detection option on the security monitoring interface to trigger the mask detection instruction, and the terminal may obtain the second image of the at least one location in response to the mask detection instruction.

It should be noted that the acquiring may be to send a second image acquiring request carrying a location identifier to the server, and the server responds to the second image acquiring request to acquire at least one video image of the location indicated by the location identifier in real time, so as to determine a person who does not wear the mask in the image through the steps of steps 701 to 705 described above, and the server may mark the person who does not wear the mask in the image to generate a second image, and feed the second image back to the terminal for display. Of course, the server can also send the face image and the video image of the non-worn mask to the terminal, and the terminal labels the video image based on the face image of the non-worn mask to generate a second image.

And the person who does not wear the mask in the second image can be distinguished and displayed by adopting the mark frame. For example, in the second image, a person who does not wear a mask may be marked with a red box, and a person who wears a mask may be marked with a green box. Optionally, in the second image, only the red frame may be used to label the person without wearing the mask, and the person with the mask is not labeled, which is not limited in the embodiment of the present application.

In the step 1102, the terminal directly obtains the second images in different locations to perform visual display in response to the trigger operation on the mask detection option, for example, a thumbnail display area of the second image in at least one location is displayed in one display interface, and the second image in the location is displayed after the trigger operation of the user on any thumbnail display area is detected. And in another possible implementation mode, the terminal can respond to the triggering operation of the mask detection option and display a mask wearing condition overview interface, the mask wearing overview display interface comprises warning grade information in each floor, and the warning grade information can be determined according to the mask wearing condition in each floor. Optionally, the number of people who do not wear the gauze mask in this floor can also be shown in this gauze mask wears overview show interface to the gauze mask condition of wearing of every floor is seen more directly perceivedly to safety control personnel, improves safety monitoring efficiency. For example, referring to the safety monitoring interface shown in fig. 10, a safety manager may trigger a mask detection instruction by triggering a mask detection option 1002 in fig. 10, and a terminal may jump to a mask wearing condition overview interface shown in fig. 12 in response to the mask detection instruction, referring to fig. 12, where fig. 12 is a schematic diagram of a mask wearing condition overview interface provided in an embodiment of the present application, the diagram visually shows mask wearing conditions of each floor, and mask wearing condition buttons 1201 to 1206 corresponding to each floor in the mask wearing condition overview interface may be triggered by the safety manager to jump to the mask wearing condition display interface of the corresponding floor. For example, in fig. 12, there is an alarm condition that no mask is worn on floors 28 and 36, and for the alarm condition that no mask is worn on floor 28, the security manager may trigger a mask wearing condition button 1201 corresponding to floor 28, and the terminal may jump to the mask wearing condition display interface shown in fig. 12 in response to the trigger operation, see fig. 13, where fig. 13 is a schematic diagram of the mask wearing condition display interface provided in the embodiment of the present application. Safety control personnel can also trigger other gauze mask that do not report an emergency and ask for help or increased vigilance the floor and correspond and wear the condition button, look over the gauze mask that corresponds the floor and wear the condition. For example, a security manager may trigger the mask wearing condition button 1302 corresponding to the 30 th building, and the terminal may jump to the mask wearing condition display interface shown in fig. 14 in response to the trigger operation, see fig. 14, where fig. 14 is a schematic diagram of a mask wearing condition display interface provided in an embodiment of the present application.

Wherein, this warning grade information can set up according to the number of the people who does not wear the gauze mask that the server discerned through the second image recognition model, for example, if there is the people who does not wear the gauze mask in certain floor, then the server can set up the warning grade information of this floor to the warning grade, if there is not the people who does not wear the gauze mask in certain floor, then the server can set up the warning grade information of this floor to normal grade, and then can send this warning grade information to the terminal, show it through the visual interface at terminal. Through the intuitive display mode, a user can intuitively know the current conditions of each place, and the method is helpful for rapidly finding out the unconventional places, so that the aim of improving the monitoring accuracy and safety is fulfilled.

1103. The terminal displays a second image of the at least one location.

In a possible implementation manner, the terminal can display the acquired second image in the mask wearing condition display interface, and the mask wearing condition display interface can include mask wearing condition information of a corresponding floor and a monitoring picture of the floor. For example, the terminal may display the acquired monitoring picture 1301 of the 28 th building in the mask wearing condition display interface shown in fig. 13 to directly display the condition of the person who does not wear the mask in the 28 th building, and in the monitoring picture 1301, the person who does not wear the mask has a square frame marked, so that the person can conveniently check the frame by security management personnel, and the security monitoring efficiency is improved. For the second image of the person without wearing the mask, the terminal may directly display the video monitoring picture 1401 of the 30 th floor on the mask wearing condition display interface in the mask wearing condition display interface as shown in fig. 14.

It should be noted that the above process may be implemented by the security manager performing an operation on the visual interface of the terminal after receiving the second warning information, and may also be implemented by the security manager performing an operation on the visual interface of the terminal according to an actual situation, which is not limited in this embodiment of the application.

The scheme that this application embodiment provided shows the safety monitoring interface through visual interface at the terminal to safety control personnel can trigger the gauze mask on this safety monitoring interface and detect the option, come to show the second image in at least one place, can help safety control personnel to see the gauze mask condition of wearing in each place directly perceivedly, so that in time handle the condition that someone did not wear the gauze mask, ensure the safety in each place.

Fig. 15 is a flowchart of a safety monitoring method provided in an embodiment of the present application, and referring to fig. 15, the method includes:

1501. the terminal displays a safety monitoring interface, the safety monitoring interface including a crowd gathering presentation option, the crowd gathering presentation option for providing a crowd gathering presentation function of at least one location.

In a possible implementation manner, a security manager may trigger the client in a visual interface of the terminal, the terminal may respond to the trigger operation, open the client, and display a security monitoring interface of the client, and the security manager may trigger the crowd gathering display option in the security monitoring interface to trigger the crowd gathering display instruction of the third target location. For example, referring to fig. 10, fig. 10 is a schematic view of a safety monitoring interface provided in an embodiment of the present application, in which two functional portals, a crowd gathering display option 1001 and a mask detection option 1002, are provided, and a security manager may trigger the crowd gathering display instruction for a third target location by triggering the crowd gathering display option 1001.

1502. And the terminal responds to the crowd gathering display instruction of the third target place and displays queuing information of different place units in the third target place, wherein the queuing information is used for representing the gathering condition of people queued in the place units.

In a possible implementation manner, the terminal may send a queuing information obtaining request to the server in response to the crowd gathering instruction for the third target location, where the queuing information obtaining request carries a location identifier of the third target location, and the server may obtain a video image of the third target location and identify the obtained video image to obtain queuing information of different location units in the third target location, and send the queuing information to the terminal, so that the terminal displays the queuing information. In another possible implementation manner, the server may acquire a video image of a third target location in real time, identify the acquired video image in real time, and cache the queuing information of the different location units obtained by the identification, after the terminal sends a queuing information acquisition request to the server in response to the crowd gathering instruction of the third target location, the server may acquire the cached queuing information in response to the received queuing information acquisition request and send the queuing information to the terminal, so that the terminal displays the queuing information.

In one possible implementation, the third destination location is an elevator car, and the different location units in the third destination location are elevators, and accordingly, the queuing information of the different location units is the number of people in queue corresponding to the different elevators. For the elevators, each elevator may correspond to the same stop floor or a different stop floor, which is not limited in the embodiment of the present application, and taking an example that each elevator can only stop at a certain floor, the terminal may respond to the crowd gathering display instruction to the third target location, and jump to the queuing condition overview interface, where the queuing condition overview interface may include the flow level to which the queuing condition of the elevator corresponding to each floor belongs, and the flow level of the elevator corresponding to each floor is determined by the server according to the number of people in the queue of the elevator and the number of people interval corresponding to different flow levels. For example, the situation that less than 10 people queue is set as normal traffic, the situation that more than 10 people queue less than 20 people queue is set as imminent congestion, the situation that more than 20 people queue less than 30 people queue is set as comparatively congestion, the situation that more than 30 people queue is set as very congestion, and the traffic levels corresponding to the floors are displayed on the queue situation overview interface according to the actual situation of the floors. For example, referring to fig. 16, fig. 16 is a schematic diagram of a queuing situation overview interface provided in an embodiment of the present application, which can show the traffic levels of the queuing situations of the elevators corresponding to each floor, in fig. 16, the levels of the elevators corresponding to floors 42, 36, 33 and 30 in which fewer than 10 people are queued are shown as normal traffic, and the levels of the elevators corresponding to floors 39 in which more than 10 people are queued and fewer than 20 people are shown as imminent congestion.

The queuing condition overview interface can refer to fig. 16, and fig. 16 is a schematic diagram of the queuing condition overview interface provided in the embodiment of the present application, and the diagram visually shows the queuing conditions of the elevators corresponding to each floor, and the number-of-queuing buttons 1601 to 1605 of the elevators corresponding to each floor in the queuing condition overview interface can be triggered by the security manager to jump to the queuing condition display interface of the elevator corresponding to the floor. For example, the security manager may trigger the crowd density information button 1602 corresponding to the elevator of floor 39, and the terminal may jump to the queuing case display interface shown in fig. 17 in response to the triggering operation, see fig. 17, where fig. 17 is a schematic diagram of a queuing case display interface provided in this embodiment of the present application. The terminal can display the acquired first image in a queuing condition display interface. For example, the terminal may display the acquired first image 1701 of the elevator corresponding to the floor 39 in a queuing situation display interface as shown in fig. 17, so that the security manager can view the current queuing situation of the elevator corresponding to the floor 39.

According to the scheme provided by the embodiment of the application, the safety monitoring interface is displayed on the visual interface of the terminal, so that safety management personnel can trigger the crowd gathering display option on the safety monitoring interface to display the queuing information of each place unit in the third target place, the queuing condition of each place unit can be seen visually by the safety management personnel, personnel evacuation can be carried out on the place units with more queuing number in time, and the safety of each place is guaranteed.

Fig. 18 is a flowchart of a safety monitoring method provided in an embodiment of the present application, and referring to fig. 18, the method includes:

1801. the terminal displays a safety monitoring interface, the safety monitoring interface including a crowd gathering presentation option, the crowd gathering presentation option for providing a crowd gathering presentation function of at least one location.

It should be noted that step 1801 is similar to step 1501, and is not described herein again.

1802. And the terminal responds to the crowd gathering and displaying instruction of the third target place and displays the carried proportion information in different place units in the third target place, wherein the carried proportion information is used for representing the proportion of the carried number of the place unit to the total number of the bearable people.

In a possible implementation manner, the terminal may send a loaded proportion information acquisition request to the server in response to a crowd gathering instruction for a third target location, where the loaded proportion information acquisition request carries a location identifier of the third target location, and the server receives the loaded proportion information acquisition request, may acquire video images of different location units, and identify the acquired video images to obtain loaded proportion information of the different location units in the third target location, and send the loaded proportion information to the terminal, so that the terminal displays the loaded proportion information. In another possible implementation manner, the server may collect video images of different site units in the third target site in real time, identify the collected video images in real time, cache the borne proportion information of the different site units obtained by identification, and after the terminal sends a borne proportion information acquisition request to the server in response to the crowd gathering instruction for the third target site, the server may acquire the cached borne proportion information in response to the received borne proportion information acquisition request, and send the borne proportion information to the terminal, so that the terminal displays the borne proportion information.

In one possible implementation, the third destination location is an elevator car, and the different location units in the third destination location are elevators, and accordingly, the loaded proportion information of the different location units is the proportion of the number of people loaded in the different elevators to the total number of people that can be loaded, for this implementation, the terminal may, in response to the crowd gathering presentation instruction to the third destination location, jump to the elevator crowd condition presentation interface, the elevator crowding condition display interface can comprise the proportion of the number of people carried by each elevator to the total number of people carried by the elevator, the number of people carried by each elevator can be obtained by identifying the video image in the elevator by the server, the server can automatically obtain the total number of people carried by the elevator, and then the loaded proportion information of the elevator is determined and sent to the terminal, and the terminal can display the received loaded proportion information in an elevator congestion condition display interface.

The elevator congestion situation display interface is shown in fig. 19, and fig. 19 is a schematic diagram of the elevator congestion situation display interface provided in the embodiment of the present application, and the elevator congestion situation display interface can display the congestion situations of the elevator 1901, the elevator 1902, the elevator 1903, the elevator 1904, the elevator 1905, and the elevator 1906, that is, the number of people carried in each elevator is the proportion of the total number of people that can be carried by the elevator, and a safety manager can evacuate people in the elevator with a higher percentage of the displayed people according to the situation displayed by the interface, thereby reducing the probability of virus propagation and improving the safety of the elevator.

The scheme provided by the embodiment of the application displays the safety monitoring interface on the visual interface of the terminal, so that safety management personnel can trigger the crowd gathering display option on the safety monitoring interface to display the carried proportion information of each place unit in the third target place, and can help the safety management personnel to visually see the personnel carrying condition in each place unit, so that personnel evacuation can be timely carried out on the place units with more carrying people, and the safety of each place is guaranteed.

It should be noted that the functions of the interfaces in the embodiments corresponding to fig. 9 to fig. 19 may be implemented by being combined into an application program, so as to provide a security monitoring application program with comprehensive functions for security managers, so that the security managers can monitor the situations in various places more effectively, and the efficiency of security monitoring is improved.

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

In some embodiments, the server performs face recognition on the video image of the place to obtain the personnel identity information, and correspondingly stores the personnel identity information, the place and the corresponding video image, so that the place where the personnel arrive can be visually recorded, and once the requirement for tracking the action path of the personnel appears, the place where the personnel arrive can be inquired based on the personnel identity information, and further the personnel identity information of other personnel in the same place with the personnel can be inquired. In order to improve the tracking accuracy, the server can also call the corresponding video image so as to determine the person contacted by the person corresponding to the personal identity information.

All the optional technical schemes can be realized by one safety monitoring system, the design concept of the safety monitoring system can be roughly divided into three parts of front-end access design, alarm design and algorithm module design, and the specific design of each part is as follows:

front-end access design: the client running on the terminal supports a plurality of access modes such as instant messaging software account access and two-dimensional code scanning mode login hypertext Markup Language 5(Hyper Text Markup Language5, H5) page. Functions that different terminals can implement may be different, for example, security management personnel may view real-time preview and history review videos of each camera through a Personal Computer (PC) client, intelligently deploy defense (including AI algorithms such as forbidden zone monitoring, crowd gathering, loitering detection, fire detection, fall detection, face recognition, mask detection, etc., the PC client dynamically displays an intelligent algorithm and a number of support roads through a service registered by a Graphics Processing Unit (GPU) algorithm service), intelligently analyze (functions such as person trajectory backtracking, video data compression, etc.), a management center (camera equipment management, role management, and system management), and an alarm center (functions of Processing various intelligent alarm information, etc.); the safety management personnel can also check real-time preview and historical playback of each camera through customized requirements of the mobile phone client, and check alarm information (the text comprises thermodynamic diagrams of crowd gathering alarm, mask detection alarm and the like detected by an algorithm).

Alarm design: and accessing the camera equipment to an IoT access platform layer through an IoT access platform server, and registering the accessed camera as the accessible equipment. And similarly, accessing the AI algorithm GPU server into the platform and registering the AI algorithm GPU server as a GPU algorithm service. And the client running in the terminal acquires the information of the access equipment and the GPU algorithm server of the IoT access platform server, and binds the camera equipment and the GPU server in a mode of creating a task AI algorithm task. And then the IoT access platform server pushes the data of the camera to a corresponding GPU server, the GPU algorithm server receiving the video data of the camera returns the detection result to the IoT access platform server through calculation, and the IoT access platform server generates an alarm through different detection strategies and pushes and displays the alarm data to a client running in the terminal. Likewise, the IoT access platform server may set the client phone/wechat as an alert push administrator, pushing alerts in the form of phone/sms/wechat messages.

Designing an algorithm module: the algorithm server can provide various GPU algorithm services, including forbidden zone monitoring, crowd gathering, loitering detection, fire detection, falling detection, face recognition, mask detection and the like. The GPU algorithm server provides algorithm installation package file installation and starts algorithm service, and needed or customized algorithm service is dynamically modified through configuration files. The system provides Agent (Agent) service, the multiple GPU algorithm servers are communicated with the Agent servers, and the Agent servers provide data transmission and forwarding of the IoT access platform server and the GPU algorithm servers. The data transmission and forwarding process may be: (1) the Agent server receives an algorithm service request of an IoT access platform server; (2) the Agent server inquires whether all the managed GPU servers can provide the algorithm service corresponding to the algorithm service request or not, and returns an inquiry result; (3) the Agent server returns a query result, if the query result can provide the algorithm service, the step (4) is carried out, and if the query result cannot provide the algorithm service, the step (5) is carried out; (4) the Agent server pulls video data from an IoT access platform server and forwards the video data to a corresponding GPU server so as to provide GPU algorithm service and keep a service state; (5) and the Agent server informs the IoT access platform that the server cannot provide GPU algorithm service, and the request is ended.

In a specific implementation process, the design of the three parts of the safety monitoring system shown in fig. 20 can be used, referring to fig. 20, where fig. 20 is an architecture diagram of a safety monitoring system provided in an embodiment of the present application, and the safety monitoring system includes: a user presentation layer 2001, a functional layer 2002, an AI algorithm layer 2003, and an IoT access platform layer 2004.

The user display layer 2001 is responsible for providing a client function for a terminal user, displaying a security monitoring interface including a crowd sourcing display option and a mask detection option, acquiring a first image of at least one place in response to a triggering operation of the user on the crowd sourcing display option, displaying the first image, acquiring a second image of the at least one place in response to a triggering operation of the user on the mask detection option, and displaying the second image. The user presentation layer 2001 may also provide a management function of the terminal device, and may also create an intelligent detection task, etc. in response to a task creation request of a management user.

The functional layer 2002 includes a functional application layer, a background engineering layer, and an algorithm analysis layer. The function application layer provides a plurality of functions for the user presentation layer 2001, which may specifically include a plurality of functions such as real-time preview, history playback, intelligent defense, intelligent analysis, device management, and alarm management. The background engineering layer is responsible for registering the devices accessed to the IoT access platform layer 2004 as local services, and binding the registered devices with the AI algorithms in the algorithm analysis layer by creating tasks, so that the AI algorithms can monitor the device data streams in real time, and the algorithm analysis layer comprises a plurality of AI algorithms for calling corresponding algorithm modules in the AI algorithm layer to process the device data streams.

The AI algorithm layer 2003 includes an intelligent algorithm service management module, a restricted area monitoring module, a fire monitoring module, a crowd gathering module, a loitering detection module, a fall detection module, and a mask detection module, and optionally, the AI algorithm layer 2003 may further include other modules, which are not limited in this embodiment of the present application. The intelligent algorithm service management is responsible for accessing the AI algorithm service to the IoT access platform layer 2004, the IoT access platform layer 2004 uses the algorithm service and guides the data stream of the corresponding device to the corresponding algorithm module, such as a restricted area monitoring module, a fire monitoring module, a crowd gathering module, a loitering detection module, a falling detection module, a mask detection module, and the like, and the algorithm module returns the result to the IoT access platform layer 2004 after processing.

The IoT access platform layer 2004 includes a device access layer, a device management layer, a notification channel layer, and a plurality of smart devices. The device access layer is responsible for accessing and managing a plurality of intelligent devices such as IoT devices and sensor devices into the platform, the device management layer is responsible for transmitting data of the IoT devices/sensors to the upper layer service, meanwhile, the upper layer returns the result after processing the device data to the IoT access platform layer 2004, and the channel layer is informed of being responsible for selectively informing the safety management personnel of the result in the modes of short message/instant communication and the like according to the result.

It should be noted that the AI algorithm layer 2003 provides only one AI resource pool, and the client running on the terminal can modify and expand its own algorithm task, and the AI algorithm layer can dynamically modify the AI algorithm service according to the modification of the client running on the terminal. In addition, more intelligent devices including smoke sensors, depth cameras, light sensors and the like can be accessed to the IoT access platform layer, so that an AI detection algorithm is assisted, and the detection precision is improved in different scenes.

Through the design of above-mentioned safety monitoring system, non-contact remote monitoring can be realized to this application, combines the AI technique to carry out the analysis to magnanimity information, reaches accurate risk early warning and response, builds an intelligent monitoring and reports an emergency and asks for help or increased vigilance platform, provides like embodiment above-mentioned multiple AI safety monitoring mode to provide high-efficient, the high accuracy's safety monitoring measure for various places, effectively reduce the virus propagation probability, guarantee the security. The intensive crowd monitoring and alarming process detects crowd gathering conditions of a target place by utilizing machine learning, judges whether the crowd abnormal gathering conditions exist or not, triggers an alarm to be pushed to a monitoring center once the crowd abnormal gathering conditions are found, brings convenience to managers to evacuate gathering personnel quickly, and reduces the virus propagation possibility. When monitoring is carried out in certain places, the monitoring can be realized through an electronic fence technology, namely, a forbidden zone is set, and whether people enter abnormally or not is monitored, such as the situation that people who do not wear specific clothes enter or enter at unspecified time. And if the abnormal condition is found, giving an alarm and informing a safety manager to handle. Certainly, backtracking of a specific scene can be realized based on video monitoring, related personnel can be quickly backtracked and searched by keeping related data of a designated place and people, and under the disease control scene, people of class B (infected people who accidentally contact with the infected people) can be found, and corresponding treatment measures are taken, so that the possibility of generation of people of class 2B (infected people who accidentally contact with the people of class B) is reduced. Under the condition of the infection source, the number of workers needed in checking can be reduced, and the possibility of manual contact with the infection source is reduced, so that the virus propagation is effectively inhibited, and the personal safety of more people is ensured.

In an actual implementation process, each part in the safety monitoring system may implement the scheme provided by the present application in a manner as shown in fig. 21, where fig. 21 is an exemplary system implementation flowchart provided by the embodiment of the present application, and referring to fig. 21, the system includes: a client 2101 running on a terminal, a background server 2102, an IoT access platform server 2103, a video gateway server 2104, camera devices 2105 to 2107, a general server 2108, and algorithm servers 2109 to 2111. Through the interaction among the background server 2102, the IoT access platform server 2103, the video gateway server 2104, the general server 2108 and the algorithm servers 2109 to 2111, the security monitoring method described in the above steps 201 to 205 and 401 to 407 can be implemented.

The specific implementation process is as follows: (1) the client 2101 running on a terminal may be connected to the background server 2102, for example, the client running on a terminal may be connected to the background server 2102 via a PC or a cell phone. (2) The background server 2102 connects to the IoT access platform server 2103 to obtain camera device information. (3) The camera devices 2105-2107 (which may also include some other intelligent devices) access the IoT access platform server 2103 through the video gateway server 2104 and register for services in the IoT access platform server 2103. (4) When the client 2101 running on the terminal initiates an AI service task, the client 2101 running on the terminal may initiate a device service request to the IoT access platform server 2103 through the backend server 2102, and the IoT access platform server 2103 may notify the video gateway server 2102 to push video data to the general server 2108 of the algorithm layer. (5) The general server 2108 distributes the pushed video data to the algorithm servers 2109 to 2111, and performs the processing according to the procedures provided by the above embodiments to obtain the alarm information. The algorithm server may be a Graphics Processing Unit (GPU) server or the like. (6) The alarm information obtained by parsing may be transmitted back to IoT access platform server 2103 through generic server 2108. (7) The IoT access platform server 2103 may transmit the alert information back to the client 2101 running on the terminal through the backend server 2102.

Fig. 22 is a schematic structural diagram of a safety monitoring device provided in an embodiment of the present application, and referring to fig. 22, the safety monitoring device includes:

a first acquiring module 2201, configured to acquire at least one first video image of a first target site;

a model processing module 2202, configured to input at least two first image recognition models into the at least one first video image, respectively, and output at least two pieces of crowd density information of the at least one first video image, where the at least two first image recognition models recognize people in the video image based on different elements of the people, respectively;

a model determining module 2203, configured to determine a target image recognition model based on the at least two pieces of crowd density information, where the target image recognition model is a model in which the output crowd density information satisfies a target condition;

the model processing module 2202 is further configured to input at least one second video image of the first target location into the target image recognition model, and output crowd density information of the at least one second video image, where a shooting time of the at least one second video image is after a shooting time of the at least one first video image;

a first information determining module 2204, configured to determine first safety monitoring information according to the crowd density information of the at least one second video image, where the first safety monitoring information is used to indicate crowd gathering conditions of the first target location.

The device provided by the embodiment of the application processes the video images acquired at the initial monitoring stage according to at least two first image recognition models to obtain the crowd density information determined by at least one model, and selects the target image recognition model based on the determined crowd density information, so that the model more suitable for a first target place can be selected for the subsequent monitoring process, the accuracy of the crowd density information can be adaptively improved, and the safety is improved.

In a possible implementation manner, the model processing module 2202 is configured to input the at least one first video image into a human head detection model, determine a human head position in the at least one first video image through the human head detection model, and determine crowd density information in the at least one first video image according to the human head position;

the model processing module 2202 is configured to input the at least one first video image into a pedestrian detection model, determine a pedestrian position in the at least one first video image through the pedestrian monitoring model, and determine crowd density information in the at least one first video image according to the pedestrian position;

the model processing module 2202 is configured to input the at least one first video image into a crowd density estimation model, determine a crowd density map of the at least one first video image through the crowd density estimation model, and determine crowd density information in the at least one first video image according to the crowd density map.

In a possible implementation manner, the model processing module 2202 is configured to output the crowd density information corresponding to the at least two head positions if the crowd density information corresponding to the at least two head positions meets a preset condition, and determine the crowd density information corresponding to the at least two head positions and the next head position based on the next head position meeting a preset distance condition with the at least two head positions if the crowd density information corresponding to the at least two head positions does not meet the preset condition.

In a possible implementation manner, the model processing module 2202 is configured to output the crowd density information corresponding to the at least two pedestrian positions if the crowd density information corresponding to the at least two pedestrian positions meets a preset condition, and determine the crowd density information corresponding to the at least two pedestrian positions and the next pedestrian position based on the next pedestrian position meeting a preset distance condition with the at least two pedestrian positions if the crowd density information corresponding to the at least two pedestrian positions does not meet the preset condition.

the device also includes:

In one possible implementation, the apparatus further includes:

the analysis module is used for analyzing the received video image;

In one possible implementation, the apparatus further includes:

the identification module is used for identifying the video image through a second image identification model if the second target place is a target type place, and determining a person who does not wear the mask in the video image;

In one possible implementation, the apparatus further includes:

Fig. 23 is a schematic structural diagram of a safety monitoring device provided in an embodiment of the present application, and referring to fig. 23, the safety monitoring device includes:

an interface display module 2301 for displaying a security monitoring interface, the security monitoring interface including a crowd gathering presentation option for providing a crowd gathering presentation function for at least one venue;

a first obtaining module 2302, configured to obtain a first image of the at least one location in response to a triggering operation on the crowd sourcing presentation option, where the first image displays crowd density information of the location in a differentiated manner in different labeling manners;

a first image display module 2303 for displaying a first image of the at least one place.

The device that this application embodiment provided shows the safety monitoring interface through visual interface at the terminal to safety control personnel can trigger crowd's gathering show option on this safety monitoring interface, comes to show the first image in at least one place, can help safety control personnel to see the crowd density information in each place directly perceivedly, so that in time handle the crowd gathering condition, ensure the safety in each place.

In a possible implementation manner, the first obtaining module 2302 is configured to send a first image obtaining request to a server, where the first image obtaining request carries a location identifier of the at least one location, and receive the first image sent by the server.

the device also includes:

In one possible implementation, the apparatus further includes:

It should be noted that: in the above embodiment, when monitoring the security of the target location, the security monitoring device is exemplified by only the division of the functional modules, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the server/terminal is divided into different functional modules to complete all or part of the functions described above. In addition, the safety monitoring device and the safety monitoring method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.

Fig. 24 is a schematic structural diagram of a terminal according to an embodiment of the present application. The terminal 2400 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 2400 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

Generally, the terminal 2400 includes: one or more processors 2401 and one or more memories 2402.

Processor 2401 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 2401 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 2401 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 2401 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, the processor 2401 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 2402 may include one or more computer-readable storage media, which may be non-transitory. The memory 2402 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 2402 is used to store at least one program code for execution by processor 2401 to implement the security monitoring method provided by the method embodiments herein.

In some embodiments, the terminal 2400 may further optionally include: a peripheral interface 2403 and at least one peripheral. The processor 2401, memory 2402 and peripheral interface 2403 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 2403 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 2404, a display screen 2405, a camera 2406, an audio circuit 2407, a positioning component 2408 and a power supply 2409.

The peripheral interface 2403 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 2401 and the memory 2402. In some embodiments, processor 2401, memory 2402, and peripheral interface 2403 are integrated on the same chip or circuit board; in some other embodiments, any one or both of processor 2401, memory 2402 and peripherals interface 2403 can be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 2404 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 2404 communicates with a communication network and other communication devices through electromagnetic signals. The radio frequency circuit 2404 converts an electric signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electric signal. Optionally, the radio frequency circuit 2404 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 2404 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 2404 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 2405 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 2405 is a touch display screen, the display screen 2405 also has the ability to capture touch signals on or over the surface of the display screen 2405. The touch signal may be input to the processor 2401 as a control signal for processing. At this point, the display 2405 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 2405 may be one, providing a front panel of the terminal 2400; in other embodiments, the number of the display screens 2405 can be at least two, and each display screen is disposed on a different surface of the terminal 2400 or is in a foldable design; in still other embodiments, display 2405 may be a flexible display disposed on a curved surface or on a folded surface of terminal 2400. Even further, the display 2405 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The Display 2405 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or other materials.

The camera assembly 2406 is used to capture images or video. Optionally, camera assembly 2406 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 2406 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Audio circuitry 2407 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 2401 for processing or inputting the electric signals to the radio frequency circuit 2404 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different positions of the terminal 2400. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from processor 2401 or radio frequency circuit 2404 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 2407 may also include a headphone jack.

The positioning component 2408 is utilized to locate a current geographic location of the terminal 2400 to implement navigation or LBS (location based Service). The positioning component 2408 may be based on a GPS (global positioning System) in the united states, a beidou System in china, a graves System in russia, or a galileo System in the european union.

Power supply 2409 is used to provide power to various components in terminal 2400. The power source 2409 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When power supply 2409 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 2400 also includes one or more sensors 2410. The one or more sensors 2410 include, but are not limited to: acceleration sensor 2411, gyro sensor 2412, pressure sensor 2413, fingerprint sensor 2414, optical sensor 2415, and proximity sensor 2416.

The acceleration sensor 2411 can detect the magnitude of acceleration on three coordinate axes of the coordinate system established with the terminal 2400. For example, the acceleration sensor 2411 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 2401 may control the display screen 2405 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 2411. The acceleration sensor 2411 may also be used for acquisition of motion data of a game or a user.

The gyroscope sensor 2412 may detect a body direction and a rotation angle of the terminal 2400, and the gyroscope sensor 2412 may cooperate with the acceleration sensor 2411 to acquire a 3D motion of the user on the terminal 2400. The processor 2401 may implement the following functions according to the data collected by the gyroscope sensor 2412: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 2413 may be disposed on the side frames of terminal 2400 and/or underneath display 2405. When the pressure sensor 2413 is disposed on the side frame of the terminal 2400, a user holding signal of the terminal 2400 may be detected, and the processor 2401 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 2413. When the pressure sensor 2413 is arranged at the lower layer of the display screen 2405, the processor 2401 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 2405. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 2414 is used for collecting the fingerprint of the user, and the processor 2401 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 2414, or the fingerprint sensor 2414 identifies the identity of the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 2401 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 2414 may be disposed on the front, back, or side of the terminal 2400. When a physical key or vendor Logo is provided on the terminal 2400, the fingerprint sensor 2414 may be integrated with the physical key or vendor Logo.

The optical sensor 2415 is used to collect the ambient light intensity. In one embodiment, the processor 2401 may control the display brightness of the display screen 2405 according to the ambient light intensity collected by the optical sensor 2415. Specifically, when the ambient light intensity is high, the display brightness of the display screen 2405 is increased; when the ambient light intensity is low, the display brightness of the display screen 2405 is adjusted down. In another embodiment, the processor 2401 may also dynamically adjust the shooting parameters of the camera head assembly 2406 according to the intensity of the ambient light collected by the optical sensor 2415.

A proximity sensor 2416, also known as a distance sensor, is typically provided on the front panel of the terminal 2400. The proximity sensor 2416 is used to collect the distance between the user and the front surface of the terminal 2400. In one embodiment, the processor 2401 controls the display 2405 to switch from a bright screen state to a dark screen state when the proximity sensor 2416 detects that the distance between the user and the front face of the terminal 2400 is gradually decreased; when the proximity sensor 2416 detects that the distance between the user and the front surface of the terminal 2400 becomes gradually larger, the processor 2401 controls the display 2405 to switch from the breath-screen state to the bright-screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 24 is not intended to be limiting and that terminal 2400 may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

Fig. 25 is a schematic structural diagram of a server provided in this embodiment, where the server 2500 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 2501 and one or more memories 2502, where the one or more memories 2502 store at least one program code, and the at least one program code is loaded and executed by the one or more processors 2501 to implement the methods provided by the foregoing method embodiments. Certainly, the server 2500 may further include a wired or wireless network interface, a keyboard, an input/output interface, and other components to facilitate input and output, and the server 2500 may further include other components for implementing the functions of the device, which is not described herein again.

In an exemplary embodiment, a computer readable storage medium, such as a memory including program code, executable by a processor, is also provided to perform the security monitoring method of the above embodiments. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by hardware associated with program code, and the program may be stored in a computer readable storage medium, where the above mentioned storage medium may be a read-only memory, a magnetic or optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A security monitoring method, the method comprising:

acquiring at least one first video image of a first target site;

inputting at least one second video image of the first target site into the target image recognition model, and outputting crowd density information of the at least one second video image, wherein the shooting time of the at least one second video image is behind that of the at least one first video image;

2. The method according to claim 1, wherein the at least one first video image is input into at least two first image recognition models respectively, and the outputting of the at least two pieces of crowd density information of the at least one first video image comprises any at least two of:

inputting the at least one first video image into a human head detection model, determining the human head position in the at least one first video image through the human head detection model, and determining crowd density information in the at least one first video image according to the human head position;

inputting the at least one first video image into a pedestrian detection model, determining the position of a pedestrian in the at least one first video image through the pedestrian monitoring model, and determining crowd density information in the at least one first video image according to the position of the pedestrian;

inputting the at least one first video image into a crowd density estimation model, determining a crowd density map of the at least one first video image through the crowd density estimation model, and determining crowd density information in the at least one first video image according to the crowd density map.

3. The method of claim 2, wherein determining crowd density information in the at least one first video image based on the head position comprises:

if the crowd density information corresponding to at least two head positions meets a preset condition, outputting the crowd density information corresponding to the at least two head positions;

and if the crowd density information corresponding to the at least two head positions does not meet the preset condition, determining the crowd density information corresponding to the at least two head positions and the next head position based on the next head position meeting the preset distance condition with the at least two head positions.

4. The method of claim 2, wherein determining crowd density information in the at least one first video image based on the pedestrian location comprises:

if the crowd density information corresponding to at least two pedestrian positions meets a preset condition, outputting the crowd density information corresponding to the at least two pedestrian positions;

and if the crowd density information corresponding to the at least two pedestrian positions does not meet the preset condition, determining the crowd density information corresponding to the at least two pedestrian positions and the next pedestrian position based on the next pedestrian position meeting the preset distance condition with the at least two pedestrian positions.

5. The method of claim 1, wherein the at least one first video image carries a verification tag;

before the at least one first video image is respectively input into the at least two first image recognition models and the at least two crowd density information of the at least one first video image is output, the method further comprises:

detecting a label carried by the obtained video image;

and if the acquired video image is detected to carry the check tag, determining that the acquired video image is the first video image, executing the step of inputting the at least one first video image into at least two first image identification models respectively and outputting at least two crowd density information of the at least one first video image.

6. The method according to claim 5, wherein after detecting the tag carried by the acquired video image, the method further comprises:

and if the acquired video image is detected not to carry the check tag, determining that the acquired video image is the second video image, inputting at least one second video image of the first target place into the target image identification model, and outputting the crowd density information of the at least one second video image.

7. The method of claim 1, wherein prior to said obtaining at least one first video image of a first target site, the method further comprises:

analyzing the received video image;

adding a check tag to a first number of video images obtained by analysis to obtain at least one first video image, and not adding a check tag to a second number of video images located behind the first number of video images in sequence to obtain at least one second video image.

8. The method of claim 1, wherein after determining the first safety monitoring information based on the crowd density information of the at least one second video image, the method further comprises:

detecting the first safety monitoring information;

and if the first safety monitoring information is detected to meet a first alarm condition, sending first alarm information, wherein the first alarm information is used for indicating the first target place to have crowd gathering.

9. The method of claim 1, further comprising:

acquiring a video image of a second target place;

if the second target place is a target type place, identifying the video image through a second image identification model, determining a person who does not wear the mask in the video image, and marking the person who does not wear the mask in the video image;

and determining second safety monitoring information according to the number of people without wearing the mask in the video image, wherein the second safety monitoring information is used for indicating the number of people without wearing the mask in the second target field.

10. The method of claim 9, wherein the identifying the video image by the second image recognition model comprises:

determining a face region in the video image through a face detection model in the second image recognition model;

intercepting a face area in the video image to obtain at least one face image;

and identifying the at least one face image through a classification model in the second image identification model, and determining whether a person corresponding to the face image wears a mask.

11. The method of claim 9, wherein after determining second safety monitoring information based on the number of people not wearing the mask from the video image, the method further comprises:

detecting the second safety monitoring information;

and if the second safety monitoring information is detected to meet a second alarm condition, sending second alarm information, wherein the second alarm information is used for indicating that people do not wear the mask in the second target place.

12. The method of claim 1, further comprising:

acquiring a video image of a third target place;

and identifying the video image of the third target place to obtain queuing information of different place units in the third target place.

13. The method of claim 1, further comprising:

acquiring video images of different site units in a third target site;

and identifying the video images of the different site units to obtain the loaded proportion information of the different site units in the third target site.

14. A security monitoring method, the method comprising:

responding to the triggering operation of the crowd gathering display option, acquiring a first image of the at least one place, wherein the first image displays the crowd density information of the place in a different marking mode;

displaying a first image of the at least one venue.

15. The method of claim 14, wherein said acquiring a first image of said at least one venue comprises:

sending a first image acquisition request to a server, wherein the first image acquisition request carries the place identifier of the at least one place;

and receiving the first image sent by the server.

16. The method of claim 14, wherein the safety monitoring interface further comprises a mask detection option for providing a personal mask fit demonstration function for at least one location;

after the displaying the safety monitoring interface, the method further comprises:

responding to the trigger operation of the mask detection option, acquiring a second image of the at least one place, and displaying whether a person in the place wears a mask or not in a distinguishing way in different marking modes through the second image;

displaying a second image of the at least one venue.

17. The method of claim 16, wherein said acquiring a second image of said at least one venue comprises:

sending a second image acquisition request to a server, wherein the second image acquisition request carries the place identifier of the at least one place;

and receiving the second image sent by the server.

18. The method of claim 14, further comprising:

and responding to a crowd gathering display instruction of a third target place, and displaying queuing information of different place units in the third target place, wherein the queuing information is used for representing gathering conditions of people queued in the place units.

19. The method of claim 18, wherein prior to said displaying queuing information for a different venue unit in the third target venue, the method further comprises:

sending a queuing information acquisition request to a server, wherein the queuing information acquisition request carries the place identifier of the third target place;

and receiving the queuing information sent by the server.

20. The method of claim 14, further comprising:

responding to a crowd gathering display instruction of a third target place, and displaying the carried proportion information in different place units in the third target place, wherein the carried proportion information is used for representing the proportion of the carried number of the place units to the total number of the bearable number.

21. The method of claim 20, wherein prior to displaying the loaded scale information in the different site units in the third target site, the method further comprises:

sending a loaded proportion information acquisition request to a server, wherein the loaded proportion information acquisition request carries the site identification of the third target site;

and receiving the loaded proportion information sent by the server.

22. A safety monitoring device, the device comprising:

the model determining module is used for determining a target image recognition model based on the at least two pieces of crowd density information, wherein the target image recognition model is a model of which the output crowd density information meets a target condition;

the model processing module is further configured to input at least one second video image of the first target location into the target image recognition model, and output crowd density information of the at least one second video image, where a shooting time of the at least one second video image is after a shooting time of the at least one first video image;

and the information determining module is used for determining first safety monitoring information according to the crowd density information of the at least one second video image, wherein the first safety monitoring information is used for indicating the crowd gathering condition of the first target place.

23. A safety monitoring device, the device comprising:

the acquisition module is used for responding to triggering operation of the crowd gathering display option, acquiring a first image of the at least one place, and displaying the crowd density information of the place in a distinguishing way through different marking ways by the first image;

and the image display module is used for displaying the first image of the at least one place.

24. A server, comprising one or more processors and one or more memories having stored therein at least one program code, the program code loaded into and executed by the one or more processors to perform operations performed by the security monitoring method of any one of claims 1 to 13.

25. A terminal, characterized in that the terminal comprises one or more processors and one or more memories having stored therein at least one program code, which is loaded and executed by the one or more processors to carry out the operations executed by the security monitoring method according to any one of claims 14 to 21.

26. A computer-readable storage medium having at least one program code stored therein, the program code being loaded and executed by a processor to perform operations performed by the security monitoring method of any one of claims 1 to 21.