CN113392779A

CN113392779A - Crowd monitoring method, device, equipment and medium based on generation of confrontation network

Info

Publication number: CN113392779A
Application number: CN202110674839.3A
Authority: CN
Inventors: 向蓓蓓; 杨洋; 茅爱华; 郑华美
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2021-09-14

Abstract

The disclosure provides a crowd monitoring method based on generation of an confrontation network, which can be applied to the technical field of artificial intelligence. The method comprises the following steps: training to generate an antagonistic network, and after the training is finished, estimating the number of people in the target monitoring area by utilizing a second crowd density estimation graph obtained by a generator model in the antagonistic network based on a second crowd image of the target monitoring area. In the training process, a first crowd image is used as the input of a generator model, and the generator model is trained to output a first crowd density estimation graph; and meanwhile, the first person group density true value graph and the first person group density estimation graph are used as the input of the discriminator model, so that the discriminator model discriminates the similarity between the first person group density true value graph and the first person group density estimation graph, and the training process is continuously repeated until the discriminated similarity meets the preset threshold condition. The present disclosure also provides a crowd monitoring apparatus, device, storage medium and program product based on generating an antagonistic network.

Description

Crowd monitoring method, device, equipment and medium based on generation of confrontation network

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and more particularly, to a crowd monitoring method, apparatus, electronic device, computer-readable storage medium, and program product based on generation of an antagonistic network.

Background

When the crowd gathers beyond a certain limit, accidents are inevitably caused, and particularly the gathering needs to be avoided during the epidemic situation. Therefore, monitoring crowd conditions is very necessary.

In the related art, a target detection method can be adopted for monitoring the number of people. For example, human features (e.g., human faces) in the image or video can be detected and located through camera monitoring to count the number of detected targets to obtain a final people counting result. However, on one hand, the method has low accuracy and more false detection and missing detection conditions in the scene with complex background and illumination change, serious crowd mutual shielding condition and large crowd size change; on the other hand, the human body characteristics are detected in the monitoring process, so that the risk of stealing the personal human body characteristic information exists.

Disclosure of Invention

In view of the foregoing, the present disclosure provides a population monitoring method, apparatus, device, medium, and program product based on generation of an antagonistic network, which improve a wider applicable environment and are safer.

According to a first aspect of the present disclosure, a crowd monitoring method based on generation of an antagonistic network is provided. The method comprises the following steps: training a generating countermeasure network, the generating countermeasure network including a generator model and a discriminator model; and after training is finished, taking a second crowd image of a target monitoring area as the input of the generator model, and estimating the number of people in the target monitoring area based on a second crowd density estimation graph output by the generator model. The following operations are repeatedly executed on the generated confrontation network by utilizing at least one first crowd image in the training process until the similarity distinguished by the discriminator model meets a preset threshold condition, and the operations comprise: obtaining the first crowd density true value graph based on the crowd calibration and transformation processing in the first crowd image; training the generator model to output a first person group density estimation graph by taking a first person group image as an input of the generator model; and in the training process, the first human group density true value graph and the first human group density estimation graph are used as the input of the discriminator model, so that the discriminator model discriminates the similarity of the first human group density true value graph and the first human group density estimation graph.

According to an embodiment of the present disclosure, training the generator model to output a first population density estimation map further includes: and taking the image spliced by the first crowd image and the first crowd density truth map as the input of the generator model.

According to an embodiment of the present disclosure, the obtaining the first population density truth map based on the scaling and transforming of the population in the first population image includes: calibrating the first crowd image by taking the head as the center to obtain a calibrated image; and performing Gaussian smoothing processing on the calibration image to obtain the first crowd density true value image.

According to an embodiment of the present disclosure, the training to generate an confrontation network further includes obtaining at least one of the first population images, specifically including: intercepting image frames from a crowd monitoring video; cutting at least one image block with a preset size from the image frame; and obtaining at least one first crowd image based on the at least one image block.

According to an embodiment of the present disclosure, the method further comprises: and intercepting the second crowd image from the crowd monitoring video based on the operation of delimiting the target monitoring area in the crowd monitoring video.

According to an embodiment of the present disclosure, said capturing the second crowd image from the crowd monitoring video further comprises: and automatically intercepting the second crowd image from the crowd monitoring video at fixed frame number intervals.

According to an embodiment of the present disclosure, the method further comprises: and when the number of people in the target monitoring area is estimated to be greater than or equal to the preset alarm number of people, giving an alarm in real time.

In a second aspect of the disclosed embodiments, a crowd monitoring device based on generating an antagonistic network is provided. The device comprises a training module and a people number estimation module. The training module is used for training a generating confrontation network, and the generating confrontation network comprises a generator model and a discriminator model. The following operations are repeatedly executed on the generated confrontation network by utilizing at least one first crowd image in the training process until the similarity distinguished by the discriminator model meets a preset threshold condition, and the operations comprise: obtaining the first crowd density true value graph based on the crowd calibration and transformation processing in the first crowd image; training the generator model to output a first person group density estimation graph by taking a first person group image as an input of the generator model; and in the training process, the first person group density true value graph and the first person group density estimation graph are used as the input of the discriminator model, so that the discriminator model discriminates the similarity of the first person group density true value graph and the first person group density estimation graph. And the people number estimation module is used for taking a second crowd image of a target monitoring area as the input of the generator model after training is finished, and estimating the people number in the target monitoring area based on a second crowd density estimation graph output by the generator model.

According to an embodiment of the present disclosure, the apparatus further comprises an alarm module. The warning module is used for giving a warning in real time when the number of people in the target monitoring area is estimated to be larger than or equal to the preset warning number of people.

In a third aspect of the disclosed embodiments, an electronic device is provided. The electronic device includes one or more processors, and a memory. The memory is used to store one or more programs which, when executed by the one or more processors, cause the one or more processors to perform the above-described method.

A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described method.

A fifth aspect of the disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above method.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:

fig. 1 schematically illustrates an application scenario diagram of a crowd monitoring method, apparatus, device, medium and program product based on generation of an antagonistic network according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow diagram of a training method to generate an anti-net, in accordance with an embodiment of the disclosure;

FIG. 3 schematically illustrates a model architecture diagram for generating an antagonistic network, according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a structural diagram of a generator model in a generation countermeasure network, in accordance with an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow diagram of a method for crowd monitoring based on generating an antagonistic network, in accordance with an embodiment of the present disclosure;

FIG. 6 is a block diagram schematically illustrating a crowd monitoring device based on generation of an antagonistic network according to an embodiment of the present disclosure;

FIG. 7 is a block diagram schematically illustrating a crowd monitoring device based on generation of an antagonistic network according to another embodiment of the present disclosure; and

fig. 8 schematically illustrates a block diagram of an electronic device adapted to implement a crowd monitoring method based on generation of an antagonistic network according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

In this document, it is to be understood that any number of elements in the specification and drawings is to be considered exemplary rather than limiting, and that any nomenclature (e.g., first, second) is used for distinction only, and not in any limiting sense.

The disclosed embodiments provide a crowd monitoring method, apparatus, device, medium, and program based on a generative confrontation network. The method comprises the following steps: training a generating countermeasure network, wherein the generating countermeasure network comprises a generator model and a discriminator model; and after the training is finished, taking the second crowd image of the target monitoring area as the input of the generator model, and estimating the number of people in the target monitoring area based on the second crowd density estimation image output by the generator model. The following operations are repeatedly executed on the generated confrontation network by utilizing at least one first crowd image in the training process until the similarity judged by the discriminator model meets a preset threshold condition, and the method comprises the following steps: obtaining a first crowd density true value graph based on the crowd calibration and transformation processing in the first crowd image; training a generator model to output a first crowd density estimation graph by taking the first crowd image as the input of the generator model; and in the training process, the first human group density true value graph and the first human group density estimation graph are used as input of the discriminator model, so that the discriminator model discriminates the similarity between the first human group density true value graph and the first human group density estimation graph.

The generator model in the generation countermeasure network attempts to "fool" the discriminator model by learning the true distribution of the input data, making the generated data as compliant as possible with the true distribution. The discriminator model discriminates the truth of the data generated by the generator model and feeds back the discrimination result to the discriminator model and the generator, thereby improving the discrimination capability of the discriminator model. Meanwhile, the generator improves the learning ability of the generator according to the feedback result and generates data closer to real distribution. Therefore, after the confrontation network training meeting requirements are generated, the generator model is used for estimating the number of people in the target monitoring area based on the second crowd image of the target monitoring area.

It should be noted that the crowd monitoring method and apparatus based on generation of the confrontation network in the embodiment of the present disclosure may be used in the financial field, for example, to monitor the crowd distribution of a bank outlet, and may also be used in any field other than the financial field, for example, to monitor crowd aggregation in public places, and the present disclosure does not limit the application field.

In addition, in the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all conform to the regulations of related laws and regulations, necessary security measures are taken, and the customs of the public order is not violated.

Fig. 1 schematically illustrates an application scenario diagram of a crowd monitoring method, apparatus, device, medium and program product based on generation of an antagonistic network according to an embodiment of the present disclosure.

As shown in fig. 1, an application scenario 100 according to this embodiment may include a camera 101, a terminal device 102, a server 103, and a network 104. The network 104 is used to provide communication links between the camera 101, the terminal device 102 and the server 103. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The camera 101 may collect and monitor the crowd gathering condition in a certain area (e.g. a bank outlet). For example, the camera 101 may be fixed in the hall (covering all areas of the hall as much as possible), and the off-line work time of the bank can be monitored continuously. The camera 101 may transmit the acquired image to the terminal device 102 and the server 103 in real time.

The terminal device 102 may display thereon a crowd image or video of the camera 101. And the user may also determine a target monitoring area through an operation (e.g., a box selection operation, etc.) in the terminal apparatus 102 and transmit information of the target monitoring area to the server 103.

The server 103 may perform the crowd monitoring method based on the generation countermeasure network according to the embodiment of the present disclosure, wherein the generation countermeasure network may be provided in the server 103. Specifically, the server 103 may receive the crowd image sent by the camera 101 and/or the information of the target monitoring area sent by the terminal device 102, estimate the number of people in the target monitoring area by using a generator in the generation countermeasure network during the working hours, and send an alarm signal to the terminal device 103 if the number of people exceeds a threshold. Meanwhile, the server 103 can train and generate the confrontation network by using the crowd video images collected in the working time in the non-working time. Therefore, the capability of the generator model for generating the density map is improved by generating the adaptive learning of the countermeasure network.

It should be noted that the crowd monitoring method based on generation of an confrontation network provided by the embodiment of the present disclosure may be generally executed by the server 103. Accordingly, the crowd monitoring apparatus, device, medium, and program based on generation of an antagonistic network provided by the embodiments of the present disclosure may be generally disposed in the server 103. The crowd monitoring method based on generation of the confrontation network provided by the embodiment of the present disclosure may also be executed by a server or a server cluster which is different from the server 103 and can communicate with the camera 101, the terminal device 102, and/or the server 103. Accordingly, the crowd monitoring apparatus, device, medium, and program based on generation of an antagonistic network provided in the embodiments of the present disclosure may also be disposed in a server or a server cluster different from the server 103 and capable of communicating with the camera 101, the terminal device 102, and/or the server 103.

It should be understood that fig. 1 is only an example of an application scenario in which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

The crowd monitoring method based on the generation of the confrontation network of the disclosed embodiment will be described in detail through fig. 2 to 5 based on the scenario described in fig. 1.

Fig. 2 schematically illustrates a flow chart of a training method for generating an antagonistic network according to an embodiment of the present disclosure.

As shown in fig. 2, the training method for generating the countermeasure network may include operations S210 to S260.

First, in operation S210, at least one first crowd image is acquired. In some embodiments, the image frame may be cut from the crowd monitoring video, and then at least one image block with a preset size is cut from the image frame, and at least one first crowd image is obtained based on the at least one image block. For example, in one application example, a camera is fixed at a predetermined position to acquire a monitoring image, the training generation countermeasure network randomly cuts 9 image blocks with the size of 256 × 256, and adds gaussian white noise with zero mean 0.01 variance to the cut image and randomly turns over to obtain a first crowd image for training.

Then, in operation S220, a first population density true value map is obtained based on the population calibration and transformation process in the first population image. For example, the first crowd image is calibrated by taking the head as the center to obtain a calibrated image, and then the calibrated image is subjected to gaussian smoothing to obtain a first crowd density true value image.

Operations S230 to S250 are then repeatedly performed until the similarity determined by the discriminator model satisfies the preset threshold condition.

Fig. 3 schematically illustrates a model architecture diagram for generating an antagonistic network according to an embodiment of the present disclosure. The training process is explained next in conjunction with the model architecture of fig. 3.

In operation S230, the first crowd image is used as an input of the generator model, and the generator model is trained to output a first crowd density estimation map.

In operation S240, the first population density true value map and the first population density estimation map are used as inputs of the discriminator model, so that the discriminator model discriminates the similarity between the first population density true value map and the first population density estimation map.

Next, in operation S250, it is determined whether the similarity discriminated by the discriminator model satisfies a preset threshold condition, for example, whether the similarity reaches a preset threshold (for example, 0.9). If not, the operations S230 to S250 are repeatedly executed to perform model training. If so, operation S260 is performed.

In operation S260, if the similarity determined by the discriminator model satisfies a preset threshold condition, it may be determined that the generated confrontation network has satisfied the usage requirement, and thus the generated confrontation network may be output to monitor the crowd gathering condition of the target monitoring area using the generated confrontation network. Wherein, when monitoring the crowd gathering condition, the number of people in the target area is estimated based on the collected real-time crowd images by using the generator model. This is described in detail below in fig. 5.

In some embodiments, the first population image and the first population density truth map may also be used as input of the generator model at the same time, for example, the image obtained by stitching the first population image and the first population density truth map is input to the generator model, so that the density truth map may be used as a conditional constraint of the training process, and the problem that the generation of the countermeasure network is unsupervised and too free in the training process is avoided. By inputting this additional condition variable (i.e., the first population density true value map), there is a guiding role for the generator model in the model training and data generation process. The discriminator model takes the first person group density true value graph as one of the input, and can help the discriminator to discriminate the true and false of the generated data to a certain extent, thereby improving the discrimination capability. Crowd density estimation is carried out on the basis of the condition-constrained antagonistic neural network, and the applicability is wide.

Fig. 4 schematically shows a structural diagram of a generator model in a generation countermeasure network according to an embodiment of the present disclosure.

As shown in fig. 4, the generator model of the disclosed embodiments employs a network architecture based on an encoder-decoder model.

Specifically, the encoder may use Multi-column Multi-scale blocks (MSIB) that employ convolution kernels of different scales in a Multi-column structure, which may effectively extract Multi-scale feature information in an image. And taking one layer of network above the multi-column multi-scale block as input, performing parallel multi-column multi-scale convolution on input data, and connecting multi-column convolution results as output.

For example, four columns of Multi-scale blocks (MSIB-4) with inclusion-4 columns may be used in the encoder, and four columns of Multi-scale blocks (MSIB-4) include four convolution kernels of different scales, 1 × 1, 3 × 3, 5 × 5, and 7 × 7. In one embodiment, since 5 × 5 and 7 × 7 convolution kernels may bring more computation, two 3 × 3 convolution kernels may be used instead of the 5 × 5 convolution kernel and three 3 × 3 convolution kernels may be used instead of the 7 × 7 convolution kernel, respectively.

Table 1 schematically shows the network structure parameters of the generator model in one embodiment.

TABLE 1 network architecture parameters of the generator model

Further, maximum pooling may be employed to downsample the extracted feature information. Pooling can effectively avoid model overfitting, but can also reduce a large amount of spatial feature information during operation. Therefore, in order to avoid a large amount of missing of the spatial feature information, two maximum pooling operations are adopted in the generator model, and are all arranged after the operation of extracting the multi-scale information in the image by four columns of multi-scale blocks.

The specific network structure of the decoder may be composed of six layers of deconvolution layers. The decoder takes the 'middle' feature map of the size of the original input image 1/4 as input, performs upsampling on the 'middle' feature map under the condition of not losing important feature information, outputs the 'middle' feature map by taking a crowd density estimation map with the size consistent with that of the original input image as output, and obtains the estimation result of the number of people in the image according to the generated crowd density map.

Some image low-level feature information is lost due to the down-sampling process in the encoder model. To better preserve the low-level feature information in the image, we combine the output of the fourth four-column multi-scale block with the output of the second deconvolution layer and input it into the third deconvolution layer, and combine the output of the second four-column multi-scale block with the output of the fourth deconvolution layer and input it into the fifth deconvolution layer in the generator model.

According to an embodiment of the present disclosure, the arbiter model may employ a PatchGAN chunking arbitration algorithm. The algorithm divides an image generated by a generator model into a plurality of N × N image small blocks, determines the true and false values of the plurality of N × N image small blocks, and takes the average value of the determination values of all the blocks as the final determination result value. The algorithm focuses on the expression effect of the high-frequency part in the image, the structure of the local image block is judged, the defect that the loss function of L1 or L2 cannot describe the loss of the high-frequency part of the image is overcome, the extraction and characterization of the local image features are realized, and the high-quality and high-resolution image is generated.

The discriminator model may include five convolutional layers, except the first layer and the last layer, each of the remaining layers is followed by a batch normalization operation (batch normalization) followed by a LeakyReLU, which implements feature mapping of the image. And the last layer of convolution layer is full-connection mapping, image true and false judgment is carried out on the extracted characteristic information, and a judgment result is output.

Table 2 schematically shows the network structure parameters of the discriminator model in one embodiment.

TABLE 2 network architecture parameters of the discriminator model

Fig. 5 schematically illustrates a flow chart of a method for crowd monitoring based on generation of an antagonistic network according to an embodiment of the present disclosure.

As shown in fig. 5, the method for crowd monitoring based on generation of an antagonistic network according to the embodiment of the present disclosure may include operations S510 to S530.

First, in operation S510, a second crowd image is captured from the crowd monitoring video based on an operation of delimiting a target monitoring area in the crowd monitoring video. For example, the second crowd image may be automatically captured from the crowd monitoring video at fixed frames per interval.

Then, in operation S520, the number of people in the target monitoring area is estimated based on the second crowd density estimation map output by the generator model with the second crowd image of the target monitoring area as an input of the generator model.

Then, in operation S530, when it is estimated that the number of persons in the target monitoring area is greater than or equal to a preset alarm number, an alarm is given in real time. In this way, crowd density estimation, aggregation monitoring and alarming are achieved, and field control capacity is enhanced. Crowd gathering monitoring is realized by automatically detecting crowd density estimation through camera monitoring, and manual monitoring cost is reduced.

Compared with the scheme of counting the number of people by detecting the characteristic parts such as the face and the like in the related art, the method disclosed by the embodiment of the invention carries out model training based on the position of the head of the person, so that the method has a strong application range to the environment when estimating the number of people, and has a good detection effect in the scenes such as complex background and illumination change, serious crowd shielding conditions, large crowd size change and the like. Moreover, the human body characteristics do not need to be detected, and the defect that lawless persons steal the personal human face characteristic information can be avoided.

Based on the crowd monitoring method based on the production confrontation network, the disclosure also provides a crowd monitoring device based on the production confrontation network. The apparatus of various embodiments of the present disclosure will be described in detail below in conjunction with fig. 6 and 7.

Fig. 6 schematically shows a block diagram of a crowd monitoring apparatus 600 based on generation of an antagonistic network according to an embodiment of the present disclosure.

As shown in fig. 6, the apparatus 600 may include a training module 610 and a people number estimation module 620 according to an embodiment of the present disclosure. According to other embodiments of the present disclosure, the apparatus 600 may further include an alarm module 630. The apparatus 600 may be used to implement the crowd monitoring method based on generation of an antagonistic network described with reference to fig. 2-5.

The training module 610 is used to train a generative confrontation network that includes a generator model and a discriminator model. The following operations are repeatedly executed on the generated confrontation network by utilizing at least one first crowd image in the training process until the similarity judged by the discriminator model meets a preset threshold condition, and the method comprises the following steps: obtaining a first crowd density true value graph based on the crowd calibration and transformation processing in the first crowd image; training a generator model to output a first crowd density estimation graph by taking the first crowd image as the input of the generator model; and in the training process, the first human group density true value graph and the first human group density estimation graph are used as input of the discriminator model, so that the discriminator model discriminates the similarity of the first human group density true value graph and the first human group density estimation graph.

The people number estimation module 620 is configured to estimate the number of people in the target monitoring area based on the second crowd density estimation graph output by the generator model, with the second crowd image of the target monitoring area as an input of the generator model after the training is finished.

The warning module 630 is configured to give a warning in real time when the number of people in the target monitoring area is estimated to be greater than or equal to a preset warning number.

Fig. 7 schematically shows a block diagram of a crowd monitoring apparatus 700 based on generation of an antagonistic network according to another embodiment of the present disclosure.

As shown in fig. 7, the crowd monitoring apparatus 700 based on generation of an antagonistic network may include the following seven modules: a video image frame processing module 710, a crowd density estimation module 720, a region delineation module 730, a synchronized playback control module 740, a density estimation map visualization module 750, a crowd gathering automatic warning module 760, and a model adaptive training module 770. The device 700 is described below by way of example for monitoring the crowd gathering at an off-line bank site.

The video image frame processing module 710 is used to read the locally stored video and read the video frames at fixed frame intervals, and store the video frames in the image frame queue for density estimation.

The crowd density estimation module 720 is used for crowd density detection, and by using the generated confrontation network obtained by training in the embodiment of the present disclosure, the density estimation of the input image is realized, and a crowd density estimation graph and a crowd counting result are provided.

The region delineation module 730 is used to delineate any region of the image and perform density estimation for the delineated region. The camera can be fixed to be positioned in a hall of a website (covering all areas of the hall as much as possible), an invalid area is eliminated by defining a target monitoring area containing crowds, the influence of the invalid background area is reduced, and the processing time of the image frame by the crowd density estimation module 720 is saved.

The synchronized playback control module 740 is used for real-time monitoring and synchronized playback of the imported video and the visualized crowd density estimation map, and provides pause-playback reset function and video playback buffer.

The density estimation map visualization module 750 is used for performing visualization processing on the crowd density estimation map provided by the crowd density estimation module 720, visually displaying crowd distribution conditions, providing support for practical application, and providing important reference information for people flow management, security and protection control and the like.

The crowd gathering automatic alarm module 760 is used to support the user to manually set the crowd gathering alarm count. The region of the website under the bank line is fixed, and the upper limit value of the crowd aggregation total can be estimated in a field investigation mode. The target monitoring area is defined in the area defining module 730, when the number of people estimated for the target monitoring area reaches a set upper limit value, an alarm is given in real time to inform security personnel and a responsible person of a bank offline to manage and control field people. During the period of epidemic prevention and control or other emergency, the relevant personnel can be informed promptly.

Model adaptive training module 770 is used to support generative confrontation network adaptive learning and training. For example, in non-working time, monitoring video image frames in working time are automatically intercepted, the position of a camera is fixed, and the capability of generating a crowd density estimation graph generated by an antagonistic network is improved through adaptive learning of the antagonistic network.

According to the embodiment of the present disclosure, any of the training module 610, the people number estimation module 620, the video image frame processing module 710, the crowd density estimation module 720, the region delineation module 730, the synchronized playback control module 740, the density estimation map visualization module 750, the crowd gathering automatic alarm module 760, and the model adaptive training module 770 may be combined into one module to be implemented, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the training module 610, the people number estimation module 620, the video image frame processing module 710, the people density estimation module 720, the region delineation module 730, the synchronized playback control module 740, the density estimation map visualization module 750, the people group automatic alert module 760, and the model adaptation training module 770 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three manners of software, hardware, and the same or in any suitable combination of any of them. Alternatively, at least one of the training module 610, the people number estimation module 620, the video image frame processing module 710, the crowd density estimation module 720, the region delineation module 730, the synchronized playback control module 740, the density estimation map visualization module 750, the crowd gathering automatic alert module 760, and the model adaptation training module 770 may be implemented at least in part as computer program modules that, when executed, may perform corresponding functions.

Fig. 8 schematically illustrates a block diagram of an electronic device 800 suitable for implementing a crowd monitoring method based on generating an antagonistic network according to an embodiment of the disclosure.

As shown in fig. 8, an electronic device 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., Application Specific Integrated Circuit (ASIC)), among others. The processor 801 may also include onboard memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing different actions of the method flows according to embodiments of the present disclosure.

In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 802 and/or RAM 803. Note that the programs may also be stored in one or more memories other than the ROM 802 and RAM 803. The processor 801 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 800 may also include input/output (I/O) interface 805, input/output (I/O) interface 805 also connected to bus 804, according to an embodiment of the present disclosure. Electronic device 800 may also include one or more of the following components connected to I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 802 and/or RAM 803 described above and/or one or more memories other than the ROM 802 and RAM 803.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the crowd monitoring method based on generation of the confrontation network provided by the embodiment of the disclosure.

The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 801. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via communication section 809, and/or installed from removable media 811. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program, when executed by the processor 801, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A crowd monitoring method based on generating an antagonistic network, comprising:

training a generated confrontation network, wherein the generated confrontation network comprises a generator model and a discriminator model, and the following operations are repeatedly executed on the generated confrontation network by utilizing at least one first crowd image in the training process until the similarity discriminated by the discriminator model meets a preset threshold condition, including:

obtaining the first crowd density true value graph based on the crowd calibration and transformation processing in the first crowd image;

training the generator model to output a first person group density estimation graph by taking a first person group image as an input of the generator model;

using the first human group density true value graph and the first human group density estimation graph as the input of the discriminator model in the training process so that the discriminator model discriminates the similarity of the first human group density true value graph and the first human group density estimation graph;

and after the training is finished, taking a second crowd image of a target monitoring area as the input of the generator model, and estimating the number of people in the target monitoring area based on a second crowd density estimation graph output by the generator model.

2. The method of claim 1, wherein training the generator model to output a first population density estimate map further comprises, with the first population image as an input to the generator model:

and taking the image spliced by the first crowd image and the first crowd density truth map as the input of the generator model.

3. The method of claim 1, wherein the obtaining the first population density truth map based on a process of scaling and transforming the population in the first population image comprises:

calibrating the first crowd image by taking the head as the center to obtain a calibrated image; and

and performing Gaussian smoothing on the calibration image to obtain the first crowd density true value image.

4. The method of claim 1, wherein said training to generate an antagonistic network further comprises acquiring at least one of said first population images, comprising:

intercepting image frames from a crowd monitoring video;

cutting at least one image block with a preset size from the image frame; and

and obtaining at least one first crowd image based on the at least one image block.

5. The method of claim 1, wherein the method further comprises:

and intercepting the second crowd image from the crowd monitoring video based on the operation of delimiting the target monitoring area in the crowd monitoring video.

6. The method of claim 5, wherein said capturing said second crowd image from said crowd monitoring video further comprises:

and automatically intercepting the second crowd image from the crowd monitoring video at fixed frame number intervals.

7. The method of claim 1 or 6, wherein the method further comprises:

and when the number of people in the target monitoring area is estimated to be greater than or equal to the preset alarm number of people, giving an alarm in real time.

8. A crowd monitoring device based on generating an antagonistic network, comprising:

the training module is used for training a generated confrontation network, the generated confrontation network comprises a generator model and a discriminator model, wherein the following operations are repeatedly executed on the generated confrontation network by utilizing at least one first crowd image in the training process until the similarity distinguished by the discriminator model meets a preset threshold condition, and the following operations are included:

training the generator model to output a first person group density estimation graph by taking a first person group image as an input of the generator model; and

and the number of people estimation module is used for taking a second crowd image of a target monitoring area as the input of the generator model after the training is finished, and estimating the number of people in the target monitoring area based on a second crowd density estimation graph output by the generator model.

9. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-7.

10. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 7.