CN110659546A

CN110659546A - Illegal booth detection method and device

Info

Publication number: CN110659546A
Application number: CN201810713429.3A
Authority: CN
Inventors: 鲁超; 戴虎
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-06-29
Filing date: 2018-06-29
Publication date: 2020-01-07
Anticipated expiration: 2038-06-29
Also published as: CN110659546B

Abstract

The application provides a method and a device for detecting illegal booths, wherein the method comprises the following steps: acquiring a video frame corresponding to a to-be-detected place; carrying out target detection on the video frame by using a trained detection model to obtain a first booth which is a booth with a fixed shape; carrying out image segmentation on the video frame by using a trained segmentation model, and carrying out connected domain processing on a segmentation result to obtain a second booth, wherein the second booth is a booth with a changeable form; acquiring a pre-configured unallowable area of the site to be detected; a booth, of the first booth and the second booth, whose overlapping area with the unallowable booth area reaches a set threshold value is regarded as an illegal booth. The method fully utilizes the advantages and short boards of the detection model and the segmentation model to efficiently detect the illegal booth, not only can reduce the labor cost and maintain the urban traffic management order, but also can provide real-time reference information for urban supervision departments, is convenient to manage, effectively improves the booth management efficiency and relieves the problem of indiscriminate stall.

Description

Illegal booth detection method and device

Technical Field

The application relates to the field of video image monitoring, in particular to a method and a device for detecting illegal booths.

Background

Illegal allocation and management refers to the behavior that operators occupy public places such as urban roads, bridges and urban squares to buy and sell goods or services profitably. With the continuous development of cities, the conflict of interests between store-out operators and residents is continuously upgraded, a plurality of store-out operators occupy public road resources of the citizens, the appearance and the operational order of the cities are seriously damaged, the defects of the operational behaviors of the cities are increasingly shown, and the banning and renovating tasks of the city management departments are urgent.

The urban management department can monitor illegal allocation events through the urban public security dynamic video monitoring system. The urban public security dynamic video monitoring system aims at fighting and preventing illegal crimes, video monitoring points are arranged at places such as public security complex places, key parts, main streets, sections with frequent cases, important intersections, bayonets and the like, monitoring images are transmitted to all levels of public security organs and other related departments in real time, and all levels of public security organs and other related departments can visually know and master the public security dynamic video monitoring system of a monitoring area through the modes of browsing, recording and the like of the images.

Most of the existing video monitoring systems adopt a traditional manual interpretation method, and workers need to watch the video images day and night and continuously judge whether sudden abnormal conditions occur in the video images by naked eyes. The monitoring mode has heavy workload, and leads the sense organ of the human body to enter a fatigue state inevitably, so that the abnormal event is missed or mistakenly detected. With the increasing of video monitoring areas, the amount of video data to be processed is far beyond the capability range of human interpretation, so that it is extremely difficult to obtain useful information from massive video data, and the requirements of video real-time monitoring and alarming cannot be met.

Disclosure of Invention

In view of this, the present application provides a method and an apparatus for detecting illegal booths, which are used to efficiently detect illegal booths and alleviate the problem of spreading out illegally.

Specifically, the method is realized through the following technical scheme:

in a first aspect of the present application, a method for detecting an illegal booth is provided, which includes:

acquiring a video frame corresponding to a to-be-detected place;

carrying out target detection on the video frame by using a trained detection model to obtain a first booth, wherein the first booth is a booth with a fixed shape;

carrying out image segmentation on the video frame by using a trained segmentation model, and carrying out connected domain processing on a segmentation result to obtain a second booth, wherein the second booth is a booth with a changeable form;

acquiring a pre-configured unallowable area of the site to be detected;

regarding a booth, among the first booth and the second booth, whose overlapping area with the non-permissible-to-place area reaches a set threshold value, as an illegal booth.

In a second aspect of the present application, a device for detecting illegal booths is provided, which has the function of implementing the method provided in the first aspect. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules or units corresponding to the above functions.

In one implementation, the apparatus may include:

the video frame acquisition module is used for acquiring a video frame corresponding to a place to be detected;

the target detection module is used for carrying out target detection on the video frame by utilizing a trained detection model to obtain a first booth, wherein the first booth is a booth with a fixed shape;

the target segmentation module is used for carrying out image segmentation on the video frame by using a trained segmentation model and carrying out connected domain processing on a segmentation result to obtain a second booth, wherein the second booth is a booth with a changeable form;

the configuration acquisition module is used for acquiring a pre-configured area which is not allowed to be shared of the site to be detected;

and the detection and judgment module is used for regarding the stall with the overlapping area of the area which is not allowed to be shared and reaches a set threshold value in the first stall and the second stall as an illegal stall.

In another implementation, the apparatus may include a processor, a memory, and a bus, where the processor and the memory are connected to each other through the bus; the memory stores machine-readable instructions, and the processor executes the method provided by the first aspect of the present application by calling the machine-readable instructions.

In a third aspect of the present application, there is provided a machine-readable storage medium having stored thereon machine-readable instructions which, when invoked and executed by a processor, cause the processor to carry out the method provided by the first aspect of the present application

According to the technical scheme, the target detection algorithm based on deep learning is realized, the training of the booth target detection model is correspondingly completed, the booth with high consistency is mainly detected, and the detection rate of most illegal booths is high; the method adopts a deep learning theory, adopts a semantic network model to perform target segmentation, correspondingly completes training of a booth target segmentation model, can segment and position booth groups with variable forms, fully utilizes the advantages of target detection and target segmentation, and realizes efficient detection of various booths in a scene; to sum up, this application carries out high-efficient the detection to the stand violation, not only can reduce the human cost, maintains urban traffic management order, can provide real-time reference information for city supervisory department moreover, and convenient management effectively improves stand managerial efficiency, alleviates in disorder the share problem.

Drawings

Fig. 1 is a schematic view of a form-fixed booth according to an embodiment of the present application;

fig. 2 is a schematic diagram of a form-changeable booth provided in an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a definition of a camera pitch angle provided by an embodiment of the present application;

fig. 4 is a flowchart of a method for detecting an illegal booth according to an embodiment of the present application;

FIG. 5 is a flowchart of an implementation of step 405 shown in FIG. 4 provided by an embodiment of the present application;

FIG. 6 is a flow chart of an implementation of generating an alert according to an embodiment of the present application;

fig. 7 is a block diagram of device modules provided in an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

With the rapid development of computer technology, intellectualization has also been increasingly applied in the field of digital security. The intelligent analysis of the video can be considered to be added in the video monitoring, so that the concerned video source is analyzed in real time, information omission is effectively avoided, various illegal stall behaviors (such as store-out operation and tourist dealers) are discovered at the beginning of occurrence, and the operators on duty are reminded.

Based on the thought, the embodiment of the application provides a target detection and target segmentation algorithm based on a deep learning technology, a large number of samples are trained according to the set calibration rule of target detection to obtain a detection model, the set calibration rule of target segmentation is trained to obtain a segmentation model, the advantages and short plates of the detection model and the segmentation model are fully utilized, the detection model is used for carrying out positioning detection on the stalls with higher consistency, the segmentation model is used for carrying out accurate positioning segmentation on the stalls with changeable shapes, and the detection and positioning tasks of all the stalls in the monitoring visual field are completed; and carrying out illegal judgment according to the analyzed booth and the preset illegal booth area. And finally, giving an alarm aiming at the determined booths in the continuous multiple frames by comprehensively analyzing the booth information of a certain number of continuous frames.

The embodiment of the application can comprise a training process and an application process of the deep learning network. The training process is first described below.

The embodiment of the application needs to train two deep learning networks, namely a detection model for detecting a shape-fixed booth and a segmentation model for segmenting a shape-variable booth. Most of the shape-fixed booths are single booths with higher consistency, and the occupied area is smaller, and the schematic diagram can be seen in fig. 1; the booth with variable shapes occupies a larger area, and a schematic diagram thereof can be seen in fig. 2.

The training process of the detection model is described first:

first, a sample is collected. The method comprises the steps of obtaining booth picture samples under one or any two or any three or all conditions of different time periods, different weather (such as sunny days, rainy days, cloudy days and the like), different illumination intensity (such as daytime and night), different monitoring camera erection and different scenes (namely sites).

The monitoring camera mount here includes the camera pitch angle and the camera imaging quality. As an embodiment, in order to improve the detection effect, the following requirements may be provided for the camera pitch angle and the camera imaging quality in the embodiment of the present application:

1) the camera pitch angle is limited to 15-90 degrees. As shown in fig. 3, the camera pitch angle refers to an included angle between a connecting line between the camera and a detected object (in the embodiment of the present application, a booth) and a road surface; the camera pitch angle determines the monitoring range of the monitoring camera; the smaller the pitch angle is, the larger the monitoring range is, but the smaller the corresponding shot pixels of the detection target are;

2) in an imaging image, the width of a single booth with higher requirement of consistency is in the range of 80-900 pixels, and the area of a booth group area with variable forms is larger than 200 pixels.

It should be noted that, when the camera pitch angle and/or the camera imaging quality do not satisfy the above conditions, the scheme still has a certain effect on the detection and segmentation of the booth.

And secondly, calibrating a sample. The fixed shape booths (such as trolleys, vehicle booths, tables and chairs and the like) with higher consistency in the booth picture samples are calibrated through the external rectangles, and the calibration is a rectangular frame including booths, which is shown by referring to a white frame in fig. 1.

And thirdly, training a sample. And training the calibrated booth picture samples by using a pre-built first deep learning network for a certain amount to obtain a trained detection model.

In application, a deep learning network structure can be adopted to iterate more than 100 ten thousand times to achieve convergence based on a Caffe (Convolutional neural network framework) environment, so as to obtain a detection model.

The training process of the segmentation model is described as follows:

first, a sample is collected. The method comprises the steps of obtaining booth picture samples under one or any two or any three or any four or all conditions of different time periods, different weather, different illumination intensities, different monitoring camera erection and different scenes.

The description of the installation of the monitoring camera can refer to the detection model part, and the description is omitted here.

And secondly, calibrating a sample. The target division calibration rule is different from the target detection calibration rule, the target division adopts pixel-by-pixel calibration, two types of targets including a booth area and a background in a booth picture sample are calibrated, and the contour of the booth with changeable shape is obtained through calibration.

And thirdly, training a sample. And training the calibrated booth picture samples by using a pre-built second deep learning network to obtain a trained segmentation model.

In application, based on Caffe environment, the deep learning semantic segmentation network structure is adopted to iterate more than 100 thousands of times to achieve convergence, and a segmentation model is obtained.

Based on the detection model and the segmentation model which are trained, the illegal booth detection method provided by the present application is described below through the flow shown in fig. 4, and the method can be applied to a monitoring camera and also can be applied to a back-end server connected to the monitoring camera. Referring to fig. 4, the method may include the steps of:

step 401: and acquiring a video frame corresponding to the to-be-detected place.

Here, the video frame corresponding to be detected may be captured by the monitoring camera. In one example, the installation of the monitoring camera at the site to be detected satisfies the following conditions: the included angle between the connecting line between the monitoring camera and the booth and the road surface is 15 degrees to 90 degrees.

In this embodiment, the method shown in fig. 4 may be performed for each frame in the original video stream; alternatively, in order to reduce the processing amount, the steps included in fig. 4 may be performed on the sampled video frames after sampling the original video stream, considering that the number of frames per second may be as many as 20 frames, and the picture change of the adjacent frames is very slight.

In application, the steps included in fig. 4 are performed on a video frame in RGB format after the video frame is converted into RGB format.

Step 402: and carrying out target detection on the obtained video frame by using the trained detection model to obtain a first booth, wherein the first booth is a booth with a fixed shape.

Due to the advantages of object detection, the first booth with higher consistency and the position coordinates thereof can be easily detected from the video frame. It should be noted that, in the embodiment of the present application, the first booth does not refer to a fixed booth, but refers to one or more booths detected by the detection model.

Step 403: and performing image segmentation on the same video frame by using the trained segmentation model, and performing CCL (Connected Component Analysis-Labeling) on a segmentation result to obtain a second booth, wherein the second booth is a booth with a changeable form.

Due to the advantage of target segmentation, the contours of a second booth with lower consistency possibly missed by a detection model can be easily segmented from the video frame; and then, performing morphological dilation corrosion on the outline of the second booth to remove a noise area, and extracting a circumscribed rectangle containing the outline of the second booth and the position coordinates of the circumscribed rectangle by using the CCL, namely the position coordinates of the second booth.

Similarly, the second booth does not refer to a fixed booth, but refers to one or more booths detected by the detection model. It is noted that the first booth and the second booth may include the same booth, and therefore, before step 405 is performed, the first booth and the second booth may be deduplicated.

Step 404: and acquiring a pre-configured unallowable area of the site to be detected.

The non-booth-allowed region may be a region where booths are not allowed to appear, which is artificially configured for the region screen displayed in the video frame, and may be represented by coordinates of one or more rectangular boxes.

Step 405: and regarding the booths of the first booth and the second booth, which have the overlapping area with the unallowable booth area reaching the set threshold value, as illegal booths.

In an alternative embodiment, the process described in step 405 may be implemented by the method shown in FIG. 5:

step 501: inputting position coordinates of a first booth and a second booth identified from a current video frame;

step 502: let variable i equal to 0;

step 503: judging whether i is smaller than the total number of the booths identified from the current video frame; if yes, go to step 504; if not, go to step 507;

step 504: screening an unprocessed stall from a first stall and a second stall, and judging whether the position coordinate of the stall is overlapped with the position coordinate of a preset area which is not allowed to be placed, wherein the overlapped area reaches a set threshold value; if yes, executing step 505, and if no, executing step 506;

for example, suppose that the coordinates of the upper left corner of a booth are (50, 50), the coordinates of the lower right corner are (60, 60), the coordinates of the upper left corner of the preset area where stall is not allowed are (0, 100), and the coordinates of the lower right corner are (100, 0); the booth area overlaps with the predetermined unallocated booth area and the overlap area reaches 100 pixel values.

Step 505: saving the position coordinates of the booth and continuing to perform step 506;

step 506: adding 1 to the value of i and returning to execute step 503;

step 507: and confirming the stored booth coordinates as the position coordinates of the illegal booth.

Through

steps

401 and 501 and 507, illegal booth information of a single video frame can be determined.

As an embodiment, compared with the method of simply using the identification result of the illegal booth with a single video frame to give an alarm, the method using the identification result of multiple video frames can achieve the effect of preventing false detection, and the specific implementation manner is as follows:

1) and establishing corresponding illegal records aiming at the places monitored by the monitoring cameras, wherein the illegal records are empty when being initially established.

2) After the illegal booth included in the current video frame is determined, acquiring the position coordinates of the determined illegal booth; and then, determining the to-be-detected place monitored by the current video frame.

3) Judging whether the illegal booth position coordinate is included in the illegal record of the to-be-detected place or not aiming at the position coordinate of each illegal booth identified from the current video frame;

if not, adding the position coordinate of the illegal booth to the illegal record, setting the number of illegal frames detected by the position coordinate as 1, and setting the number of allowed transient disappearing frames corresponding to the position coordinate as an initial value, wherein the initial value can be an integer greater than 0. The purpose of counting the number of illegal frames detected by the coordinates at the same position is to prevent false detection, and the purpose of setting the number of frames allowed to disappear temporarily is to improve the detection rate of illegal booths and avoid that the illegal booths are mistakenly considered to be absent in the to-be-detected places because the illegal booths cannot be detected by a certain frame later.

If the illegal record of the place to be detected comprises the position coordinate of the illegal booth, the number of illegal frames of the position coordinate recorded in the illegal record, which is detected, can be added by 1, and as for the number of allowed transient disappearing frames corresponding to the position coordinate recorded in the illegal record, the number can be selected to be reset as an initial value, and the number can also be selected to be kept unchanged. And then, judging whether the number of illegal frames added with 1 is equal to a set threshold, if so, generating an alarm for the scene to be detected, and otherwise, if the number of illegal frames is less than or greater than the set threshold. No alarm is generated, thus avoiding repeated alarms for the same site.

5) Subtracting 1 from the number of the allowed transient vanishing frames corresponding to the other position coordinates aiming at each other position coordinate inconsistent with the position coordinate of the illegal booth identified from the current frame in the illegal record of the to-be-detected place; and judging whether the number of the allowed transient vanishing frames after subtracting 1 is 0 or not, and if so, deleting the other position coordinates from the illegal records of the to-be-detected place.

In an alternative embodiment, the process of generating an alert may be implemented by the method shown in FIG. 6:

step 601: inputting the position coordinates of the illegal booth identified from the current video frame and the number of illegal frames of which the updated position coordinates are detected;

step 602: let variable i equal to 0;

step 603: judging whether i is smaller than the total number of illegal booths identified from the current frame; if yes, go to step 604, if no, go to step 606.

Step 604: screening an unprocessed position coordinate from the position coordinates of the illegal booths, and updating the state of the position coordinate;

specifically, the state of the position coordinates may be updated in the following manner: if the number of the detected illegal frames of the position coordinate is less than a set threshold, updating the attribute of the position coordinate to be a suspected illegal target; if the number of illegal frames detected by the position coordinate is equal to the set threshold, updating the attribute of the position coordinate to be a new illegal target; if the number of illegal frames detected by the position coordinate is greater than the set threshold, updating the attribute of the position coordinate to be 'reported illegal target';

step 605: adding 1 to the value of i, and returning to execute the step 603;

step 606: and alarming the position coordinates of the illegal booths with the attributes of the new illegal objects identified in the current video frames.

The description of the method provided in the present application is thus completed.

As can be seen from the above description, the method and the system realize the target detection algorithm based on deep learning, correspondingly complete the training of the booth target detection model, mainly detect booths with high consistency, and have high detection rate for most illegal booths;

furthermore, the method adopts a deep learning theory, adopts a semantic network model to perform target segmentation, correspondingly completes the training of a booth target segmentation model, can perform segmentation positioning on booth groups with variable forms, fully utilizes the advantages of target detection and target segmentation, and realizes efficient detection of various booths in a scene;

furthermore, the method and the device provide a logic for judging whether the booth is illegal by utilizing multi-frame statistical information, and compared with the method of simply utilizing single-frame detection information to make judgment, the method and the device have the advantages of improving the detection rate, eliminating false triggering, preventing repeated alarm and the like;

to sum up, this application carries out high-efficient the detection to the stand violation, not only can reduce the human cost, maintains urban traffic management order, can provide real-time reference information for city supervisory department moreover, and convenient management effectively improves stand managerial efficiency, alleviates in disorder the share problem.

The methods provided herein are described above. The apparatus provided in the present application is described below.

Referring to fig. 7, fig. 7 is a functional block diagram of an illegal booth detection apparatus provided in the present application. As shown in fig. 7, the apparatus includes:

a video frame acquiring module 701, configured to acquire a video frame corresponding to a location to be detected;

a target detection module 702, configured to perform target detection on the video frame by using a trained detection model to obtain a first booth, where the first booth is a booth with a fixed shape;

the target segmentation module 703 is configured to perform image segmentation on the video frame by using a trained segmentation model, and perform connected domain processing on a segmentation result to obtain a second booth, where the second booth is a booth with a changeable form;

a configuration obtaining module 704, configured to obtain a pre-configured non-allowed allocation area of the site to be detected;

the detection and judgment module 705 is configured to regard a booth, of the first booth and the second booth, whose overlapping area with the area not allowed to be shared reaches a set threshold as an illegal booth.

In one embodiment, the apparatus may further include:

the statistical module is used for acquiring the position coordinates of the illegal booths; judging whether the illegal records of the to-be-detected place comprise the position coordinates or not according to the acquired position coordinates of each illegal booth; if not, adding the position coordinate into the illegal record, setting the number of illegal frames detected by the position coordinate as 1, and setting the number of allowed transient disappearing frames corresponding to the position coordinate as an initial value.

In one embodiment, the statistical module is further configured to add 1 to the number of illegal frames detected by the position coordinate recorded in the illegal record if the illegal record of the to-be-detected place includes the position coordinate; and judging whether the number of illegal frames added with 1 is equal to a set threshold value, and if so, generating an alarm aiming at the scene to be detected.

In one embodiment, the statistical module is further configured to, for other position coordinates in the illegal record that are inconsistent with the position coordinates of the illegal booth, subtract 1 from the number of allowed transient vanishing frames corresponding to the other position coordinates; and judging whether the number of the allowed transient vanishing frames after subtracting 1 is 0, if so, deleting the other position coordinates from the illegal record.

In one embodiment, the target detection module 702 can train the detection model by: the method comprises the steps of obtaining booth picture samples under one or more conditions of different time periods, different weather, different illumination intensities, different monitoring camera erection and different scenes, wherein booths with fixed shapes are marked on the booth picture samples through external rectangles; and training the booth picture samples by using a pre-built first deep learning network for a certain amount to obtain a trained detection model.

In one embodiment, the object segmentation module 703 may train the segmentation model by: the method comprises the steps that booth picture samples under one or more conditions of different time periods, different weather, different illumination intensities, different monitoring camera erection and different scenes are obtained, and outlines of booths with changeable shapes are marked on the booth picture samples in a pixel calibration mode; and training the booth picture samples by using a pre-built second deep learning network to obtain a trained segmentation model after a certain amount of training is carried out.

In one embodiment, the video frame acquiring module 701 is configured to acquire a video frame corresponding to the to-be-detected place through a monitoring camera, where the erection of the monitoring camera at the to-be-detected place meets the following conditions: the line between the monitoring camera and the booth forms an included angle of 15 degrees to 90 degrees with the road surface.

It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation. The functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

The description of the apparatus shown in fig. 7 is thus completed.

The application also provides a device for detecting the illegal booth, which comprises a processor, a memory and a bus, wherein the processor and the memory are mutually connected through the bus; the memory stores machine-readable instructions that the processor invokes to implement the method shown in fig. 4.

Additionally, a machine-readable storage medium is provided that stores machine-readable instructions which, when invoked and executed by a processor, cause the processor to implement the method illustrated in fig. 4.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A method for detecting an illegal booth, comprising:

acquiring a video frame corresponding to a to-be-detected place;

acquiring a pre-configured unallowable area of the site to be detected;

2. The method of claim 1, wherein after considering a booth of said first and second booths having an overlap area with said unallowable booth area reaching a set threshold as an illegal booth, the method further comprises:

acquiring the position coordinates of the illegal booth;

judging whether the illegal records of the to-be-detected place comprise the position coordinates or not according to the acquired position coordinates of each illegal booth;

if not, adding the position coordinate into the illegal record, setting the number of illegal frames detected by the position coordinate as 1, and setting the number of allowed transient disappearing frames corresponding to the position coordinate as an initial value.

3. The method of claim 2, wherein the method further comprises:

if the illegal record of the to-be-detected place comprises the position coordinate, adding 1 to the number of illegal frames of the position coordinate recorded in the illegal record, which is detected; and judging whether the number of illegal frames added with 1 is equal to a set threshold value, and if so, generating an alarm aiming at the scene to be detected.

4. The method of claim 2, wherein the method further comprises:

for other position coordinates which are inconsistent with the position coordinates of the illegal booth in the illegal records, subtracting 1 from the number of the allowed transient vanishing frames corresponding to the other position coordinates; and judging whether the number of the allowed transient vanishing frames after subtracting 1 is 0, if so, deleting the other position coordinates from the illegal record.

5. The method of claim 1, wherein the detection model is trained by:

the method comprises the steps of obtaining booth picture samples under one or more conditions of different time periods, different weather, different illumination intensities, different monitoring camera erection and different scenes, and marking out a booth with a fixed shape through a circumscribed rectangle on the booth picture samples;

and training the booth picture samples by using a pre-built first deep learning network for a certain amount to obtain a trained detection model.

6. The method of claim 1, wherein the segmentation model is trained by:

the method comprises the steps that booth picture samples under one or more conditions of different time periods, different weather, different illumination intensities, different monitoring camera erection and different scenes are obtained, and outlines of booths with changeable shapes are marked on the booth picture samples in a pixel calibration mode;

and training the booth picture samples by using a pre-built second deep learning network to obtain a trained segmentation model after a certain amount of training is carried out.

7. The method according to any one of claims 1 to 6, wherein the acquiring a video frame corresponding to a location to be detected comprises:

acquiring a video frame corresponding to the to-be-detected place through a monitoring camera, wherein the erection of the monitoring camera on the to-be-detected place meets the following conditions:

the line between the monitoring camera and the booth forms an included angle of 15 degrees to 90 degrees with the road surface.

8. An illegal booth detection device, comprising:

9. The apparatus of claim 8, wherein the apparatus further comprises:

10. The apparatus of claim 9,

the statistical module is further configured to add 1 to the number of illegal frames detected by the position coordinate recorded in the illegal record if the illegal record of the to-be-detected place includes the position coordinate; and judging whether the number of illegal frames added with 1 is equal to a set threshold value, and if so, generating an alarm aiming at the scene to be detected.

11. The apparatus of claim 9,

the statistical module is further configured to subtract 1 from the number of the allowed transient vanishing frames corresponding to other position coordinates in the illegal records, wherein the other position coordinates are inconsistent with the position coordinates of the illegal booth; and judging whether the number of the allowed transient vanishing frames after subtracting 1 is 0, if so, deleting the other position coordinates from the illegal record.

12. The apparatus of claim 8, wherein the target detection module is trained to obtain the detection model by:

the method comprises the steps of obtaining booth picture samples under one or more conditions of different time periods, different weather, different illumination intensities, different monitoring camera erection and different scenes, wherein booths with fixed shapes are marked on the booth picture samples through external rectangles;

13. The apparatus of claim 8, wherein the object segmentation module is trained to derive the segmentation model by:

14. The apparatus according to any one of claims 8 to 13,

the video frame acquisition module is used for acquiring the video frame corresponding to the to-be-detected place through a monitoring camera, wherein the erection of the monitoring camera on the to-be-detected place meets the following conditions:

15. The illegal booth detection device is characterized by comprising a processor, a memory and a bus, wherein the processor and the memory are mutually connected through the bus;

the memory has stored therein machine-readable instructions, the processor executing the method of any of claims 1 to 7 by calling the machine-readable instructions.

16. A machine readable storage medium having stored thereon machine readable instructions which, when invoked and executed by a processor, cause the processor to carry out the method of any of claims 1 to 7.