CN111666945A

CN111666945A - Storefront violation identification method and device based on semantic segmentation and storage medium

Info

Publication number: CN111666945A
Application number: CN202010392935.4A
Authority: CN
Inventors: 郭闯世; 邵新庆; 刘强; 徐�明
Original assignee: Shenzhen ZNV Technology Co Ltd; Nanjing ZNV Software Co Ltd
Current assignee: Shenzhen ZNV Technology Co Ltd; Nanjing ZNV Software Co Ltd
Priority date: 2020-05-11
Filing date: 2020-05-11
Publication date: 2020-09-15

Abstract

A storefront violation identification method and device based on semantic segmentation and a storage medium are provided, wherein the storefront violation identification method comprises the following steps: acquiring frame images of one or more storefronts; recognizing the frame image according to a preset semantic segmentation violation recognition model to obtain a recognition area belonging to violation in the frame image; filtering each identification area in the frame image according to a preset mask, and reserving the identification area of the interested storefront; and outputting the identification result of the identification area of the storefront of interest. Because each identification area in the frame image is filtered according to the preset mask, the interference of non-storefront areas in the frame image can be eliminated to the maximum extent, and only the identification area of the storefront which is interested is concerned about the violation condition, so that the influence of external factors is effectively avoided, the situations of false detection and missed detection are greatly reduced, and the accuracy of the storefront violation identification is favorably improved.

Description

Storefront violation identification method and device based on semantic segmentation and storage medium

Technical Field

The invention relates to the technical field of image processing, in particular to a storefront violation identification method and device based on semantic segmentation and a storage medium.

Background

With the introduction and continuous implementation of smart cities and smart communities, intelligent management of street stores is imperative. The monitoring method for the violation of the shop front is mainly divided into two types: the first is a monitoring mode of a fixed camera, for example, a monitoring gun or a dome camera installed on a street wall surface, because the area shot by the fixed camera is relatively fixed, the scene analysis is relatively simple, the concerned area can be divided by some auxiliary means, and the violation condition caused by random placement of articles outside the shop in the concerned area can be detected; the second is a monitoring mode of a mobile camera, such as a vehicle-mounted law enforcement instrument, because the mobile camera is in a moving state, there are many uncertain factors in the aspects of moving speed, distance from a shop, environmental shielding, and the like, and it is very difficult to acquire a region of interest at this time, and the same method is difficult to apply.

When the shooting content of a camera is processed, targets in images or videos are often detected by means of deep learning, and the accuracy and speed of target detection are also improved by various detection methods, but a problem exists no matter which type of target detection method is adopted at present: i.e. simply by giving the position of the object a rectangular box. In fact, when the detection mode encounters complicated problems, such as irregular edges of articles placed outside a storefront, overlapping of the articles, and mutual close proximity of different types of articles, the detection precision of illegal storefront operation is reduced, the actual application effect is poor, and the situation of false detection or missed detection of a target is easily caused.

Disclosure of Invention

The invention mainly solves the technical problem of how to improve the accuracy of the storefront violation identification. In order to solve the technical problem, the application discloses a storefront violation identification method and device based on semantic segmentation and a storage medium.

According to a first aspect, an embodiment provides a storefront violation identification method based on semantic segmentation, including: acquiring frame images of one or more storefronts; identifying the frame image according to a preset semantic segmentation violation identification model to obtain an identification area belonging to violation in the frame image; filtering each identification area in the frame image according to a preset mask, and reserving the identification area of the interested storefront; and outputting the identification result of the identification area of the storefront of interest.

The acquiring of the frame images of one or more storefronts comprises: acquiring shot photos of one or more shop fronts, and taking the shot photos as the frame images; or acquiring shot videos aiming at one or more shop fronts, and taking each frame of picture in the shot videos as the frame image; the shot picture comprises a storefront picture shot by a camera; the shooting video comprises shooting the acquired storefront photos by a fixed camera or a movable camera.

The method for recognizing the frame image according to the preset semantic segmentation violation recognition model to obtain the recognition area belonging to the violation in the frame image comprises the following steps: inputting the frame image into a preset semantic segmentation violation identification model, so that the semantic segmentation violation identification model respectively predicts each pixel point in the frame image, and processes to obtain an identification region belonging to violation in the frame image; the illegal behaviors of each identification area in the frame image comprise one or more of cross-store operation, public channel occupation, shielding identification, facility reconstruction, illegal identification and illegal product display.

The process of establishing the semantic segmentation violation recognition model comprises the following steps: acquiring a plurality of training sample images of shop fronts in a violation occurrence state or a non-occurrence state; and training a preset semantic segmentation network by using the training sample image, and learning to obtain the semantic segmentation violation recognition model.

The filtering each identification area in the frame image according to a preset mask and reserving the identification area of the interested storefront comprises the following steps: comparing a preset mask with the frame image, filtering out an area formed by covering the mask outside the frame image, and reserving the covered area of the mask to obtain an identification area of the storefront of interest; the mask is the distribution position of the interesting storefront calibrated manually or the distribution position of the interesting storefront after image segmentation processing.

If the mask is the distribution position of the interesting storefront after the image segmentation processing, the image segmentation processing process comprises the following steps: inputting the frame image into a preset storefront segmentation model, so that the storefront segmentation model respectively predicts each pixel point in the frame image, processes to obtain a region belonging to the distribution position of the interesting storefront in the frame image, and takes the region belonging to the distribution position of the interesting storefront as the mask.

The outputting of the identification result of the identification area of the storefront of interest comprises: and judging whether the identification area of the storefront of interest has violation behaviors, if so, determining that the violation identification result exists, and giving violation alarms, otherwise, determining that the violation identification result does not exist.

According to a second aspect, an embodiment provides a storefront violation identification device, comprising: the image capturing equipment is used for capturing images of one or more storefronts and generating corresponding frame images; the processing device is connected with the image capturing device and used for performing image processing on the frame images of one or more storefronts according to the storefront violation identification method in the first aspect and outputting an identification result of an identification area of the storefront of interest; and the display equipment is connected with the processing equipment and used for receiving the identification result of the identification area of the interested storefront output by the processing module and displaying the alarm of the identification result.

The processing apparatus includes: the acquisition module is used for acquiring frame images of one or more storefronts from the image acquisition equipment; the recognition module is connected with the acquisition module and used for recognizing the frame image according to a preset semantic segmentation violation recognition model to obtain a violation recognition area in the frame image; the filtering module is connected with the identification module and used for filtering each identification area in the frame image according to a preset mask and reserving the identification area of the interested storefront; and the output module is connected with the filtering module and used for outputting the identification result of the identification area of the interested storefront.

According to a third aspect, an embodiment provides a computer-readable storage medium, characterized by a program executable by a processor to implement the storefront violation identification method as described in the first aspect above.

The beneficial effect of this application is:

according to the embodiment, the storefront violation identification method and device based on semantic segmentation and the storage medium are provided, wherein the storefront violation identification method comprises the following steps: acquiring frame images of one or more storefronts; recognizing the frame image according to a preset semantic segmentation violation recognition model to obtain a recognition area belonging to violation in the frame image; filtering each identification area in the frame image according to a preset mask, and reserving the identification area of the interested storefront; and outputting the identification result of the identification area of the storefront of interest. On the first hand, because the frame image of the storefront is obtained from the shot picture or the shot video, the picture or the video can be conveniently processed in real time, so that whether the illegal behaviors exist along the storefront of the street shop can be known in time, and the supervision pressure of law enforcement personnel is relieved; in the second aspect, because the frame image is identified according to the preset semantic segmentation violation identification model, the features of the context information can be effectively linked to predict each pixel in the image, and whether each pixel is violated is judged, so that the method is favorable for obtaining a violation identification area with high accuracy; in the third aspect, because each identification area in the frame image is filtered according to the preset mask, the interference of non-storefront areas in the frame image can be eliminated to the maximum extent, and only the identification area of the storefront which is interested is concerned about the violation condition, so that the influence of external factors is effectively avoided, the situations of false detection and missed detection are greatly reduced, and the accuracy of the storefront violation identification is favorably improved; in the fourth aspect, as the identification result of the identification area of the interested storefront is output, law enforcement personnel can know whether the illegal action occurs in the interested storefront in time and can specifically know which illegal situation occurs according to the alarm information, the identification result is intuitive and clear, the illegal situation does not need to be checked, and the storefront with the illegal action can be alarmed through the identification result to enhance the prompt function; in the fifth aspect, the protected storefront violation identification device realizes the functions of frame image acquisition, identification result output and alarm display by means of the image acquisition equipment, the processing equipment and the display equipment, so that the device can feed back the storefront violation state of the storefront along the street in time to the user, and the use experience and the violation enforcement efficiency of the user are improved.

Drawings

FIG. 1 is a flow chart of a storefront violation identification method of the present application;

FIG. 2 is a detailed flow chart of a frame image recognition and filtering process;

FIG. 3 is a schematic structural diagram of a semantic segmentation recognition model;

FIG. 4 is a flow diagram of a method for storefront violation identification in another embodiment;

fig. 5 is a schematic structural diagram of a storefront violation identification device.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.

Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.

The numbering of the components as such, e.g., "first", "second", etc., is used herein only to distinguish the objects as described, and does not have any sequential or technical meaning. The term "connected" and "coupled" when used in this application, unless otherwise indicated, includes both direct and indirect connections (couplings).

The first embodiment,

Referring to fig. 1, the present application discloses a storefront violation identification method based on semantic segmentation, which includes steps S110-S140, which are described below.

In step S110, frame images of one or more storefronts are obtained, that is, one frame image may include a captured image of one or more storefronts.

In the present embodiment, the frame image may have a variety of shooting sources. For example, a shot photograph of one or more store fronts is acquired as a frame image, and each shot photograph is taken as a frame image for each along-street store using a camera. In another case, a shot video of one or more shop fronts is acquired, and each frame of the shot video is used as a frame image, that is, a shot video of continuous frames obtained by shooting the shops along the street with a camera is used as a frame image.

It should be noted that the taken pictures include shop pictures taken by a camera, such as a street shop front picture taken by law enforcement officers using a mobile phone, a professional camera, and a law enforcement camera; the captured video includes shop photographs captured by a fixed camera or a mobile camera, such as a video of a store facade along a street captured by a fixed camera installed on a wall surface or a column, the view of the video image is often relatively single, and the video image can also be a video of a store facade along a street captured by a mobile camera installed on a law enforcement vehicle, and the video image is often not single and changes at any time.

And step S120, recognizing the frame image according to a preset semantic segmentation violation recognition model to obtain a violation recognition area in the frame image. The semantic division violation identification model is trained in advance, and semantic analysis, storefront violation identification and violation area division can be performed on the content in the frame image, so that the frame image can be subjected to the semantic division violation identification model, and then the identification area belonging to the storefront violation can be divided in the processed image.

In a specific embodiment, referring to fig. 2, the step S120 may specifically include steps S121-S122, which are respectively described as follows.

Step S121, inputting the frame image into a preset semantic segmentation violation identification model. It will be appreciated that a trained network model often has an input channel by means of which frame images can be input.

For example, the semantic segmentation violation recognition model illustrated in fig. 3 is a typical full-volume network structure (i.e., full volume Networks, FCN for short) that transfers their learned representations to the network by fine-tuning the segmentation task. In fig. 3, the model structure is composed of a number of different network layers (e.g. 96, 256, 384, 4096, 21 layers), which can be understood as an encoder-decoder network architecture, where the encoder is typically a pre-trained classification network like VGG, ResNet, followed by a decoder network, the task of which is to semantically project the discriminable features learned by the encoder (lower resolution) into the pixel space (higher resolution) to obtain dense classification.

It is understood that, in addition to the FCN model structure used in fig. 3, the semantic segmentation violation identification model may also use a network model structure of SegNet, U-Net, deep lab, E-Net, Link-Net, Mask R-CNN, PSPNet, RefineNet, or G-FRNet, and is not limited herein.

It should be noted that the semantic segmentation violation recognition model herein can be learned from training sample images related to storefront violation behaviors by training a preset semantic segmentation network. In one embodiment, the process of building a semantic segmentation violation identification model includes: (1) acquiring a plurality of training sample images of shop fronts in a violation occurrence state or a non-occurrence state; (2) and training a preset semantic segmentation network by using the training sample image, and learning to obtain a semantic segmentation violation identification model.

And step S122, predicting each pixel point in the frame image by utilizing a semantic division violation identification model, and processing to obtain an identification area belonging to the violation in the frame image.

It should be noted that the illegal action of each recognition area in the frame image includes one or more of cross-store operation, public channel occupation, blocking identification, facility reconstruction, illegal identification, and illegal product display, and as long as one of the above illegal actions is satisfied, the recognition area with the violation can be regarded as the recognition area on the image.

It should be noted that the final result (probability of image classification) of the semantic segmentation violation identification model in different classification tasks is the only important thing, and the semantic segmentation not only needs to have discrimination capability at the pixel level, but also needs to have a mechanism capable of projecting discriminable features learned by an encoder at different stages to a pixel space, and may form different image classification capabilities due to the architecture of a decoder. In fact, fig. 3 reflects the flow of FCN end-to-end dense prediction: the method is characterized in that the method is combined by different stages in an encoder, and the stages are different in the roughness of semantic information; when the low-resolution semantic feature map is processed, deconvolution operation initialized by a bilinear interpolation filter is used for completion, and knowledge migration is performed from classifier networks such as VGG16 and Alexnet to realize semantic subdivision.

Step S130, filtering each identification area in the frame image according to a preset mask, and reserving the identification area of the interested storefront.

It should be noted that the mask may be a selected image or graphic, and the image to be processed is blocked (wholly or partially) to control the area or process of the image processing. Typically in optical image processing, the mask may be a film, filter, or the like; in digital image processing, the mask may be a two-dimensional matrix array or a multivalued image.

In one embodiment, referring to fig. 2, the step S130 may specifically include steps S131-S132, which are respectively described as follows.

Step S131, comparing a preset mask with the frame image, specifically, multiplying the mask with the frame image, and performing various bit operations to complete the covering of the mask on the frame image, thereby obtaining a covered region and an area outside the covered region.

Step S132, filtering out the area formed by covering the mask outside the frame image, and reserving the covered area of the mask to obtain the identification area of the interested storefront.

For example, when the mask is a two-dimensional matrix data, the frame image of the storefront can be pixel-filtered by an n × n matrix, and then the region of interest or interest is highlighted. In the process, the image value in the coverage area can be kept unchanged by multiplying the mask with the phase of the frame image, and the image value outside the coverage area is set to be 0, so that certain areas on the image can be shielded by using the mask, the shielded areas do not participate in processing or calculation of processing parameters, and only the non-shielded areas are processed or counted. In addition, the structural feature extraction operation can be performed on the frame image by using the mask, and the structural feature similar to the mask in the image can be detected and extracted by using a similarity variable or an image matching method.

It should be noted that the storefronts of interest here may be all storefronts in the frame image, or may be part of storefronts, or may even be a certain storefront designated by the user.

In the embodiment, the mask used is the distribution position of the storefront of interest which is manually calibrated, or the distribution position of the storefront of interest after the image segmentation processing. It can be understood that the distribution position is often embodied by region formation, and the region of the storefront of interest in the frame image and the occupied segmentation shape can be obtained by circling the distribution position of the storefront of interest. If the mask is the distribution position of the interesting storefront which is calibrated manually, the area where the distribution position is located and the occupied segmentation shape in the frame image can be set according to the actual needs of the user, and the obtained mask is suitable for the frame image corresponding to the picture shot by the camera or the frame image corresponding to one frame picture shot by the fixed camera. If the mask is the distribution position of the interested storefront after the image segmentation processing, the image segmentation processing comprises the following steps: inputting the frame image into a preset storefront segmentation model, so that the storefront segmentation model respectively predicts each pixel point in the frame image, processes to obtain a region belonging to the distribution position of the interesting storefront in the frame image, and takes the region belonging to the distribution position of the interesting storefront as a mask.

The storefront segmentation model may be learned from training sample images of the storefront by training a preset semantic segmentation network.

As can be understood by those skilled in the art, the network structure of the semantic segmentation network also often uses FCN, SegNet, U-Net, DeepLab, E-Net, Link-Net, Mask R-CNN, PSPNet, RefineNet or G-FRNet, and other types of network model structures, but the difference of training sample images can cause the difference of hyper-parameters in the network, so that the network has different recognition functions.

And step S140, outputting the identification result of the identification area of the interested storefront.

In one embodiment, referring to fig. 2, the step S130 may specifically include steps S141-S143, which are described below.

Step S141 is to determine whether there is a violation in the identification area of the storefront of interest, and if yes, the process goes to step S142, otherwise, the process goes to step S143.

The illegal behaviors include one or more of cross-store operation, public channel occupation, blocking identification, facility reconstruction, illegal identification and illegal product display, and the illegal behaviors can be considered to exist as long as any one of the illegal behaviors exists.

And step S142, under the condition that the violation behaviors are determined to exist, determining that the violation identification result exists, and performing violation warning. The violation alarm here may include the identification area corresponding to the identification result and which specific violation behavior.

In step S143, the recognition result that there is no violation is determined. It will be appreciated that in the absence of any of the listed violations within the identified area of the storefront of interest, it will be determined that there is no violation identification, and that the business activity of the storefront of interest is in compliance by default.

In the present embodiment, both the semantic segmentation violation recognition model and the storefront segmentation model are obtained by training an existing semantic segmentation network based on a training sample image and learning the training sample image. That is, the basis of both models is a semantic segmentation network, which is dense prediction, i.e., each pixel is classified, unlike ordinary object detection which recognizes "what" and "where" it is in an image, semantic segmentation can accurately predict what a pixel of a point indicates based on context information around the point, i.e., so-called image semantic information.

The semantic segmentation of the image is to allocate a semantic category to each pixel in the input image to obtain the pixilated dense classification, since 2007, the semantic segmentation/scene analysis is always a part of a computer vision community, but is very similar to other fields in computer vision, and then the full convolution neural network is used for end-to-end segmentation of the natural image for the first time, so that the semantic segmentation generates a great breakthrough. The semantic segmentation network can adopt common FCN, SegNet, U-Net, deep Lab, E-Net, Link-Net, Mask R-CNN, PSP-Net, Refine-Net or G-FRNet and other types of network model structures, the network model structures belong to the prior art, and can be known through related network data, scientific research papers and teaching materials, and details are not repeated here.

Example II,

Referring to fig. 4, on the basis of the storefront violation identification method based on semantic segmentation disclosed in the first embodiment of the present application, an improved storefront violation identification method is further disclosed, which includes steps S210-S300, which are described below respectively.

In step S210, frame images of one or more storefronts are obtained, that is, one frame image may include a captured image of one or more storefronts.

There may be multiple sources of capture of the frame image, such as from a photograph taken for one or more store fronts, taking the photograph as the frame image; such as shot video from one or more store fronts, each frame in the shot video is treated as a frame image.

And step S220, recognizing the frame image according to a preset semantic segmentation violation recognition model to obtain a violation recognition area in the frame image.

The semantic division violation recognition model is trained in advance, and semantic analysis, storefront violation recognition and violation area division can be performed on the content in the frame image, so that after the frame image passes through the semantic division violation recognition model, a recognition area belonging to a storefront violation can be divided from the processed image.

In step S230, it is determined whether mask filtering is required, if so, the process proceeds to step S240, otherwise, the process proceeds to step S280.

If the identification area belonging to the violation is not identified in the frame image, the mask filtering is not necessary, and the process may proceed to step S280. The step of judging whether the mask filtering is needed or not is added, so that the corresponding flow can be entered according to the needs, and the work flow of unnecessary steps is reduced.

In step S240, it is determined whether the shooting source of the frame image is fixed, if so, the process proceeds to step S250, otherwise, the process proceeds to step S260.

It should be noted that if the frame image is from a photo taken by a camera or a video taken by a fixed moving camera, the shooting source is considered to be fixed; if the frame image is from a video shot by a moving camera, the shooting source is considered unfixed.

And step S250, taking the region of the distribution position of the interest storefront calibrated manually as a mask.

In step S260, the region of the distribution position of the store front of interest after the image segmentation processing is used as a mask.

It can be understood that the distribution position is often embodied by region formation, and the region of the storefront of interest in the frame image and the occupied segmentation shape can be obtained by circling the distribution position of the storefront of interest. The mask may be a selected image or graphic, and the image to be processed is blocked (either wholly or partially) to control the area or process of image processing. Typically in optical image processing, the mask may be a film, filter, or the like; in digital image processing, the mask may be a two-dimensional matrix array or a multivalued image.

The image segmentation process includes: inputting the frame image into a preset storefront segmentation model, so that the storefront segmentation model respectively predicts each pixel point in the frame image, processes to obtain a region belonging to the distribution position of the interesting storefront in the frame image, and takes the region belonging to the distribution position of the interesting storefront as a mask.

Step S270, filtering each identification area in the frame image according to a preset mask, and reserving the identification area of the interested storefront. Specifically, a preset mask is compared with the frame image, a region formed by covering the mask outside the frame image is filtered, and the covering region of the mask is reserved to obtain the identification region of the interested storefront.

Further, the process proceeds to step S280 after step S270.

In step S280, it is determined whether there is an illegal action in the identified area, if so, the process proceeds to step S290, otherwise, the process proceeds to step S300.

It should be noted that the violation includes one or more of cross-store operation, public channel occupation, blocking identification, facility construction, illegal identification, and illegal product display, and the violation can be considered as existing as long as any one of them exists.

In step S290, if it is determined that the violation behavior exists, it is determined that the violation identification result exists, and a violation alarm is performed. The violation alarm here may include the identification area corresponding to the identification result and which specific violation behavior.

In step S300, the recognition result that there is no violation is determined. It will be appreciated that in the absence of any of the listed violations within the identified area of the storefront of interest, it will be determined that there is no violation identification, and that the business activity of the storefront of interest is in compliance by default.

Those skilled in the art will appreciate that the following technical advantages may be achieved in practicing the disclosed storefront violation identification method: (1) because the frame image of the storefront is obtained from the shot picture or the shot video, the picture or the video can be conveniently processed in real time, so that whether illegal behaviors exist on the storefront along the street or not can be known in time, and the supervision pressure of law enforcement personnel is relieved; (2) the frame image is identified according to a preset semantic segmentation violation identification model, so that the characteristics of the context information can be effectively linked to predict each pixel in the image, whether each pixel is violated is judged, and the higher accuracy identification area belonging to the violation can be obtained; (3) filtering each identification area in the frame image according to a preset mask, so that the interference of non-storefront areas in the frame image can be eliminated to the maximum extent, and only the identification area of the storefront which is interested is concerned about the violation condition, thereby effectively avoiding the influence of external factors, greatly reducing the occurrence of false detection and missed detection, and being beneficial to improving the accuracy of the illegal storefront identification; (4) because the identification result of the identification area of the interested storefront is output, law enforcement personnel can not only know whether the illegal behaviors happen to the interested storefront in time, but also can specifically know which illegal situation happens according to the alarm information, the identification result is visual and clear, the personnel do not need to participate in the withering work of checking a large number of monitoring videos, and the storefront with the illegal behaviors can be alarmed through the identification result to enhance the prompt function.

Example III,

Referring to fig. 5, the embodiment discloses a storefront violation identification device, which includes an image capturing device 31, a processing device 32, and a display device 33, which are described below.

The image capturing device 31 has a camera, and can capture images of one or more storefronts and generate corresponding frame images. Specifically, the image capturing device 31 may be a camera or a video camera, such as a law enforcement officer using a mobile phone, a professional camera, a law enforcement camera, or other types of image capturing devices to take pictures of the shop front along the street, thereby generating a frame image; for another example, the frame image is generated by capturing a video of the shop front along the street by means of a fixed camera mounted on a wall surface, a pillar, or a mobile camera mounted on a law enforcement car.

The processing device 32 is connected to the image capturing device 31, and the processing device 32 may acquire frame images of one or more storefronts from the image capturing device 31, identify the frame images according to a preset semantic segmentation violation identification model to obtain identification areas belonging to violations in the frame images, filter each identification area in the frame images according to a preset mask to reserve the identification area of the storefront of interest, and finally output an identification result of the identification area of the storefront of interest. In particular, the processing device 32 may be a processor component or a combination of a processor and a memory, and is capable of processing the frame image according to the method disclosed in the second embodiment or the third embodiment.

The display device 33 is connected with the processing device 32, and the display device 33 has a function of displaying images or specific information, and can receive the identification result of the identification area of the interested storefront output by the processing module 32 and display the identification result in a warning mode. Specifically, the display device 32 may display the acquired frame image, the identification area belonging to the violation in the frame image, the identification area of the sensing area storefront, and the identification result of the identification area, so that the user can know the operating state of the apparatus in time through the display content. In addition, the display device 32 may be a display, an electronic screen, or the like, and is not particularly limited herein.

It should be noted that, when the display device 33 displays the recognition result of the recognition area of the storefront of interest in an alarm manner, the recognition result may be represented by characters or marks, and the specific representation form is not limited.

In one embodiment, referring to FIG. 5, the processing device 31 includes an acquisition module 321, a recognition module 322, a filtering module 323, and an output module 324, each described below.

The acquisition module 321 has a communication function and can acquire frame images of one or more storefronts from the image pickup device 31. The acquired frame image can have various shooting sources, for example, a shot picture for one or more shop fronts is acquired, the shot picture is taken as the frame image, namely, each picture shot by a camera for the shops along the street is taken as the frame image; in another case, a shot video of one or more shop fronts is acquired, and each frame of the shot video is used as a frame image, that is, a shot video of continuous frames obtained by shooting the shops along the street with a camera is used as a frame image.

The recognition module 322 is connected to the obtaining module 321, and mainly recognizes the frame image according to a preset semantic segmentation violation recognition model to obtain a recognition area belonging to a violation in the frame image. The semantic division violation identification model is trained in advance, and semantic analysis, storefront violation identification and violation area division can be performed on the content in the frame image, so that the frame image can be subjected to the semantic division violation identification model, and then the identification area belonging to the storefront violation can be divided in the processed image. For specific functions of the identification module 322, reference may be made to step S120 in the first embodiment, which is not described herein again.

The filtering module 323 is connected to the recognition module 322, and mainly filters each recognition area in the frame image according to a preset mask, and retains the recognition area of the interested storefront. The mask may be a selected image or graphic, and the image to be processed is blocked (wholly or partially) to control the area or process of image processing. Typically in optical image processing, the mask may be a film, filter, or the like; in digital image processing, the mask may be a two-dimensional matrix array or a multivalued image. For specific functions of the filtering module 323, reference may be made to step S130 in the first embodiment, which is not described herein again.

The output module 324 is connected to the filter module 323, and mainly outputs the identification result of the identification area of the storefront of interest. For specific functions of the output module 324, reference may be made to step S140 in the first embodiment, which is not described herein again.

Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by computer programs. When all or part of the functions of the above embodiments are implemented by a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.

The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims

1. A storefront violation identification method based on semantic segmentation is characterized by comprising the following steps:

acquiring frame images of one or more storefronts;

identifying the frame image according to a preset semantic segmentation violation identification model to obtain an identification area belonging to violation in the frame image;

filtering each identification area in the frame image according to a preset mask, and reserving the identification area of the interested storefront;

and outputting the identification result of the identification area of the storefront of interest.

2. The storefront violation identification method of claim 1, wherein said obtaining a frame image of one or more storefronts comprises:

acquiring shot photos of one or more shop fronts, and taking the shot photos as the frame images; or acquiring shot videos aiming at one or more shop fronts, and taking each frame of picture in the shot videos as the frame image;

the shot picture comprises a storefront picture shot by a camera; the shooting video comprises shooting the acquired storefront photos by a fixed camera or a movable camera.

3. The storefront violation identification method according to claim 2, wherein the identifying the frame image according to a preset semantic segmentation violation identification model to obtain an identification area belonging to a violation in the frame image comprises:

inputting the frame image into a preset semantic segmentation violation identification model, so that the semantic segmentation violation identification model respectively predicts each pixel point in the frame image, and processes to obtain an identification region belonging to violation in the frame image;

the illegal behaviors of each identification area in the frame image comprise one or more of cross-store operation, public channel occupation, shielding identification, facility reconstruction, illegal identification and illegal product display.

4. The storefront violation identification method according to claim 3, wherein building said semantic segmentation violation identification model comprises:

acquiring a plurality of training sample images of shop fronts in a violation occurrence state or a non-occurrence state;

and training a preset semantic segmentation network by using the training sample image, and learning to obtain the semantic segmentation violation recognition model.

5. The storefront violation identification method according to claim 4, wherein the filtering each identification area in the frame image according to a preset mask and retaining the identification area of the storefront of interest comprises:

comparing a preset mask with the frame image, filtering out an area formed by covering the mask outside the frame image, and reserving the covered area of the mask to obtain an identification area of the storefront of interest;

the mask is the distribution position of the interesting storefront calibrated manually or the distribution position of the interesting storefront after image segmentation processing.

6. The storefront violation identification method of claim 5, wherein if said mask is the distribution location of the storefront of interest after the image segmentation process, the image segmentation process comprises:

inputting the frame image into a preset storefront segmentation model, so that the storefront segmentation model respectively predicts each pixel point in the frame image, processes to obtain a region belonging to the distribution position of the interesting storefront in the frame image, and takes the region belonging to the distribution position of the interesting storefront as the mask.

7. The storefront violation identification method of any of claims 4-6, wherein said outputting the identification of the identified region of the storefront of interest comprises:

and judging whether the identification area of the storefront of interest has violation behaviors, if so, determining that the violation identification result exists, and giving violation alarms, otherwise, determining that the violation identification result does not exist.

8. A storefront violation identification device, comprising:

the image capturing equipment is used for capturing images of one or more storefronts and generating corresponding frame images;

the processing device is connected with the image capturing device and used for carrying out image processing on the frame images of one or more storefronts according to the storefront violation identification method in any one of claims 1-7 and outputting an identification result of an identification area of the storefront of interest;

and the display equipment is connected with the processing equipment and used for receiving the identification result of the identification area of the interested storefront output by the processing module and displaying the alarm of the identification result.

9. The storefront violation identification apparatus of claim 8, wherein said processing device comprises:

the acquisition module is used for acquiring frame images of one or more storefronts from the image acquisition equipment;

the recognition module is connected with the acquisition module and used for recognizing the frame image according to a preset semantic segmentation violation recognition model to obtain a violation recognition area in the frame image;

the filtering module is connected with the identification module and used for filtering each identification area in the frame image according to a preset mask and reserving the identification area of the interested storefront;

and the output module is connected with the filtering module and used for outputting the identification result of the identification area of the interested storefront.

10. A computer-readable storage medium comprising a program executable by a processor to implement the storefront violation identification method of any of claims 1-7.