CN111079671B

CN111079671B - Method and device for detecting abnormal articles in scene

Info

Publication number: CN111079671B
Application number: CN201911329567.2A
Authority: CN
Inventors: 黄泽元; 孙楠
Original assignee: Shenzhen Jizhi Digital Technology Co Ltd
Current assignee: Beijing Zhichuang Digital Technology Service Co ltd
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2020-11-03
Anticipated expiration: 2039-12-20
Also published as: CN111079671A

Abstract

The embodiment of the invention discloses a method and a device for detecting abnormal articles in a scene, wherein the method comprises the following steps: respectively inputting the scene detection graph and the scene standard graph into corresponding sub-networks of the twin network, and respectively obtaining a feature graph of the scene detection graph and a feature graph of the scene standard graph; fusing the feature map of the scene detection map and the feature map of the scene standard map to obtain a common feature map; calculating by adopting a plurality of void convolutions and the common feature graph respectively to obtain sub-feature graphs of the objects in the scene enclosed by the target frame; and screening out the sub-feature maps with abnormal articles from the sub-feature maps of the various articles in the scene through a full-connection network, calculating the coordinates of the sub-feature maps with the abnormal articles, and using the target frame to circle the abnormal articles in the scene detection map. By the abnormal article detection method provided by the embodiment of the application, abnormal articles which should not appear in a scene originally can be detected, manpower is greatly liberated, and life quality of people is improved.

Description

Method and device for detecting abnormal articles in scene

Technical Field

The invention relates to the field of abnormal article detection, in particular to a method and a device for detecting abnormal articles in a scene.

Background

With the continuous development of urbanization in China, various high-rise comprehensive construction groups such as comprehensive shopping malls, residential quarters, supermarket stores and the like emerge continuously. Human activities are also increasingly focused on various comprehensive building groups. However, various abnormal articles such as sundries, garbage and the like are often placed on the fire fighting access intentionally or unintentionally, which greatly affects the traveling of people, may hurt children and old people, and even when a fire occurs, the fire fighting access is full of sundries, so that people cannot escape in time.

The traditional abnormal article detection is mainly carried out manually, such as a cleaner cleans sundries on a channel, a special worker checks whether a fire fighting channel is blocked, and the like.

Therefore, a method for detecting abnormal articles is desired to replace manual inspection.

Disclosure of Invention

In view of this, the present application provides a method and an apparatus for detecting abnormal articles in a scene, which are used to detect abnormal articles that should not appear in the scene, greatly liberate manpower, and improve life quality of people.

In a first aspect of the present application, a method for detecting an abnormal object in a scene is provided, where the method includes:

respectively inputting the scene detection graph and the scene standard graph into corresponding sub-networks of the twin network, and respectively obtaining a feature graph of the scene detection graph and a feature graph of the scene standard graph; wherein the subnetworks in the twin network are of the same architecture; the scene detection graph and the scene standard graph have the same scene identification;

fusing the feature map of the scene detection map and the feature map of the scene standard map to obtain a common feature map; wherein the common feature map has all features of the scene detection map and the scene standard map;

calculating by adopting a plurality of void convolutions and the common feature graph respectively to obtain sub-feature graphs of the objects in the scene enclosed by the target frame; the hole convolution has a two-layer structure, and the first layer of hole convolution is used for calculating to obtain the central point of the target frame; the second layer of hole convolution is used for calculating to obtain the width and height values of the target frame;

and screening out the sub-feature maps with abnormal articles from the sub-feature maps of the various articles in the scene through a full-connection network, calculating the coordinates of the sub-feature maps with the abnormal articles, and using the target frame to circle the abnormal articles in the scene detection map.

Optionally, the twin network has two sub-networks, the respectively inputting the scene detection map and the scene standard map into the corresponding sub-networks of the twin network to respectively obtain the feature map of the scene detection map and the feature map of the scene standard map, and the method includes:

inputting a scene detection graph into a first sub-network of the twin network to obtain a feature map of the scene detection graph, and inputting a scene standard graph into a second sub-network of the twin network to obtain a feature map of the scene standard graph.

Optionally, the twin network has three sub-networks, and the inputting the scene detection diagram and the two scene standard diagrams into the corresponding sub-networks of the twin network respectively to obtain the feature diagrams of the scene detection diagram and the feature diagrams of the two scene standard diagrams respectively includes:

inputting a scene detection graph into a first sub-network of the twin network to obtain a feature map of the scene detection graph, and inputting a first scene standard graph into a second sub-network of the twin network to obtain a feature map of the first scene standard graph; and inputting a second scene standard diagram into a third sub-network of the twin network to obtain a characteristic diagram of the second scene standard diagram.

Optionally, the multiple types of items in the scene include:

in a market passage scene, the multiple types of articles comprise people, doors, fire extinguishers, warning boards and abnormal articles; wherein the abnormal articles are articles except human beings, doors, fire extinguishers and warning boards which are adjacent in a physical space and are not separated.

Optionally, the method for collecting multiple types of articles further includes:

collecting a scene standard graph;

putting a new article in the scene, collecting a corresponding scene detection graph, and marking the new article as an abnormal article; and the scene is a scene except the market channel.

Optionally, a sub-network in the twin network forms a backbone network by a residual error network and a feature pyramid network.

Optionally, the fusing the feature map of the scene detection map and the feature map of the scene standard map to obtain a common feature map includes:

stacking the feature map of the scene detection map and the feature map of the scene standard map along the number of channels;

calculating the stacked characteristic graph and a convolution kernel to obtain the common characteristic graph; the size of the convolution kernel is 3x3, the step length is 1, and the number of the convolution kernels is half of the number of channels. Optionally, the calculating, by using the plurality of hole convolutions and the common feature map, to obtain the sub-feature maps of the multiple types of objects in the scene enclosed by the target frame includes:

calculating three pre-target frames by respectively adopting three hole convolutions with expansion rates of 0, 1 and 2 and the common characteristic diagram;

stacking the three pre-target frames, and compressing to obtain a target frame;

and sub-feature maps of multiple types of objects in the scene encircled by the target frame.

Optionally, the method further includes:

and adjusting the first layer structure in the cavity convolution by using the coordinates of the sub-feature graph with the abnormal object as a feedback signal.

In a second aspect of the present application, there is provided an apparatus for detecting an anomalous object in a scene, the apparatus comprising:

the system comprises a twin network feature extraction unit, a twin network feature fusion unit, a cavity convolution calculation unit and an abnormal article detection unit;

the twin network feature extraction unit is used for respectively inputting the scene detection graph and the scene standard graph into corresponding sub-networks of the twin network to respectively obtain the feature graph of the scene detection graph and the feature graph of the scene standard graph; wherein the subnetworks in the twin network are of the same architecture; the scene detection graph and the scene standard graph have the same scene identification;

the twin network feature fusion unit is used for fusing the feature map of the scene detection map and the feature map of the scene standard map to obtain a common feature map; wherein the common feature map has all features of the scene detection map and the scene standard map;

the hole convolution calculating unit is used for calculating with the common feature map by adopting a plurality of hole convolutions to obtain sub-feature maps of various objects in a scene enclosed by the target frame; the hole convolution has a two-layer structure, and the first layer of hole convolution is used for calculating to obtain the central point of the target frame; the second layer of hole convolution is used for calculating to obtain the width and height values of the target frame;

the abnormal article detection unit is used for screening out the sub-feature graphs with the abnormal articles from the sub-feature graphs of the multiple types of articles in the scene through a full-connection network, calculating the coordinates of the sub-feature graphs with the abnormal articles, and enclosing the abnormal articles in the scene detection graph by the target frame.

Compared with the prior art, the technical scheme of the application has the advantages that:

in the technical method provided by the application, a scene detection graph and a scene standard graph are respectively input into corresponding sub-networks of a twin network, a feature graph of the scene detection graph and a feature graph of the scene standard graph are respectively obtained, then the feature graph of the scene detection graph and the feature graph of the scene standard graph are fused to obtain a common feature graph, then a plurality of hole convolutions are respectively adopted to be calculated with the common feature graph to obtain sub-feature graphs of multiple types of objects in the scene enclosed by a target frame, finally the sub-feature graphs of the abnormal objects are screened out from the sub-feature graphs of the multiple types of objects in the scene through a full-connection network, the coordinates of the sub-feature graphs of the abnormal objects are calculated, and the abnormal objects are enclosed by the target frame in the scene detection graph. According to the method for detecting the abnormal article, the scene detection diagram and the scene standard diagram are respectively input into the corresponding sub-networks of the twin network, the feature diagram of the scene detection diagram and the feature diagram of the scene standard diagram can be obtained through shared calculation, and then the obtained feature diagrams are fused to obtain the common feature diagram with the feature combination of the two feature diagrams. After the target frame is obtained through the hole convolution calculation, the obtained target frame has sparsity and representativeness. And screening the sub-feature maps with the abnormal articles through a full-connection network, calculating the coordinates of the sub-feature maps, mapping the coordinates to a scene detection map, and using the target frame to circle the abnormal articles, so that the positions of the abnormal articles in the detection picture can be obtained. Abnormal articles which should not appear in the scene originally are detected through the method, manpower is greatly liberated, and life quality of people is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a method for detecting an abnormal object in a scene according to the present application;

FIG. 2 is a flowchart of a method for detecting an abnormal object in another scenario provided by the present application;

fig. 3 is a schematic structural diagram of a device for detecting an abnormal object in a scene according to the present application.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a flowchart of a method for detecting an abnormal object in a scene provided by the present application, where the method may include the following steps 101-104.

Step 101: respectively inputting the scene detection graph and the scene standard graph into corresponding sub-networks of the twin network, and respectively obtaining a feature graph of the scene detection graph and a feature graph of the scene standard graph; wherein the subnetworks in the twin network are of the same architecture; the scene detection graph and the scene standard graph have the same scene identification.

The scene detection graph and the scene standard graph have the same scene identification, and the two pictures can be confirmed to be pictures in the same scene through the scene identification. The scene standard graph is a picture of a normal situation of the scene, that is, the scene standard graph does not have an abnormal object which should not appear in the scene. For example, the scene is a mall passageway scene, and the scene standard chart should have human beings, doors, fire extinguishers and warning boards, except for the human beings, the doors, the fire extinguishers and the warning boards, which belong to abnormal articles. The scene detection graph is a picture to be detected whether the scene to be detected has abnormal articles or not.

The abnormal object is an object that should not appear in the scene standard diagram, and is a complex irregular structure body different from the background in the scene standard diagram.

It should be noted that the twin network has a plurality of sub-networks, each sub-network has the same architecture, for example, a residual network and a feature pyramid network form a backbone network, and the two networks adopt a classical connection mode, wherein four stages of the residual network respectively have 4 layers of outputs, and the 4 layers are respectively subjected to upsampling, 3x3 convolution, 1x1 convolution and pooling layer calculation to obtain 5 layers of outputs of the classical feature pyramid network.

In a possible embodiment, the twin network has three subnetworks, then a scene detection map is input into a first subnetwork of the twin network, a feature map of the scene detection map is obtained, a first scene standard map is input into a second subnetwork of the twin network, a feature map of the first scene standard map is obtained; and inputting a second scene standard diagram into a third sub-network of the twin network to obtain a characteristic diagram of the second scene standard diagram.

Step 102: fusing the feature map of the scene detection map and the feature map of the scene standard map to obtain a common feature map; wherein the common feature map has all features of the scene detection map and the scene standard map.

The feature map of the scene detection map extracted through the sub-network has the features of the scene detection map, and the feature map of the scene standard map extracted through the sub-network has the features of the scene standard map. For example, if there are human beings, gates in the scene standard graph, the feature graph of the scene standard graph will have the features of the two types of features described above. If the scene detection map also has a table, then the scene detection map has the characteristics of the table in addition to the human and door characteristics. And fusing the obtained characteristics, wherein the obtained common sub-characteristic diagram has the characteristics of a class, a door and a table, and the table is known to be an abnormal object relative to a scene standard diagram.

Step 103: calculating by adopting a plurality of void convolutions and the common feature graph respectively to obtain sub-feature graphs of the objects in the scene enclosed by the target frame; the hole convolution has a two-layer structure, and the first layer of hole convolution is used for calculating to obtain the central point of the target frame; and the second layer of hole convolution is used for calculating and obtaining the width and height values of the target frame.

In the traditional method, a classic method of setting a target frame by a sliding window is often adopted, the target frame set manually not only has artificial subjective experience, but also is overlaid and paved on the whole picture, so that the subsequent calculation amount is extremely large. In the embodiment of the application, the hole convolution is adopted, and the hole convolution can enlarge the receptive field under the condition of not performing pooling loss information, so that each convolution output contains information in a larger range. The problems that global information is needed in the image and long sequence information is needed to be relied on can be well processed. In the embodiment of the application, after the hole convolution is adopted for calculation, the neural network can learn the position and the size of the target frame by itself. The method adopts a plurality of hole convolutions to carry out calculation respectively, can obtain a plurality of target frames with different receptive fields, enables the target frames to have universality and representativeness, and enables subsequent calculation to be more accurate and faster due to the target frames with high quality and less quantity. When the target frame is predicted, two branches can be set up, one branch predicts whether the selected point is the center of the target frame or not by calculating the detection picture through the hole convolution and taking a supervisory signal as the position of the center point of the target frame to obtain a feature map with the depth of 1, the other branch predicts the height and width offset of the target frame by calculating the detection picture through the hole convolution and taking the supervisory signal as the height and width offset of the target frame to obtain a feature map with the depth of 2. The hole convolution of the two branches is used for calculation, the center point, the width value and the height value of the target frame are obtained respectively, the position of the target frame on the picture can be determined, and the sub-feature graphs of various objects in the scene enclosed by the target frame can be displayed on the picture. The target frame obtained by the method is sparse and high in quality. The various types of objects may be people on the picture, doors, tables, etc.

In addition, since the feature size of the original feature map may not be the same as that of the calculated feature map, the feature map may be transformed by a self-deformation convolution of 1 × 1 once.

Step 104: and screening out the sub-feature maps with abnormal articles from the sub-feature maps of the various articles in the scene through a full-connection network, calculating the coordinates of the sub-feature maps with the abnormal articles, and using the target frame to circle the abnormal articles in the scene detection map.

The sub-feature map with the abnormal object can be found in the obtained sub-feature maps through a full-connection network, for example, three pictures are obtained in step 103, the target frame in picture 1 circles a person and a door in the picture, the target frame in picture 2 circles a door and a table in the picture, and the target frame in picture 3 circles a door in the picture. Wherein the table is an abnormal object, then the picture 2 is screened out through the full-connection network. And then calculating the coordinates of the scene detection picture with the table feature map in the scene detection picture in the image 2, and using the target frame to circle the abnormal object in the scene detection picture.

In the embodiment provided by the application, a scene detection graph and a scene standard graph are respectively input into corresponding sub-networks of a twin network, and a feature graph of the scene detection graph and a feature graph of the scene standard graph are respectively obtained; wherein the subnetworks in the twin network are of the same architecture; the scene detection graph and the scene standard graph have the same scene identification, and then the feature graph of the scene detection graph and the feature graph of the scene standard graph are fused to obtain a common feature graph; the common feature graph has all features of the scene detection graph and the scene standard graph, and then a plurality of hole convolutions are adopted to respectively calculate with the common feature graph to obtain sub-feature graphs of multiple types of objects in the scene enclosed by the target frame; the hole convolution has a two-layer structure, and the first layer of hole convolution is used for calculating to obtain the central point of the target frame; and the second-layer cavity convolution is used for calculating to obtain the width and height values of the target frame, screening the sub-feature graphs with abnormal articles from the sub-feature graphs of the various articles in the scene through a full-connection network, calculating the coordinates of the sub-feature graphs with the abnormal articles, and using the target frame to circle the abnormal articles in the scene detection graph. According to the method for detecting the abnormal article, the scene detection diagram and the scene standard diagram are respectively input into the corresponding sub-networks of the twin network, the feature diagram of the scene detection diagram and the feature diagram of the scene standard diagram can be obtained through shared calculation, and then the obtained feature diagrams are fused to obtain the common feature diagram with the feature combination of the two feature diagrams. After the target frame is obtained through the hole convolution calculation, the obtained target frame has sparsity and representativeness. And screening the sub-feature maps with the abnormal articles through a full-connection network, calculating the coordinates of the sub-feature maps, mapping the coordinates to a scene detection map, and using the target frame to circle the abnormal articles, so that the positions of the abnormal articles in the detection picture can be obtained. Abnormal articles which should not appear in the scene originally are detected through the method, manpower is greatly liberated, and life quality of people is improved.

In order to make the technical solution provided by the embodiment of the present invention clearer, the following describes the method for detecting abnormal articles provided by the embodiment of the present invention with an embodiment of a mall passageway in combination with fig. 2.

Assuming that sundries in a market channel scene need to be detected, a model needs to be trained before detection, and because the data volume is small, various monotonous backgrounds can be used as generalized scenes (such as a desktop, a lawn, floors with various colors, a roof, a road and the like), and the specific operation is to collect standard graphs of various generalized scenes; placing a new item in the generalized scene, such as placing a football on a lawn, acquiring a corresponding scene detection map, and marking the new item, i.e., the football, as an anomalous item. Therefore, more training data can be collected, and the model robustness is better. Through observation, in a market passage scene, people, doors, fire extinguishers and warning boards are certain to be in common categories, the categories are marked as non-sundry categories and used for distinguishing sundries, the categories are required to be marked and trained, and the total number of sundry labels is calculated to be 5. It should be noted that, in the labeling, as long as there is no partition between adjacent physical spaces, the labels should be identified as one sundry, for example, a lamp is placed on a table, and the labels should be identified as one sundry, not two.

Step 201: and respectively inputting the market channel scene detection graph and the market channel scene standard graph into corresponding sub-networks of the twin network to respectively obtain 5 characteristic graphs of the scene detection graph and 5 characteristic graphs of the scene standard graph.

In an embodiment of the present application, the twin network has two sub-networks, then a mall channel scene detection map is input into a first sub-network of the twin network, 5 feature maps of the mall channel scene detection map are obtained, a mall channel scene standard map is input into a second sub-network of the twin network, 5 feature maps of the mall channel scene standard map are obtained.

Step 202: and fusing the 5 characteristic graphs of the market channel scene detection graph and the 5 characteristic graphs of the market channel scene standard graph to obtain a common characteristic graph.

Specifically, the fusion mode is as follows: the mall channel scene detection graph and the mall channel scene standard graph respectively have 5 feature graphs, the number of the channels is 256, the dimensions are (256,256,256) (128,128,256) (64, 256) (32, 256) (16, 256), the same feature graphs are stacked along the number of the channels, the number of the channels of the feature graphs after stacking is 512, then convolution operation is carried out on the feature graphs obtained by stacking and convolution kernels with the size of 3x3, the step length of 1 and the number of 256 to obtain a common feature graph, and the number of the channels of the common feature graph obtained according to the convolution calculation is 256.

Step 203: and respectively calculating the hole convolutions with the expansion rates of 0, 1 and 2 and the common feature map to obtain sub-feature maps of the objects in the scene enclosed by the target frame.

In the embodiment of the application, three cavity convolutions are constructed, the expansion rates are respectively 0, 1 and 2, the three cavity convolutions are calculated in parallel, the height and the width are respectively set as H and W characteristic diagrams, H x W x1 characteristic diagrams are respectively generated through 3 paths of cavity convolution operation to predict whether each point is at the center, and H x W x 2 characteristic diagrams are used for predicting the offset of the height and the width of each target frame; and fusing the channels and the feature maps with the same function, and finally performing channel combination through 1x1 convolution.

Step 204: and screening out a sub-feature diagram with abnormal articles from the sub-feature diagrams of 5 types of articles in the market channel scene through a full-connection network, calculating the coordinates of the sub-feature diagram with the abnormal articles, and using the target frame to circle the abnormal articles in the scene detection diagram.

Specifically, in the embodiment of the present application, 5 feature maps are obtained in step 202, a series of target frames are generated in step 203, each target frame may be mapped to a feature vector in a mall channel scene detection picture, then two layers of full connection are performed on the feature vectors, so as to obtain the positions of the wide, high, and central points corresponding to the article types of the target frames, respectively, then the probabilities corresponding to each type of the article are obtained by classification, such as 10 percent of the probability of human being, 10 percent of the probability of a door, 30 percent of the probability of a fire extinguisher, 10 percent of a warning sign, and 40 percent of the probability of a trash can, so that the trash can should be an abnormal article in the mall channel scene, the position of the target frame that is circled out of the trash can is calculated, for example, the position of (0,0) is calculated, the trash can is circled out through the target frame in the mall channel scene detection picture, it is proved that the mall channel scene detection picture has an abnormal article trash can be added relative to the, and the garbage bin detects the lower left corner of picture in market passageway scene.

The coordinates of the sub-feature map having the abnormal object are used as a feedback signal to adjust the first layer structure in the void convolution.

The embodiment of the present invention provides a method for detecting an abnormal article in a scene, and also provides a device for detecting an abnormal article, as shown in fig. 3, including:

a twin network feature extraction unit 310, a twin network feature fusion unit 320, a hole convolution calculation unit 330 and an abnormal article detection unit 340;

the twin network feature extraction unit 310 may be configured to input the scene detection map and the scene standard map into corresponding sub-networks of the twin network, and obtain a feature map of the scene detection map and a feature map of the scene standard map, respectively; wherein the subnetworks in the twin network are of the same architecture; the scene detection graph and the scene standard graph have the same scene identification.

The twin network feature fusion unit 320 may be configured to fuse the feature map of the scene detection map and the feature map of the scene standard map to obtain a common feature map; wherein the common feature map has all features of the scene detection map and the scene standard map.

The hole convolution calculating unit 330 may be configured to calculate, by using a plurality of hole convolutions and the common feature map, sub-feature maps of multiple types of objects in a scene enclosed by the target frame; the hole convolution has a two-layer structure, and the first layer of hole convolution is used for calculating to obtain the central point of the target frame; and the second layer of hole convolution is used for calculating and obtaining the width and height values of the target frame.

The abnormal object detection unit 340 may be configured to screen a sub-feature map with an abnormal object from sub-feature maps of multiple types of objects in the scene through a full-connection network, calculate coordinates of the sub-feature map with the abnormal object, and enclose the abnormal object with the target frame in the scene detection map.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described apparatus embodiments are merely illustrative, and the units and modules described as separate components may or may not be physically separate. In addition, some or all of the units and modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The foregoing is directed to embodiments of the present invention, and it is understood that various modifications and improvements can be made by those skilled in the art without departing from the spirit of the invention.

Claims

1. A method for detecting an anomalous object in a scene, the method comprising:

calculating by adopting a plurality of void convolutions and the common feature graph respectively to obtain sub-feature graphs of the objects in the scene enclosed by the target frame; the hole convolution has a two-layer structure, and the first layer of hole convolution is used for calculating to obtain the central point of the target frame; the second layer of hole convolution is used for calculating and obtaining the width and height values of the target frame;

2. The method of claim 1, wherein the twin network has two sub-networks, and the inputting the scene detection map and the scene standard map into the corresponding sub-networks of the twin network respectively obtains the feature map of the scene detection map and the feature map of the scene standard map respectively comprises:

3. The method according to claim 1, wherein the twin network has three sub-networks, and the inputting the scene detection map and the two scene standard maps into the corresponding sub-networks of the twin network respectively to obtain the feature map of the scene detection map and the feature maps of the two scene standard maps respectively comprises:

4. The method of claim 1, wherein the plurality of classes of items in the scene comprise:

in a market passage scene, the multiple types of articles comprise people, doors, fire extinguishers, warning boards and abnormal articles; wherein the abnormal articles are articles except people, doors, fire extinguishers and warning boards which are adjacent in a physical space and are not separated.

5. The method according to claim 4, wherein the method for collecting the plurality of categories of objects further comprises:

collecting a scene standard graph;

6. The method of claim 1, wherein the sub-networks in the twin network comprise a backbone network consisting of a residual network and a feature pyramid network.

7. The method of claim 1, wherein the fusing the feature map of the scene detection map and the feature map of the scene standard map to obtain a common feature map comprises:

calculating the stacked characteristic graph and a convolution kernel to obtain the common characteristic graph; the size of the convolution kernel is 3x3, the step length is 1, and the number of the convolution kernels is half of the number of channels.

8. The method according to claim 1, wherein the calculating the plurality of hole convolutions and the common feature map to obtain the sub-feature maps of the plurality of classes of objects in the scene enclosed by the target frame comprises:

stacking the three pre-target frames, and compressing to obtain a target frame;

9. The method according to any one of claims 1-8, further comprising:

10. An apparatus for detecting anomalous objects in a scene, said apparatus comprising:

the hole convolution calculating unit is used for calculating with the common feature map by adopting a plurality of hole convolutions to obtain sub-feature maps of various objects in a scene enclosed by the target frame; the hole convolution has a two-layer structure, and the first layer of hole convolution is used for calculating to obtain the central point of the target frame; the second layer of hole convolution is used for calculating and obtaining the width and height values of the target frame;