WO2022113453A1

WO2022113453A1 - Abnormality detection device and abnormality detection method

Info

Publication number: WO2022113453A1
Application number: PCT/JP2021/031589
Authority: WO
Inventors: 全孔; 智明吉永
Original assignee: 株式会社日立製作所
Priority date: 2020-11-30
Filing date: 2021-08-27
Publication date: 2022-06-02
Also published as: JP2022086627A

Abstract

This abnormality detection system comprises: a graph data generation unit for detecting a plurality of elements at a prescribed monitoring target place and generating graph data indicating attributes of the respective elements and a relationship among the elements, on the basis of a video or an image acquired by photographing the monitoring target place; a spatial-temporal feature amount calculation unit for calculating the spatial-temporal feature amount of the graph data generated by the graph data generation unit; and an abnormality detection unit that detects an abnormality at the monitoring target place on the basis of the spatial-temporal feature amount calculated by the spatial-temporal feature amount calculation unit.

Description

Anomaly detection device, anomaly detection method

The present invention relates to an apparatus and a method for monitoring a monitored location and detecting an abnormality.

Conventionally, a person who behaves suspiciously is discovered based on images and images taken by a surveillance camera installed in a surveillance target place such as a public facility where an unspecified number of people enter and exit, and that person is regarded as a suspicious person. By handling it, a security system is used to prevent damage caused by threats such as crime and terrorist acts. In such a security system, it is necessary to detect a suspicious person from a large number of people reflected in the images and images taken by the surveillance camera, and in the past, the guards actually saw the images and images to make a suspicious decision. Was discovering the person. Judgment of such a suspicious person is difficult and highly depends on the professional skills of the guards. In addition, since it is necessary to constantly watch images and images during monitoring, the burden on the guards is heavy.

In recent years, in developed countries such as Japan, the shortage of security guards has become serious due to the decrease in the working population, and it has become difficult to secure security guards with specialized skills as described above. Further, in the above-mentioned security system, it is necessary to have a security guard who monitors the image permanently stationed, and there is also a problem that the cost for that is high. Therefore, in order to solve these problems, there is a demand for a technique for automatically detecting a suspicious person from the images and images of a surveillance camera.

For example, the technique of Patent Document 1 is known as a technique for automatically finding a suspicious person from an image of a surveillance camera. In Patent Document 1, in an abnormal behavior detecting device for detecting an abnormal behavior of a behavioral body reflected in a time-series image, the pattern or characteristic of the abnormal behavior is not defined in advance, and the behavior different from the normal defined behavior is taken. A technique for extracting a behavioral body as an abnormal body candidate and determining whether or not the abnormal body candidate is an abnormal body is disclosed.

Japanese Patent No. 6692086

In the technique of Patent Document 1, whether or not the behavior is abnormal is determined by paying attention only to the behavior performed by the behavioral body at a certain point in time. However, in the actual action that a suspicious person can take, the action must be interpreted in consideration of the relationship with other elements shown in the same image and the relationship between the actions performed at each time. , It is difficult to correctly judge whether or not the behavior is abnormal. Therefore, the technique of Patent Document 1 may not be able to reliably determine a suspicious person as an abnormal body.

Therefore, an object of the present invention is to provide a technique for accurately detecting suspicious behavior or abnormal behavior from images or images of various people or objects and detecting the abnormality.

The abnormality detection system according to the present invention detects a plurality of elements in the monitored place based on an image or an image obtained by photographing a predetermined monitored place, and the attributes of each element and the relationship between the elements. The graph data generation unit that generates graph data representing the sex, the spatiotemporal feature amount calculation unit that calculates the spatiotemporal feature amount of the graph data generated by the graph data generation unit, and the spatiotemporal feature amount calculation unit. An abnormality detecting unit for detecting an abnormality in the monitored place based on the calculated spatiotemporal feature amount is provided.

According to the present invention, it is possible to accurately detect suspicious behavior or abnormal behavior from images or images of various people or objects and detect the abnormality.

It is a block diagram which shows the structure of the abnormality detection system which concerns on 1st Embodiment of this invention. It is a block diagram which shows the structure of the graph data generation part. It is a figure which shows the outline of the processing performed by the graph data generation part. It is a figure which shows the data structure example of a graph database. It is a figure which shows the data structure example of a node database. It is a figure which shows the data structure example of an edge database. It is explanatory drawing of the graph data visualization editorial department. It is a block diagram which shows the structure of a node feature amount extraction part. It is a figure which shows the outline of the processing performed by a node feature amount extraction unit. It is a block diagram which shows the structure of the edge feature amount extraction part. It is a block diagram which shows the structure of the spatiotemporal feature amount calculation part. It is a figure which shows an example of the mathematical expression which represents the arithmetic processing in the space-time feature amount calculation part. It is a figure which shows the outline of the processing performed by the spatiotemporal feature amount calculation unit. It is a block diagram which shows the structure of an abnormality detection part. It is a figure which shows the outline of the processing performed by an abnormality detection unit. It is a block diagram which shows the structure of the judgment basis presentation part. It is a figure which shows the outline of the processing performed by the basis confirmation target selection unit and the subgraph extraction processing unit. It is a figure which shows the example of the abnormality detection screen displayed by the judgment basis presentation part.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. In order to clarify the explanation, the following description and drawings are omitted or simplified as appropriate. The present invention is not limited to this embodiment, and any application example consistent with the idea of the present invention is included in the technical scope of the present invention. Unless otherwise specified, each component may be plural or singular.

In the following description, for example, various information may be described by the expression of "xxx table", but various information may be expressed by a data structure other than the table. The "xxx table" may be referred to as "xxx information" in order to show that various types of information do not depend on the data structure.

Further, in the following description, a reference code (or a common part in the reference code) is used when the same type of element is not distinguished, and when the same type of element is described separately, the element ID (or the element ID) is used. Element reference code) may be used.

In the following description, a process may be described with a "program" or its process as the subject, but the program is executed by a processor (for example, a CPU (Central Processing Unit)) to perform a defined process. The subject of the process may be a processor because it is performed while appropriately using a storage resource (for example, a memory) and / or a communication interface device (for example, a communication port). The processor operates as a functional unit that realizes a predetermined function by operating according to a program. A device and system including a processor is a device and system including these functional parts.

[First Embodiment]
Hereinafter, the first embodiment of the present invention will be described.

FIG. 1 is a block diagram showing a configuration of an abnormality detection system according to the first embodiment of the present invention. The abnormality detection system 1 of the present embodiment is a system that detects a threat or a sign thereof generated in the monitored place as an abnormality based on an image or an image obtained by photographing a predetermined monitored place with a surveillance camera. .. The video or image used in the abnormality detection system 1 is a video or a moving image taken by a surveillance camera at a predetermined frame rate, and each is composed of a combination of a plurality of images acquired in time series. .. In the following, the images and images handled by the abnormality detection system 1 will be collectively referred to as “images” and described.

As shown in FIG. 1, the abnormality detection system 1 includes a camera moving image input unit 10, a graph data generation unit 20, a graph database 30, a graph data visualization editing unit 60, a node feature amount extraction unit 70, and an edge feature amount extraction unit 80. , Node feature amount storage unit 90, edge feature amount storage unit 100, spatiotemporal feature amount calculation unit 110, node feature amount acquisition unit 120, abnormality detection unit 130, threat sign degree storage unit 140, judgment basis presentation unit 150, and elements. It is configured to include a contribution storage unit 160. In the abnormality detection system 1, the camera moving image input unit 10, the graph data generation unit 20, the graph data visualization editing unit 60, the node feature amount extraction unit 70, the edge feature amount extraction unit 80, the spatiotemporal feature amount calculation unit 110, and the node feature. Each functional block of the quantity acquisition unit 120, the abnormality detection unit 130, and the judgment basis presentation unit 150 is realized by, for example, executing a predetermined program by a computer, and is realized by, for example, a graph database 30, a node feature amount storage unit 90, and an edge feature amount storage. The unit 100, the threat sign storage unit 140, and the element contribution storage unit 160 are realized by using a storage device such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive). In addition, a part or all of these functional blocks may be realized by using GPU (Graphics Processing Unit) or FPGA (Field Programmable Gate Array).

The camera moving image input unit 10 acquires video (moving image) data taken by a surveillance camera (not shown) and inputs it to the graph data generation unit 20.

The graph data generation unit 20 extracts one or more elements to be monitored from various subjects reflected in the image based on the image data input from the camera moving image input unit 10, and the attributes and elements for each element. Generate graph data showing the relationship between them. Here, the element to be monitored extracted by the graph data generation unit 20 is a movement or movement at a monitoring target place where the surveillance camera is installed among various people and objects reflected in the image captured by the surveillance camera. A person or object that is stationary. However, it is preferable to exclude objects that are permanently installed in the monitored area and buildings in which the monitored area exists from the elements to be monitored.

The graph data generation unit 20 sets a plurality of time ranges for the video by dividing the time-series video data into predetermined time intervals Δt, and generates graph data for each time range. Then, each generated graph data is recorded in the graph database 30 and output to the graph data visualization editing unit 60. The details of the graph data generation unit 20 will be described later with reference to FIGS. 2 and 3.

The graph database 30 stores the graph data generated by the graph data generation unit 20. The graph database 30 has a node database 40 and an edge database 50. The node database 40 stores node data representing the attributes of each element in the graph data, and the edge database 50 stores edge data representing the relationships between the elements in the graph data. The details of the graph database 30, the node database 40, and the edge database 50 will be described later with reference to FIGS. 4, 5, and 6.

The graph data visualization editing unit 60 visualizes the graph data generated by the graph data generation unit 20 and presents it to the user, and accepts the user to edit the graph data. The edited graph data is stored in the graph database 30. The details of the graph data visualization editing unit 60 will be described later with reference to FIG. 7.

The node feature amount extraction unit 70 extracts the node feature amount of each graph data based on the node data stored in the node database 40. The node feature amount extracted by the node feature amount extraction unit 70 is a numerical value of the features possessed by the attributes of each element in each graph data, and is extracted for each node constituting each graph data. The node feature amount extraction unit 70 stores the extracted node feature amount information in the node feature amount storage unit 90, and stores the weight used for calculating the node feature amount in the element contribution storage unit 160. The details of the node feature amount extraction unit 70 will be described later with reference to FIGS. 8 and 9.

The edge feature amount extraction unit 80 extracts the edge feature amount of each graph data based on the edge data stored in the edge database 50. The edge feature amount extracted by the edge feature amount extraction unit 80 is a numerical value of the features having a relationship between the elements in each graph data, and is extracted for each edge constituting each graph data. The edge feature amount extraction unit 80 stores the extracted edge feature amount information in the edge feature amount storage unit 100, and stores the weight used for calculating the edge feature amount in the element contribution storage unit 160. The details of the edge feature amount extraction unit 80 will be described later with reference to FIG.

The spatiotemporal feature amount calculation unit 110 calculates the spatiotemporal feature amount of the graph data based on the node feature amount and the edge feature amount of each graph accumulated in the node feature amount storage unit 90 and the edge feature amount storage unit 100, respectively. do. The spatiotemporal feature amount calculated by the spatiotemporal feature amount calculation unit 110 is the temporal and spatial feature amount of each graph data generated by the graph data generation unit 20 for each predetermined time interval Δt with respect to the time-series video data. It is a numerical value of various features, and is calculated for each node that composes each graph data. The spatiotemporal feature amount calculation unit 110 sets the feature amounts of other nodes adjacent to each node in the spatial direction and the temporal direction with respect to the node feature amount accumulated for each node, and the adjacent node. A folding operation is performed in which weights are applied to the feature amounts of the edges set in between. By repeating such a convolution operation a plurality of times, it is possible to calculate the spatiotemporal feature amount that reflects the potential relationship between the feature amount of each node and the adjacent node. The spatiotemporal feature amount calculation unit 110 updates the node feature amount accumulated in the node feature amount storage unit 90, reflecting the calculated spatiotemporal feature amount. The details of the spatiotemporal feature amount calculation unit 110 will be described later with reference to FIGS. 11, 12, and 13.

The node feature amount acquisition unit 120 acquires the node feature amount stored in the node feature amount storage unit 90 reflecting the spatiotemporal feature amount calculated by the spatiotemporal feature amount calculation unit 110, and causes the abnormality detection unit 130 to acquire the node feature amount. input.

The abnormality detection unit 130 calculates the threat sign degree of each element reflected in the image captured by the surveillance camera based on the node feature amount input from the node feature amount acquisition unit 120. The threat sign degree is a value indicating the degree to which the behavior or characteristic of a person or object corresponding to each element is considered to correspond to a threat such as a crime or terrorist act or a sign thereof. Then, if there is a person or a suspicious object that behaves suspiciously, it is detected based on the calculation result of the threat sign degree of each element. Here, the node feature amount input from the node feature amount acquisition unit 120 reflects the spatiotemporal feature amount calculated by the spatiotemporal feature amount calculation unit 110 as described above. That is, the anomaly detection unit 130 detects an abnormality in the monitoring location where the surveillance camera is installed by calculating the threat sign degree of each element based on the spatiotemporal feature amount calculated by the spatiotemporal feature amount calculation unit 110. It is something to do. The abnormality detection unit 130 stores the calculated threat sign degree and abnormality detection result of each element in the threat sign degree storage unit 140. The details of the abnormality detection unit 130 will be described later with reference to FIGS. 14 and 15.

The determination basis presentation unit 150 stores each graph data stored in the graph database 30, the threat predictive degree for each element of each graph data stored in the threat predictive degree storage unit 140, and the element contribution degree storage unit 160. An abnormality detection screen showing the processing result of the abnormality detection system 1 is presented to the user based on the weighting coefficient at the time of calculating the node feature amount and the edge feature amount. The abnormality detection screen includes information on a person or object detected as a suspicious person or a suspicious object by the abnormality detection unit 130, as well as information indicating the grounds for the abnormality detection unit 130 to make the determination. By looking at the abnormality detection screen presented by the determination basis presentation unit 150, the user detects which person or object is a suspicious person or suspicious object among various people or objects reflected in the image for what reason. You can check if it was done. The details of the determination basis presentation unit 150 will be described later with reference to FIGS. 16, 17, and 18.

Next, the details of each of the above functional blocks will be described below.

FIG. 2 is a block diagram showing the configuration of the graph data generation unit 20. As shown in FIG. 2A, the graph data generation unit 20 includes an entity detection processing unit 21, an in-video co-reference analysis unit 22, and a relationship detection processing unit 23.

The entity detection processing unit 21 performs entity detection processing on the video data input from the camera moving image input unit 10. The entity detection process performed by the entity detection processing unit 21 is a process of detecting a person or an object corresponding to a monitored element from a video and estimating the attribute of each element. As shown in FIG. 2B, the entity detection processing unit 21 includes a person / object detection processing unit 210, a person / object tracking processing unit 211, and a person / object attribute estimation unit 212.

The person / object detection processing unit 210 uses a predetermined algorithm or tool (for example, OpenCV, Faster R-CNN, etc.) for each time range in which the time-series video data is divided by a predetermined time interval Δt, and in the video. Detects people and objects reflected in the image as elements to be monitored. Then, a unique ID is assigned to each detected element as a node ID, a frame surrounding the area in the image of each element is set, and frame information regarding the position and size of the frame is acquired.

The person / object tracking processing unit 211 uses a predetermined object tracking algorithm or tool (for example, Deepsort) based on the frame information of each element acquired by the person / object detection processing unit 210 to display time-series video data. Track each element. Then, the tracking information indicating the result of the tracking process of each element is acquired and associated with the node ID of each element.

The person / object attribute estimation unit 212 estimates the attributes of each element based on the tracking information of each element acquired by the person / object tracking processing unit 211. Here, for example, the entropy of each frame extracted by sampling video data at a predetermined sampling rate (eg: 1 fps) is calculated. The entropy of each frame is calculated by H = plog (1-p), for example, assuming that the reliability of the detection result of each frame is p (p ∈ {0,1}). Then, the attribute estimation of each element is performed using the image information of the person or the object in the frame having the highest calculated entropy value. Attribute estimation is performed, for example, using a pre-learned attribute estimation model, such as the appearance or behavioral characteristics of a person or object, such as gender, age, clothing, maskedness, size, color, stay. Time etc. are estimated. Once the attributes of each element can be estimated, the attribute information is linked to the node ID of each element.

In the entity detection processing unit 21, various persons and objects reflected in the image are detected as elements to be monitored by the processing of each block described above, and the characteristics of each person and each object are acquired as attributes for each element. At the same time, a unique node ID is assigned to each element. Then, tracking information and attribute information of each element are set in association with the node ID. These pieces of information are stored in the node database 40 as node data representing the characteristics of each element.

The in-video co-reference analysis unit 22 performs in-video co-reference analysis on the node data acquired by the entity detection processing unit 21. The in-video co-reference analysis performed by the in-video co-reference analysis unit 22 is a process of correcting the node ID given to each element in the node data by mutually referencing the images of each frame in the video. be. In the entity detection process performed by the entity detection processing unit 21, different node IDs may be erroneously assigned to the same person or object, and the frequency of occurrence varies depending on the performance of the algorithm. The in-video co-reference analysis unit 22 corrects such an error in the node ID by performing the in-video co-reference analysis. As shown in FIG. 2C, the in-video co-reference analysis unit 22 includes a maximum entropy frame sampling processing unit 220, a tracking matching processing unit 221 and a node ID updating unit 222.

The maximum entropy frame sampling processing unit 220 samples the frame having the highest entropy value in the video data, and reads the node data of each element detected in that frame from the node database 40. Then, based on the read node data, the template image of each element is acquired by extracting the image area corresponding to each element in the image of the frame.

The tracking matching processing unit 221 is based on the template image acquired by the maximum entropy frame sampling processing unit 220 and the tracking information included in the node data of each element read from the node database 40, between the frames. Perform template matching. Here, the range in which each element exists in the image of each frame is estimated from the tracking information, and template matching using the template image is performed within the estimated image range.

The node ID update unit 222 updates the node ID assigned to each element based on the result of the template matching of each element performed by the tracking matching processing unit 221. Here, by assigning a common node ID to the elements matched as the same person or object among a plurality of frames by template matching, the node data of each element stored in the node database 40 can be obtained. Align. Then, by dividing the matched node data into fixed time interval Δt, dividing the attribute information and tracking information, and linking them to the node ID of each element, each time range set at the time interval Δt interval is used. Generate node data for each element in the graph data. The node data generated in this way is stored in the node database 40 together with the graph ID uniquely set for each graph data.

The relationship detection processing unit 23 performs the relationship detection processing on the video data input from the camera moving image input unit 10 based on the node data whose node ID has been updated by the in-video co-reference analysis unit 22. The relationship detection process performed by the relationship detection processing unit 23 is a process of detecting mutual relationships with respect to a person or an object detected as a monitored element by the entity detection processing unit 21. As shown in FIG. 2D, the relationship detection processing unit 23 includes a person / object relationship detection processing unit 230 and a person behavior detection processing unit 231.

The person / object relationship detection processing unit 230 detects the relationship between the person and the object reflected in the image based on the node data of each element read from the node database 40. Here, for example, using a person / object relationship detection model that has been learned in advance, actions such as "carrying", "opening", and "leaving" that a person performs on an object such as luggage are detected as the relationship between the two. do.

The person behavior detection processing unit 231 detects the interaction behavior between people reflected in the video based on the node data of each element read from the node database 40. Here, for example, using a person interaction behavior detection model that has been learned in advance, actions such as "conversation" and "delivery" performed by a plurality of people together are detected as interaction actions between each person.

The relationship detection processing unit 23 detects the action performed by one person on another person or object with respect to the person or object detected as the monitored element by the entity detection processing unit 21 by the processing of each block described above. And the behavior is acquired as a mutual relationship. This information is stored in the edge database 50 as edge data representing the relationship between each element.

FIG. 3 is a diagram showing an outline of the processing performed by the graph data generation unit 20. As shown in FIG. 3, the graph data generation unit 20 is the object 3 carried by the person 2 and the person 2 from the image taken by the camera moving image input unit 10 by the entity detection process performed by the entity detection processing unit 21. Are detected and these are tracked in the video. Further, the relationship detection process performed by the relationship detection processing unit 23 detects the relationship between the person 2 and the object 3. Then, based on these processing results, graph data composed of a plurality of nodes and edges is generated for each fixed time interval Δt. In this graph data, for example, the person 2 is represented by the node P1 and the object 3 is represented by the node O1, and attribute information indicating the characteristics thereof is set for each of these nodes. Further, an edge of "carrying" indicating the relationship between the person 2 and the object 3 is set between the node P1 and the node O1. The information of the graph data thus generated is stored in the graph database 30.

FIG. 4 is a diagram showing an example of the data structure of the graph database 30. As shown in FIG. 4, the graph database 30 is represented by, for example, a data table including columns 301 to 304. Column 301 stores a series of reference numbers set for each row of the data table. A graph ID unique to each graph data is stored in the column 302. The start time and end time of the time range corresponding to each graph data are stored in

columns

303 and 304, respectively. The start time and end time are calculated from the shooting start time and shooting end time recorded in the video used to generate each graph data, and the difference is equal to the above-mentioned time interval Δt. The graph database 30 is configured by storing this information row by row for each graph data.

FIG. 5 is a diagram showing an example of a data structure of the node database 40. The node database 40 is composed of a node attribute table 41 shown in FIG. 5A, a tracking information table 42 shown in FIG. 5B, and a frame information table 43 shown in FIG. 5C.

As shown in FIG. 5A, the node attribute table 41 is represented by, for example, a data table including columns 411 to 414. Column 411 stores a series of reference numbers set for each row of the data table. The graph ID of the graph data to which each node belongs is stored in the column 412. The value of this graph ID is associated with the value of the graph ID stored in the column 302 in the data table of FIG. 4, whereby each node is associated with the graph data. Column 413 stores a node ID unique to each node. Column 414 stores the attribute information acquired for the element represented by each node. The node attribute table 41 is configured by storing this information row by row for each node.

As shown in FIG. 5B, the tracking information table 42 is represented by, for example, a data table including columns 421 to 424. Column 421 stores a series of reference numbers set for each row of the data table. In column 422, the node ID of the node targeted by each tracking information is stored. The value of this node ID is associated with the value of the node ID stored in column 413 in the data table of FIG. 5A, whereby each tracking information is associated with the node. Column 423 stores a track ID unique to each tracking information. Column 424 stores a list of frame IDs of each frame in which the element represented by the node is reflected in the video. The tracking information table 42 is configured by storing this information row by row for each tracking information.

As shown in FIG. 5C, the frame information table 43 is represented by, for example, a data table including columns 431 to 434. Column 431 stores a series of reference numbers set for each row of the data table. The track ID of the tracking information to which each frame information belongs is stored in the column 432. The value of the track ID is associated with the value of the track ID stored in the column 423 in the data table of FIG. 5B, whereby each frame information and the tracking information are associated with each other. A frame ID unique to each frame information is stored in the column 433. The column 434 stores information indicating the position of each element in the frame represented by the frame information and the type of each element (person, object, etc.). The frame information table 43 is configured by storing this information row by row for each frame information.

FIG. 6 is a diagram showing an example of the data structure of the edge database 50. As shown in FIG. 6, the edge database 50 is represented by, for example, a data table containing columns 501-506. Column 501 stores a series of reference numbers set for each row of the data table. The graph ID of the graph data to which each edge belongs is stored in the column 502. The value of this graph ID is associated with the value of the graph ID stored in the column 302 in the data table of FIG. 4, whereby each edge is associated with the graph data. The

nodes

503 and 504 store the node IDs of the nodes located at the start point and the end point of each edge, respectively. The values of these node IDs are associated with the values of the node IDs stored in column 413 in the data table of FIG. 5A, whereby each edge represents the relationship between which nodes. Is specified. Column 505 stores an edge ID unique to each edge. Column 506 stores, as edge information representing the relationship between the elements represented by the edge, the content of the action performed by the person corresponding to the start point node on another person or object corresponding to the end point node. The edge database 50 is configured by storing this information row by row for each edge.

FIG. 7 is an explanatory diagram of the graph data visualization editing unit 60. The graph data visualization editing unit 60 displays, for example, the graph data editing screen 61 shown in FIG. 7 on a display (not shown) and presents it to the user. On the graph data editing screen 61, the user can arbitrarily edit the graph data by performing a predetermined operation. For example, the graph data 610 generated by the graph data generation unit 20 is visualized and displayed on the graph data editing screen 61. In the graph data 610, the user selects an arbitrary node or edge on the screen to display the

node information boxes

611 and 612 showing the detailed information of the node and the edge information box 613 showing the detailed information of the edge. Can be done. The attribute information of each node is displayed in these information boxes 611 to 613. The user can edit the content of each attribute information shown in the underlined portion by selecting arbitrary attribute information in the information boxes 611 to 613.

Further, on the graph data editing screen 61, a node addition button 614 and an edge addition button 615 are displayed together with the graph data 610. The user can add a node or an edge to the graph data 610 at an arbitrary position by selecting the node addition button 614 or the edge addition button 615 on the screen. Further, by selecting an arbitrary node or edge in the graph data 610 and performing a predetermined operation (for example, dragging or right-clicking the mouse), the node or edge can be moved or deleted.

The graph data visualization editing unit 60 can appropriately edit the contents of the generated graph data by the user's operation as described above. Then, the graph database 30 is updated to reflect the edited graph data.

FIG. 8 is a block diagram showing the configuration of the node feature amount extraction unit 70. As shown in FIG. 7, the node feature amount extraction unit 70 includes a maximum entropy frame sampling processing unit 71, a person / object area image acquisition unit 72, an image feature amount calculation unit 73, an attribute information acquisition unit 74, and an attribute information feature amount calculation. It is configured to include a unit 75, a feature amount coupling processing unit 76, an attribute weight calculation attention mechanism 77, and a node feature amount calculation unit 78.

The maximum entropy frame sampling processing unit 71 reads the node data of each node from the node database 40, and samples the frame having the maximum entropy in the video for each node.

The person / object area image acquisition unit 72 acquires the area image of the person or object corresponding to the element represented by each node from the frame sampled by the maximum entropy frame sampling processing unit 71.

The image feature amount calculation unit 73 calculates the image feature amount for each element represented by each node from the area image of each person or each object acquired by the person / object area image acquisition unit 72. Here, for example, when a DNN (Deep Neural Network) for object classification that has been learned in advance using a large-scale image data set (for example, MSCOCO) is used and a region image of each element is input to this DNN. The image feature amount is calculated by extracting the output from the intermediate layer. If the image feature amount can be calculated for the area image of each element, another method may be used.

The attribute information acquisition unit 74 reads the node information of each node from the node database 40 and acquires the attribute information of each node.

The attribute information feature amount calculation unit 75 calculates the feature amount of the attribute information for each element represented by each node from the attribute information acquired by the attribute information acquisition unit 74. Here, for example, by using a predetermined language processing algorithm (for example, word2Vec) for the text data constituting the attribute information, the attribute items (gender, age, clothes, presence / absence of wearing a mask) of each element represented by the attribute information are used. , Size, color, staying time, etc.) Calculate the feature amount. If the attribute information feature amount can be calculated for the attribute information of each element, another method may be used.

The feature amount combination processing unit 76 performs a combination process of combining the image feature amount calculated by the image feature amount calculation unit 73 and the feature amount of the attribute information calculated by the attribute information feature amount calculation unit 75. Here, for example, the feature amount for the feature of the whole person or object represented by the image feature amount and the feature amount for each attribute item of the person or object represented by the attribute information are set as vector components according to the feature amount of each of these items. Create a feature vector for each element.

The attribute weight calculation attention mechanism 77 acquires the weight for each item of the feature amount for the feature amount combined by the feature amount combination processing unit 76. Here, for example, the weights learned in advance are acquired for each vector component of the feature amount vector. Attribute weight calculation The weight information acquired by the attention mechanism 77 is stored in the element contribution storage unit 160 as an element contribution indicating the contribution of each node feature amount to the threat sign degree calculated by the anomaly detection unit 130. Will be done.

The node feature amount calculation unit 78 performs weighting processing by multiplying the feature amount combined by the feature amount combination processing unit 76 by the weight acquired by the attribute weight calculation attention mechanism 77, and calculates the node feature amount. calculate. That is, the node feature amount is calculated by summing the values obtained by multiplying each vector component of the feature amount vector by the weight set by the attribute weight calculation attention mechanism 77.

In the node feature amount extraction unit 70, the node feature amount representing the attribute feature amount for each element is generated for each graph data generated for each time range set in the time interval Δt interval by the processing of each block described above. Be extracted. The extracted node feature amount information is stored in the node feature amount storage unit 90.

FIG. 9 is a diagram showing an outline of the processing performed by the node feature amount extraction unit 70. As shown in FIG. 9, the node feature amount extraction unit 70 calculates the image feature amount by the image feature amount calculation unit 73 for the frame having the maximum entropy of the person 2 in the video corresponding to each graph data. , The attribute information feature amount calculation unit 75 calculates the feature amount for each attribute item of the attribute information of the node P1 corresponding to the person 2, so that the "whole body feature amount", "mask", "skin color", and "skin color" are calculated. The feature amount of the node P1 is obtained for each item such as "stay time". Then, the feature amount of the node P1 is extracted by performing the weighting operation for each of these items by the node feature amount calculation unit 78 using the weight acquired by the attribute weight calculation attention mechanism 77. By performing the same calculation for each of the other nodes, the feature amount of each node of the graph data can be obtained. The weight acquired by the attribute weight calculation attention mechanism 77 is stored in the element contribution storage unit 160 as the element contribution.

FIG. 10 is a block diagram showing the configuration of the edge feature amount extraction unit 80. As shown in FIG. 10, the edge feature amount extraction unit 80 includes an edge information acquisition unit 81, an edge feature amount calculation unit 82, an edge weight calculation attention mechanism 83, and a weighting calculation unit 84.

The edge information acquisition unit 81 reads and acquires the edge information of each edge from the edge database 50.

The edge feature amount calculation unit 82 calculates the edge feature amount, which is the feature amount of the relationship between the elements represented by each edge, from the edge information acquired by the edge information acquisition unit 81. Here, for example, the edge feature amount is calculated by using a predetermined language processing algorithm (for example, word2Vec) for text data such as "passing" and "conversation" representing the action contents set as edge information. ..

The edge weight calculation attention mechanism 83 acquires the weight for the edge feature amount calculated by the edge feature amount calculation unit 82. Here, for example, the weight learned in advance is acquired for the edge feature amount. The weight information acquired by the edge weight calculation attention mechanism 83 is stored in the element contribution storage unit 160 as an element contribution representing the contribution of the edge feature amount to the threat sign degree calculated by the abnormality detection unit 130.

The weighting calculation unit 84 performs weighting processing by multiplying the edge feature amount calculated by the edge feature amount calculation unit 82 by the weight acquired by the edge weight calculation attention mechanism 83, and the weighted edge feature. Calculate the amount.

In the edge feature amount extraction unit 80, the edge feature amount representing the feature amount of the relationship between the elements for each graph data generated for each time range set in the time interval Δt interval by the processing of each block described above. Is extracted. The extracted edge feature amount information is stored in the edge feature amount storage unit 100.

FIG. 11 is a block diagram showing the configuration of the spatiotemporal feature amount calculation unit 110. As shown in FIG. 11, the spatiotemporal feature amount calculation unit 110 includes a plurality of residual convolution calculation blocks 111 and a node feature amount update unit 112. Each residual convolution calculation block 111 corresponds to a predetermined number of stages, receives the calculation result of the residual convolution calculation block 111 in the previous stage, executes the convolution operation, and inputs the convolution operation to the residual convolution calculation block 111 in the subsequent stage. The node feature amount and the edge feature amount read from the node feature amount storage unit 90 and the edge feature amount storage unit 100 are input to the residual convolution calculation block 111 in the front stage, and the residual convolution calculation block 111 in the final stage is input. The calculation result of is input to the node feature amount update unit 112. As a result, the calculation of spatiotemporal features using GNN (Graph Neural Network) is realized.

The spatiotemporal feature amount calculation unit 110 performs the above-mentioned convolution operation in each of the plurality of residual convolution calculation blocks 111. In order to realize this convolution operation, each residual convolution calculation block 111 includes two space convolution calculation processing units 1110 and one time convolution calculation processing unit 1111.

The space convolution calculation processing unit 1110 calculates the outer product of the feature amount of the adjacent node adjacent to each node in the graph data and the feature amount of the edge set between each node and the adjacent node as the convolution calculation in the spatial direction. Then, a weighting operation using a weight matrix of D × D size is performed on this outer product. Here, the value of the degree D of the weight matrix is defined as the length of the feature amount of each node. This ensures a variety of learning using a learnable weighted linear transformation. Further, since the weight matrix can be designed without being restricted by the number of nodes and edges constituting the graph data, the weighting operation can be performed using the optimum weight matrix.

In the residual convolution calculation block 111, the spatial convolution calculation processing unit 1110 performs a weighting calculation twice for each node constituting the graph data. As a result, the convolution operation in the spatial direction is realized.

The time-convolution calculation processing unit 1111 performs a time-direction convolution calculation on the feature amount of each node for which the spatial-direction convolution calculation is performed by the two space convolution calculation processing units 1110. Here, for the node adjacent to each node in the time direction, that is, the feature amount of the node representing the same person or object as the node in the graph data generated for the video in the adjacent time range, and the adjacent node. The outer product with the set feature amount of the edge is calculated, and the weighting operation similar to that of the space convolution calculation processing unit 1110 is performed on this outer product. This realizes a convolution operation in the time direction.

By adding the spatiotemporal feature amount calculated by the spatial and temporal convolution operations described above and the node feature amount input to the residual convolution operation block 111, the operation result of the residual convolution operation block 111 is obtained. Desired. By performing such an operation, it is possible to perform a convolution operation in which the features of both the adjacent nodes adjacent to each other in the spatial direction and the temporal direction and the edges between the adjacent nodes are added to the features of each node at the same time.

The node feature amount update unit 112 updates the feature amount of each node accumulated in the node feature amount storage unit 90 by using the calculation result output from the residual convolution calculation block 111 in the final stage. As a result, the spatiotemporal features calculated for each node constituting the graph data are reflected in the features of each node.

The spatiotemporal feature amount calculation unit 110 can calculate the spatiotemporal feature amount of each graph data using GNN by the processing of each block described above, and can update the node feature amount by reflecting it on the node feature amount. .. In the GNN learning in the spatiotemporal feature calculation unit 110, it is preferable to learn the residual function with reference to the input of any layer. In this way, even if the layer at the time of learning is deep, a gradient explosion or a gradient explosion occurs. The problem of vanishing gradient can be prevented. Therefore, it is possible to calculate the node features that reflect more accurate spatiotemporal information.

FIG. 12 is a diagram showing an example of a mathematical formula representing arithmetic processing in the space convolution arithmetic processing unit 1110. The space convolution calculation processing unit 1110 performs a space convolution calculation by, for example, calculating each of the matrix calculation formulas as shown in FIG. 12. Then, concatenation or average pooling is performed on the obtained tensor of N × D × P (N is the number of nodes, D is the length of the node features, and P is the number of channels of the matrix operation = the length of the edge features). By repeating this for the number of residual convolution calculation blocks 111 provided according to the number of layers of GNN, the feature amount after spatial convolution is calculated, and the time convolution calculation is further performed to perform the spatiotemporal feature amount. Is calculated and reflected in the node features.

Here, the convolution operation performed by the space convolution calculation processing unit 1110 and the convolution calculation performed by the time convolution calculation processing unit 1111 are expressed by the following mathematical formulas (1) and (2), respectively.

In the formula (1), O represents the concatenation or average pooling, φ represents the non-linear activation function, and l represents the layer number of the GNN corresponding to the space convolution arithmetic processing unit 1110. Further, in the mathematical formula (2), k represents the layer number of the GNN corresponding to the time convolution calculation processing unit 1111.

Further, in FIG. 12 and mathematical formulas (1) and (2), H ^{N × D} represents a matrix of spatial node features, N represents the number of nodes in the graph data, and D represents the length (order) of the node features. Each is represented. M _i ^{L × D} represents a matrix of time node features for the i-th node, and L represents the length of time. EN ^{× N × P} represents a matrix of edge features, and _Eij represents an edge feature (order P) connecting the i-th node and the j-th node. Here, if there is no edge connecting the i-th node and the j-th node, _Eij = 0.

Further, in FIG. 12 and mathematical formulas (1) and (2), Fi ^{1 × D} represents a matrix of time node features for the _i -th node. _Fij represents the existence or nonexistence of the jth node in the jth graph data. Here, if the j-th node does not exist in the j-th graph data, F _ij = 0, and if it exists, F _ij = 1.

Further, in FIG. 12 and the equations (1) and (2), Q1 ^{× L} represents a convolutional kernel for weighting the relationship between the nodes in the time direction, and _WS ^l is D regarding the node features in the spatial direction. It ^represents a weighting matrix of × D size, and _WTk represents a weighting matrix of D × D size regarding the node features in the time direction.

FIG. 13 is a diagram showing an outline of the processing performed by the spatiotemporal feature amount calculation unit 110. In FIG. 13, the dotted line represents the space convolution calculation by the space convolution calculation processing unit 1110, and the broken line represents the time convolution calculation by the time convolution calculation processing unit 1111. As shown in FIG. 13, for example, node 3 in the t-th graph data has a space corresponding to the feature amounts of

adjacent nodes

1 and 4 and the feature amount of the edge set between these adjacent nodes. Features are added by spatial convolution. Further, the time feature amount corresponding to the feature amount of the node 3 in the immediately preceding t-1st graph data and the feature amount of the node 3 in the immediately following t + 1st graph data is added by the time convolution operation. As a result, the spatiotemporal feature amount of the t-th graph data for the node 3 is calculated and reflected in the feature amount of the node 3.

FIG. 14 is a block diagram showing the configuration of the abnormality detection unit 130. As shown in FIG. 14, the abnormality detection unit 130 includes a feature quantity distribution clustering unit 131, a center point distance calculation unit 132, and an abnormality determination unit 133.

The feature amount distribution clustering unit 131 performs clustering processing of the feature amount of each node acquired from the node feature amount storage unit 90 by the node feature amount acquisition unit 120, and obtains the distribution of the node feature amount. Here, for example, the distribution of the node features is obtained by plotting the features of each node on a two-dimensional map.

The center point distance calculation unit 132 calculates the distance from the center point of each node feature amount in the distribution of the node feature amount obtained by the feature amount distribution clustering unit 131. As a result, the features of each node reflecting the spatiotemporal features are compared with each other. The distance from the center point of each node feature amount calculated by the center point distance calculation unit 132 is stored in the threat sign degree storage unit 140 as a threat sign degree indicating the degree of threat of the element corresponding to each node.

The abnormality determination unit 133 determines the threat sign degree of each node based on the distance calculated by the center point distance calculation unit 132. As a result, if there is a node with a threat sign of more than a predetermined value, the element corresponding to that node is determined to be a suspicious person or a suspicious object, an abnormality in the monitored location is detected, and the user is notified. .. Notification to the user is performed using, for example, an alarm device (not shown). At this time, the position of an element determined to be a suspicious person or a suspicious object may be highlighted in the image of the surveillance camera. The abnormality detection result by the abnormality determination unit 133 is stored in the threat sign degree storage unit 140 in association with the threat sign degree.

The anomaly detection unit 130 detects an abnormality in the monitored location based on the spatiotemporal feature amount calculated by the spatiotemporal feature amount calculation unit 110 by the processing of each block described above, and also detects the spatiotemporal feature for each element. The quantities can be compared with each other, and the threat predictiveness for each element can be obtained based on the comparison result.

FIG. 15 is a diagram showing an outline of the processing performed by the abnormality detection unit 130. As shown in FIG. 15, the abnormality detection unit 130 plots the node features for which the spatiotemporal features have been determined for each node of the graph data including the nodes P3, P6, and O2 on a two-dimensional map. Find the distribution of node features. Then, the central point of the distribution of the obtained node feature amount is obtained, and the distance from this center point to each node feature amount is calculated to obtain the threat sign degree of each node. As a result, it is determined that the element corresponding to the node whose threat sign degree is equal to or higher than the predetermined value, for example, the person corresponding to the node P6 whose node feature amount is outside the distribution circle 4 on the distribution map is a suspicious person or a suspicious object. Judgment is made and an abnormality is detected.

FIG. 16 is a block diagram showing the configuration of the determination basis presentation unit 150. As shown in FIG. 16, the judgment basis presentation unit 150 includes a basis confirmation target selection unit 151, a subgraph extraction processing unit 152, a person attribute threat contribution presentation unit 153, an object attribute threat contribution presentation unit 154, and an action history contribution presentation. It is configured to include a unit 155 and a verbalization summary generation unit 156.

The basis confirmation target selection unit 151 acquires the threat sign degree stored in the threat sign degree storage unit 140, and includes the node in which the abnormality is detected by the abnormality detection unit 130 based on the acquired threat sign degree of each node. Select any part of the graph data as the target for confirming the basis for abnormality detection. Here, for example, the part related to the node with the highest threat sign may be automatically selected, or an arbitrary node may be specified according to the user's operation and the part related to that node may be selected. good.

The subgraph extraction processing unit 152 acquires the graph data stored in the graph database 30, and extracts the portion selected by the basis confirmation target selection unit 151 in the acquired graph data as a subgraph indicating the basis confirmation target for abnormality detection. do. For example, the node with the highest threat sign or the node specified by the user, each node connected to the node, and each edge are extracted as a subgraph.

When the node included in the subgraph extracted by the subgraph extraction processing unit 152 represents a person, the person attribute threat contribution presentation unit 153 calculates and visualizes the contribution of the person's attribute to the threat sign degree. Present to the user. For example, the element contribution degree stored in the element contribution degree storage unit 160 for various attribute items (gender, age, clothes, whether or not a mask is worn, staying time, etc.) represented by the attribute information included in the node information of the node. That is, the contribution of each attribute item is calculated based on the weight of each attribute item with respect to the node feature amount. Then, a predetermined number of attribute items are selected from the one with the highest calculated contribution, and the content and contribution of each attribute item are presented in a predetermined layout on the abnormality detection screen.

When the node included in the subgraph extracted by the subgraph extraction processing unit 152 represents an object, the object attribute threat contribution presentation unit 154 calculates and visualizes the contribution of the object attribute to the threat sign degree. Present to the user. For example, for various attribute items (size, color, staying time, etc.) represented by the attribute information included in the node information of the node, the element contribution degree stored in the element contribution degree storage unit 160, that is, the attribute for the node feature amount. Calculate the contribution of each attribute item based on the weight of each item. Then, a predetermined number of attribute items are selected from the one with the highest calculated contribution, and the content and contribution of each attribute item are presented in a predetermined layout on the abnormality detection screen.

The action history contribution presentation unit 155 is an action performed between the person or object and another person or object when the node included in the subgraph extracted by the subgraph extraction processing unit 152 represents a person or an object. Calculates the degree of contribution to the threat sign by, visualizes it, and presents it to the user. For example, for each edge connected to the node, the contribution of each edge is calculated based on the element contribution stored in the element contribution storage unit 160, that is, the weight for the edge feature amount. Then, a predetermined number of edges are selected from the one with the highest calculated contribution, and the action content and contribution represented by each edge are presented in a predetermined layout on the abnormality detection screen.

The verbalization summary generation unit 156 verbalizes the contents presented by the person attribute threat contribution presentation unit 153, the object attribute threat contribution presentation unit 154, and the action history contribution presentation unit 155, respectively, and is the basis for abnormality detection. Generate a text (summary) that expresses concisely. Then, the generated summary is displayed at a predetermined position on the abnormality detection screen.

In the determination basis presentation unit 150, the threat predictive degree and the threat predictive degree calculated for the element such as a person or an object in which the abnormality is detected by the abnormality detection unit 130 by the processing of each block described above An abnormality detection screen including at least information on the characteristics or behaviors of the element having a high degree of contribution to the above can be presented to the user as a screen showing the determination basis of the abnormality detection unit 130.

FIG. 17 is a diagram showing an outline of the processing performed by the evidence confirmation target selection unit 151 and the subgraph extraction processing unit 152. In FIG. 17, (a) shows an example of visualizing the graph data before the subgraph extraction, and (b) shows an example of visualizing the graph data after the subgraph extraction.

When the user designates any node by a predetermined operation (for example, clicking a mouse) in the graph data shown in FIG. 17A, the basis confirmation target selection unit 151 is connected to the designated node and the node. Select each node and each edge as the target for checking the basis for abnormality detection. At this time, the subgraph extraction processing unit 152 extracts the nodes and edges selected by the basis confirmation target selection unit 151 as subgraphs, highlights the extracted subgraphs, and grays out and displays the parts other than the subgraphs of the graph data. By doing so, the subgraph is visualized.

For example, consider the case where the user specifies the node O2 in the graph data of FIG. 17 (a). In this case, the portion including the designated node O2, the nodes P2 and P4 adjacent to the node O2, and the edges set between the nodes O2, P2, and P4 are selected by the grounds confirmation target selection unit 151. It is extracted as a subgraph from the subgraph extraction processing unit 152. Then, as shown in FIG. 17 (b), these extracted nodes and edges are highlighted, and the other parts are grayed out, so that the subgraph is visualized.

FIG. 18 is a diagram showing an example of an abnormality detection screen displayed by the determination basis presentation unit 150. On the anomaly detection screen 180 shown in FIG. 18, for each of the person and the object in which the abnormality is detected, the threat sign degree is shown as a threat level, and the characteristics and the contribution degree of each action to the threat sign degree are shown. .. Specifically, the contribution to each item of "mask", "stay time", and "upper body color" is shown for the person photographed by the camera 2, and "left behind" for the object photographed by the camera 1. The degree of contribution to each item of "stay time" and "delivery" is shown. Further, as a suspicious point regarding these persons or objects, a summary generated by the verbalization summary generation unit 156 is displayed. Further, a video showing a suspicious action taken by a person and the shooting time thereof are displayed as an action timeline.

The abnormality detection screen 180 shown in FIG. 18 is an example, and if the abnormality detection result by the abnormality detection unit 130 and its basis can be presented in an easy-to-understand manner for the user, the abnormality detection screen is displayed with other contents and screen layout. May be good.

According to the first embodiment of the present invention described above, the following effects are exhibited.

(1) The abnormality detection system 1 detects a plurality of elements in a monitored place based on an image or an image obtained by photographing a predetermined monitored place, and determines the attributes of each element and the relationship between the elements. Calculated by the graph data generation unit 20 that generates the graph data to be represented, the spatiotemporal feature amount calculation unit 110 that calculates the spatiotemporal feature amount of the graph data generated by the graph data generation unit 20, and the spatiotemporal feature amount calculation unit 110. An abnormality detecting unit 130 for detecting an abnormality in a monitored place is provided based on the spatiotemporal feature amount. By doing so, it is possible to accurately detect suspicious behavior or abnormal behavior from images or images of various people or objects and detect the abnormality.

(2) The graph data generation unit 20 detects a person or object reflected in a video or image as an element by the entity detection processing unit 21, acquires the characteristics of the detected person or object as an element, and acquires the characteristics of the detected person or object as an attribute for each element. The relationship detection processing unit 23 acquires actions taken by a person with respect to another person or object as relationships between elements. Since this is done, the information necessary for generating the graph data can be reliably acquired from the video or the image, and the graph data can be generated.

(3) The graph data generation unit 20 combines a plurality of nodes representing the attributes of each element and a plurality of edges representing the relationships between the elements to generate graph data as shown in FIG. 3, for example. By doing so, it is possible to generate graph data that clearly expresses the attributes of each element and the relationships between the elements.

(4) The graph data generation unit 20 generates graph data for each of a plurality of time ranges obtained by dividing the video or image acquired in time series into predetermined time intervals Δt. Since this is done, it is possible to generate appropriate graph data in consideration of the balance between the amount of data and the accuracy of abnormality detection for a video or image that changes over time.

(5) The abnormality detection system 1 has a node feature amount extraction unit 70 that extracts a node feature amount representing an attribute feature amount for each element for each graph data generated for each predetermined time range, and a node feature amount extraction unit 70 for each time range. Each of the generated graph data is further provided with an edge feature amount extraction unit 80 for extracting an edge feature amount representing the feature amount of the relationship between the elements. The spatiotemporal feature amount calculation unit 110 calculates the spatiotemporal feature amount based on the node feature amount and the edge feature amount of each graph data extracted by the node feature amount extraction unit 70 and the edge feature amount extraction unit 80, respectively. Since this is done, it is possible to calculate the spatiotemporal feature amount that reflects the potential relationship between the feature amount of each node and the adjacent node.

(6) The abnormality detection system 1 further includes a graph data visualization editing unit 60 that visualizes graph data, presents it to the user, and accepts the user to edit the graph data. By doing so, even if inappropriate graph data is erroneously generated, the user can correct it to the correct graph data.

(7) The spatiotemporal feature amount calculation unit 110 calculates the spatiotemporal feature amount for each element. The anomaly detection unit 130 compares the spatiotemporal feature amounts of each element calculated by the spatiotemporal feature amount calculation unit 110 with each other by the feature amount distribution clustering unit 131 and the center point distance calculation unit 132, and the abnormality determination unit 133 uses the abnormality determination unit 133. Anomalies are detected by calculating the degree of threat for each element based on the comparison result. Since this is done, when there is a person or a suspicious object that behaves suspiciously, it can be reliably determined and the abnormality can be detected.

(8) In the abnormality detection system 1, for an element in which an abnormality is detected by the abnormality detection unit 130 by the determination basis presentation unit 150, the degree of threat calculated for the element and the degree of contribution to the degree of threat are determined. An abnormality detection screen 180 including at least high information on the characteristics or behavior of the element is presented to the user as a screen showing the determination basis of the abnormality detection unit 130. By doing so, it is possible to make the user understand in an easy-to-understand manner which person or object is detected as a suspicious person or suspicious object among various people or objects reflected in the image. ..

(9) The abnormality detection screen 180 further includes information on the degree of contribution to the degree of threat for each feature or action of the element. By doing so, it is possible to make the user understand in an easy-to-understand manner what characteristics and behaviors of the person or object detected as a suspicious person or a suspicious object are emphasized in the judgment of the threat.

(10) The computer constituting the abnormality detection system 1 detects a plurality of elements in the monitored place based on the video or image obtained by photographing the predetermined monitored place (the entity detection processing unit 21 performs the process). Entity detection processing to be executed), processing to generate graph data showing attributes of each element and relationships between elements (processing of graph data generation unit 20), and processing to calculate spatiotemporal feature amount of graph data (time). The processing of the spatial feature amount calculation unit 110) and the processing of detecting an abnormality in the monitored place based on the spatiotemporal feature amount (processing of the abnormality detecting unit 130) are executed. Since this is done, it is possible to accurately detect suspicious behaviors and abnormal behaviors and detect abnormalities from images or images of various people and objects photographed by processing using a computer.

It should be noted that the present invention is not limited to the above embodiment, and can be carried out by using any component within the range not deviating from the gist thereof. The embodiments and modifications described above are merely examples, and the present invention is not limited to these contents as long as the features of the invention are not impaired. Further, although various embodiments and modifications have been described above, the present invention is not limited to these contents. Other aspects considered within the scope of the technical idea of the present invention are also included within the scope of the present invention.

1 ... Abnormality detection system, 10 ... Camera moving image input unit, 20 ... Graph data generation unit, 30 ... Graph database, 40 ... Node database, 50 ... Edge database, 60 ... Graph data visualization editing unit, 70 ... Node feature amount extraction Unit, 80 ... Edge feature amount extraction unit, 90 ... Node feature amount storage unit, 100 ... Edge feature amount storage unit, 110 ... Spatio-temporal feature amount calculation unit, 120 ... Node feature amount acquisition unit, 130 ... Abnormality detection unit, 140 ... Threat sign storage unit, 150 ... Judgment basis presentation unit, 160 ... Element contribution storage unit

Claims

Based on the video or image obtained by shooting a predetermined monitoring target location, a plurality of elements in the monitoring target location are detected, and graph data showing the attributes of each element and the relationship between the elements is generated. Graph data generator and
A spatiotemporal feature amount calculation unit for calculating the spatiotemporal feature amount of the graph data generated by the graph data generation unit, and a spatiotemporal feature amount calculation unit.
An abnormality detection system including an abnormality detection unit that detects an abnormality in the monitoring target location based on the space-time feature amount calculated by the space-time feature amount calculation unit.
In the abnormality detection system according to claim 1,
The graph data generation unit detects the image or the person or object reflected in the image as the element, acquires the detected feature of the person or the object as an attribute for each element, and the person is another person. An abnormality detection system that acquires an action performed on the person or the object as a relationship between the elements.
In the abnormality detection system according to claim 1,
The graph data generation unit is an abnormality detection system that generates graph data by combining a plurality of nodes representing attributes for each element and a plurality of edges representing relationships between the elements.
In the abnormality detection system according to claim 1,
The graph data generation unit is an abnormality detection system that generates graph data for each time range of a plurality of time ranges obtained by dividing the video or the image acquired in time series into predetermined time intervals.
In the abnormality detection system according to claim 4,
For each graph data generated for each time range, a node feature amount extraction unit that extracts a node feature amount representing the feature amount of the attribute for each element, and a node feature amount extraction unit.
For each graph data generated for each time range, an edge feature amount extraction unit for extracting an edge feature amount representing the feature amount of the relationship between the elements is further provided.
The spatiotemporal feature amount calculation unit is based on the node feature amount and the edge feature amount of each graph data extracted by the node feature amount extraction unit and the edge feature amount extraction unit, respectively. Anomaly detection system that calculates.
In the abnormality detection system according to claim 1,
An abnormality detection system further comprising a graph data visualization editing unit that visualizes the graph data, presents the graph data to the user, and accepts the user to edit the graph data.
In the abnormality detection system according to claim 1,
The spatiotemporal feature amount calculation unit calculates the spatiotemporal feature amount for each of the elements.
The anomaly detection unit compares the spatiotemporal feature amount for each element calculated by the spatiotemporal feature amount calculation unit with each other, and calculates the degree of threat for each element based on the comparison result. An abnormality detection system that detects the abnormality.
In the abnormality detection system according to claim 7,
For the element in which the abnormality is detected by the abnormality detection unit, the degree of the threat calculated for the element and information on the characteristics or behavior of the element having a high degree of contribution to the degree of the threat are provided. An abnormality detection system that presents an abnormality detection screen including at least to the user as a screen showing the determination basis of the abnormality detection unit.
In the abnormality detection system according to claim 8,
The anomaly detection screen is an anomaly detection system that further includes information on the degree of contribution for each feature or action of the element.
Of the plurality of elements included in the video or image obtained by shooting a predetermined place, the element having a high degree of threat has a high degree of threat and a high degree of contribution to the degree of threat. An anomaly detection system that presents an anomaly detection screen to the user, including at least information on the characteristics or behavior of the element.
By computer
A process of detecting a plurality of elements in the monitored place based on an image or an image obtained by shooting a predetermined monitored place, and a process of detecting a plurality of elements in the monitored place.
Processing to generate graph data showing the attributes of each element and the relationships between the elements,
The process of calculating the spatiotemporal features of the graph data and
An abnormality detection method for executing a process of detecting an abnormality in the monitored location based on the spatiotemporal feature amount.