WO2020001216A1

WO2020001216A1 - Abnormal event detection

Info

Publication number: WO2020001216A1
Application number: PCT/CN2019/088703
Authority: WO
Inventors: 邓亦梁
Original assignee: 杭州海康威视数字技术股份有限公司
Priority date: 2018-06-26
Filing date: 2019-05-28
Publication date: 2020-01-02
Also published as: CN110648352A; CN110648352B

Abstract

The present application provides a method and apparatus for detecting an abnormal event, an electronic device, and a computer-readable storage medium. The method comprises: using a trained convolutional neural network (CNN) to acquire one or more feature targets from a monitored video stream, wherein the video stream is a video stream obtained by monitoring a specified region and the feature target represents a target person present in the specified region; using a preset foreground model for foreground detection to acquire one or more foreground targets from the video stream; identifying one or more detection targets on the basis of the feature target and the foreground target; tracking each detection target by recording the presence status of the detection target in the video stream, so as to obtain a tracking result; and identifying an abnormal event according to the tracking result.

Description

Detection of abnormal events

Technical field

The present application relates to the field of image processing, and in particular, to a method, an apparatus, an electronic device, and a computer-readable storage medium for detecting an abnormal event.

Background technique

In order to facilitate depositors' deposits and withdrawals, banks generally install ATMs (Automated Teller Machines) to provide depositors with 24-hour self-service. To ensure that depositors can use ATM in an independent and safe operating space, usually, a protective cabin is installed outside the ATM. In addition, some public telephones are also equipped with a protective compartment.

The protection cabin brings convenience to the depositor (or the user of the public telephone). However, illegal and criminal activities in the protection cabin also occur from time to time, mainly including trailing robbery and criminals staying for a long time to cause damage. In addition, users may leave their belongings in the protective cabin after completing the necessary business departure in the protective cabin.

If the monitoring personnel can learn the three types of abnormal events in the protection cabin in time, the safety of the protection cabin and the user experience can be effectively improved.

Summary of the invention

In view of this, embodiments of the present application provide a method, an apparatus, an electronic device, and a computer-readable storage medium for detecting an abnormal event, so as to accurately detect abnormal events such as a person's tail, a person's stay, and an item's left.

In a first aspect, an embodiment of the present application provides a method for detecting an abnormal event, including: using a trained convolutional neural network CNN to obtain one or more feature targets from a monitored video stream, wherein the video stream It is a video stream obtained by monitoring a specified area, and the characteristic target represents a target person appearing in the specified area; using a preset foreground model for foreground detection to obtain one or more foreground targets from the video stream; based on The feature target and the foreground target determine one or more detection targets; track each of the detection targets by recording the presence of the detection target in the video stream to obtain a tracking result; and according to the tracking result Identify abnormal events.

In a second aspect, an embodiment of the present application provides an abnormal event detection device, including: a first obtaining unit, configured to obtain one or more feature targets from a monitored video stream by using a trained convolutional neural network CNN Wherein, the video stream is obtained by monitoring a specified area, and the characteristic target represents a target person appearing in the specified area; a second obtaining unit is configured to use a preset foreground model for foreground detection from the One or more foreground targets are obtained from the video stream; a first determining unit is configured to determine one or more detection targets based on the feature target and the foreground target; a second determining unit is configured to record the detection targets in the The existence of the video stream tracks each of the detection targets to obtain a tracking result, and determines an abnormal event based on the tracking result.

According to a third aspect, an embodiment of the present application provides an electronic device, including a processor, and a memory for storing executable instructions of the processor. The processor is configured to execute the method for detecting an abnormal event according to the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method for detecting an abnormal event according to the first aspect is implemented.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flowchart of a method for detecting an abnormal event shown in the present application;

FIG. 2 is a block diagram of an embodiment of an abnormal event detection device shown in the present application; FIG.

FIG. 3 is a hardware structural diagram of an electronic device shown in the present application.

detailed description

In order to enable those skilled in the art to better understand the technical solutions in the embodiments of the present invention, and to make the foregoing objectives, features, and advantages of the embodiments of the present invention more comprehensible, the prior art solutions and the present invention are described below with reference to the accompanying drawings. The technical solutions in the embodiments of the invention will be further described in detail.

In some embodiments, a camera on the top of the protective cabin is used to record the monitoring video in the protective cabin in real time, and then a foreground model of the protective cabin is established, and a foreground target (people entering the protective cabin) is extracted based on the foreground model. Further, the moving foreground target is tracked to judge the entry and exit of the personnel in the protective cabin, so as to realize the judgment of abnormal events such as the following of the personnel and the retention of the personnel, and to alert the monitoring personnel.

However, the method of obtaining a foreground target by establishing a foreground model is susceptible to environmental influences and may cause misjudgment of abnormal events. In addition, when the discrimination between the foreground object and the background is low, the probability of detecting the foreground object is low. For example, when the light outside the protective cabin changes or the shadow of people outside the cabin is projected into the cabin, false foreground targets may be generated, causing misjudgment of anomalous events; and the color of the personnel ’s coat and the background in the protective cabin When the colors are similar, it is difficult to extract the foreground target.

Referring to FIG. 1, a flowchart of a method for detecting an abnormal event shown in this application includes the following steps:

Step 101: Use the trained convolutional neural network CNN for feature detection to obtain one or more feature targets of the target person from the monitored video stream. The video stream is a video stream obtained by monitoring a specified area.

The execution method of the above method may be an electronic device docked with a monitoring device (such as a monitoring camera). In one embodiment shown, the electronic device may be a hard disk video recorder. In order to facilitate the description of the solution of the present application, the hard disk video recorder is taken as the main execution body in the following text.

The above-mentioned CNN (Convolutional Neural Network) for feature detection can identify the feature targets appearing in the video frames of the video stream through the training of human features in advance. In practical applications, CNNs can be trained with the head and shoulders of a person, so that the CNN can identify the head and shoulder targets appearing in the video frame. Of course, CNNs can also be trained with other personnel characteristics (for example, limbs and torso), which may be specifically based on the characteristics of personnel that can be easily monitored by the monitoring device.

The specified area may be any area where an abnormal event may occur, and the specified area is monitored by a monitoring device to generate a video stream. For the protection cabin scenario, the above-mentioned designated area may be the inside of the protection cabin.

After the hard disk video recorder obtains the video stream from the monitoring device, the trained CNN is used to obtain the person's characteristic target from the video stream to detect the persons appearing in the video stream.

In practical applications, the hard disk video recorder can record the obtained coordinates of the upper-left corner of the target frame of the feature target in the feature target table, the width, the height, and the confidence that the feature target is a human feature. The feature target table includes a mapping relationship between coordinates, width, height, and confidence of an upper left corner of a target frame of the feature target.

Step 102: Use a preset foreground model for foreground detection to obtain one or more foreground objects from the video stream.

The foreground models used for foreground detection include Gaussian Mixture Model (Vius) and ViBe (visual background extractor) algorithms. The foregoing foreground model is established based on the RGB (Red Green Blue) information of the monitored specified area, and can be used to identify the foreground target appearing in the video frame of the video stream. Among them, the foreground target refers to the target relative to the established foreground model.

The hard disk video recorder uses the trained foreground model to obtain a foreground target from the video stream. The foreground target may be a body part of a person, an item, or even a part of the scene in the protective cabin (for example, an ATM machine illuminated by light may be identified as a foreground target). Therefore, further analysis and determination are needed in the future. The specifics of the prospects.

In practical applications, the hard disk video recorder can record the obtained coordinates and the width and height of the upper left corner of the target frame of the foreground target in the foreground target table. The foregoing foreground target table includes a mapping relationship of coordinates, width, and height of an upper left corner of the foreground target.

Step 103: Determine one or more detection targets based on the feature target and the foreground target.

The hard disk video recorder may determine a detection target to be tracked subsequently based on the feature target and the foreground target. Specifically, the hard disk video recorder may select a foreground object of interest that is not associated with each characteristic object from the obtained foreground objects, and then determine each characteristic object and the foreground object of interest as detection targets.

In an embodiment shown, the hard disk video recorder may first calculate, for each foreground target, the area of intersection between the target frame of the foreground target and the target frame of each characteristic target.

Specifically, the position and area of the target frame of each foreground target in the video frame may be determined by the coordinates and width and height of the upper left corner of the target frame of each foreground target recorded in the foreground target table. And, the position and area of the target frame of each feature target in the video frame are determined by the coordinates and width and height of the upper left corner of the target frame of each feature target recorded in the feature target table. Further, for each foreground target, an area of intersection between the target frame of the foreground target and the target frame of each feature target is determined.

The hard disk video recorder can determine whether the area of the intersection reaches a preset area threshold.

If the area of the intersection between the target frame of the foreground target and the target frame of each feature target is less than the above-mentioned area threshold, it is determined that the foreground target is not associated with each feature target and the foreground target is determined as the foreground target of interest. In other words, the focus on the foreground target is not the person's body part.

If the area of the intersection between the target frame of the foreground target and the target frame of a feature target is not less than the area threshold, it is determined that the foreground target is associated with the feature target. In other words, the foreground target is the body part of the person indicated by the characteristic target.

The hard disk video recorder may determine the foreground target of interest that is not associated with each characteristic target and each characteristic target as detection targets to track the above detection targets.

Step 104: Track each detection target by recording the existence of the detection target in the video stream, and determine an abnormal event according to the tracking result.

The hard disk video recorder can use multiple object tracking technology (Multiple Object Tracking / Multiple Target Tracking) to track the center point of the target frame of the detection target. The center point of the target frame of the detection target may be determined based on the coordinates and width and height of the upper left corner of the target frame.

The hard disk video recorder can record the type of each detection target tracked, the historical coordinates of the center point of the target frame of each detection target, and the video frame identifier of the video frame where the detection target is located in the tracking table. The historical coordinates are coordinates of a center point of a target frame of the detection target in each video frame of a video stream in which the detection target exists. For each detection target, the hard disk video recorder will continuously record the coordinates of the center point of the target frame in the video frame of the detection target during the tracking process.

The tracking table includes the type of the detection target, the historical coordinates of the center point of the target frame of the detection target, and the mapping relationship of the video frame identifier of the video frame where the detection target is located.

The types of detection targets mentioned above include personnel and non-persons, where non-persons include objects and false foregrounds (such as silhouettes). As the specific content of the foreground target needs to be further analyzed and determined, as an embodiment, the hard disk video recorder may indicate that the type of the characteristic target is a person by recording the confidence level of the person's characteristics, and the confidence level of the foreground target is temporarily recorded as zero.

The video frame identifier may be a frame number of a video frame, and the frame number indicates a position of the video frame in the video stream, and frame numbers of two frames before and after the video frame are different by one. Therefore, in practical applications, the position of the video frame in which the detection target exists in the video stream can be determined by recording the frame number of the video frame in which the detection target is located.

The DVR determines abnormal events based on the tracking results.

In one embodiment shown, the hard disk video recorder may convert a preset duration judgment threshold for several abnormal events into a count threshold for the number of video frames. For example, for a trailing event, the preset duration judgment threshold is 5 minutes. Since there are 25 frames per second, the converted trailing count threshold is 7500. For a detained event, the preset duration judgment threshold is 10 minutes. The retention count threshold value of 15,000 is 15,000; for the item leftover event, the preset duration judgment threshold value is 10 minutes, and the converted residual count threshold value is 15000.

For the stuck event, the hard disk video recorder may determine the first target video frame based on the above-mentioned tracking table and according to the type of each detection target in the tracking result and the video frame identifier of the video frame in which it is located. Wherein, there is at least one characteristic target in the first target video frame. It is determined whether the number of the first target video frames reaches a preset retention count threshold. If so, determine that a detention event exists.

Specifically, when the hard disk video recorder tracks the detection target, as long as there is a tracking entry of at least one type of detection target in the tracking table, the number of video frames in which at least one person exists (the video frames described above) (That is, the first target video frame), and a retention count is obtained. Every time a new first target video frame is obtained, the hard disk video recorder may increase the above-mentioned retention count by one, and determine whether the retention count reaches the above-mentioned retention count threshold.

If the stay count does not reach the stay count threshold, the stay count is continuously updated.

If the retention count reaches the retention count threshold, it is determined that a retention event exists. In this case, the hard disk video recorder can output stuck alarm information to the video surveillance personnel.

Of course, if the characteristic target disappears from the video frame before the retention count reaches the retention count threshold, the retention count may be cleared to zero. In other words, each of the first target video frames corresponding to the staying count is continuous in time.

For a trailing event, the hard disk video recorder may determine the second target video frame based on the above tracking table, according to the type of each detection target in the tracking result, and the video frame identifier of the video frame in which it is located. There are at least two feature targets in the second target video frame. It is determined whether the number of the second target video frames reaches a preset trailing count threshold. If so, determine that a trailing event exists.

Specifically, when the hard disk video recorder tracks the detection target, if there are at least two detection target types in the tracking table, the number of video frames in which there are at least two persons can be counted (the above) The video frame is the second target video frame), and the trailing count is obtained. Each time a new second target video frame is obtained, the hard disk video recorder may increase the above-mentioned trailing count by one, and determine whether the trailing count reaches the above-mentioned trailing count threshold.

If the above-mentioned trailing count does not reach the above-mentioned trailing count threshold, then the above-mentioned trailing count is continuously updated.

If the aforementioned trailing count reaches the aforementioned trailing count threshold, it is determined that a trailing event exists. In this case, the DVR can output a trailing alarm message to the video surveillance personnel.

Of course, if the feature target disappears from the video frame before the trailing count reaches the threshold of the trailing count, the trailing count can be cleared to zero. In other words, each second target video frame corresponding to the above-mentioned trailing count is temporally continuous.

The retention count and the trailing count can be counted simultaneously without affecting each other.

For the item leftover event, the hard disk video recorder can determine the third target video frame based on the above tracking table, according to the type of each detection target in the tracking result, the video frame identification of the video frame where it is located, and the historical coordinates of the center point of the target frame of each detection target . Wherein, there is no feature target but a foreground target in the third target video frame, and the coordinates of the center point of the target frame of the foreground target in the third target video frame where the foreground target exists are located in a preset detection area. .

It is determined whether the number of the third target video frames reaches a preset legacy count threshold. If so, extract a foreground target from at least one third target video frame.

Specifically, when the hard disk video recorder tracks the above-mentioned detection target, if the acquired video frame loses a characteristic target, and there is a foreground target in the video frame whose center point is located in a preset detection area, the target can be determined. If the video frame is the third target video frame, it is counted to obtain a legacy count. The above-mentioned preset detection area may be an area where the user easily leaves an item in an actual application environment. For example, for an ATM protective cabin, the above detection area may be an area close to the ATM. Each time a new third target video frame is acquired, the above-mentioned legacy count may be increased by one, and it is determined whether the legacy count reaches the aforementioned legacy count threshold. Each third target video frame corresponding to the above-mentioned legacy count is continuous in time.

If the above-mentioned legacy count does not reach the above-mentioned legacy count threshold, the above-mentioned legacy count is continuously updated.

If the legacy count reaches the threshold of the legacy count, the foreground target may be extracted from at least one third target video frame.

Further, the hard disk video recorder may use the preset CNN classification model to classify the extracted foreground targets, and obtain the confidence that the foreground targets correspond to N different types of foreground targets. Among them, N is an integer greater than 1, and the N different foreground target types include at least items and non-items.

If the foreground object corresponds to the item with the greatest confidence, it is determined that an item legacy event exists.

In practical applications, the types of foreground targets can include people, items, and non-items. Non-items include false prospects. In this case, the above-mentioned CNN classification model is trained in advance through human characteristics, items that may appear in a specified area, and background content of the specified area. For example, in the case of a protective cabin, items include bank cards, keys, bags, umbrellas, luggage, etc. The background content of the designated area includes the ground, ATM, posters posted in the protective cabin, and changes in light outside the protective cabin or shadows Background content when arriving in the cabin, etc.

By further distinguishing the foreground target by the above-mentioned CNN classification model, the content of the foreground target can be more accurately identified, and the misjudgment of the event left by the item can be avoided.

Specifically, the hard disk video recorder may determine the actual content of the foreground target based on the confidence corresponding to the person, the item, and the non-item.

If the confidence level corresponding to the person is the largest, it means that although there are no feature targets in the current video frame, there are still people, and the above-mentioned legacy count can be cleared to zero;

If the confidence level corresponding to the non-item is the largest, it means that the current third target video frame does not exist or the item, and the above-mentioned legacy count can be cleared to zero;

If the confidence level corresponding to the item is greatest, it is determined that there is an item legacy event. In this case, the hard disk video recorder can output alarms on items left to video surveillance personnel.

To sum up, in the embodiment of the present application, the hard disk video recorder can extract the feature target of the target person in the video stream through the CNN used for feature detection, and obtain the foreground target from the video stream through the foreground model used for foreground detection, and based on The detection targets determined by the above characteristic targets and foreground targets can greatly reduce the impact of the environment, can more accurately identify the target person and item, improve the detection rate, and then by tracking the above detection targets, can effectively detect Anomalous events such as personnel tracking, detention of personnel and leftovers of items.

Corresponding to the foregoing embodiment of the method for detecting an abnormal event, this application further provides an embodiment of a device for detecting an abnormal event.

Referring to FIG. 2, a block diagram of an embodiment of an abnormal event detection device is shown.

As shown in FIG. 2, the abnormal event detection device 20 includes:

A first obtaining unit 210 is configured to obtain one or more feature targets from a monitored video stream by using a trained convolutional neural network CNN, wherein the video stream is obtained by monitoring a specified area, and the feature target represents Target person appearing in the designated area.

The second obtaining unit 220 is configured to obtain one or more foreground objects from the video stream by using a preset foreground model for foreground detection.

The first determining unit 230 is configured to determine one or more detection targets based on the feature target and the foreground target.

A second determining unit 240 is configured to track each of the detection targets by recording the presence of the detection targets in the video stream to obtain a tracking result, and determine an abnormal event according to the tracking result.

In this example, the first determining unit 230 is further configured to: select a foreground object of interest that is not associated with each of the characteristic objects from the foreground objects; determine each of the characteristic object and the foreground object of interest Is the detection target.

In this example, the first determining unit 230 is further configured to calculate, for each foreground target, an area of intersection between a target frame of the foreground target and a target frame of each characteristic target. The area of the intersection between the frame and the target frame of each feature target is less than a preset area threshold, then it is determined that the foreground target is not associated with each feature target.

In this example, the second determining unit 240 is further configured to record the type of the detection target, the coordinates of the center point of the target frame of the detection target in each video frame in which the detection target is located, and the A video frame identifier of a video frame of the detection target exists in the video stream.

The second determining unit 240 is further configured to determine a first target video frame in the video stream according to the tracking result, wherein at least one of the characteristic targets exists in the first target video frame; When the number of the first target video frames associated with the at least one characteristic target reaches a preset retention count threshold, it is determined that a retention event exists. The second determining unit 240 is further configured to determine a second target video frame in the video stream according to the tracking result, wherein at least two of the characteristic targets exist in the second target video frame; When the number of consecutive second target video frames associated with the at least two feature targets reaches a preset trailing count threshold, it is determined that a trailing event exists.

The second determining unit 240 is further configured to determine a third target video frame in the video stream according to the tracking result; wherein the foreground target exists in the third target video frame, but no one exists. The coordinates of the center point of the target frame of the characteristic target in the foreground target in the third target video frame in which the foreground target is located are within a preset detection area; in the continuous association with the foreground target, When the number of the third target video frames reaches a preset legacy count threshold, the foreground target is extracted from at least one of the third target video frames; the preset foreground target is extracted using a preset CNN classification model Classify to obtain the confidence that the foreground target corresponds to N different foreground target types, where N is an integer greater than 1, and the N different foreground target types include at least items and non-items; and if the foreground target corresponds to the confidence of the item If the degree is the largest, it is determined that there is an item leftover event.

The embodiment of the apparatus for detecting an abnormal event of the present application can be applied to an electronic device. The device embodiments can be implemented by software, or by hardware or a combination of software and hardware. Taking software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory through the processor of the electronic device where it is located.

In terms of hardware, as shown in FIG. 3, it is a hardware structure diagram of the electronic device where the abnormal event detection device of this application is located, except for the processor, memory, network interface, and non-volatile memory shown in FIG. 3. In addition, the electronic equipment in which the device is located in the embodiment may generally include other hardware according to the actual function of the detection device of the abnormal event, and details are not described herein again. The memory and non-volatile memory of the electronic device are also equipped with machine-executable instructions corresponding to the first obtaining unit 210, machine-executable instructions corresponding to the second obtaining unit 220, and the first determining unit 230, respectively. The corresponding machine-executable instructions and the machine-executable instructions corresponding to the second determining unit 240 described above.

For details about the implementation process of the functions and functions of the units in the above device, refer to the implementation process of the corresponding steps in the foregoing method for details, and details are not described herein again.

As for the device embodiment, since it basically corresponds to the method embodiment, the relevant part may refer to the description of the method embodiment. The device embodiments described above are only schematic, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located One place, or it can be distributed across multiple network elements. Some or all of these modules can be selected according to actual needs to achieve the purpose of the solution of this application. Those of ordinary skill in the art can understand and implement without creative efforts.

An embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the method for detecting an abnormal event according to the foregoing method embodiment is implemented. In one embodiment, the computer-readable storage medium includes a non-transitory computer-readable storage medium.

The above are only preferred embodiments of this application, and are not intended to limit this application. Any modification, equivalent replacement, or improvement made within the spirit and principle of this application shall be included in this application Within the scope of protection.

Claims

A method for detecting abnormal events, including:

Using a trained convolutional neural network CNN to obtain one or more feature targets from a video stream, wherein the video stream is obtained by monitoring a designated area, and the feature target represents a target person appearing in the designated area;

Obtaining one or more foreground objects from the video stream by using a preset foreground model for foreground detection;

Determining one or more detection targets based on the feature target and the foreground target;

Track each detection target by recording the existence of the detection target in the video stream to obtain a tracking result;

An abnormal event is determined according to the tracking result.
The method according to claim 1, wherein determining the detection target based on the characteristic target and the foreground target comprises:

Selecting a foreground target of interest that is not associated with each of the characteristic targets from the foreground targets;

Each of the feature target and the attention foreground target is determined as the detection target.
The method according to claim 2, wherein selecting the foreground object of interest that is not associated with each of the characteristic objects from the foreground objects comprises:

For each stated goal,

Calculating the area of intersection between the target frame of the foreground target and the target frame of each of the characteristic targets;

If the area of the intersection between the target frame of the foreground target and the target frame of each of the feature targets is less than a preset area threshold, it is determined that the foreground target is not associated with each of the feature targets.
The method according to claim 1, wherein recording the existence of the detection target in the video stream comprises:

Record the type of the detection target, the coordinates of the center point of the target frame of the detection target in each video frame in which the detection target is present, and the video frame identifier of the video frame in which the detection target is present in the video stream.
The method according to claim 4, wherein determining an abnormal event according to the tracking result comprises:

Determining a first target video frame in the video stream according to the tracking result, wherein at least one of the characteristic targets exists in the first target video frame;

When the number of consecutive first target video frames associated with the at least one characteristic target reaches a preset retention count threshold, it is determined that a retention event exists.
The method according to claim 4, wherein determining an abnormal event according to the tracking result comprises:

Determining a second target video frame in the video stream according to the tracking result, wherein at least two of the characteristic targets exist in the second target video frame;

When the number of consecutive second target video frames associated with the at least two characteristic targets reaches a preset trailing count threshold, it is determined that a trailing event exists.
The method according to claim 4, wherein determining an abnormal event according to the tracking result comprises:

Determining a third target video frame in the video stream according to the tracking result, wherein the foreground target exists in the third target video frame, but there is no one of the characteristic targets, and the target of the foreground target The coordinates of the center point of the frame in the third target video frame where the foreground target is located are within a preset detection area;

Extracting the foreground target from at least one third target video frame when the number of consecutive third target video frames associated with the foreground target reaches a preset legacy count threshold;

The preset CNN classification model is used to classify the extracted foreground targets to obtain the confidence that the foreground targets correspond to N different types of foreground targets, where N is an integer greater than 1, and the N different types of foreground targets include Articles and non-items;

If the foreground object corresponds to the item with the greatest confidence, it is determined that an item legacy event exists.
An abnormal event detection device includes:

A first obtaining unit is configured to obtain one or more feature targets from a monitored video stream by using a trained convolutional neural network CNN, wherein the video stream is obtained by monitoring a specified area, and the feature targets indicate occurrence Target personnel in said designated area;

A second obtaining unit, configured to obtain one or more foreground objects from the video stream by using a preset foreground model for foreground detection;

A first determining unit, configured to determine one or more detection targets based on the characteristic target and the foreground target;

A second determining unit is configured to track each detection target by recording the presence of the detection target in the video stream to obtain a tracking result, and determine an abnormal event according to the tracking result.
The apparatus according to claim 8, wherein the first determining unit is further configured to:

Selecting a foreground target of interest that is not associated with each of the characteristic targets from the foreground targets;

Each of the feature target and the attention foreground target is determined as the detection target.
The apparatus according to claim 9, wherein the first determining unit is further configured to:

For each stated goal,

Calculating the area of intersection between the target frame of the foreground target and the target frame of each of the characteristic targets;

If the area of the intersection between the target frame of the foreground target and the target frame of each of the characteristic targets is less than a preset area threshold, it is determined that the foreground target is not associated with each of the characteristic targets.
The apparatus according to claim 8, wherein the second determining unit is further configured to:

Record the type of the detection target, the coordinates of the center point of the target frame of the detection target in each video frame in which the detection target is present, and the video frame identifier of the video frame in which the detection target is present in the video stream.
The apparatus according to claim 11, wherein the second determining unit is further configured to:

Determining a first target video frame in the video stream according to the tracking result, wherein at least one of the characteristic targets exists in the first target video frame;

When the number of consecutive first target video frames associated with the at least one characteristic target reaches a preset retention count threshold, it is determined that a retention event exists.
The apparatus according to claim 11, wherein the second determining unit is further configured to:

Determining a second target video frame in the video stream according to the tracking result, wherein at least two of the characteristic targets exist in the second target video frame;

When the number of consecutive second target video frames associated with the at least two feature targets reaches a preset trailing count threshold, it is determined that a trailing event exists.
The apparatus according to claim 11, wherein the second determining unit is further configured to:

Determining a third target video frame in the video stream according to the tracking result; wherein the foreground target exists in the third target video frame, but none of the characteristic targets exists, and the target of the foreground target The coordinates of the center point of the frame in the third target video frame where the foreground target is located are within a preset detection area;

Extracting the foreground target from at least one third target video frame when the number of consecutive third target video frames associated with the foreground target reaches a preset legacy count threshold;

The preset CNN classification model is used to classify the extracted foreground targets to obtain the confidence that the foreground targets correspond to N different types of foreground targets, where N is an integer greater than 1, and the N different types of foreground targets include Articles and non-items;

If the foreground object corresponds to the item with the greatest confidence, it is determined that an item legacy event exists.
An electronic device includes:

Processors, and,

A memory for storing the processor-executable instructions;

The processor is configured to execute the method for detecting an abnormal event according to any one of claims 1-7.
A computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements the method for detecting an abnormal event according to any one of claims 1-7.