CN113869137A

CN113869137A - Event detection method and device, terminal equipment and storage medium

Info

Publication number: CN113869137A
Application number: CN202111038795.1A
Authority: CN
Inventors: 聂宗强; 胡金晖; 袁明冬; 张力元; 阮威健; 史周安
Original assignee: Smart City Research Institute Of China Electronics Technology Group Corp
Current assignee: Smart City Research Institute Of China Electronics Technology Group Corp
Priority date: 2021-09-06
Filing date: 2021-09-06
Publication date: 2021-12-31

Abstract

The application relates to the technical field of artificial intelligence, and provides an event detection method, an event detection device, terminal equipment and a storage medium. The event detection method comprises the following steps: acquiring a video frame to be identified; detecting an image area where a specified event contained in a video frame to be identified is located; inputting the image area into a trained event classification model for processing to obtain a classification result of the specified event; and correcting the classification result of the specified event according to the object detection result of the video frame to be identified. By the method, accurate event classification results can be obtained.

Description

Event detection method and device, terminal equipment and storage medium

Technical Field

The present application belongs to the technical field of artificial intelligence, and in particular, to an event detection method, apparatus, terminal device, and storage medium.

Background

The development of information technology and artificial intelligence technology makes digitization and intelligence become the mainstream trend in the future. For example, in city management activities, in order to determine whether a city management event exists in a city management area, a camera is generally preset so as to obtain video information through an equipment head, and the video information is processed through an event detection algorithm to obtain an event classification result, so that a city is managed according to the event classification result, and therefore the purpose of smart city management is achieved. However, the current event detection algorithm generally has the problem of low precision, so that the obtained event classification result has errors, and the development of city management activities is facilitated.

Therefore, it is desirable to provide a method for processing video information to obtain an accurate event classification result.

Disclosure of Invention

The embodiment of the application provides an event detection method, an event detection device, terminal equipment and a storage medium, and aims to solve the problem of how to obtain an accurate event classification result.

In a first aspect, an embodiment of the present application provides an event detection method, including:

acquiring a video frame to be identified;

detecting an image area where a specified event contained in the video frame to be identified is located;

inputting the image area into a trained event classification model for processing to obtain a classification result of the specified event;

and correcting the classification result of the specified event according to the object detection result of the video frame to be identified.

Compared with the prior art, in the event detection method provided by the embodiment of the application, the video frame to be identified is obtained, whether the video frame to be identified contains the image area where the specified event is located is detected, and if the image area where the specified event contained in the video frame to be identified is located is detected, the image area is input into a trained event classification model for processing, so that the classification result of the specified event is obtained. Further, in order to avoid a classification error of the specified event, the object condition in the image area where the specified event is located is known through the object detection result according to the object detection result of the video frame to be identified, so that the classification result of the specified event is corrected according to the known object condition in the image area where the specified event is located, and a correct event classification result corresponding to the image area where the specified event is located is obtained.

In a second aspect, an embodiment of the present application provides an event detection apparatus, including:

the acquisition module is used for acquiring a video frame to be identified;

the detection module is used for detecting an image area where a specified event contained in the video frame to be identified is located;

the classification module is used for inputting the image area into a trained event classification model for processing to obtain a classification result of the specified event;

and the correction module is used for correcting the classification result of the specified event according to the object detection result of the video frame to be identified.

In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the event detection method when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the event detection method.

In a fifth aspect, an embodiment of the present application provides a computer program product, which, when run on a terminal device, causes the terminal device to execute the event detection method described in any one of the above first aspects.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic implementation flow diagram of an event detection method according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a video frame to be identified according to an embodiment of the present application.

Fig. 3 is a flowchart illustrating a specific implementation of step S14 of the event detection method according to an embodiment of the present application.

Fig. 4 is a schematic structural diagram of an event detection device according to an embodiment of the present application.

Fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details.

As used in this specification and the appended claims, the term "if" may be interpreted in context to mean "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

In order to explain the technical solution described in the present application, the following description will be given by way of specific examples.

Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of an event detection method according to an embodiment of the present disclosure. In this embodiment, the event detection method may be applied to a mobile phone, a tablet computer, a desktop computer, a notebook, a palmtop computer, a cloud server, an ultra-mobile personal computer (UMPC), a netbook, and other terminal devices. The embodiment of the present application does not set any limit to the specific type of the terminal device.

As shown in fig. 1, the event detection method provided in the embodiment of the present application includes the following steps:

s11: and acquiring a video frame to be identified.

As an example of the application, the video frame to be recognized is an image obtained by shooting aiming at a target area, so that the situation in the target area can be known through the video frame to be recognized, and a data base is provided for detecting and classifying information in the video frame to be recognized. The target area is a physical area to be known on demand, such as a street monitored by a monitoring camera.

It can be understood that the situation in the target area can be known through the video frame to be identified. For example, fig. 2 shows a video frame to be recognized, which is acquired to know the pedestrian and vehicle conditions on the street.

In application, the number of the acquired video frames to be identified can be one or more.

As a possible implementation manner of this embodiment, a specific implementation manner of acquiring a video frame to be identified may include:

and acquiring a target video.

And sampling the target video according to the specified sampling rate to obtain a video frame to be identified, wherein the specified sampling rate is determined according to the frame rate of the target video.

In this embodiment, the target video is a video acquired by an image pickup apparatus. For example, surveillance video captured by a camera placed at the street.

And the specified sampling rate is used for describing the number of video frames to be identified extracted from the target video every second. It will be appreciated that the specified sampling rate may be preset according to actual requirements. For example, the frame rate of the target video may be 25fps or 30fps, a specified sampling rate is set to be 5 frames per second, that is, within one second, 5 consecutive video frames are extracted from the target video, and each extracted video frame is taken as a video frame to be identified.

It can be understood that, in order to facilitate rapid sampling of the target video, the corresponding relationship between the frame rate of the video and the specified sampling rate is determined in advance, so when the frame rate of one target video is known, the specified sampling rate can be determined according to the frame rate of the target video, so as to facilitate sampling of the target video according to the specified sampling rate to obtain at least one video frame to be identified. For example, when the frame rate of one target video is determined to be 25fps, the corresponding specified sampling rate is 5 frames per second, so that the target video can be sampled based on the specified sampling rate to obtain the video frame to be identified.

In some embodiments, at least one frame of video frame to be identified is determined from each video frame of the target video in a uniform sampling manner. The uniform sampling mode describes that a plurality of continuous video frames are extracted from each video frame of a target video by adopting a set specified sampling rate.

For example, a sampling rate of 5 frames per second is specified, that is, every 1 second, consecutive 5 frames of video frames are extracted from each video frame of the target video, that is, 5 video frames to be identified are extracted.

S12: and detecting an image area where a specified event contained in the video frame to be identified is located.

As an example of the present application, the specified event is an event set according to actual needs. For example, the designated event may be a city management event, such as a pedestrian and a pedestrian-pulled item corresponding to the black box in FIG. 2.

The image area is an area describing a specified event in the video frame to be identified. For example, the black box in fig. 2 corresponds to the image area.

In this embodiment, since the video frame to be recognized can be used to know the situation of one target area, in order to determine whether the specified event exists, the video frame to be recognized may be detected, so as to determine whether the video frame to be recognized includes an image area where the specified event exists.

In an embodiment, in an application, if the number of the video frames to be identified is multiple, the image area where the specified event included in each video frame to be identified is located is detected according to the time sequence of obtaining the video frames to be identified.

In one embodiment of the present application, detecting an image area where a specified event contained in a video frame to be identified is located includes:

and inputting the video frame to be recognized into the trained event detection model for processing to obtain the image area of the specified event in the video frame to be recognized.

As an example of the present application, in order to efficiently perform event detection, after a video frame to be recognized is acquired, the video frame to be recognized is input into a trained event detection model for processing, so as to determine an image area where a specified event is located in the video frame to be recognized.

In this embodiment, a video frame to be recognized is input into a trained event detection model for processing, the position of an image area of a specified event in the video frame to be recognized is determined, and then the image area of the specified event in the video frame to be recognized is obtained based on the position of the image area of the specified event in the video frame to be recognized.

It can be understood that the event detection model may be trained in advance by the terminal device, or a file corresponding to the event detection model may be transplanted to the terminal device after being trained in advance by another device. That is, the execution agent that trains the event detection model may be the same as or different from the execution agent that performs event detection using the event detection model. For example, when the initial event detection model is trained by other devices, after the initial event detection model is trained by other devices, the model parameters of the initial event detection model are fixed to obtain a file corresponding to the event detection model, and then the file is transplanted to the terminal device.

In one embodiment, a sample video frame marked with a sample event is obtained; and taking the sample video frame with the marked sample event as a training set, and training to obtain an event detection model.

In one embodiment, a target video is obtained, and the target video is sampled according to a specified sampling rate to obtain an initial video frame. Inputting an initial video frame into a trained event monitoring model for processing to obtain an event detection result, wherein the event detection result is used for describing whether an image area where a specified event is located is included in the initial video frame; and if the first detection result indicates that the initial video frame contains the image area where the specified event is located, taking the initial video frame as the video frame to be identified.

S13: and inputting the image area into the trained event classification model for processing to obtain a classification result of the specified event.

As one example of the present application, the classification result of a specified event is used to describe the classification of the specified event.

In this embodiment, in order to reduce the amount of computation and quickly classify the video frames to be recognized, after detecting that the video frames to be recognized include the image area where the designated event is located, the image area where the designated event is located is input into the trained event classification model for processing, so as to obtain the classification result of the designated event.

It can be understood that, in order to classify the image region where the specified event is located, when it is detected that the video frame to be recognized contains the image region where the specified event is located, the image region of the specified event is obtained by segmentation from the video frame to be recognized, and then the image region obtained by segmentation is input into a trained event classification model to be processed, so as to obtain a classification result of the specified event.

It can be understood that the event classification model may be trained in advance by the terminal device, or a file corresponding to the event classification model may be transplanted to the terminal device after being trained in advance by another device. That is, the execution subject who trains the event classification model may be the same as or different from the execution subject who performs event classification using the event classification model. For example, when the initial event classification model is trained by other devices, after the initial event classification model is trained by other devices, the model parameters of the initial event classification model are fixed to obtain a file corresponding to the event classification model, and then the file is transplanted to the terminal device.

In an embodiment, a sample image area of a labeled sample classification is obtained, and an event classification model is obtained by training with the sample image area of the labeled sample event as a training set.

S14: and correcting the classification result of the specified event according to the object detection result of the video frame to be identified.

As an example of the present application, the object detection result is used to describe an object existing in the video frame to be identified. For example, as shown in fig. 2, object detection is performed on a video frame to be recognized, and a trailer corresponding to a pedestrian, a vehicle or a pedestrian existing in the video frame to be recognized is determined.

In this embodiment, when an image region is input into a trained event classification model for processing, there may be an error in the classification result of a specified event obtained by processing due to the fact that the trained event classification model is not accurate enough, and therefore, it is further necessary to detect a video frame to be recognized to obtain an object detection result of the video frame to be recognized, so as to determine whether an object included in the image region corresponding to the specified event is classified incorrectly according to the object condition described in the object detection result, and further correct the classification result of the specified event, so as to obtain a correct classification result of the image region of the specified event.

For example, as shown in fig. 2, the image area corresponding to the black frame in fig. 2 is the image area corresponding to the specified event. And inputting the trained event classification model into the image area for processing, and determining that the image area corresponding to the black frame is a city management event, namely, false identification exists, because the vehicle-mounted materials in the image area corresponding to the black frame are regarded as randomly-stacked garbage. Therefore, in order to correct the situation, object detection is performed on the video frame to be recognized, so that the object detection situation in the image area corresponding to the black frame is obtained, that is, it is determined that people and vehicle-mounted materials are contained in the image area corresponding to the black frame, not the city management event.

In the event detection method provided by the embodiment of the application, a video frame to be identified is obtained, whether the video frame to be identified contains an image area where a specified event is located is detected, and if the image area where the specified event is located is detected, the image area is input into a trained event classification model for processing, so that a classification result of the specified event is obtained. Further, in order to avoid a classification error of the specified event, the object condition in the image area where the specified event is located is known through the object detection result according to the object detection result of the video frame to be identified, so that the classification result of the specified event is corrected according to the known object condition in the image area where the specified event is located, and a correct event classification result corresponding to the image area where the specified event is located is obtained.

In an embodiment of the present application, a specific implementation manner of inputting an image region into a trained event classification model for processing to obtain a classification result of a specified event may include:

if a plurality of image areas of the specified event are located in the video frame to be identified, respectively determining a position difference value corresponding to each image area, wherein the position difference value refers to a difference value between the position of the image area in the video frame to be identified and the position of the image area corresponding to the image area in the video frame of the previous frame, and the video frame of the previous frame refers to another video frame containing the image area of the specified event in the target video.

And determining the image area of which the corresponding position difference value is smaller than the first set threshold value in the plurality of image areas as the target image area.

And inputting the target image area into the event classification model for processing to obtain a classification result of the specified event.

In this embodiment, because the video frame to be recognized can have one target area, a plurality of specified events exist in the target area, so that the video frame to be recognized describing the target area includes image areas corresponding to the plurality of specified events respectively. Moreover, since each video frame to be identified may have a change in the designated event in the target area due to the fact that each video frame to be identified describes the target area at a different time point during the process of continuously acquiring the video frames to be identified, in order to determine the change of the designated event in the target area, the position difference corresponding to each image area is determined, that is, the position difference between the position information of the image area of the designated event in the current video frame to be identified and the position information of the image area of the designated event in the previous video frame is calculated. Further, the position difference value corresponding to each image area is compared with a first set threshold, and when the position difference value is smaller than the first set threshold, it indicates that the change of the designated event in the target area is not obvious, and may be an event to be monitored, so the image area with the position difference value smaller than the first set threshold is determined as the target image area, and the target image area is input into an event classification model for processing, so as to obtain the classification result of the designated event.

It is understood that when the position difference is greater than or equal to the first set threshold, the change indicating the occurrence of the specified event in the target area is significant and may not be an event to be monitored, and therefore, the image area having the position difference greater than or equal to the first set threshold is not the target image area.

For example, the first video frame to be identified includes image areas corresponding to the designated event a, the designated event B, and the designated event C, and meanwhile, the previous video frame corresponding to the first video frame to be identified also includes image areas corresponding to the designated event a, the designated event B, and the designated event C. Therefore, in order to determine the change conditions of the specified event a, the specified event B, and the specified event C in the first video frame to be recognized, the position difference values corresponding to the image areas respectively corresponding to the specified event a, the specified event B, and the specified event C in the first video frame to be recognized are respectively determined, that is, the position of the image area of the specified event a included in the first video frame to be recognized and the position of the image area corresponding to the specified event a in the previous video frame are subjected to difference operation, so as to obtain the position difference value corresponding to the image area of the specified event a, and by analogy, the position difference value corresponding to the image area of the specified event B and the position difference value corresponding to the image area of the specified event C are respectively obtained.

It can be understood that, in determining the position difference value corresponding to each image region, the position difference value corresponding to the image region is obtained by performing difference operation on the coordinates of the position of the image region in the video frame to be identified and the coordinates of the position of the image region corresponding to the image region in the previous frame of video frame.

In an embodiment, in order to calculate a position difference value corresponding to each image area, the video frame to be recognized and a previous frame of video frame corresponding to the video frame to be recognized are placed in the same coordinate system, so that the position difference value corresponding to the image area is calculated according to coordinates of the image area in the video frame to be recognized and coordinates of the image area in the corresponding previous frame of video frame, and therefore, the change condition of a specified event in a target area in a time period from a time point of acquiring the previous frame of video frame to a time point of acquiring the video frame to be recognized is determined according to the position difference value, and whether the image area of the specified event needs to be processed is further determined. When the position difference is greater than or equal to the first set threshold, the change of the designated event in the target area is obvious, and the designated event may not be an event to be monitored, so that the image area of which the position difference is greater than or equal to the first set threshold does not need to be classified, and the efficiency of classifying the image area of the designated event contained in the video frame to be identified can be effectively improved.

In an embodiment, after the video frame to be identified is acquired and an image area where a specified event included in the video frame to be identified is located is detected, the method further includes:

and if the video frame to be identified is not the first frame video frame, determining the time difference between the acquisition time of the current video frame to be identified and the acquisition time of the previous frame video frame.

And determining the position difference value between the position information of the image area of the specified event in the current video frame to be identified and the position information of the image area of the specified event in the previous video frame.

And acquiring a first set threshold according to the time difference.

And determining the image area with the position difference value smaller than the first set threshold value as the target image area.

In this embodiment, since the designated event in the target area may change, for example, the position of the designated event in the target area moves, in the time period from the acquisition time of the last frame of video frame to the acquisition time of the current video frame to be identified, different first set thresholds are preset according to different time difference values in advance, so that the first set thresholds serve as references for judging the change situation of the position of the designated event.

In one embodiment of the present application, the object detection result of the video frame to be recognized is obtained by:

and inputting the video frame to be recognized into the trained object detection model for processing to obtain an object detection result, wherein the object detection result is used for indicating whether a target object exists in the image area.

As one example of the present application, a target object is an object that exists in a target area and is recorded in a video frame to be recognized by photographing the target area by an image pickup device. For example a pedestrian or a vehicle as shown in figure 2.

In the embodiment, in order to reduce the amount of calculation and efficiently determine the object included in the video frame to be recognized, the video frame to be recognized is input into the trained object detection model for processing, and an object detection result indicating whether the target object exists in the image area is obtained.

It can be understood that the object detection model may be trained in advance by the terminal device, or a file corresponding to the object detection model may be transplanted to the terminal device after being trained in advance by another device. That is, the execution subject who trains the object detection model may be the same as or different from the execution subject who performs object detection using the object detection model. For example, when the initial object detection model is trained by using other devices, after the initial object detection model is trained by the other devices, the model parameters of the initial object detection model are fixed to obtain a file corresponding to the object detection model, and then the file is transplanted to the terminal device.

In one embodiment, a marked sample event and a sample video frame of an object in an image area where the marked sample event is located are obtained; and taking the marked sample event and the sample video frame of the object in the image area where the marked sample event is located as a training set, and training to obtain an object detection model.

With reference to fig. 3, in an embodiment of the present application, a specific implementation manner of correcting a classification result of a specified event according to an object detection result of a video frame to be identified includes:

s21: and if the object detection result indicates that the target object exists in the image area, respectively calculating a first area of the image area and a second area occupied by the target object in the image area.

S22: and calculating an overlap degree parameter according to the first area and the second area.

S23: and correcting the classification result of the specified event according to the overlapping degree parameter.

In the present embodiment, the overlap degree parameter refers to a ratio between a portion where the second area occupied by the target object in the image area overlaps with the first area of the image area and the first area.

In this embodiment, since the image area is input into the trained event classification model for processing, when the image area where the specified event is located is classified, there is a possibility that the classification is wrong, therefore, in order to make the classification of the image area where the specified event is located more accurate, the object detection is performed on the video frame to be identified, and when the object detection result indicates that the target object exists in the image region, it indicates that the image region may not be the region in which the specified event is located, it is necessary to separately calculate a first area of the image area and a second area occupied by the target object in the image area, so as to calculate the overlap parameter, thereby determining whether the target object is a main component of the image area through the overlap degree parameter, therefore, the relation between the target object and other parts in the image area is deduced, and the classification result of the specified event is corrected.

For example, as shown in fig. 2, the black frame in fig. 2 is an image area where the specified event is located, a first area corresponding to the black frame is calculated, and then a second area occupied by a person in the black frame in the image area is calculated. And calculating to obtain an overlapping degree parameter according to the first area corresponding to the black frame and the second area occupied by the person in the black frame. For example, the first area is 1 square meter, the second area is 0.5 square meter, that is, the calculated overlap parameter is 50%, that is, the image area corresponding to the black frame also represents the main component of the image area in fig. 2, so that the image area is previously classified as an error in the city management event, and the classification result of the city management event is corrected to a normal event according to the estimation.

In an embodiment of the present application, a specific implementation manner of modifying the classification result of the specified event according to the overlap parameter includes:

and if the overlapping degree parameter is greater than or equal to a second set threshold value, determining that the classification result of the specified event is correct.

And if the overlapping degree parameter is smaller than a second set threshold value, determining that the classification result of the specified event is wrong, and correcting the classification result of the specified event.

In this embodiment, the second setting threshold may be preset according to actual requirements.

It is to be understood that, since whether or not the target object is a main component of the image area where the specified event is located can be understood by the overlap degree parameter, in order to determine whether or not the target object is a main component of the image area when the target object exists in the image area, the overlap degree parameter is compared with the second set threshold value. If the overlapping degree parameter is larger than or equal to the second set threshold, the target object is not the main component of the image area, and the classification result of the designated event is correct. Otherwise, if the overlapping degree parameter is smaller than the second set threshold, determining that the classification result of the specified event is wrong, and correcting the classification result of the specified event.

In one embodiment, after the classification result of the specified event is corrected, the corrected classification result of the specified event is fed back.

In this embodiment, when the classification result of the corrected specified event is fed back, the feedback may be performed by the terminal device, or may be performed by another electronic device connected to the terminal device. For example, after correcting the classification result of the specified event, the terminal device feeds back the corrected classification result of the specified event to the server.

For example, as shown in fig. 2, after the image area corresponding to the black frame in fig. 2 is processed, the classification result of the specified event is obtained as a city management event. After the object detection is performed on the graph 2, it is determined that the overlapping degree parameter corresponding to the black frame is smaller than the second set threshold, that is, the classification result of the specified event corresponding to the graph 2 is wrong, but the specified event is a normal event, so that after the classification result of the specified event is corrected, the terminal device feeds the corrected classification result back to the server of the smart city management system, so that a city manager can conveniently view the message. On the contrary, if it is determined that the overlapping degree parameter corresponding to the black frame is greater than or equal to the second set threshold, the classification result of the designated event is correct, and feedback needs to be performed to the server, so that the designated event can be conveniently processed by the city manager.

When the overlap degree parameter is smaller than the second set threshold, it is determined that the classification result of the specified event is incorrect, that is, the event corresponding to the image area is a normal event, or feedback may not be performed, that is, the classification result of the specified event is corrected, or information push may not be performed. Otherwise, if the overlapping degree parameter is greater than or equal to the second set threshold, it is determined that the classification result of the specified event is correct, and in order for the relevant manager to quickly process the specified event, information needs to be pushed, so that the relevant manager can conveniently view the information and process the specified event.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 5 shows a block diagram of an event detection device provided in the embodiment of the present application, which corresponds to the event detection method in the above embodiment, and only shows the relevant parts in the embodiment of the present application for convenience of description.

Referring to fig. 5, the apparatus 100 includes:

an obtaining module 101, configured to obtain a video frame to be identified;

the detection module 102 is configured to detect an image area where a specified event included in a video frame to be identified is located;

the classification module 103 is configured to input the image region into the trained event classification model for processing, so as to obtain a classification result of the specified event;

and the correcting module 104 is configured to correct the classification result of the specified event according to the object detection result of the video frame to be identified.

In an embodiment, the detection module 102 is further configured to input a video frame to be recognized into a trained event detection model for processing, so as to obtain an image area where a specified event is located in the video frame to be recognized.

In an embodiment, the classification module 103 is further configured to determine a position difference value corresponding to each image area if there are a plurality of image areas where the specified event is located in the video frame to be identified, where the position difference value refers to a difference value between a position of an image area in the video frame to be identified and a position corresponding to the image area in a previous frame of video frame, where the previous frame of video frame refers to another video frame in the target video that includes the image area where the specified event is located; determining an image area of which the corresponding position difference value is smaller than a set threshold value in the plurality of image areas as a target image area; and inputting the target image area into the event classification model for processing to obtain a classification result of the specified event.

In one embodiment, the apparatus 100 further comprises an object detection module.

And the object detection module is used for inputting the video frame to be recognized into the trained object detection model for processing to obtain an object detection result, and the object detection result is used for indicating whether a target object exists in the image area.

In an embodiment, the modification module 104 is further configured to, if the object detection result indicates that the target object exists in the image region, respectively calculate a first area of the image region and a second area occupied by the target object in the image region; calculating an overlap degree parameter according to the first area and the second area; and correcting the classification result of the specified event according to the overlapping degree parameter.

In an embodiment, the modification module 104 is further configured to determine that the classification result of the specified event is correct if the overlap parameter is greater than or equal to the set threshold; and if the overlapping degree parameter is smaller than the set threshold, determining that the classification result of the specified event is wrong, and correcting the classification result of the specified event.

In an embodiment, the obtaining module 101 is further configured to obtain a target video; and sampling the target video according to the specified sampling rate to obtain a video frame to be identified, wherein the specified sampling rate is determined according to the frame rate of the target video.

The event detection apparatus provided in this embodiment is configured to implement any event detection method in the method embodiments, where the functions of each module may refer to corresponding descriptions in the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

Fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 5, the terminal device 5 of this embodiment includes: at least one processor 50 (only one processor is shown in fig. 5), a memory 51, and a computer program 52 stored in the memory 51 and executable on the at least one processor 50, the steps of any of the various event detection method embodiments described above being implemented when the computer program 52 is executed by the processor 50.

The terminal device 5 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 50, a memory 51. Those skilled in the art will appreciate that fig. 5 is only an example of the terminal device 5, and does not constitute a limitation to the terminal device 5, and may include more or less components than those shown, or combine some components, or different components, such as an input-output device, a network access device, and the like.

The Processor 50 may be a Central Processing Unit (CPU), and the Processor 50 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), off-the-shelf Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 51 may in some embodiments be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5. The memory 51 may also be an external storage device of the terminal device 5 in other embodiments, such as a plug-in hard disk provided on the terminal device 5, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 51 may also include both an internal storage unit of the terminal device 5 and an external storage device. The memory 51 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of a computer program. The memory 51 may also be used to temporarily store data that has been output or is to be output.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

An embodiment of the present application further provides a terminal device, where the terminal device includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various method embodiments described above when executing the computer program.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps in the above-mentioned method embodiments may be implemented.

The embodiments of the present application provide a computer program product, which when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments when executed.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer readable storage medium and used by a processor to implement the steps of the embodiments of the methods described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to an apparatus/terminal device, recording medium, computer Memory, Read-Only Memory (ROM), Random-Access Memory (RAM), electrical carrier wave signals, telecommunications signals, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. An event detection method, comprising:

acquiring a video frame to be identified;

2. The event detection method according to claim 1, wherein detecting the image area where the specified event contained in the video frame to be identified is located comprises:

and inputting the video frame to be recognized into a trained event detection model for processing to obtain an image area of the specified event in the video frame to be recognized.

3. The event detection method of claim 1, wherein the inputting the image region into a trained event classification model for processing to obtain the classification result of the specified event comprises:

if the image areas of the specified event in the video frame to be identified are multiple, respectively determining a position difference value corresponding to each image area, wherein the position difference value refers to a difference value between the position of the image area in the video frame to be identified and the position corresponding to the image area in a previous frame of video frame, and the previous frame of video frame refers to another video frame of the target video containing the image area of the specified event;

determining an image area of which the corresponding position difference value is smaller than a first set threshold value in the plurality of image areas as a target image area;

4. The event detection method according to claim 1, wherein the object detection result of the video frame to be recognized is obtained by:

and inputting the video frame to be recognized into a trained object detection model for processing to obtain the object detection result, wherein the object detection result is used for indicating whether a target object exists in the image area.

5. The event detection method according to claim 4, wherein the modifying the classification result of the specified event according to the object detection result of the video frame to be recognized comprises:

if the object detection result indicates that the target object exists in the image area, respectively calculating a first area of the image area and a second area occupied by the target object in the image area;

calculating an overlap degree parameter according to the first area and the second area;

and correcting the classification result of the specified event according to the overlapping degree parameter.

6. The event detection method according to claim 5, wherein the modifying the classification result of the specified event according to the overlap degree parameter comprises:

if the overlapping degree parameter is larger than or equal to a second set threshold value, determining that the classification result of the specified event is correct;

and if the overlapping degree parameter is smaller than the second set threshold, determining that the classification result of the specified event is wrong, and correcting the classification result of the specified event.

7. The event detection method according to any one of claims 1 to 6, wherein said obtaining a video frame to be identified comprises:

acquiring a target video;

and sampling the target video according to a specified sampling rate to obtain the video frame to be identified, wherein the specified sampling rate is determined according to the frame rate of the target video.

8. An event detection device, comprising:

the acquisition module is used for acquiring a video frame to be identified;

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the event detection method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the event detection method according to any one of claims 1 to 7.