CN112200067B

CN112200067B - Intelligent video event detection method, system, electronic equipment and storage medium

Info

Publication number: CN112200067B
Application number: CN202011072563.3A
Authority: CN
Inventors: 何颂颂; 陶剑文; 但雨芳; 季谋
Original assignee: Ningbo Polytechnic
Current assignee: Ningbo Polytechnic
Priority date: 2020-10-09
Filing date: 2020-10-09
Publication date: 2024-02-02
Anticipated expiration: 2040-10-09
Also published as: CN112200067A

Abstract

The invention provides an intelligent video event detection method, an intelligent video event detection system, electronic equipment and a storage medium, wherein the method is characterized in that a video source is obtained from a database, the video source is labeled, and a corresponding preset video template is determined according to the label of the video source; determining a key frame of a video source, splitting the video source into a plurality of video clips with the same number of frames according to the key frame, scoring and sorting the video clips according to a preset video template based on a preset detection model, and determining the video clip with the highest scoring as a target video clip; and when the highest score of the scoring is greater than or equal to a preset threshold value, determining frames with the same time stamp in the target video segment and corresponding to the marked frames in the preset video template as event evidence graphs, and simultaneously determining the corresponding event types. The event detection method and device can intelligently and automatically detect and output the detection result of the events in the video, and meanwhile, the detection result has key image evidence of the events, so that the event detection efficiency is improved, and the event detection is more accurate and reliable.

Description

Intelligent video event detection method, system, electronic equipment and storage medium

Technical Field

The invention belongs to the technical field of computers, and particularly relates to an intelligent video event detection method, an intelligent video event detection system, electronic equipment and a storable medium.

Background

With the improvement of the social economy level, intelligent detection of video content is further specifically applied to specific scenes to realize the technology of achieving the event judgment of the specific scenes, such as video detection of traffic accidents, video monitoring of anti-theft security, statistics of traffic flows and the like. The intelligent video event detection technology can be used for improving the detection efficiency to a great extent.

At present, the identification and detection of video events mainly rely on manual screening and browsing, or only part of video events can be identified, and the video events can not be intelligently and rapidly identified by manual screening, so that event information of specific scenes can be judged, and the detection efficiency is low.

Disclosure of Invention

A first objective of the embodiments of the present invention is to provide an intelligent video event detection method, which aims to solve the problem of low video event detection efficiency at present.

The embodiment of the invention is realized in such a way that an intelligent video event detection method comprises the following steps:

obtaining a video source from a database, marking the video source, and determining a corresponding preset video template according to the label of the video source;

determining a key frame of the video source, splitting the video source into a plurality of video clips with the same number of frames according to the key frame, scoring and sorting the video clips according to the preset video template based on a preset detection model, and determining the video clip with the highest score as a target video clip;

and when the highest score of the scoring is greater than or equal to a preset threshold value, determining frames with the same time stamp in the target video segment and corresponding to the marked frames in the preset video template as event evidence graphs, and simultaneously determining the corresponding event types.

In one embodiment, the obtaining the video source from the database, labeling the video source, and determining the corresponding preset video template according to the label of the video source includes: obtaining a video to be detected from a system database, marking the video source according to basic information of the video to be detected, and selecting a corresponding preset video template from a preset video template library according to the mark of the video source; the video to be detected is the video with earliest shooting time in all the videos to be detected in the data, and the basic information comprises video shooting position information, video shooting time information and equipment information of the corresponding video shooting device.

In one embodiment, the preset video template includes a plurality of videos having labels and the target video clip has the same frame, and the frames of the preset video template are subjected to gray scale processing, and the labels of the preset video template include position information.

In one embodiment, the selecting the corresponding preset video template from the preset video template library according to the label of the video source includes; traversing the preset video template library according to the position information of the video source, and determining the preset video template with the same position information as the position information of the video source as the corresponding preset video template.

In one embodiment, the scoring the plurality of video clips according to the preset video template based on a preset detection model includes: and graying each frame of the video clips, carrying out similarity calculation on each frame of the video clips and frames with the same time stamp of a preset video template based on an image similarity algorithm model to obtain the similarity of each frame, carrying out weighted calculation on all frames of the video clips to obtain the similarity of the video clips and the preset video template, and grading and sorting the video clips according to the similarity, wherein the higher the similarity is, the higher the grading is, and the higher the grading is, the more the grading is.

Another object of an embodiment of the present invention is to provide an intelligent video event detection system, including:

the video acquisition unit is used for acquiring a video source from the database, labeling the video source and determining a corresponding preset video template according to the label of the video source;

the target video determining unit is used for determining key frames of the video source, splitting the video source into a plurality of video clips with the same number of frames according to the key frames, scoring and sorting the video clips according to the preset video template based on a preset detection model, and determining the video clip with the highest scoring as a target video clip;

and the detection result determining unit is used for determining frames with the same time stamp corresponding to the marked frames in the preset video template in the target video segment as event evidence graphs and determining corresponding event types when the highest score of the scores is larger than or equal to a preset threshold value.

It is a further object of an embodiment of the present invention to provide an electronic device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the intelligent video event detection method.

Yet another object of an embodiment of the present invention is a computer-readable storage medium, on which a computer program is stored, which when executed by a processor causes the processor to perform the steps of the intelligent video event detection method.

According to the intelligent video event detection method provided by the embodiment of the invention, a video source is obtained from a database, the video source is labeled, and a corresponding preset video template is determined according to the label of the video source; determining a key frame of the video source, splitting the video source into a plurality of video clips with the same number of frames according to the key frame, scoring and sorting the video clips according to the preset video template based on a preset detection model, and determining the video clip with the highest score as a target video clip; and when the highest score of the scoring is greater than or equal to a preset threshold value, determining frames with the same time stamp in the target video segment and corresponding to the marked frames in the preset video template as event evidence graphs, and simultaneously determining the corresponding event types. The event detection method and the event detection device can intelligently and automatically detect and output the detection result of the events in the video, and meanwhile, the detection result has key image evidence of the events, so that on one hand, the event detection efficiency is improved, and on the other hand, the event detection is more accurate and reliable.

Drawings

Fig. 1 is a flowchart of an implementation of an intelligent video event detection method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of main modules of an intelligent video event detection system according to an embodiment of the present invention;

FIG. 3 is a diagram of an exemplary system architecture to which embodiments of the present invention may be applied;

fig. 4 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, etc. may be used in embodiments of the present invention to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another.

It is noted that embodiments of the invention and features of the embodiments may be combined with each other without conflict.

In order to further describe the technical means and effects adopted by the present invention for achieving the intended purpose, the following detailed description is given of the specific embodiments, structures, features and effects according to the present invention with reference to the accompanying drawings and preferred embodiments.

Fig. 1 shows an implementation flow of an intelligent video event detection method according to an embodiment of the present invention, and for convenience of explanation, only the relevant parts of the embodiment of the present invention are shown, which is described in detail below:

an intelligent video event detection method, comprising:

s101: obtaining a video source from a database, marking the video source, and determining a corresponding preset video template according to the label of the video source;

s102: determining a key frame of the video source, splitting the video source into a plurality of video clips with the same number of frames according to the key frame, scoring and sorting the video clips according to the preset video template based on a preset detection model, and determining the video clip with the highest score as a target video clip;

s103: and when the highest score of the scoring is greater than or equal to a preset threshold value, determining frames with the same time stamp in the target video segment and corresponding to the marked frames in the preset video template as event evidence graphs, and simultaneously determining the corresponding event types.

In step S101, a video source is obtained from a database, the video source is labeled, and a corresponding preset video template is determined according to the label of the video source, so that a video shot by a video shooting device can be obtained for detection, and the obtained video source is labeled, so that a preset video template corresponding to the label is selected from a preset video template library according to the label.

In one embodiment, the obtaining the video source from the database, labeling the video source, and determining the corresponding preset video template according to the label of the video source includes: obtaining a video to be detected from a system database, marking the video source according to basic information of the video to be detected, and selecting a corresponding preset video template from a preset video template library according to the mark of the video source; the video to be detected is the video with earliest shooting time in all the videos to be detected in the data, and the basic information comprises video shooting position information, video shooting time information and equipment information of the corresponding video shooting device. Therefore, basic information of the video source, such as the source of the video source (equipment information, which equipment shoots), geographical position information, shooting time information and the like, can be determined through the tag, a preset video template matched with the tag can be selected from a preset video template library through the tag, the specific geographical position corresponding to the event type can be determined through the tag, and when the event needs to be traced, tracing can be performed according to the source of the video source reflected by the tag.

Specifically, for example, a video source is obtained, the video source is shot by the equipment a at the intersection B of the street X, the shooting time is 9 am of the 3 nd month No. 2 of 2020, then the video source can be labeled as the "X street intersection B, 9 am of the 3 nd month No. 2, a", and a video template with the label of the X street intersection B is matched in a preset video template library as a corresponding preset template.

Therefore, a video source is obtained from a database, the video source is labeled, and a corresponding preset video template is determined according to the label of the video source; determining a key frame of the video source, splitting the video source into a plurality of video clips with the same number of frames according to the key frame, scoring and sorting the video clips according to the preset video template based on a preset detection model, and determining the video clip with the highest score as a target video clip; and when the highest score of the scoring is greater than or equal to a preset threshold value, determining frames with the same time stamp in the target video segment and corresponding to the marked frames in the preset video template as event evidence graphs, and simultaneously determining the corresponding event types. The intelligent video event detection method can intelligently and automatically detect the events in the video and output detection results, and meanwhile, the detection results have key image evidence of the events, so that on one hand, the event detection efficiency is improved, and on the other hand, the event detection is more accurate and reliable.

In step S102: determining a key frame of the video source, splitting the video source into a plurality of video clips with the same number of frames according to the key frame, scoring and sorting the video clips according to the preset video template based on a preset detection model, and determining the video clip with the highest score as a target video clip, thereby determining whether the video source is a video meeting the event type or not, and not preparing for subsequent processing.

In step S103: and when the highest score of the scoring is greater than or equal to a preset threshold value, determining frames with the same time stamp in the target video segment and corresponding to the marked frames in the preset video template as event evidence graphs, and simultaneously determining the corresponding event types.

Here, the preset threshold may be set according to a specific scene, for example, in terms of monitoring of the vehicle violation information, the preset threshold may be set to 95%, and in the event detection, may be set to 80%.

In one embodiment, the determined time evidence graph and the corresponding event type may be generated and displayed on the client, and the interface information may include the evidence graph, the event time and the event type, for example, when the video source is a line pressing running of a vehicle at 9 am in month 2 in 2020 at the intersection B of the street X, the interface information includes the evidence graph, the line pressing time and the line pressing violation of the line pressing of the vehicle.

Fig. 2 is a schematic diagram of main modules of an intelligent video event detection system according to an embodiment of the present invention, and for convenience of explanation, only the portions relevant to the embodiment of the present invention are shown, which is described in detail below:

an intelligent video event detection system 200, comprising:

a video acquisition unit 201, configured to acquire a video source from a database, tag the video source, and determine a corresponding preset video template according to the tag of the video source;

a target video determining unit 202, configured to determine a key frame of the video source, split the video source into a plurality of video segments with the same number of frames according to the key frame, score and rank the plurality of video segments according to the preset video template based on a preset detection model, and determine a video segment with the highest score as a target video segment;

and the detection result determining unit 203 is configured to determine, when the highest score of the scores is greater than or equal to a preset threshold, a frame with the same timestamp in the target video segment corresponding to the marked frame in the preset video template as an event evidence graph, and determine a corresponding event type.

Thus, the intelligent video event detection system 200 provided in the embodiment of the present invention includes: a video acquisition unit 201, configured to acquire a video source from a database, tag the video source, and determine a corresponding preset video template according to the tag of the video source; a target video determining unit 202, configured to determine a key frame of the video source, split the video source into a plurality of video segments with the same number of frames according to the key frame, score and rank the plurality of video segments according to the preset video template based on a preset detection model, and determine a video segment with the highest score as a target video segment; and the detection result determining unit 203 is configured to determine, when the highest score of the scores is greater than or equal to a preset threshold, a frame with the same timestamp in the target video segment corresponding to the marked frame in the preset video template as an event evidence graph, and determine a corresponding event type. The event detection method and the event detection device can intelligently and automatically detect and output the detection result of the events in the video, and meanwhile, the detection result has key image evidence of the events, so that on one hand, the event detection efficiency is improved, and on the other hand, the event detection is more accurate and reliable.

Fig. 3 illustrates an exemplary system architecture 500 in which the detection method or detection apparatus of embodiments of the present invention may be applied.

As shown in fig. 3, the system architecture 500 may include terminal devices 501, 502, 503, a network 504, and a server 505. The network 504 is used as a medium to provide communication links between the terminal devices 501, 502, 503 and the server 505. The network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 505 via the network 504 using the terminal devices 501, 502, 503 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc., may be installed on the terminal devices 501, 502, 503.

The terminal devices 501, 502, 503 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 505 may be a server providing various services, such as a background management server providing support for incoming and outgoing messages sent by the user with the terminal devices 501, 502, 503. The background management server can perform analysis and other processes after receiving the terminal equipment request, and feed back the processing result to the terminal equipment.

It should be noted that, the method for detecting an intelligent video event provided by the embodiment of the present invention may be executed by the server 505 or may be executed by the terminal devices 501, 502, 503, and accordingly, the intelligent video event detection system may be executed by the server 505 or may be executed by the terminal devices 501, 502, 503.

It should be understood that the number of terminal devices, networks and servers in fig. 3 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 4, there is illustrated a schematic diagram of a computer system 600 suitable for use in implementing an electronic device of an embodiment of the present invention. The computer system shown in fig. 4 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in fig. 4, the computer system 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 601.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present invention may be implemented in software or in hardware. The described units may also be provided in a processor, for example, described as: a processor includes a determination unit, an extraction unit, a training unit, and a screening unit. Wherein the names of the units do not constitute a limitation of the unit itself in some cases, e.g. the determining unit may also be described as "unit determining the candidate set of users".

The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. An intelligent video event detection method, comprising:

when the highest score of the scoring is greater than or equal to a preset threshold value, determining frames with the same time stamp in the target video segment and corresponding to the marked frames in the preset video template as event evidence graphs, and determining corresponding event types;

the step of obtaining the video source from the database, the step of marking the video source, and the step of determining the corresponding preset video template according to the mark of the video source comprises the following steps: obtaining a video to be detected from a system database, marking the video source according to basic information of the video to be detected, and selecting a corresponding preset video template from a preset video template library according to the mark of the video source; the video to be detected is the video with earliest shooting time in all the videos to be detected in the data, and the basic information comprises video shooting position information, video shooting time information and equipment information of a corresponding video shooting device;

the preset video template comprises a plurality of videos with labels and the target video segment has the same frame, the frames of the preset video template are subjected to gray scale processing, and the labels of the preset video template comprise position information;

the selecting of the corresponding preset video template from a preset video template library according to the label of the video source comprises the following steps; traversing the preset video template library according to the position information of the video source, and determining the preset video template with the same position information as the position information of the video source as a corresponding preset video template;

the scoring and sorting the video clips according to the preset video template based on the preset detection model comprises the following steps: and graying each frame of the video clips, carrying out similarity calculation on each frame of the video clips and frames with the same time stamp of a preset video template based on an image similarity algorithm model to obtain the similarity of each frame, carrying out weighted calculation on all frames of the video clips to obtain the similarity of the video clips and the preset video template, and grading and sorting the video clips according to the similarity, wherein the higher the similarity is, the higher the grading is, and the higher the grading is, the more the grading is.

2. An intelligent video event detection system, comprising:

the detection result determining unit is used for determining frames with the same time stamp corresponding to the marked frames in the preset video template in the target video segment as event evidence graphs and determining corresponding event types when the highest score of the scores is larger than or equal to a preset threshold value;

3. An electronic device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of the intelligent video event detection method of claim 1.

4. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, causes the processor to perform the steps of the intelligent video event detection method of claim 1.