CN111553305B

CN111553305B - System and method for identifying illegal videos

Info

Publication number: CN111553305B
Application number: CN202010387707.8A
Authority: CN
Inventors: 徐非凡; 王勇; 高原; 刘希宏; 韩红卫; 苏兴华; 李小成; 徐智锋; 王洁飞; 刘思远; 李富平; 谭宁军; 苏彦平; 陈胜伟
Original assignee: China National Petroleum Corp; CNPC Chuanqing Drilling Engineering Co Ltd
Current assignee: China National Petroleum Corp; CNPC Chuanqing Drilling Engineering Co Ltd
Priority date: 2020-05-09
Filing date: 2020-05-09
Publication date: 2023-05-05
Anticipated expiration: 2040-05-09
Also published as: CN111553305A

Abstract

The invention discloses a system and a method for identifying a violation video, which belong to the field of industrial and civil security monitoring. The system and the method for identifying the illegal video enhance the working efficiency of personnel, enable monitoring personnel to monitor a plurality of cameras at the same time, reduce the working intensity of monitoring personnel, comprehensively cover all construction sites and make up the defects of single, one-sided and unstable existing illegal identification methods.

Description

System and method for identifying illegal videos

Technical Field

The invention belongs to the field of security monitoring, and particularly relates to a system and a method for identifying illegal videos.

Background

Video monitoring technology applications are mainly focused on government departments and special departments and industries of finance, public security, traffic, power, and the like. Among them, government and financial sectors occupy 20.9% and 20.6% of market share, respectively. However, with the progress of social informatization, the demands of video monitoring in more and more industries and fields are greatly increased, and video monitoring technology starts to extend from individual fields of banks, traffic and the like to multiple fields, and the traditional security monitoring is developed to management monitoring and production management monitoring.

The traditional method relies on site management personnel to monitor the behavior safety of operators and check and treat the hidden object trouble, but the hidden object trouble can not be found out in time, the requirement on the monitoring personnel is high, and the actual effect is poor. The existing monitoring mode is that a full-time monitoring person observes an image video to check and correct staff's illegal behaviors and check and correct physical hidden dangers, and generally only can observe images of individual construction sites and twelve cameras simultaneously, so that the monitoring coverage area is narrow and the monitoring efficiency is low.

The existing intelligent monitoring system only stays at the discrimination level of static violations, and for continuous action violations and process violations, discrimination is only carried out from the characteristics of motion tracks, and the system has no mode universality and environmental integrity. The same behavior pattern may not hold when certain factors in the environment change. Therefore, the reliability of the recognition cannot be ensured.

The invention patent application with the application number of 201710408664.5 discloses a security monitoring system, which is one of the security systems and comprises a monitoring server, an alarm module and a video monitoring module, wherein the monitoring server is connected with a local area network, and the local area network is connected with a control module and is accessed to the Internet; the video monitoring module comprises: an infrared camera and a spherical camera for shooting monitoring pictures. The monitoring system architecture can detect abnormal conditions such as fire, illegal invasion and the like and alarm in time. The technical scheme is that the early intelligent monitoring system realizes the identification of certain scene modes, such as flame identification, intrusion identification and the like, by manually customizing parameters, and the intelligent monitoring system can only identify the abnormality of a fixed scene, has weak universality, and has technical defects of insufficient robustness and the like.

Disclosure of Invention

The embodiment of the invention aims to provide a system and a method for identifying illegal videos, which overcome the technical defects.

In order to solve the technical problems, the invention provides a violation video recognition system, which comprises:

the violation template construction module is used for carrying out data labeling and model training according to the original data acquired by the network camera to obtain a target detection model;

the intelligent processing module consists of an image acquisition sub-module and a target tracking sub-module, wherein the image acquisition sub-module acquires original data, converts photons into the original image data through a CMOS sensor chip, compresses the image and inputs the image into the target tracking sub-module; the target tracking sub-module performs feature extraction and feature screening on the collected original image data by loading a target detection model to obtain a target, and records the position of the target in the image by a target record in the previous frame data after the target is obtained to obtain a target track so as to realize target tracking;

the violation judging module consists of a tracking data describing layer, a scene describing layer and a scene identifying layer, wherein the data describing layer carries out merging and arrangement on the output result of the intelligent processing module and then transmits the merged result to the scene describing layer; the scene description layer generates three types of data, namely a time relation data structure, a space relation data structure and a scene element data structure, according to the data provided by the data description layer, and the three types of data are transmitted to the scene recognition layer for violation discrimination after being generated;

the alarm output module consists of a hidden danger judging module, an alarm prompting module and an alarm parameter configuration module; the hidden danger judging module performs redundancy screening output according to the rule breaking judging result and the redundancy screening rule of the rule breaking judging module; the alarm prompting module performs visual display on the judged result; the alarm parameter configuration module is mainly used for setting redundancy screening rules of the judging result.

Further, the composition of the time relation data structure is the sequence of key targets on a time axis and the movement speed thereof; the composition of the spatial relationship data structure is the up-down left-right relationship between key targets on a two-dimensional image; the scene element data structure comprises the types of targets, the number of targets of various types and the positions of the targets in the scene.

Preferably, the scene recognition layer consists of four sub-modules of a scene template, a trigger, a scene recorder and a violation template;

the scene template templates the time, place and character basic composition information contained in a certain specific procedure and action;

the trigger matches the scene information constructed by the data description layer at each moment with the scene template, and when the current scene is successfully matched with the elements in the scene template, the scene recorder submodule is triggered to start recording;

the scene recorder module records interaction information among each key element contained in the scene and matches the interaction information with the violation template; when the matching is successful, outputting a matching result, inquiring a corresponding violation item according to the matching result, and sending the violation item to an alarm output module;

the violation templates template the relationship among time, place, person, cause, pass, result basic composition information, time of occurrence of the target set, composition of the target set in a certain time period and target set elements in a certain time period contained in a certain specific process and action.

Further, the scene recorder module uses cosine similarity to describe similarity of the template vector to the state description.

Preferably, the interaction information comprises the time of occurrence of the target set, the composition of the target set in a certain time period and the relationship among the target set elements in a certain time period.

The invention also provides a method for identifying the illegal video, which comprises the following steps:

step one: violation template construction

According to the original data acquired by the network camera, performing data labeling and model training to obtain a target detection model;

step two: intelligent processing

Collecting original data, converting photons into original image data through a CMOS sensor chip, performing image compression, performing feature extraction and feature screening on the collected original image data to obtain a target, recording the position of the target in an image through target recording in previous frame data after the target is obtained, and obtaining a target track to realize target tracking;

step three: violation identification

Combining and sorting the intelligent processing results, generating three types of data, namely a time relationship data structure, a space relationship data structure and a scene element data structure, and then carrying out violation discrimination;

step four: alarm output

And performing redundancy screening output according to the rule violation judgment result and the redundancy screening rule, and performing visual display on the judgment result.

Further, the rule violation discrimination in the third step comprises the following steps:

templating time, place and character basic composition information contained in a specific process and action;

templating the relationship among time, place, character, cause, pass, result basic composition information, time of occurrence of target set, composition of target set in a certain time period and target set elements in a certain time period contained in a certain specific process and action;

matching scene information constructed by the data description layer at each moment with a scene template, and triggering a scene recorder submodule to start recording when the current scene is successfully matched with elements in the scene template;

the scene recorder module records interaction information among each key element contained in the scene and matches the interaction information with the violation template; and when the matching is successful, outputting a matching result, and inquiring and outputting corresponding violation entries according to the matching result.

The beneficial effects of the invention are as follows:

(1) The violation video recognition system enhances the working efficiency of personnel, so that monitoring personnel can monitor a plurality of cameras at the same time, the working intensity of monitoring professional personnel is reduced, and all construction sites are comprehensively covered.

(2) Compared with the traditional violation identification modeling method, a more comprehensive and basic scene element description and scene recovery mode is provided, and the problems of singleness, unilateralness, instability and the like of the existing violation identification mode are solved. The method can model and identify most of operation procedures and actions, and has universality.

In order to make the above-mentioned objects of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

FIG. 1 is a frame diagram of a violation identification system;

FIG. 2 is a schematic diagram of a violation determination method;

FIG. 3 is a flow chart of a violation method;

fig. 4 is a target element information screen;

FIG. 5 is a schematic diagram one of scenario 1;

FIG. 6 is a schematic diagram II of scenario 1;

FIG. 7 is a schematic diagram of scenario 2;

fig. 8 is a schematic diagram of scenario 3.

Detailed Description

Further advantages and effects of the present invention will become apparent to those skilled in the art from the disclosure of the present specification, by describing the embodiments of the present invention with specific examples.

In the present invention, the up, down, left, and right in the drawings are regarded as the up, down, left, and right of the offending video identification system described in the present specification.

The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the examples described herein, which are provided to fully and completely disclose the present invention and fully convey the scope of the invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, like elements/components are referred to by like reference numerals.

Unless otherwise indicated, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. In addition, it will be understood that terms defined in commonly used dictionaries should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.

First embodiment:

a first embodiment of the present invention relates to a violation video recognition system, see fig. 1, comprising:

the intelligent processing module consists of an image acquisition sub-module and a target tracking sub-module, wherein the image acquisition sub-module acquires original data, converts photons into the original image data through the CMOS sensor chip, compresses the image and inputs the image into the target tracking sub-module; the target tracking sub-module performs feature extraction and feature screening on the collected original image data by loading a target detection model to obtain a target, and records the position of the target in the image by a target record in the previous frame data after the target is obtained to obtain a target track so as to realize target tracking;

Specifically, the violation template construction module performs data annotation and model training by using an existing data annotation tool LabelImg, a model training frame Mxnet, tensorFlow and the like according to the original data acquired by the network camera to obtain a model for target detection;

the intelligent processing module mainly comprises two sub-modules of image acquisition and target tracking. The image acquisition sub-module is responsible for acquiring the original data, converts photons into the original image data through the CMOS sensor chip, compresses the image, and inputs the image into the target tracking sub-module. The target tracking sub-module performs feature extraction on the collected original image data by loading a target detection model, and further performs feature screening on features in the image data to obtain a target. After the target is acquired, the position of the target in the image is recorded through the target record in the previous frame data, and the target track is acquired through updating the position of the previous frame data target, so that the target tracking is realized.

The key point of the violation discrimination is that 6 elements of time, place, person, cause and pass in the scene are restored. Whether a certain working procedure or a specific limb action, the environment must be built around, and the construction is meaningful in a specific scene.

The alarm output module mainly comprises a hidden danger judging module, an alarm prompting module and an alarm parameter configuration module. And the hidden danger judging sub-module performs redundancy screening output on the judging result of the operation procedure and the operation action according to the rule violation judging module. And the alarm prompting sub-module performs visual display on the judged result. The alarm parameter configuration sub-module is mainly used for setting redundancy screening rules of the judging result.

Second embodiment:

the present embodiment relates to a system for identifying a violation video, referring to fig. 1, including:

specifically, the trace data description layer classifies output results of the intelligent processing module into two types of information: a target front and rear frame position information list, a target category information list. Each element in the position information list of the front and rear frames of the targets is composed of two pieces of coordinate information, and the space positions of each target in the front and rear frames of images are recorded; each element in the target category information list is composed of a name describing the target category, and category information of each target in the current state is recorded.

The scene description layer generates three types of data, namely a time relation data structure, a space relation data structure and a scene element data structure, based on the data provided by the tracking data description layer, and the three types of data are transmitted to the next scene recognition layer for discrimination after being generated.

The spatial relationship data structure is composed of the upper, lower, left and right relationship between key targets on the two-dimensional image.

The time relation data structure is composed of the sequence of key targets on a time axis and the movement speed of the key targets.

The scene element data structure is composed of the types of targets appearing in the scene (i.e., all targets that may appear in the scene to be detected that may be separated into individual individuals), the number of targets of each type, and the target location.

The specific generation method of the three types of data comprises the following steps:

1. a spatial relationship description list is constructed, wherein the list consists of spatial relationship description elements among targets, and each spatial relationship description element consists of category names of two targets and spatial distances of the two targets.

The class names of the two targets are taken from class names corresponding to the two targets to be described in the target class information list, and the space distance between the two targets is generated by calculating Euclidean distance by using post-frame coordinates of two elements to be described in the target front-rear frame position information list.

2. A time relation description list is constructed, and elements in the list are used for describing the movement trend of the same target. The target motion trend is generated by calculating the euclidean distance of two coordinates by using the spatial coordinates of each target element in the target front and rear frame position information list on the front and rear frame images.

3. A scene element description list is constructed, each element in the list consists of the category and id of the object appearing in the scene, and the id is the number of the objects in the current list.

The scene recognition layer consists of four sub-modules of a scene template, a trigger, a scene recorder and a violation template.

The scene template submodule templates time, place and character basic composition information contained in a certain specific procedure and action to generate a scene template.

The trigger submodule matches the scene information such as time, place, person and the like constructed by the data description layer at each moment with the scene template. And triggering a scene recorder submodule to start recording after the current scene is successfully matched with the elements in the scene template.

The scene recorder module uses cosine similarity to describe the similarity of the template vector to the state description. The scene recorder module is used for acquiring target elements simultaneously existing in the scene description layer and the violation template by inquiring the scene description layer, recording interaction information between each key element (refer to elements in a target list in the violation template) contained in the scene, recording three types of information, namely the time when a target set appears, the composition of the target set in a certain time period and the relationship among the target set elements in a certain time period, and matching with a time sequence dictionary in the violation template. And when the matching is successful, outputting a matching result. And inquiring the corresponding violation entry according to the matching result, and sending the violation entry to an alarm output module.

The rule breaking template sub-module is used for templating the relationship among target set elements in a certain period of time, a certain place, a person, a cause, a pass and a result which are included in a certain procedure and action and are basically composed of information, namely the occurrence time of a target set. The construction method of the violation template is to generate a target list, a target distance list and a single target time domain displacement list which are contained in the violation operation. The violation templates are accumulated in time domain based on the scene templates and consist of a higher-level time sequence dictionary. The index in the time sequence dictionary is a time stamp, and the elements in the time sequence dictionary are three lists corresponding to the time stamp, wherein:

each element of the target list is composed of a target class name and a corresponding sequence number in the scene;

the distance between the targets and each element of the list is formed by the Euclidean distance between each target in the target list and all other targets on the image;

each element in the single-target time domain displacement list is configured as a spatial distance of each target on the front and rear frame images.

The composition of the time relation data structure is the sequence of key targets on a time axis and the movement speed of the key targets; the composition of the spatial relationship data structure is the relationship between key targets on the two-dimensional image; the composition of the scene element data structure is the kind of targets, the number of targets of each kind and the positions of the targets appearing in the scene.

Third embodiment:

the embodiment provides a method for identifying a violation video, which comprises the following steps:

step one: violation template construction

step two: intelligent processing

step three: violation identification

step four: alarm output

The method for identifying the violation videos comprises 4 steps of constructing a violation template, intelligently processing, judging the violation and outputting an alarm. Original data conversion, information extraction, violation discrimination and alarm output are completed through the combination of the parts. Detecting flow of the violation video identification system:

the system firstly collects the original data through the network camera, performs data labeling and model training, and obtains a model for target detection. On the basis, the original image acquired after model deployment is subjected to target detection processing, a target detection result is used for target tracking, and the violation discrimination of the working procedure and the behavior action is carried out through the output information of the target tracking. After the intelligent processing process is finished, the violation hidden danger judging module judges the intelligent processing result according to the item configuration and hidden danger description content of the violation hidden danger, and finally inputs the behavior data judged as the violation into the alarm processing to alarm. In the alarm processing module, the alarm content and the historical violation data can be managed.

Fourth embodiment:

the embodiment provides a method for identifying a violation video, referring to fig. 2 and fig. 3, including the following steps:

step one: violation template construction

step two: intelligent processing

step three: violation identification

step four: alarm output

Specifically, the rule violation discrimination in the third step comprises the following steps:

The following is a general description of the implementation of the system and method for violations of oil drilling single wellhead operations:

step one: constructing a violation template

(1) Template assembly

Spatial relationship data structure:

the composition is the relationship of the key targets on the two-dimensional image.

In the current violation scene, four targets are included, namely a person, an elevator pin and a drill rod. Its spatial relationship is only used as a counterexample verification and not as a rule-breaking criterion. If the central coordinates of the elevator pin and the central coordinates of the drilling tool are in a certain range, otherwise, the space relation forming the violation is invalid.

Time relation data structure

The composition is the sequence of key targets on a time axis and the movement speed of the key targets.

The lower left corner is assumed to be the origin of the image in the coordinate system describing the image. The longitudinal coordinates of the elevator pin are firstly changed from large to small and then from small to large. When the data describing the elevator pin in the logger conforms to such a changing relationship, the time relationship data template may be successfully matched.

Scene element data structure

In the current violation scene, four targets are included, namely a person, an elevator pin and a drill rod.

Template recording duration

The video is 25 frames per second, and for this violation, the overall job duration is about 1 minute, thus setting the template recording duration to 1500 frames.

Threshold value of violation

And the method is used for judging the similarity between the violation template and the actual scene recorder, and judging the violation when the similarity exceeds a threshold. Here set to 80%

Step two: intelligent processing

The system collects the original data and detects the target to obtain scene information. The data acquisition is carried out by using a special SDK of a camera, and target detection is carried out by selecting a main stream target detection method, such as SSD, fast-rcnn, yolo and the like, according to the complexity of a detection object.

And obtaining the target element information in the picture shown in fig. 4 through an intelligent processing module.

Object name	Scene(s)	Target quantity
			Human body	Drilling platform	1
Elevator	Drilling platform	1
			Elevator pin	Drilling platform	1
Drill rod	Drilling platform	1

Target name paraphrasing:

drilling tool: a cylindrical pipe for use in the drilling industry for delivering liquids into the subsurface.

And (3) hanging an elevator: connecting piece for fixing drilling tool to hoist

Elevator pin: another state of the elevator indicates that the elevator establishes a temporary stable connection with the drill pipe.

Step three: violation identification

Scenario 1-the elevator pin is in the air (FIG. 5), the elevator pin is landed (FIG. 6)

Element (b): elevator, elevator pin, man, drill rod

After the intelligent processing module inputs the original image data, four targets including an elevator, an elevator pin, a person and a drill rod are detected in the scene.

Spatial relationship: the spatial relationship of the scenario is that the elevator pin moves down relative to the position of the person and the elevator, and the elevator is stationary on the rig floor.

Time relationship: according to the composition of the targets and the positions among the targets in the front and back frame image data, the movement track of the key target (the elevator pin moves downwards) can be obtained. Here, to illustrate the implementation, the state description of the intermediate frames of data is skipped.

The current operation scene can be preliminarily judged to be the tripping operation scene by comparing the current information with the template and the violation template information according to the trigger module. Thus, the scope of the violation entry to be triggered can be narrowed. At this time, the scene recorder is triggered to start recording the current element space state and the element time state.

Scene 2-remove Pin replacement Elevator (see FIG. 7)

Element (b): elevator, elevator pin, man, drill rod

Spatial relationship: the spatial relationship of the scene is that the positions of the elevator pin and the elevator are interchanged, and according to the target position coordinate obtained by target detection, the position of the original elevator pin is the elevator, the position of the original elevator is changed into the elevator pin.

Time relationship: from the composition of the objects in the previous image frame data and the locations between the objects, the movement pattern of the key objects (the elevator pin objects disappear and then reappear at the locations of the previous side elevators) can be obtained.

Scene 3-lifting pin lifting (see FIG. 8)

Element (b): elevator, elevator pin, man, drill rod

Spatial relationship: the spatial relationship of the scenario is that the elevator pin is moving up relative to both the elevator and the person, and the elevator is stationary.

And (3) integrating the element contents, the spatial relationship and the scene description information constructed by the time relationship of the three scenes, comparing the scene description information with the corresponding contents in the violation template continuously by the scene recorder, and describing the similarity (the vector length is 1500 frames) of the time sequence dictionary in the violation template and the time sequence dictionary dynamically generated by the scene description layer by using cosine similarity. And dynamically outputting the similarity, comparing the similarity with a threshold value, and when the similarity is larger than a judging threshold value of 80%, indicating that the scene meets the template requirement, and further carrying out logic judgment.

In the current scenario, in a normal working procedure, the number of people in the target set should be at least 2. Since the entire process is always operated by a single person, the operation procedure is determined as a violation in this scenario.

Step four: alarm output

And searching the content of the violation item according to the ID number corresponding to the violation template, printing the violation content on the display, and alarming.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples of carrying out the invention and that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims

1. A violation video recognition system, comprising:

the violation judging module consists of a tracking data describing layer, a scene describing layer and a scene identifying layer, wherein the tracking data describing layer carries out merging and arrangement on the output result of the intelligent processing module and then transmits the merged and arranged result to the scene describing layer; the scene description layer generates three types of data, namely a time relation data structure, a space relation data structure and a scene element data structure, according to the data provided by the tracking data description layer, and the three types of data are transmitted to the scene recognition layer for violation discrimination after being generated;

the composition of the time relation data structure is the sequence of key targets on a time axis and the movement speed of the key targets; the composition of the spatial relationship data structure is the up-down left-right relationship between key targets on a two-dimensional image; the composition of the scene element data structure is the types of targets, the number of targets of various types and the positions of the targets in the scene;

the alarm output module consists of a hidden danger judging module, an alarm prompting module and an alarm parameter configuration module; the hidden danger judging module performs redundancy screening output according to the rule breaking judging result and the redundancy screening rule of the rule breaking judging module; the alarm prompting module performs visual display on the judged result; the alarm parameter configuration module is used for setting redundancy screening rules of the judging result.

2. The offending video identification system of claim 1, wherein: the scene recognition layer consists of four sub-modules of a scene template, a trigger, a scene recorder and a violation template;

the scene template templates the time, place and character basic composition information contained in specific procedures and actions;

the trigger matches the scene information constructed by the tracking data description layer at each moment with a scene template, and when the current scene is successfully matched with elements in the scene template, the scene recorder submodule is triggered to start recording;

the violation templates template the relationship among time, place, person, cause, pass, result basic composition information, time of occurrence of the target set, composition of the target set in a certain time period and target set elements in a certain time period contained in the specific process and action.

3. The offending video identification system of claim 2, wherein: the scene recorder module uses cosine similarity to describe similarity of the template vector to the state description.

4. The offending video identification system of claim 2, wherein: the interaction information comprises the time of the occurrence of the target set, the composition of the target set in a certain time period and the relationship among the target set elements in a certain time period.

5. A method of identifying a violation video of a violation video identification system according to any of claims 1 to 4, comprising the steps of:

step one: violation template construction

step two: intelligent processing

step three: violation identification

step four: alarm output

6. The method of claim 5, wherein the identifying of the violation in step three comprises the steps of:

templating the basic composition information of the time, place and person contained in the specific process and action;

templating the relationship among time, place, character, cause, pass, result basic composition information, time of occurrence of the target set, composition of the target set in a certain time period and target set elements in a certain time period contained in a specific process and action;

matching scene information constructed by the tracking data description layer at each moment with a scene template, and triggering a scene recorder submodule to start recording when the current scene is successfully matched with elements in the scene template;