Disclosure of Invention
Accordingly, the present invention is directed to a method and system for multi-modal intelligent analysis for law enforcement supervision, which solves at least one of the above-mentioned problems.
In order to achieve the above purpose, a multi-mode intelligent analysis method based on law enforcement supervision comprises the following steps:
step S1, acquiring a law enforcement site video file from a law enforcement supervision recorder platform, and performing audio and video segmentation processing on the law enforcement site video file to obtain a law enforcement site audio clip and a law enforcement site image video frame clip;
step S2, performing time stamp and spatial position extraction processing on the law enforcement site video file to obtain a law enforcement site process time stamp and law enforcement site process spatial position information; performing multi-mode space-time synchronization association analysis on the law enforcement site audio behavior information data and the law enforcement site image behavior information data based on the law enforcement site process time stamp and the law enforcement site process space position information to obtain multi-mode space-time synchronization law enforcement behavior association event data;
Step 3, performing law enforcement behavior sequence connection processing on the multi-mode space-time synchronous law enforcement behavior associated event data to generate a law enforcement site event behavior sequence connection diagram;
Step S4, acquiring a preset law enforcement event behavior knowledge graph, performing behavior normalization score calculation on law enforcement site event behavior intention information data based on the law enforcement event behavior knowledge graph to obtain law enforcement site behavior intention normalization score values, and performing non-normalized behavior intelligent alarm optimization analysis on corresponding law enforcement site events in the law enforcement site event behavior intention information data based on the law enforcement site behavior intention normalization score values to generate a law enforcement site event non-normalized behavior optimization suggestion report.
Further, step S1 includes the steps of:
s11, acquiring a law enforcement site video file from a law enforcement supervision recorder platform;
Step S12, performing audio and video synchronization processing on the law enforcement site video file to obtain law enforcement site audio and video synchronization video;
S13, video content extraction processing is carried out on the law enforcement site audio-visual synchronous video to obtain law enforcement site video content information data; based on the law enforcement site video content information data, performing recognition analysis on the start time and the end time of different law enforcement events on the law enforcement site audio-visual synchronous video to obtain the start time and the end time of different law enforcement events in the law enforcement site video;
S14, performing audio and video segmentation processing on the law enforcement site audio and video synchronous video according to the starting time and the ending time of different law enforcement events in the law enforcement site video to obtain a law enforcement site audio clip and a law enforcement site image video frame clip;
and step S15, performing law enforcement behavior information identification analysis on the law enforcement site audio frequency fragment and the law enforcement site image video frame fragment to obtain law enforcement site audio frequency behavior information data and law enforcement site image behavior information data.
Further, step S15 includes the steps of:
step S151, performing law enforcement audio voiceprint feature analysis on the law enforcement site audio clip by utilizing Speechbrain technology to obtain law enforcement site audio voiceprint feature data;
step S152, carrying out audio voice signal channel deconstructing processing on different speakers in the law enforcement site audio clips based on the law enforcement site audio voiceprint feature data to obtain audio voice signal sub-channels of different speakers in the law enforcement site audio;
Step 153, performing law enforcement action event triggering identification analysis on audio voice signal sub-channels of different speakers in the law enforcement site audio to obtain law enforcement action event triggering points in the law enforcement site;
Step S154, performing law enforcement audio behavior information identification analysis on the law enforcement on-site audio frequency fragments based on the trigger points of the law enforcement on-site audio frequency behavior events to obtain the data of the law enforcement on-site audio frequency behavior information;
Step S155, performing law enforcement image behavior information identification analysis on the law enforcement site image video frame segments by using an object detection algorithm Yolov and a tracking technology Bytetrack to obtain law enforcement site image behavior information data.
Further, step S155 includes the steps of:
Performing law enforcement on-site object target recognition analysis on the law enforcement on-site image video frame fragments by using an object detection algorithm Yolov to obtain a law enforcement on-site image frame object detection target;
Performing object boundary region marking processing on the object detection target of the law enforcement site image frame to obtain a law enforcement site object boundary region marking target;
Tracking, identifying and analyzing the motion trail of the law enforcement site object boundary area marking target in the law enforcement site image video frame segment by utilizing a tracking technology Bytetrack so as to generate a law enforcement site object marking target motion trail;
Performing object target behavior pattern analysis on the motion track of the object mark target of the law enforcement site to obtain law enforcement behavior patterns of different object targets of the law enforcement site;
And performing law enforcement image behavior information identification analysis on the video frame segments of the law enforcement site images based on the law enforcement behavior patterns of different law enforcement site object targets to obtain law enforcement site image behavior information data.
Further, step S2 includes the steps of:
S21, performing timestamp extraction processing on the law enforcement site video file to obtain a law enforcement site process timestamp;
s22, converting a three-dimensional space coordinate system of each frame of video image in the law enforcement site video file to obtain a three-dimensional space coordinate system of each frame of image in the law enforcement site video file;
S23, extracting and processing the three-dimensional space coordinate system of each frame of image in the law enforcement site video to obtain the law enforcement site space coordinate of each frame of image in the law enforcement site video;
S24, performing spatial position marking processing on the law enforcement site spatial coordinates of each frame of image in the law enforcement site video to obtain spatial position information of a law enforcement site process;
And S25, carrying out multi-mode space-time synchronization correlation analysis on the law enforcement site audio behavior information data and the law enforcement site image behavior information data based on the law enforcement site process time stamp and the law enforcement site process space position information to obtain multi-mode space-time synchronization law enforcement behavior correlation event data.
Further, step S25 includes the steps of:
S251, performing space-time synchronization reference construction on the law enforcement site process time stamp and the space position information of the law enforcement site process to obtain a space-time data synchronization reference frame of the law enforcement site process;
Step S252, audio event time sequence synchronous arrangement processing is carried out on the law enforcement site audio behavior information data based on a law enforcement site process space-time data synchronous reference frame, so as to obtain law enforcement site audio event time sequence synchronous sequence data;
Step 253, performing image behavior space mapping processing on the law enforcement site image behavior information data based on a law enforcement site process space-time data synchronization reference frame to obtain law enforcement site image behavior space position mapping data;
Step S254, performing multi-modal event behavior interaction association analysis on the law enforcement site audio event time sequence synchronous sequence data and the law enforcement site image behavior space position mapping data to obtain multi-modal event behavior interaction association relation between the law enforcement site audio event and the image behavior;
And S255, performing law enforcement on-site audio behavior information data and law enforcement on-site image behavior information data on the basis of the multi-mode event behavior interaction association relationship between the law enforcement on-site audio events and the image behaviors to extract and process the law enforcement on-site audio behavior information data and the law enforcement on-site image behavior information data to obtain multi-mode time-space synchronous law enforcement on-site event data.
Further, step S3 includes the steps of:
step S31, extracting and processing the related event data of the multi-mode space-time synchronous law enforcement behaviors by law enforcement personnel and the behavior sequence of the principal to obtain the behavior sequence of the law enforcement personnel of the law enforcement behaviors and the behavior sequence of the principal of the law enforcement behaviors;
Step S32, performing behavior description matching analysis on behavior description nodes in a behavior sequence of a law enforcement behavior event law enforcement personnel and behavior description nodes in a behavior sequence of a law enforcement behavior event party to obtain a law enforcement event matching relationship between the behavior description nodes of the law enforcement personnel and the behavior description nodes of the party;
Step S33, performing law enforcement action sequence connection processing between the action description node in the law enforcement action event law enforcement action sequence and the action description node in the law enforcement action event principal action sequence based on the law enforcement event matching relationship between the law enforcement action description node and the principal action description node so as to generate a law enforcement site event action sequence connection diagram;
and step S34, carrying out event behavior intention recognition analysis on the law enforcement site event behavior sequence connection diagram to obtain law enforcement site event behavior intention information data.
Further, step S4 includes the steps of:
s41, acquiring a preset law enforcement event behavior knowledge-graph;
Step S42, performing behavior normalization rule retrieval and extraction processing on the law enforcement event behavior knowledge graph to obtain law enforcement event behavior normalization rule standards;
s43, performing behavior intention semantic feature analysis on the law enforcement site event behavior intention information data to obtain law enforcement site event behavior intention semantic feature data;
Step S44, performing normalization rule adaptation and behavior normalization score calculation on law enforcement on-site event behavior intention semantic feature data based on law enforcement event behavior normalization rule standards to obtain law enforcement on-site behavior intention normalization score values;
And step S45, performing nonstandard behavior intelligent alarm optimization analysis on the corresponding law enforcement on-site event in the law enforcement on-site event behavior intention information data based on the law enforcement on-site behavior intention normative score value so as to generate a law enforcement on-site event nonstandard behavior optimization suggestion report.
Further, step S45 includes the steps of:
Step S451, comparing and judging the rule execution site behavior intention normative score value according to a preset normative score threshold, and judging the corresponding rule execution site event in the rule execution site event intention information data as a normative behavior event when the rule execution site behavior intention normative score value is larger than or equal to the preset normative score threshold;
Step S452, performing non-normative behavior intelligent alarm processing on the event judged to be the non-normative behavior according to a preset law enforcement behavior non-normative alarm rule so as to generate alarm information of the monitored non-normative behavior event;
step S453, law enforcement behavior problem identification analysis is carried out on the alarm information of the non-normative behavior event of the supervision, and a report of the non-normative law enforcement behavior problem of the supervision is obtained;
step S454, performing law enforcement behavior optimization suggestion processing on the supervision irregular law enforcement behavior problem report so as to generate a law enforcement site event irregular behavior optimization suggestion report.
Furthermore, the invention also provides a multi-mode intelligent analysis system based on law enforcement supervision, which is used for executing the multi-mode intelligent analysis method based on law enforcement supervision, and comprises the following steps:
The system comprises a law enforcement on-site audio/video behavior recognition analysis module, a law enforcement on-site video recognition analysis module and a law enforcement on-site video recognition analysis module, wherein the law enforcement on-site audio/video behavior recognition analysis module is used for acquiring a law enforcement on-site video file from a law enforcement supervision recorder platform, and performing audio/video segmentation processing on the law enforcement on-site video file to obtain a law enforcement on-site audio clip and a law enforcement on-site image video frame clip;
The system comprises a law enforcement behavior multi-mode space-time correlation analysis module, a multi-mode space-time synchronization correlation analysis module and a multi-mode space-time synchronization correlation analysis module, wherein the law enforcement behavior multi-mode space-time correlation analysis module is used for extracting and processing a law enforcement site video file to obtain a law enforcement site process time stamp and law enforcement site process space position information;
The system comprises a law enforcement event behavior intention recognition analysis module, a law enforcement site event behavior sequence connection graph, a law enforcement event behavior intention recognition analysis module and a data processing module, wherein the law enforcement event behavior intention recognition analysis module is used for performing law enforcement behavior sequence connection processing on multi-mode space-time synchronous law enforcement behavior associated event data to generate the law enforcement site event behavior sequence connection graph;
The intelligent warning module is used for acquiring a preset law enforcement event behavior knowledge graph, calculating behavior normalization scores of the law enforcement site event behavior intention information data based on the law enforcement event behavior knowledge graph to obtain law enforcement site behavior intention normalization score values, and carrying out intelligent warning optimization analysis on corresponding law enforcement site events in the law enforcement site event behavior intention information data based on the law enforcement site behavior intention normalization score values to generate a law enforcement site event nonstandard behavior optimization suggestion report.
The invention has the beneficial effects that:
1. compared with the prior art, the multi-mode intelligent analysis method based on law enforcement supervision has the advantages that the video files of the law enforcement supervision recorder platform are obtained, so that comprehensive records and examination of law enforcement activities can be ensured, real and comprehensive visual evidence is provided for the law enforcement processes, each detail of the law enforcement processes can be recorded by obtaining the video files, no matter the behaviors of law enforcement personnel or the reactions of law enforcement objects can be completely captured, the detailed records are not only beneficial to follow-up examination and investigation, and the credibility and legitimacy of the law enforcement activities are enhanced, so that potential problems in the law enforcement processes can be identified, and the law enforcement operations are ensured to accord with standards and standards. The video data can be decomposed into smaller audio fragments and image video frame fragments by performing audio and video segmentation processing on the law enforcement site video file, the whole video content can be divided into independent meaningful parts by the segmentation processing, and each part corresponds to a specific law enforcement event, so that targeted analysis and examination can be more conveniently performed. By processing the audio and image data separately, the specifics of each event can be more clearly understood, for example, the audio clips can be used to analyze dialog content and speech emotion, while the video frame clips can be used to observe visual details and behavior, which improves the accuracy and efficiency of the data analysis so that each law enforcement event can be independently reviewed and evaluated to more fully understand the law enforcement process. Meanwhile, specific law enforcement behavior information data can be extracted by respectively carrying out law enforcement behavior information identification analysis on a law enforcement site audio clip and a law enforcement site image video frame clip, wherein the analysis on the audio clip can identify language, sound characteristics and dialogue contents in the audio clip so as to help understand communication and behavior motivation in a law enforcement process, and the analysis on the image video frame clip can identify visual behavior characteristics such as law enforcement actions, site environment and personnel positions. Through the analysis of comprehensive audio and image data, the behavior and the law enforcement effect of law enforcement personnel can be more accurately understood, the transparency and the compliance of the law enforcement process are ensured, the multi-angle behavior information identification not only improves the examination capability of the law enforcement process, but also strengthens the control and the optimization of the law enforcement quality. And secondly, the time nodes of law enforcement events in the video can be accurately recorded and tracked by performing time stamp extraction processing on the law enforcement site video file, wherein the time stamp provides accurate time recording for the events of the law enforcement process, so that subsequent event analysis can be performed according to a time sequence, the occurrence sequence of the events can be restored, synchronization of law enforcement actions in the video with other data sources (such as audio and sensor data) can be facilitated, the time consistency of different mode data is analyzed in a comparison manner, and accurate alignment among the multi-mode data is ensured, so that basic data guarantee is provided for transparency and reliability of the subsequent processing process. The method has the advantages that the method can be used for providing visual information about law enforcement sites by carrying out marking processing on the space coordinate positions of each frame of video image in the video file of the law enforcement sites, adding detailed marks and comments to the space coordinates in each frame of image, and clearly knowing the positions of various objects and events in the images and the specific meanings of the objects in the sites by the space position marks, so that important positions such as crime sites, evidence positions or the positions of related people can be identified, the event analysis and investigation become more systematic and efficient, the space position marks also support the generation of visual space layout diagrams, and the visual display of the space structures and the object distribution of the sites is facilitated. And the multi-mode space-time synchronization association analysis is carried out on the law enforcement site audio behavior information data and the law enforcement site image behavior information data based on the law enforcement site process time stamp and the law enforcement site process space position information, so that the time stamp and the space position information are combined, various data sources such as audio and images are subjected to space-time synchronization, a comprehensive law enforcement event view can be provided, visual information in a video image and sound information in the audio can be effectively aligned, and an analyst can comprehensively consider different data sources in time and space, so that the occurrence and development of an event can be more accurately understood. Through time-space synchronization, the correlation between different modal data can be realized, key behaviors and interactions in the event are identified, deep analysis and understanding of complex law enforcement events are supported, comprehensive utilization of multi-modal data can promote comprehensiveness and accuracy of law enforcement behavior information, analysis of law enforcement behavior events from different angles is facilitated, richer basis is provided for subsequent processing procedures, coverage and timeliness of analysis of law enforcement behavior events can be improved, and accordingly all law enforcement site behavior conditions can be timely reflected and comprehensively covered. Then, the event behavior sequence connection diagram is generated by performing law enforcement behavior sequence connection processing on the multi-mode space-time synchronous law enforcement behavior associated event data, and the main advantage of the process is that the matching relation in the behavior sequence is presented in a visual mode, so that the dynamic process of the event can be better understood, and the behavior sequence connection diagram can intuitively display the time sequence, interaction relation and key nodes of behaviors of all parties in the event, so that the whole process of the event becomes clearer. Through the connection diagram, the relevance and causal relation of behaviors can be identified, key event points and important behavior chains in the event are found, and the graphical display is not only beneficial to analysts to quickly master the overall view of the event, so that visual data support is provided for the subsequent processing process. The analysis of the event behavior intention recognition is also carried out on the law enforcement site event behavior sequence connection diagram, so that the potential intention and purpose of each behavior in the law enforcement site event behavior sequence connection diagram are read, and the main advantage of the analysis is that the motivation and the intention behind the behavior can be deeply mined and understood, so that a deep view of the cause and the background of the event is provided. Through the intention recognition of the behavior sequence, the real purposes of law enforcement personnel and principals in the event can be revealed, and whether the behaviors accord with the established regulations and legal requirements can be judged. For example, whether the intention of the law enforcement officer is to legally perform a task or whether excessive behavior exists, and meanwhile, whether the principal has defenses, evasions or other intentions can be analyzed, so that the analysis not only improves the understanding depth of the event, but also helps to evaluate compliance and legality in the law enforcement process, and data support is provided for improving law enforcement strategies and training. Finally, by acquiring a preset law enforcement event behavior knowledge graph, the process involves integrating and establishing a systematic knowledge base containing various law enforcement events and behaviors thereof, wherein the knowledge base comprises information such as historical law enforcement behavior case data, behavior standardization rules, law enforcement programs and the like, and by constructing the knowledge graph, deep analysis of corresponding law enforcement events can be realized, and an intuitive reference model is provided for law enforcement personnel, so that the law enforcement event knowledge graph is beneficial to identifying and understanding standardization and trend of various law enforcement behaviors, and basis can be provided for formulating more scientific law enforcement standards and optimizing law enforcement strategies. The standardization assessment of the law enforcement behaviors can be realized by carrying out behavior standardization grading calculation on the law enforcement event behavior intention information data based on the law enforcement event behavior knowledge graph, the standardization degree of the behaviors can be objectively assessed by matching semantic features of the behavior intention with standardization rule standards in the knowledge graph, and corresponding standardization grading values are calculated, so that the grading can help to quickly identify behaviors which do not accord with the standardization, and clear improvement directions are provided for law enforcement personnel. In addition, through carrying out the intelligent alarm optimization analysis of the nonstandard behaviors on the corresponding law enforcement site events in the law enforcement site event behavior intention information data based on the law enforcement site behavior intention standardization grading value, behaviors which are not in accordance with the standardization can be automatically detected and identified, and a specific optimization suggestion report is generated, the process can remind law enforcement personnel to pay attention to existing problems in real time, improved specific measures and suggestions can be provided, the optimization analysis report can help law enforcement personnel to adjust and improve the law enforcement behaviors, and the law enforcement personnel can be ensured to meet the established standardization standard, so that the standardization and efficiency of the law enforcement process can be remarkably improved, and errors and the occurrence of the nonstandard law enforcement behaviors can be reduced.
2. The multi-mode intelligent analysis system based on law enforcement supervision provided by the invention is integrally composed of the law enforcement audio/video behavior recognition analysis module, the law enforcement behavior multi-mode time-space association analysis module, the law enforcement event behavior intention recognition analysis module and the law enforcement event non-standard behavior intelligent alarm module, can realize any multi-mode intelligent analysis method based on law enforcement supervision, is used for realizing the multi-mode intelligent analysis method based on law enforcement supervision by combining the operation among computer programs running on each module, and has mutually coordinated internal structures, so that repeated work and manpower investment can be greatly reduced, and a more accurate and efficient multi-mode intelligent analysis process based on law enforcement supervision can be provided quickly and effectively, thereby simplifying the operation flow of the multi-mode intelligent analysis system based on law enforcement supervision.
Detailed Description
The following is a clear and complete description of the technical method of the present invention, taken in conjunction with the accompanying drawings, and it is evident that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
Furthermore, the drawings are merely schematic illustrations of the present invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. The functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor methods and/or microcontroller methods.
It will be understood that, although the terms "first," "second," etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
To achieve the above objective, referring to fig. 1 to 3, the present invention provides a multi-modal intelligent analysis method for law enforcement supervision, which comprises the following steps:
step S1, acquiring a law enforcement site video file from a law enforcement supervision recorder platform, and performing audio and video segmentation processing on the law enforcement site video file to obtain a law enforcement site audio clip and a law enforcement site image video frame clip;
step S2, performing time stamp and spatial position extraction processing on the law enforcement site video file to obtain a law enforcement site process time stamp and law enforcement site process spatial position information; performing multi-mode space-time synchronization association analysis on the law enforcement site audio behavior information data and the law enforcement site image behavior information data based on the law enforcement site process time stamp and the law enforcement site process space position information to obtain multi-mode space-time synchronization law enforcement behavior association event data;
Step 3, performing law enforcement behavior sequence connection processing on the multi-mode space-time synchronous law enforcement behavior associated event data to generate a law enforcement site event behavior sequence connection diagram;
Step S4, acquiring a preset law enforcement event behavior knowledge graph, performing behavior normalization score calculation on law enforcement site event behavior intention information data based on the law enforcement event behavior knowledge graph to obtain law enforcement site behavior intention normalization score values, and performing non-normalized behavior intelligent alarm optimization analysis on corresponding law enforcement site events in the law enforcement site event behavior intention information data based on the law enforcement site behavior intention normalization score values to generate a law enforcement site event non-normalized behavior optimization suggestion report.
In the embodiment of the present invention, please refer to fig. 1, which is a schematic flow chart of steps of a multi-mode intelligent analysis method based on law enforcement supervision, in this example, the multi-mode intelligent analysis method based on law enforcement supervision includes the following steps:
step S1, acquiring a law enforcement site video file from a law enforcement supervision recorder platform, and performing audio and video segmentation processing on the law enforcement site video file to obtain a law enforcement site audio clip and a law enforcement site image video frame clip;
In the embodiment of the invention, the platform generally has the function of storing and managing law enforcement video original records by acquiring the corresponding law enforcement video original records from the law enforcement supervision recorder platform, logs in a file management system of the platform after user authentication and authority verification, selects a 'video record' column on a system interface, browses and screens the required law enforcement video original records therein, and can find out specific video records according to dates, times, case numbers or other keywords by utilizing the retrieval function provided by the platform, so as to acquire and obtain the law enforcement site video file. And by determining the starting time and the ending time of each law enforcement event audio and video in the previously acquired law enforcement on-site video file, and further by performing audio and video segmentation processing on the law enforcement events corresponding to the law enforcement on-site video file according to the starting time and the ending time obtained by previous identification, the video file is imported and the image video track is segmented according to the starting time and the ending time by using a video editing tool (such as FFmpeg or Adobe Premiere Pro), so as to generate a plurality of image video segments, each segment corresponds to one law enforcement event, and meanwhile, by cutting the audio track for the same time period, each video segment is ensured to have a corresponding audio segment, so that the on-site audio segment and the law enforcement on-site image video frame segment are obtained. Then, recognition analysis of behavior information is performed on the law enforcement site audio clips obtained by segmentation through a voice recognition technology (such as Speechbrain technology) so as to convert the audio content into characters, analyze keywords and contexts in language, and extract corresponding law enforcement behavior information from the keywords and contexts, so that law enforcement site audio behavior information data are obtained. Meanwhile, each frame of image within the video frame segment of the law enforcement live image is analyzed by using an image recognition algorithm (e.g., object detection algorithm Yolov and tracking technique Bytetrack) to identify actions, objects and scene changes of law enforcement personnel and parties therefrom, and the analysis results are collated into structured data including information of behavior descriptions, participators, key actions, etc. of each event, which provide detailed law enforcement behavior records, finally obtaining law enforcement live image behavior information data.
Step S2, performing time stamp and spatial position extraction processing on the law enforcement site video file to obtain a law enforcement site process time stamp and law enforcement site process spatial position information; performing multi-mode space-time synchronization association analysis on the law enforcement site audio behavior information data and the law enforcement site image behavior information data based on the law enforcement site process time stamp and the law enforcement site process space position information to obtain multi-mode space-time synchronization law enforcement behavior association event data;
In the embodiment of the invention, a video parsing tool, such as OpenCV, is used for reading the law enforcement field video file frame by frame, and the time information of each frame is acquired by calling the get method of the video stream, so that the time stamp of each frame is recorded accurately, and the law enforcement field process time stamp is obtained. Meanwhile, the camera calibration technology is used for carrying out three-dimensional space coordinate system conversion on each frame of video image in the law enforcement site video file obtained by previous conversion, internal reference and external reference calibration of a camera are needed, the relation between the image coordinates and the actual world coordinates can be accurately mapped, a corresponding three-dimensional coordinate system is generated for each frame of image, and the space position corresponding to the identification object in the video is extracted from the three-dimensional space coordinate system obtained by previous conversion, so that the space position information of the law enforcement site process is obtained. Then, the time stamp of the audio event is matched with the time stamp in the synchronous frame, the time stamp of the audio event is ensured to be consistent with the actual situation of the scene, the image behavior feature in the image behavior information data of the scene is matched with the space position information by combining the time stamp of the previously constructed time data synchronous reference frame, the space information in the image data is ensured to be consistent with the position in the actual scene, the time of each of the law enforcement events is synchronously corresponding to the space position information, a time data synchronous reference frame is integrated and constructed, the time data of the previously determined audio event is synchronously analyzed by comprehensively using the data analysis and the machine learning tool, the audio event is fused with the image behavior data, the time stamp of the audio event is matched with the time stamp in the synchronous frame, the time of the audio event is ensured to be consistent with the actual situation of the scene, the image behavior feature in the image behavior information of the scene is matched with the space position information by combining the time data synchronous reference frame of the previously constructed time data, the image behavior feature in the image behavior information data of the scene is ensured to be consistent with the position in the actual scene, the audio event is interacted with the image behavior data by comprehensively using the data analysis and the machine learning tool, the interactive association between the audio event time data and the image behavior data is synchronously analyzed by combining the time data of the image behavior data and the image behavior space position data, the audio event is interacted with the image behavior data, the image behavior data is correspondingly obtained by the image behavior data is interacted with the image behavior data, and processing by using a machine learning algorithm (such as a classifier or a clustering algorithm) to identify a correlation event between the audio event and the image behavior, for example, a law enforcement correlation event between a specific audio event (such as an alarm sound) and a corresponding image behavior (such as personnel movement), so as to finally obtain multi-mode time-space synchronous law enforcement behavior correlation event data, wherein the multi-mode time-space synchronous law enforcement behavior correlation event data comprises law enforcement personnel and law enforcement behavior conditions of a party under the current law enforcement event.
Step 3, performing law enforcement behavior sequence connection processing on the multi-mode space-time synchronous law enforcement behavior associated event data to generate a law enforcement site event behavior sequence connection diagram;
In an embodiment of the invention, all law enforcement personnel and principal action records are extracted from the multimodal spatiotemporal synchronized law enforcement action correlation event data obtained from the previous analysis, the records are from monitoring cameras, sensors, voice records and other data collection devices, wherein actions such as law enforcement personnel proximity, interrogation, inspection, punishment and principal parking, interpretation, coordination, countermeasures and the like are included, and the action records are separated into a law enforcement personnel action sequence and a principal action sequence by using data labels or event identifiers, and the action description nodes in the law enforcement personnel action sequence obtained by the previous analysis and the action description nodes in the principal action sequence are compared, each pair of action description nodes calculates their similarity scores through a matching algorithm, determines their matching relationship, for example, when the similarity scores exceed the threshold, considers that the two nodes match successfully, records the result of the matching analysis into a matching relationship table, thereby determining the matching relationship between each pair of matched personnel and principal action description nodes, and simultaneously, by combining the action description nodes obtained by the previous analysis, forming a corresponding graph as the matching relationship between the law enforcement action nodes in the law enforcement action graph, and then connecting the law enforcement action graph by constructing the corresponding graph as the corresponding nodes in the law enforcement action graph, by constructing a behavior intent model based on known behavior patterns and intent templates, the model can be trained using machine learning techniques (e.g., classification algorithms or sequence labeling models) to identify different behavior intents, for example, by training a Support Vector Machine (SVM) classifier to predict the intent type of each behavior sequence, and by using previously constructed behavior intent models to identify each behavior node and behavior sequence in a law enforcement site event behavior sequence connection graph, to apply the behavior intent model to nodes in the graph, by calculating the intent probability of each behavior sequence, determining its corresponding behavior intent, for example, by analyzing whether the behavior sequence meets a specific law enforcement intent (e.g., warning, discouraging, punishing, evading, etc.), law enforcement site event behavior intent information data is ultimately obtained.
Step S4, acquiring a preset law enforcement event behavior knowledge graph, performing behavior normalization score calculation on law enforcement site event behavior intention information data based on the law enforcement event behavior knowledge graph to obtain law enforcement site behavior intention normalization score values, and performing non-normalized behavior intelligent alarm optimization analysis on corresponding law enforcement site events in the law enforcement site event behavior intention information data based on the law enforcement site behavior intention normalization score values to generate a law enforcement site event non-normalized behavior optimization suggestion report.
In the embodiment of the invention, the knowledge patterns of the behaviors of the corresponding law enforcement events are extracted from the preset knowledge base, the patterns comprise the behavior patterns of the various law enforcement events and related normative rules, the law enforcement behavior documents, normative terms and case data of different sources are converted into structural knowledge by using a natural language processing technology, the data sources can be legal databases, law enforcement records and expert domain knowledge, and the behavior knowledge patterns are integrated and constructed through a visual interface of a pattern construction platform, so that the law enforcement event behavior knowledge patterns are obtained. Secondly, a rule extraction method is applied to process a law enforcement event behavior knowledge graph obtained through previous analysis, firstly, a rule engine is used for searching, the rule engine extracts rules related to behavior rules through analyzing the attributes of nodes and edges in the graph, a rule list containing detailed behavior rule standards is generated as a result, the list can comprise behavior rules, processing programs and corresponding normative bases of law enforcement personnel, statistical analysis of behavior intention semantic features is conducted on law enforcement site event behavior intention information data obtained through previous analysis through natural language processing and a machine learning model, word segmentation, part of speech analysis and syntactic analysis are conducted on descriptive texts of site events, feature data of behavior intention, such as the purpose of law enforcement intention, semantic features of motivation and situation are extracted through semantic analysis tools, corresponding site event intention semantic feature data are adapted through combination of the behavior normative rule standards obtained through previous search, similarity or euclidean distance calculation action feature vector and normative rule similarity is calculated through a matching algorithm, and site intention score similarity is obtained through rule score similarity, and rule score is obtained according to site intention score. Then, analyzing the corresponding law enforcement site event in the law enforcement site event behavior intention information data by using the law enforcement site behavior intention normative scoring value obtained by the previous quantitative calculation, marking the behavior with the scoring value lower than the threshold value as an irregular behavior by setting the threshold value, further analyzing the irregular behavior by applying an intelligent warning method, identifying potential problem areas by using an anomaly detection algorithm, and carrying out optimization analysis on the problem areas, wherein the report content comprises the identified irregular behavior type, the existing normative risk and the improvement suggestion, and the report shows the analysis result and the suggestion by using a data visualization tool to help law enforcement personnel adjust the law enforcement behavior of the law enforcement site event, ensure standardization of the law enforcement process and finally generate a law enforcement site event irregular behavior optimization suggestion report.
Further, step S1 includes the steps of:
s11, acquiring a law enforcement site video file from a law enforcement supervision recorder platform;
Step S12, performing audio and video synchronization processing on the law enforcement site video file to obtain law enforcement site audio and video synchronization video;
S13, video content extraction processing is carried out on the law enforcement site audio-visual synchronous video to obtain law enforcement site video content information data; based on the law enforcement site video content information data, performing recognition analysis on the start time and the end time of different law enforcement events on the law enforcement site audio-visual synchronous video to obtain the start time and the end time of different law enforcement events in the law enforcement site video;
S14, performing audio and video segmentation processing on the law enforcement site audio and video synchronous video according to the starting time and the ending time of different law enforcement events in the law enforcement site video to obtain a law enforcement site audio clip and a law enforcement site image video frame clip;
and step S15, performing law enforcement behavior information identification analysis on the law enforcement site audio frequency fragment and the law enforcement site image video frame fragment to obtain law enforcement site audio frequency behavior information data and law enforcement site image behavior information data.
As an embodiment of the present invention, referring to fig. 2, a detailed step flow chart of step S1 in fig. 1 is shown, in which step S1 includes the following steps:
s11, acquiring a law enforcement site video file from a law enforcement supervision recorder platform;
In the embodiment of the invention, the platform usually has the function of storing and managing law enforcement records by acquiring corresponding law enforcement video originals from a law enforcement supervision recorder platform, logging into a file management system of the platform after user authentication and authority verification, selecting a 'video record' column on a system interface, browsing and screening the required law enforcement video originals in the column, utilizing a retrieval function provided by the platform, finding a specific video record according to date, time, case number or other keywords, selecting a target video, clicking a 'download' button, storing the video file into a local storage device, and ensuring that the downloaded file is complete and undamaged by adopting standard video formats such as MP4 or AVI, and finally acquiring the law enforcement field video file.
Step S12, performing audio and video synchronization processing on the law enforcement site video file to obtain law enforcement site audio and video synchronization video;
In the embodiment of the invention, the audio and video in the previously acquired law enforcement on-site video file are synchronously processed to introduce the video file downloaded from the law enforcement supervision recorder platform into video processing software, a tool supporting audio and video synchronization, such as Adobe Premiere Pro or FFmpeg, is selected, video and audio tracks are separated in the software to check whether time deviation exists, if the time deviation exists, synchronization is realized by adjusting the starting time of the audio tracks, the specific operation comprises using a time axis tool of the software to align the audio tracks with the video tracks, ensuring that the audio and video are consistent when playing, merging the adjusted audio and video into a file after the synchronization is completed, and storing the file as a new video file format, and finally obtaining the law enforcement on-site audio and video synchronous video.
S13, video content extraction processing is carried out on the law enforcement site audio-visual synchronous video to obtain law enforcement site video content information data; based on the law enforcement site video content information data, performing recognition analysis on the start time and the end time of different law enforcement events on the law enforcement site audio-visual synchronous video to obtain the start time and the end time of different law enforcement events in the law enforcement site video;
In the embodiment of the invention, the content extraction processing is performed on the law enforcement site audio-video synchronous video obtained after the previous audio-video synchronization, so that the specific content information in the law enforcement site video is extracted by analyzing the video frame by using video analysis software such as OpenCV or a custom video analysis tool, and the law enforcement site video content information data is obtained. Meanwhile, the recognition and classification processing of different law enforcement events are carried out on the law enforcement site audio-visual synchronous video by combining the prior analysis of the obtained law enforcement site video content data, so as to obtain the law enforcement events contained in the law enforcement site video, the time stamp of each law enforcement event is marked and recorded according to the visual characteristics of the different law enforcement events and the voice content in the audio, the starting time and the ending time of each law enforcement event are determined, and finally the starting time and the ending time of the different law enforcement events in the law enforcement site video are obtained.
S14, performing audio and video segmentation processing on the law enforcement site audio and video synchronous video according to the starting time and the ending time of different law enforcement events in the law enforcement site video to obtain a law enforcement site audio clip and a law enforcement site image video frame clip;
In the embodiment of the invention, the audio and video segmentation processing is performed on the corresponding law enforcement events in the law enforcement site audio-visual synchronous video according to the start time and the end time of the different law enforcement events in the law enforcement site video obtained by the previous identification, so that a video file is imported and a video track is segmented according to the start time and the end time by using a video editing tool (such as FFmpeg or Adobe Premiere Pro), a plurality of video segments are generated, each segment corresponds to one law enforcement event, meanwhile, the corresponding audio segment is ensured to be in each video segment by cutting the audio track for the same time period, so that the audio segment and the video frame segment are generated, and each segment is independently stored as different files, and finally the law enforcement site audio segment and the law enforcement site image video frame segment are obtained.
And step S15, performing law enforcement behavior information identification analysis on the law enforcement site audio frequency fragment and the law enforcement site image video frame fragment to obtain law enforcement site audio frequency behavior information data and law enforcement site image behavior information data.
In the embodiment of the invention, the recognition analysis of the behavior information is carried out on the law enforcement site audio fragments obtained by the segmentation by using a voice recognition technology (such as Speechbrain technology) so as to convert the audio content into characters, analyze keywords and contexts in the language and extract the corresponding law enforcement site audio behavior information from the keywords and contexts, thereby obtaining the law enforcement site audio behavior information data. Meanwhile, each frame of image within the video frame segment of the law enforcement live image is analyzed by using an image recognition algorithm (e.g., object detection algorithm Yolov and tracking technique Bytetrack) to identify actions, objects and scene changes of law enforcement personnel and parties therefrom, and the analysis results are collated into structured data including information of behavior descriptions, participators, key actions, etc. of each event, which provide detailed law enforcement behavior records, finally obtaining law enforcement live image behavior information data.
Further, step S15 includes the steps of:
step S151, performing law enforcement audio voiceprint feature analysis on the law enforcement site audio clip by utilizing Speechbrain technology to obtain law enforcement site audio voiceprint feature data;
step S152, carrying out audio voice signal channel deconstructing processing on different speakers in the law enforcement site audio clips based on the law enforcement site audio voiceprint feature data to obtain audio voice signal sub-channels of different speakers in the law enforcement site audio;
Step 153, performing law enforcement action event triggering identification analysis on audio voice signal sub-channels of different speakers in the law enforcement site audio to obtain law enforcement action event triggering points in the law enforcement site;
Step S154, performing law enforcement audio behavior information identification analysis on the law enforcement on-site audio frequency fragments based on the trigger points of the law enforcement on-site audio frequency behavior events to obtain the data of the law enforcement on-site audio frequency behavior information;
Step S155, performing law enforcement image behavior information identification analysis on the law enforcement site image video frame segments by using an object detection algorithm Yolov and a tracking technology Bytetrack to obtain law enforcement site image behavior information data.
As an embodiment of the present invention, referring to fig. 3, a detailed step flow chart of step S15 in fig. 2 is shown, in which step S15 includes the following steps:
step S151, performing law enforcement audio voiceprint feature analysis on the law enforcement site audio clip by utilizing Speechbrain technology to obtain law enforcement site audio voiceprint feature data;
in the embodiment of the invention, the voice print characteristic analysis is performed on the law enforcement site voice print segments obtained by slicing by using Speechbrain technology, so that the law enforcement site voice print segments are input into a Speechbrain framework, the framework provides rich voice print processing functions, voice print characteristics in the voice print segments can be extracted by using a voice print recognition model pre-trained by the framework, the voice print characteristics are obtained by performing characteristic vectorization processing on the basis of the frequency spectrum characteristics in the voice signals and the specific modes of voice samples, in particular implementation, firstly, the voice signals are preprocessed, including steps of denoising, normalization and the like, so as to ensure the quality of input data, and characteristic vectors of each speaker are extracted by using the voice print recognition model, and represent voice print characteristics of the speakers, so that voice print characteristic data of the law enforcement site voice print are finally obtained.
Step S152, carrying out audio voice signal channel deconstructing processing on different speakers in the law enforcement site audio clips based on the law enforcement site audio voiceprint feature data to obtain audio voice signal sub-channels of different speakers in the law enforcement site audio;
In the embodiment of the invention, the deconstructing processing of the signal channels is performed on different speakers in the law enforcement site audio fragment by combining the law enforcement site audio voiceprint feature data obtained by the previous analysis, so that the voiceprint feature data is applied to an audio signal separation algorithm, such as a Blind Source Separation (BSS) technology based on voiceprint features, and the mixed audio signal can be deconstructed into a plurality of sub-channels by performing blind source separation on the audio signal, each sub-channel corresponds to an audio signal of an independent speaker, specifically, the algorithm extracts the voice components of each speaker in the audio signal according to the voiceprint features and distributes the components into different audio signal sub-channels, so as to finally obtain the audio signal sub-channels of different speakers in the law enforcement site audio.
Step 153, performing law enforcement action event triggering identification analysis on audio voice signal sub-channels of different speakers in the law enforcement site audio to obtain law enforcement action event triggering points in the law enforcement site;
In the embodiment of the invention, the recognition analysis of the triggering condition of the law enforcement action event is carried out on the audio voice signal sub-channels of different speakers in the law enforcement site audio obtained after the prior deconstructing, so that the audio signals in each sub-channel are processed by utilizing an audio event detection algorithm, key events in the audio signals such as passwords, calls and collision sounds can be recognized by the algorithm based on specific acoustic models, and when the method is concretely implemented, an event recognition model is firstly trained so as to recognize the sound characteristics of the specific action event, each audio voice signal sub-channel is input into the trained model, the model can output the triggering time point of each event and relevant information thereof, and the triggering points are recorded for analyzing the specific action event in the law enforcement site, so that the triggering point of the law enforcement site audio action event is finally obtained.
Step S154, performing law enforcement audio behavior information identification analysis on the law enforcement on-site audio frequency fragments based on the trigger points of the law enforcement on-site audio frequency behavior events to obtain the data of the law enforcement on-site audio frequency behavior information;
In the embodiment of the invention, the corresponding law enforcement site audio fragments are subjected to recognition analysis of behavior information by combining the previously recognized law enforcement site audio law enforcement behavior event trigger points so as to segment the audio fragments according to the trigger points, each piece of audio is analyzed around one event, each piece of audio is processed by using a behavior information recognition model, the model can recognize and classify audio characteristics related to specific law enforcement behaviors, such as sound intensity, tone change and the like, in specific implementation, firstly, each event fragment is subjected to characteristic extraction, including frequency characteristics, duration characteristics and the like of sound, and the characteristics are input into a classifier, and the classifier matches the characteristics with predefined law enforcement behavior categories to output specific behavior information data of each event, so that the law enforcement site audio behavior information data is finally obtained.
Step S155, performing law enforcement image behavior information identification analysis on the law enforcement site image video frame segments by using an object detection algorithm Yolov and a tracking technology Bytetrack to obtain law enforcement site image behavior information data.
In the embodiment of the invention, the recognition analysis of the intra-image law enforcement actions is performed on the video frame fragments of the law enforcement site image obtained by the previous slicing by using an object detection algorithm Yolov and a tracking technology Bytetrack, so that the object in the video frame is detected by using the Yolov algorithm, the Yolov8 is used as an advanced target detection algorithm, various objects in the image can be efficiently identified, corresponding bounding boxes are generated, in particular implementation, each frame of image is input into a Yolov model, the model outputs the detected object and the position information thereof, the objects are tracked by applying a Bytetrack tracking technology, the continuous tracking of the movement of each object in a plurality of frames is ensured, the Bytetrack combines the object detection and tracking technology, and the movement action information of each object target is generated by correlating the positions and the movement tracks of the objects in the continuous frames, wherein the action category, the time stamp, the positions and other relevant context information of the object are included, and finally the behavior information data of the site image is obtained.
Further, step S155 includes the steps of:
Performing law enforcement on-site object target recognition analysis on the law enforcement on-site image video frame fragments by using an object detection algorithm Yolov to obtain a law enforcement on-site image frame object detection target;
In the embodiment of the invention, the object target of each frame image in the video frame segment of the law enforcement site image is identified and analyzed by using an object detection algorithm YOLOv (You Only Look Once version) wherein YOLOv is a deep learning model, various objects in the image can be identified and classified through training, in the process, the algorithm utilizes a convolutional neural network to extract the characteristics of the image, and a bounding box (bounding box) is generated to mark each detected object in the image, and each bounding box contains the position coordinates of the object and the class label of the object, so that the law enforcement site image frame object detection target is finally obtained.
Preferably, object boundary area marking processing is carried out on the object detection target of the law enforcement site image frame to obtain a law enforcement site object boundary area marking target;
In the embodiment of the invention, the object detection target of the image frame of the law enforcement site obtained by the previous detection is further processed, mainly the marking of the boundary area of the object is carried out, the process involves converting the boundary frame coordinates output by YOLOv into the area marks in the actual image, in the specific implementation process, firstly, the boundary frame coordinates are mapped into the pixel coordinate system of the image, then, the boundary area of each object is definitely marked in the image by drawing the boundary frame and attaching the label, and the result of the marking of the boundary area is presented in the form of image superposition, so that each object has clear visual boundary in the image, and finally, the object boundary area marking target of the object of the law enforcement site is obtained.
Preferably, tracking technology Bytetrack is utilized to track, identify and analyze the motion trail of the law enforcement site object boundary area marking target in the law enforcement site image video frame segment so as to generate a law enforcement site object marking target motion trail;
In the embodiment of the invention, the tracking technology Bytetrack is used for carrying out the recognition analysis of tracking on the motion trail of the previously determined object boundary region marking target of the law enforcement site in the video frame segment of the image of the law enforcement site, wherein Bytetrack is an efficient multi-target tracking algorithm capable of processing the target tracking problem in a complex scene, taking the initial boundary frame of each object as the tracking target, the Bytetrack algorithm calculates the motion trail of the object between each frame by continuously analyzing the video frames, and updating the position of each target by combining the appearance characteristics and the motion pattern of the object, ensuring the accurate tracking of the motion trail of the object, and recording the motion trail in the form of continuous coordinate points and time stamps to finally generate the object marking target motion trail of the law enforcement site.
Preferably, object target behavior pattern analysis is carried out on the movement track of the object mark target of the law enforcement site to obtain law enforcement behavior patterns of different object targets of the law enforcement site;
In the embodiment of the invention, the object behavior patterns are analyzed on the movement tracks of the object marked targets of the law enforcement site generated by the previous tracking by using a machine learning model or a statistical method, so that the movement patterns of the object in the law enforcement site are extracted by counting and analyzing the movement track data of the object, and the movement patterns of the object in the law enforcement site are extracted by analyzing the movement speed, the direction change, the track continuity and the like of the object, such as the rule of a movement path, the stay time and the interaction patterns of the movement path and other objects, so that the law enforcement behavior patterns of the object targets of different law enforcement sites are obtained. Meanwhile, the law enforcement action modes of different law enforcement site object targets obtained through previous analysis are combined to conduct classification processing on the law enforcement action behaviors of corresponding law enforcement site object boundary area marking targets, so that a classification model is established to classify the action behaviors of the objects in video frames based on action mode analysis results, the classification process involves comparing the motion trail and the action modes of each target, the targets are distributed into predefined action categories, such as patrol, stillness or contact, and when the method is implemented, the characteristic extraction method is adopted to extract key action characteristics from trail data, and classification algorithms (such as a support vector machine or a neural network) are utilized to conduct processing, so that the action category of each object target is output, and finally the action of the law enforcement action of the different law enforcement site object targets is obtained.
Preferably, the law enforcement image behavior information recognition analysis is carried out on the video frame segments of the law enforcement site images based on the law enforcement behavior patterns and the law enforcement behavior actions of different law enforcement site object targets to obtain the law enforcement site image behavior information data.
In the embodiment of the invention, the whole law enforcement image behavior information identification analysis is carried out on the image video frame fragments of the law enforcement site by combining the law enforcement behavior patterns and the law enforcement behavior actions of different law enforcement site object targets obtained by the previous analysis, so that detailed behavior information data is generated by integrating the behavior classification information of the objects into video frame data and combining the behavior information of the objects with the image data by using an information fusion technology, in the specific implementation, the behavior classification result of each object is aligned with the video frame, the behavior patterns and the interactions of the objects in the whole scene are analyzed, and the generated behavior information data comprises the behavior types, time stamps, positions and other related contextual information of the objects, so that the law enforcement site image behavior information data is finally obtained.
Further, step S2 includes the steps of:
S21, performing timestamp extraction processing on the law enforcement site video file to obtain a law enforcement site process timestamp;
In the embodiment of the invention, a video analysis tool, such as OpenCV, is used to read the law enforcement field video file frame by frame, and the get method of the video stream is called to obtain the time information of each frame so as to accurately record the time stamp of each frame.
S22, converting a three-dimensional space coordinate system of each frame of video image in the law enforcement site video file to obtain a three-dimensional space coordinate system of each frame of image in the law enforcement site video file;
in the embodiment of the invention, the three-dimensional space coordinate system conversion is carried out on each frame of video image in the previously extracted law enforcement site video file by using the camera calibration technology, the internal reference and external reference calibration of a camera are needed, the relation between the image coordinates and the real world coordinates can be accurately mapped, the perspective conversion is applied by using the known camera parameters, the pixel coordinates of the image are converted into the coordinates in the three-dimensional space coordinate system, the conversion can be completed by using the functions in the computer vision library, such as cv2.Pro points, and the corresponding three-dimensional coordinate system is generated for each frame of image, and finally the three-dimensional space coordinate system of each frame of image in the law enforcement site video is obtained.
S23, extracting and processing the three-dimensional space coordinate system of each frame of image in the law enforcement site video to obtain the law enforcement site space coordinate of each frame of image in the law enforcement site video;
In the embodiment of the invention, the spatial position corresponding to the identified object in the video is extracted from the three-dimensional spatial coordinate system of each frame of image in the law enforcement site video obtained by previous conversion, so that the coordinates of key points related to the law enforcement behavior are extracted, the extracted coordinates are ensured to accurately reflect the real situation of the law enforcement site, and finally the space coordinates of the law enforcement site of each frame of image in the law enforcement site video are obtained.
S24, performing spatial position marking processing on the law enforcement site spatial coordinates of each frame of image in the law enforcement site video to obtain spatial position information of a law enforcement site process;
In the embodiment of the invention, the spatial position of the law enforcement site space coordinate of each frame of image in the law enforcement site video obtained by previous extraction is marked, so that the spatial coordinate of each frame of image is marked by defining different marking rules, for example, a graphic processing tool is adopted to convert the spatial coordinate into a visualized point cloud model, different law enforcement behaviors are color coded or shape marked, and finally the spatial position information of the law enforcement site process is marked.
And S25, carrying out multi-mode space-time synchronization correlation analysis on the law enforcement site audio behavior information data and the law enforcement site image behavior information data based on the law enforcement site process time stamp and the law enforcement site process space position information to obtain multi-mode space-time synchronization law enforcement behavior correlation event data.
In the embodiment of the invention, the time stamp of the law enforcement site process and the spatial position information of the law enforcement site process, which are obtained by previous extraction, are subjected to space-time synchronization processing by using a data fusion tool so as to align the time stamp with the spatial position information, so that the time and the spatial position of each law enforcement event can be synchronously corresponding, the time stamp and the spatial position information are integrated to form a space-time data synchronization reference frame, simultaneously, the time stamp of the audio event is matched with the time stamp in the synchronization frame by utilizing the previously constructed space-time data synchronization reference frame, the time stamp of the audio event is ensured to be consistent with the actual situation of the site, the image behavior characteristics in the image behavior information data of the site are matched with the spatial position information by combining the previously constructed space-time data synchronization reference frame, the spatial information in the image data is ensured to be consistent with the position in the actual situation, the time-sequence synchronization data of the previously determined audio event and the spatial position matching data are interactively related by comprehensively using a data analysis and a machine learning tool, the audio event is fused with the image behavior data, the audio event is included in the audio event is recognized as a Multi-modal interaction relation (Multi-modal relation, such as Multi-modal relation is obtained by extracting the audio behavior data, and Multi-modal relation is recognized by using a Multi-modal interaction relation, and the audio network is recognized, the related connection extraction processing of the related events of the law enforcement site audio behavior is carried out on the corresponding audio behavior information data of the law enforcement site and the corresponding image behavior information data of the law enforcement site by combining the interactive related relationship between the audio events and the image behaviors obtained through the previous analysis, so that the interactive information in the audio events and the image behavior data is combined through establishing a behavior related model, and the related events between the audio events and the image behaviors, such as the related events of the law enforcement between a specific audio event (such as alarm sound) and the corresponding image behaviors (such as the movement of people), are identified through processing by using a machine learning algorithm (such as a classifier or a clustering algorithm), and finally the related event data of the multi-mode time-space synchronous law enforcement behavior is obtained, wherein the related event data of the related events of the law enforcement behavior of law enforcement personnel and the principal under the current law enforcement event is included.
Further, step S25 includes the steps of:
S251, performing space-time synchronization reference construction on the law enforcement site process time stamp and the space position information of the law enforcement site process to obtain a space-time data synchronization reference frame of the law enforcement site process;
in the embodiment of the invention, the time stamp and the spatial position information of the law enforcement site process, which are obtained by the previous extraction, are subjected to space-time synchronization processing by using a data fusion tool so as to align the time stamp and the spatial position information, so that the time and the spatial position of each law enforcement event can synchronously correspond.
Step S252, audio event time sequence synchronous arrangement processing is carried out on the law enforcement site audio behavior information data based on a law enforcement site process space-time data synchronous reference frame, so as to obtain law enforcement site audio event time sequence synchronous sequence data;
In the embodiment of the invention, the audio event behavior data is extracted from the recording equipment by utilizing the previously constructed law enforcement site process space-time data synchronization reference frame to perform time sequence synchronization arrangement processing on the law enforcement site audio behavior information data, the audio event behavior data is converted into time sequence data through audio processing software (such as Audacity), the time stamps of the audio events are matched with the time stamps in the synchronization frame based on the law enforcement site process space-time data synchronization reference frame, the time sequence of the audio events is ensured to be consistent with the actual situation of the site, an audio event detection algorithm is applied, key events in the audio such as shouting, warning sounds or intervention sounds are identified and marked, and meanwhile, the audio events are arranged according to the time sequence, so that the time sequence synchronization sequence data of the site audio events is finally obtained.
Step 253, performing image behavior space mapping processing on the law enforcement site image behavior information data based on a law enforcement site process space-time data synchronization reference frame to obtain law enforcement site image behavior space position mapping data;
In the embodiment of the invention, by combining a previously constructed law enforcement site process space-time data synchronization reference frame, image behavior characteristics in law enforcement site image behavior information data and space position information are matched, so that the space information in the image data is ensured to be consistent with the position in an actual scene, the specific operation comprises conversion of an image coordinate system and a space position coordinate system and space mapping of the image characteristics, and a space mapping algorithm (such as a feature point matching algorithm) is used for aligning the behavior positions in the image with the space position data in the law enforcement site process space-time data synchronization reference frame, so that the law enforcement site image behavior space position mapping data is finally obtained.
Step S254, performing multi-modal event behavior interaction association analysis on the law enforcement site audio event time sequence synchronous sequence data and the law enforcement site image behavior space position mapping data to obtain multi-modal event behavior interaction association relation between the law enforcement site audio event and the image behavior;
In the embodiment of the invention, the data analysis and the machine learning tool are comprehensively used for carrying out interactive association analysis on the time sequence synchronous sequence data of the law enforcement site audio event and the spatial position mapping data of the image behavior of the law enforcement site, so as to fuse the audio event with the image behavior data, establish a Multi-mode dataset containing time and spatial information, apply a Multi-mode fusion algorithm such as a fusion neural network (Multi-modal Fusion Neural Networks), analyze the relationship between the audio event and the image behavior, identify and extract the correlation between the audio event and the image behavior, such as whether the sound and the image behavior occur in the same time period, generate a data report of the Multi-mode event behavior interaction association relationship, display the interaction association relationship between the audio event and the image behavior data, and finally obtain the Multi-mode event behavior interaction association relationship between the audio event and the image behavior of the law enforcement site.
And S255, performing law enforcement on-site audio behavior information data and law enforcement on-site image behavior information data on the basis of the multi-mode event behavior interaction association relationship between the law enforcement on-site audio events and the image behaviors to extract and process the law enforcement on-site audio behavior information data and the law enforcement on-site image behavior information data to obtain multi-mode time-space synchronous law enforcement on-site event data.
In the embodiment of the invention, the corresponding law enforcement site audio behavior information data and the corresponding law enforcement site image behavior information data are subjected to the associated connection extraction processing of the law enforcement behavior associated events by combining the multi-modal event behavior interaction association relationship between the law enforcement site audio events and the image behaviors obtained through previous analysis, so that the interaction information in the audio events and the image behavior data is combined through establishing a behavior association model, and the associated event conditions between the audio events and the image behaviors, such as the law enforcement associated events between specific audio events (such as alarm sounds) and the corresponding image behaviors (such as personnel movements), are identified through processing by using a machine learning algorithm (such as a classifier or a clustering algorithm), and finally the multi-modal time-space synchronous law enforcement behavior associated event data are obtained, wherein the law enforcement behavior conditions of law enforcement personnel and parties under the current law enforcement event are included.
Further, step S3 includes the steps of:
step S31, extracting and processing the related event data of the multi-mode space-time synchronous law enforcement behaviors by law enforcement personnel and the behavior sequence of the principal to obtain the behavior sequence of the law enforcement personnel of the law enforcement behaviors and the behavior sequence of the principal of the law enforcement behaviors;
In an embodiment of the present invention, by extracting all law enforcement personnel and principal action records from the multimodal spatiotemporal synchronized law enforcement action-related event data obtained from previous analysis, these records are derived from monitoring cameras, sensors, voice records and other data collection devices, including actions such as law enforcement personnel's approach, inquiry, inspection, punishment, etc. and actions such as principal's parking, interpretation, coordination, countermeasures, etc., and using data tags or event identifiers to separate the action records into law enforcement personnel actions and principal actions, the records with specific tags can be categorized into corresponding sequences by regular expression matching or rule engines, for example, law enforcement personnel actions can be labeled "OfficerAction", principal actions can be labeled "SuspectAction", and each action record can be decomposed into specific action description nodes, each node should contain time stamps, action types and related context information of actions, and at the same time, the action description nodes are ordered according to time stamps, an ordered action sequence is generated, each action sequence accurately reflects the actions of law enforcement personnel and principal's actions, and finally a law enforcement personnel action sequence and a law enforcement action sequence are obtained as a law enforcement action sequence.
Step S32, performing behavior description matching analysis on behavior description nodes in a behavior sequence of a law enforcement behavior event law enforcement personnel and behavior description nodes in a behavior sequence of a law enforcement behavior event party to obtain a law enforcement event matching relationship between the behavior description nodes of the law enforcement personnel and the behavior description nodes of the party;
In the embodiment of the invention, through carrying out standardized processing on the behavior description nodes in the behavior sequence of the law enforcement behavior event law enforcement personnel and the behavior description nodes in the behavior sequence of the law enforcement behavior event principal obtained through previous analysis, different behavior descriptions can be converted into a uniform semantic representation form by using natural language processing technology (such as word embedding or text similarity calculation), vector representations of the behavior descriptions are constructed, meanwhile, the behavior description nodes of the law enforcement personnel and the behavior description nodes of the principal are compared through using matching algorithms (such as cosine similarity calculation, dynamic time warping algorithm or a matching method based on templates), each pair of behavior description nodes calculate similarity scores through the algorithm, the matching relationship of the behavior description nodes is determined, for example, a threshold value is set, when the similarity scores exceed the threshold value, the two nodes are considered to be successfully matched, the result of the matching analysis is recorded into a matching relationship table, so that the matching relationship between each pair of matched law enforcement personnel and the behavior description nodes of the principal is determined, and the event matching relationship between the behavior description nodes of the principal and the behavior description nodes of the principal is finally obtained.
Step S33, performing law enforcement action sequence connection processing between the action description node in the law enforcement action event law enforcement action sequence and the action description node in the law enforcement action event principal action sequence based on the law enforcement event matching relationship between the law enforcement action description node and the principal action description node so as to generate a law enforcement site event action sequence connection diagram;
In the embodiment of the invention, the matching connection processing is carried out between the behavior description nodes in the behavior sequence of the law enforcement personnel of the law enforcement behavior event and the corresponding behavior description nodes in the behavior sequence of the law enforcement behavior event by combining the law enforcement event matching relation between the behavior description nodes of the law enforcement personnel and the behavior description nodes of the principal behavior obtained by the previous matching analysis, so that the matched behavior nodes are connected in time sequence by using a graph construction technology (such as an adjacency matrix or an adjacency list) in the graph theory, each behavior node is used as one node in the graph, the connected behavior description is used as an edge, thus, a behavior sequence connection graph is constructed, the event development process of a law enforcement site is displayed, each pair of matched nodes can be ensured to be correctly connected together, a complete behavior sequence is formed, and finally, the connection is generated to form a law enforcement site event behavior sequence connection graph.
And step S34, carrying out event behavior intention recognition analysis on the law enforcement site event behavior sequence connection diagram to obtain law enforcement site event behavior intention information data.
In the embodiment of the invention, by constructing a behavior intention model based on a known behavior mode and an intention template, the model can be trained by using a machine learning technology (such as a classification algorithm or a sequence labeling model) to identify different behavior intents, for example, by training a Support Vector Machine (SVM) classifier to predict the intention type of each behavior sequence, and carrying out intention identification on each behavior node and each behavior sequence in a law enforcement site event behavior sequence connection graph according to a previously constructed behavior intention model so as to apply the behavior intention model to the nodes in the graph, determining the corresponding behavior intention of each behavior sequence by calculating the intention probability of each behavior sequence, for example, analyzing whether the behavior sequence accords with a specific law enforcement intention (such as warning, dissuasion, punishment, arrest, escape and the like), and finally obtaining the law enforcement site event behavior intention information data.
Further, step S4 includes the steps of:
s41, acquiring a preset law enforcement event behavior knowledge-graph;
In the embodiment of the invention, the knowledge patterns of the corresponding law enforcement event behaviors are extracted from the preset knowledge base, the patterns comprise the behavior patterns of various law enforcement events and related normative rules, the law enforcement event documents, normative terms and case data of different sources are converted into structural knowledge by using a natural language processing technology, the data sources can be legal databases, law enforcement records and expert domain knowledge, the behavior knowledge patterns are integrated and constructed through a visual interface of a pattern construction platform, a knowledge pattern comprising law enforcement event nodes and relations thereof is formed, and finally the law enforcement event behavior knowledge pattern is obtained.
Step S42, performing behavior normalization rule retrieval and extraction processing on the law enforcement event behavior knowledge graph to obtain law enforcement event behavior normalization rule standards;
In the embodiment of the invention, a rule extraction method is applied to process a previously acquired law enforcement event behavior knowledge graph, firstly, a rule engine is used for searching by defining a rule searching standard of behavior normative rules, the rule engine extracts rules related to behavior normative by analyzing the attributes of nodes and edges in the graph, in the searching process, the behavior patterns in the graph are compared with normative terms, the behavior rules conforming to the normative are extracted from the rules, a rule list containing detailed behavior normative standards is generated as a result, and the rule list can comprise the behavior criteria, a processing program and corresponding normative basis of law enforcement personnel, and finally the law enforcement event behavior normative rule standard is obtained.
S43, performing behavior intention semantic feature analysis on the law enforcement site event behavior intention information data to obtain law enforcement site event behavior intention semantic feature data;
In the embodiment of the invention, the statistical analysis of the behavior intention semantic features is carried out on the behavior intention information data of the law enforcement site event obtained by the previous analysis by utilizing natural language processing and a machine learning model so as to realize word segmentation, part-of-speech tagging and syntactic analysis on the descriptive text of the site event, and the feature data of the behavior intention, such as the purpose, motivation, situation and the like of the law enforcement behavior intention, are extracted by a semantic analysis tool, while the machine learning model can use a pre-trained language model (such as BERT or GPT) to carry out feature extraction, and the most representative semantic features are screened out by a feature selection algorithm so as to finally obtain the semantic feature data of the behavior intention of the law enforcement site event.
Step S44, performing normalization rule adaptation and behavior normalization score calculation on law enforcement on-site event behavior intention semantic feature data based on law enforcement event behavior normalization rule standards to obtain law enforcement on-site behavior intention normalization score values;
In the embodiment of the invention, the rule of regularity is adapted to the corresponding law enforcement on-site event behavior intention semantic feature data by combining the rule of law enforcement event behavior normalization obtained by the previous retrieval and extraction, to calculate the similarity of the behavioral intention feature vector and the normalization rule using a matching algorithm such as cosine similarity or euclidean distance, and generating a normalization score according to the calculated similarity score by using a normalization rule adapter, wherein the score reflects the coincidence degree of the on-site behavior intention and the normalization rule, and the calculation process involves weighting the weight and the matching degree of each rule, so that the calculated behavior normalization score is used for evaluating the normalization degree of the on-site behavior intention, and finally the law enforcement on-site behavior intention normalization score is obtained.
And step S45, performing nonstandard behavior intelligent alarm optimization analysis on the corresponding law enforcement on-site event in the law enforcement on-site event behavior intention information data based on the law enforcement on-site behavior intention normative score value so as to generate a law enforcement on-site event nonstandard behavior optimization suggestion report.
In the embodiment of the invention, the corresponding law enforcement site events in the law enforcement site event behavior intention information data are analyzed by using the law enforcement site behavior intention normative scoring values obtained through previous quantitative calculation, so that the behaviors with scoring values lower than the threshold value are marked as non-normative behaviors through setting the threshold value, the non-normative behaviors are further analyzed by applying an intelligent warning method, potential problem areas are identified by using an anomaly detection algorithm, and the optimal analysis of the non-normative behaviors comprises generating a detailed optimal advice report, the report content comprises the identified non-normative behavior types, the existing normative risks and the improved advice, and the report utilizes a data visualization tool to display analysis results and advice to help law enforcement personnel adjust the law enforcement behaviors, ensure standardization of a law enforcement process and finally generate a law enforcement site event non-normative behavior optimal advice report.
Further, step S45 includes the steps of:
Step S451, comparing and judging the rule execution site behavior intention normative score value according to a preset normative score threshold, and judging the corresponding rule execution site event in the rule execution site event intention information data as a normative behavior event when the rule execution site behavior intention normative score value is larger than or equal to the preset normative score threshold;
In the embodiment of the invention, the law enforcement site behavior intention normative score value obtained by the previous quantitative calculation is compared and judged according to the preset normative score threshold value (which is set according to the minimum requirement of the normative behavior, for example, 80 points), if the law enforcement site behavior intention normative score value is greater than or equal to the normative score threshold value of 80 points, the corresponding law enforcement site behavior event is judged to be a normative behavior event, and if the law enforcement site behavior intention normative score value is smaller than the normative score threshold value of 80 points, the corresponding law enforcement site behavior event is judged to be an nonstandard behavior event.
Step S452, performing non-normative behavior intelligent alarm processing on the event judged to be the non-normative behavior according to a preset law enforcement behavior non-normative alarm rule so as to generate alarm information of the monitored non-normative behavior event;
In the embodiment of the invention, by extracting all relevant records of the irregular behavior event judged from the previous judging result, wherein the records comprise detailed information of the irregular behavior event, such as time, place, normative score value and the like of the irregular behavior event, and processing the extracted irregular behavior event according to a preset irregular law enforcement alarm rule, the irregular law enforcement alarm rule comprises indexes such as action severity, repeated violation number and the like, the data are analyzed, corresponding alarm signals are generated, and meanwhile, the alarm signals are converted into standardized alarm information, wherein the standardized alarm information comprises detailed information, alarm grade, suggested processing measures and the like of the irregular behavior event, for example, if a certain event normative score value is 75 minutes and repeated irregular behaviors are involved, high-grade alarm information is generated, and finally, the monitored irregular behavior event alarm information is generated, and the information comprises detailed description and alarm grade of the irregular behavior event generated through supervision.
Step S453, law enforcement behavior problem identification analysis is carried out on the alarm information of the non-normative behavior event of the supervision, and a report of the non-normative law enforcement behavior problem of the supervision is obtained;
In the embodiment of the invention, by carrying out detailed analysis on the alarm information of the current supervision irregular behavior event generated by the previous alarm, the frequency and type of the alarm event, the related law enforcement personnel and other factors are counted to identify potential problems, so that frequently occurring problems or systematic irregular behavior problems are identified from the potential problems, the identified problems are organized into corresponding problem reports, wherein the types of the irregular behavior problems, the related law enforcement event, the frequency and the potential reasons of the irregular behavior problems are included, and finally the report of the current supervision irregular law enforcement behavior problem is obtained.
Step S454, performing law enforcement behavior optimization suggestion processing on the supervision irregular law enforcement behavior problem report so as to generate a law enforcement site event irregular behavior optimization suggestion report.
In the embodiment of the invention, a specific irregular law enforcement problem is extracted from the report of the irregular law enforcement problem of the supervision obtained by the previous analysis, for example, the improper processing flow of the irregular behavior of a certain kind of party or the insufficient training of law enforcement personnel, the existing law enforcement flow is modeled by using a flow optimization tool (such as business flow management software or flow modeling tool), and an improvement suggestion is made according to the identified problem, wherein the method comprises the steps of redesigning the flow, adding a training link or adjusting law enforcement resource allocation, so that a detailed optimization suggestion report is written, the report content comprises the problem description, the optimization suggestion, the expected effect and the implementation step, the report is submitted to a management layer as a reference basis for improving the law enforcement behavior, and finally the irregular law enforcement event optimization suggestion report is generated.
Furthermore, the invention also provides a multi-mode intelligent analysis system based on law enforcement supervision, which is used for executing the multi-mode intelligent analysis method based on law enforcement supervision, and comprises the following steps:
The system comprises a law enforcement on-site audio/video behavior recognition analysis module, a law enforcement on-site video recognition analysis module and a law enforcement on-site video recognition analysis module, wherein the law enforcement on-site audio/video behavior recognition analysis module is used for acquiring a law enforcement on-site video file from a law enforcement supervision recorder platform, and performing audio/video segmentation processing on the law enforcement on-site video file to obtain a law enforcement on-site audio clip and a law enforcement on-site image video frame clip;
The system comprises a law enforcement behavior multi-mode space-time correlation analysis module, a multi-mode space-time synchronization correlation analysis module and a multi-mode space-time synchronization correlation analysis module, wherein the law enforcement behavior multi-mode space-time correlation analysis module is used for extracting and processing a law enforcement site video file to obtain a law enforcement site process time stamp and law enforcement site process space position information;
The system comprises a law enforcement event behavior intention recognition analysis module, a law enforcement site event behavior sequence connection graph, a law enforcement event behavior intention recognition analysis module and a data processing module, wherein the law enforcement event behavior intention recognition analysis module is used for performing law enforcement behavior sequence connection processing on multi-mode space-time synchronous law enforcement behavior associated event data to generate the law enforcement site event behavior sequence connection graph;
The intelligent warning module is used for acquiring a preset law enforcement event behavior knowledge graph, calculating behavior normalization scores of the law enforcement site event behavior intention information data based on the law enforcement event behavior knowledge graph to obtain law enforcement site behavior intention normalization score values, and carrying out intelligent warning optimization analysis on corresponding law enforcement site events in the law enforcement site event behavior intention information data based on the law enforcement site behavior intention normalization score values to generate a law enforcement site event nonstandard behavior optimization suggestion report.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.