CN117953580A - Behavior recognition method and system based on cross-camera multi-target tracking and electronic equipment - Google Patents

Behavior recognition method and system based on cross-camera multi-target tracking and electronic equipment Download PDF

Info

Publication number
CN117953580A
CN117953580A CN202410117750.0A CN202410117750A CN117953580A CN 117953580 A CN117953580 A CN 117953580A CN 202410117750 A CN202410117750 A CN 202410117750A CN 117953580 A CN117953580 A CN 117953580A
Authority
CN
China
Prior art keywords
behavior
targets
target
cameras
current frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410117750.0A
Other languages
Chinese (zh)
Inventor
齐冬莲
金浩远
闫云凤
聂雪松
李启
朱志航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202410117750.0A priority Critical patent/CN117953580A/en
Publication of CN117953580A publication Critical patent/CN117953580A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a behavior recognition method, a system and electronic equipment based on multi-target tracking across cameras, and belongs to the technical field of behavior recognition. The method comprises the following steps: video image acquisition; positioning a personnel area; performing dual target tracking: target matching in a single camera is completed by using the position similarity and the Hungary algorithm, target matching among cameras is completed by using the appearance similarity and the greedy algorithm, and personnel numbers are given; performing primary behavior identification; comprehensively judging behaviors; and correcting the behavior preliminary labels of the targets in the current frames of all cameras according to the comprehensive behavior discrimination results, and visually outputting target areas and behavior labels related to behaviors in the images. The method solves the problems of high omission rate and high false detection rate based on single-frame image recognition, and can realize efficient and reliable personnel behavior safety management and control tasks based on video monitoring in operation scenes such as electric power construction and the like.

Description

Behavior recognition method and system based on cross-camera multi-target tracking and electronic equipment
Technical Field
The invention relates to the technical field of behavior recognition, in particular to a behavior recognition method, a system and electronic equipment based on cross-camera multi-target tracking.
Background
With the rapid development of society, personnel safety control situations of operation scenes such as electric power construction are increasingly severe. Particularly in complex working scenarios, ensuring the safety of the working personnel becomes a great challenge. Compared with the traditional manual inspection mode, the behavior recognition method based on computer vision also appears at present, and although certain labor cost can be reduced, the prior art is mostly based on single-frame images, and the method has obvious defects when processing dynamic and changeable complex operation environments. Due to the lack of deep analysis in the time dimension, the technologies often have the problems of high omission and high false detection, so that accuracy and reliability are limited, and a safety supervisor is still required to conduct a large amount of manual investigation and determination of identification results.
Therefore, in view of these limitations in the prior art, how to provide a behavior recognition method, system and electronic device based on multi-target tracking across cameras is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the invention provides a behavior recognition method, a system and electronic equipment based on multi-target tracking across cameras, which utilizes a multi-camera multi-view monitoring system commonly existing in an electric power operation scene to associate the same personnel captured by different cameras through a multi-target tracking algorithm across cameras and assign uniform personnel numbers, thereby realizing comprehensive discrimination by utilizing behavior information of the same numbered personnel in a specific time period. The method has the advantages that the method not only depends on a static single-frame image, but also combines multi-frame data in a continuous time sequence of multiple cameras, thereby providing more comprehensive context information for behavior recognition, effectively improving the accuracy of behavior recognition, greatly reducing the possibility of misjudgment and missed detection, and providing high-reliability technical support for safety control in operation scenes such as electric power construction and the like.
In order to achieve the above object, the present invention provides the following technical solutions:
a behavior recognition method based on cross-camera multi-target tracking comprises the following steps:
S100: acquiring real-time video image data of a plurality of cameras in a scene;
s200: carrying out region positioning on targets in all frames of all cameras through a trained pedestrian detection model to obtain region coordinates of all targets;
S300: for a single camera, performing position similarity calculation on the region coordinates of all targets in the current frame and the region coordinates of all targets in the previous frame, and finishing target matching in the single camera by using a Hungary algorithm, and giving the matched targets the same personnel number to obtain target information of the current frame;
For the inter-camera targets, calculating the appearance similarity between all targets in the current frame of any camera and all targets in the current frame of the rest cameras by using appearance characteristics, and finishing inter-camera target matching by using a greedy algorithm, and correcting the personnel numbers of the targets according to the matching result;
S400: performing behavior recognition on all targets in the current frames of all cameras, and endowing each target with a corresponding primary behavior label;
s500: combining the behavior preliminary labels of the same personnel numbers in the previous adjacent specific frames of the multiple cameras, and determining the behavior category of the personnel according to a preset comprehensive judging strategy to obtain a comprehensive behavior judging result;
s600: and correcting the behavior preliminary labels of the targets in the current frames of all cameras according to the comprehensive behavior discrimination results, and visually outputting target areas and behavior labels related to behaviors in the images.
Preferably, the setting of the multiple cameras in S100 specifically includes: for a specific application scene, a plurality of cameras are arranged at different positions and different heights, so that real-time video image data of different visual angles of the scene are acquired, and more comprehensive spatial information is provided.
Preferably, the S200 includes:
constructing a pedestrian detection model based on the F2DNet pedestrian detection framework;
acquiring a COCO data set, and pre-training the pedestrian detection model;
Acquiring CrowdHuman a pedestrian data set, cityperson pedestrian data set, ETHZ pedestrian data set, MOT17 multi-target tracking data and MOT20 multi-target tracking data, and training the pedestrian detection model to obtain a trained pedestrian detection model;
Inputting the acquired current frame image of any camera into a pedestrian detection model which is trained, outputting the region coordinates of all targets of the current frame, and representing the region coordinates as follows:
Where i represents the i-th object of the current frame, c represents the c-th camera of the current frame, And/>Respectively the abscissa and the ordinate of the left lower corner of the target area of the current frame,/>And/>The length and width of the target area of the current frame, respectively.
Preferably, in S300, for a single camera, the calculating of the position similarity between the region coordinates of all the targets in the current frame and the region coordinates of all the targets in the previous frame, and the matching of the targets in the single camera is completed by using the hungarian algorithm, includes:
s310: for target matching in a single camera, the region coordinates of all targets of a current frame and a previous frame are input, and the region coordinates of the targets of the previous frame are as follows:
wherein j represents the jth target of the previous frame, v represents the v camera of the current frame, And/>Respectively the abscissa and the ordinate of the lower left corner of the target area of the previous frame,/>And/>The length and the width of the target area of the previous frame are respectively;
s311: and carrying out position prediction on all targets of the previous frame by using the configured Kalman filter, and obtaining the area coordinates of the predicted targets as follows:
Wherein, And/>Respectively predicting the abscissa and the ordinate of the lower left corner of the target area for the previous frame,/>And/>Predicting the length and width of the target area for the previous frame respectively;
s312: and adopting IoU as a measuring function to calculate the position similarity between all targets of the current frame and all predicted targets of the previous frame:
for target matching in a single camera, c and v in the function are equal;
S313: taking the position similarity as a matching basis, adopting Hungary to calculate the matching degree between targets, taking the matched targets as the same person, giving the same person number, and outputting primary target information after target matching in the single camera of the current frame as follows:
Wherein, Numbering the personnel.
Preferably, in S300, for the inter-camera object matching, the appearance similarity calculation is performed on all objects in the current frame of any camera and all objects in the current frame of the remaining cameras by using appearance features, and the inter-camera object matching is completed by using a greedy algorithm, including:
S320: inputting the region coordinates of all targets of the current frame of the multiple cameras after target matching in the single camera and personnel numbering;
S321: extracting appearance characteristics of pedestrians in all target areas through a trained pedestrian re-identification model, and adding the appearance characteristics into primary target information of a current frame to obtain secondary target information of the current frame;
S322: according to the camera numbering sequence, all targets in the first camera and all targets in other cameras respectively calculate cosine similarity as appearance similarity according to appearance characteristics one by one;
s323: and (3) taking the appearance similarity as a matching basis, adopting a greedy algorithm to complete target matching among different cameras, taking the matched targets as the same person, and correcting the person numbers of the targets in other cameras by taking the person number of the first camera target as the basis.
Preferably, the S321 includes:
building a pedestrian re-identification model based on a CAL architecture;
obtaining an LTCC re-recognition data set and a PRCC re-recognition data set, training the pedestrian re-recognition model to obtain a trained pedestrian re-recognition model,
Extracting target appearance characteristics through the pedestrian re-identification model;
Adding the target appearance characteristics into the primary target information of the current frame, and obtaining the secondary target information of the current frame is as follows:
Wherein, Is a target appearance feature.
Preferably, the step S400 includes:
performing frame-by-frame behavior recognition by adopting YOLOX target detection models, and pre-training weights by using COCO data sets;
Obtaining a behavior recognition dataset, the behavior recognition dataset comprising: pouring, smoking and crossing;
Training the YOLOX target detection model by the behavior recognition dataset;
the behavior preliminary label obtained through the trained YOLOX target detection model is combined into three-level target information, and the form is as follows:
Wherein, The preliminary labels of behaviors comprise concrete behavior category labels of inversions, smoking and crossing and non-existing behavior category labels.
Preferably, the S500 includes:
S510: inputting all targets with the same personnel number in the current frame of the multiple cameras;
s520: counting the quantity of the initial behavior label categories of all the input targets, and taking the category with the largest quantity ratio as the behavior category to be determined under the same personnel number in the current frame;
s530: acquiring the behavior preliminary labels of all the same personnel number targets in the adjacent previous frames in a specific time interval of the multiple cameras, and calculating the behavior preliminary label duty ratio which is the same as the behavior type to be determined in the current frame;
S540: if the label duty ratio is larger than the preset duty ratio, the behavior class under the same personnel number is the behavior class to be determined in the current frame, otherwise, the related behavior is not carried out under the corresponding same personnel number;
S550: and determining the behavior category of the personnel according to a preset comprehensive discrimination strategy based on S510-S540 to obtain a comprehensive discrimination result of the behaviors.
Preferably, the parameters of the comprehensive discrimination policy include:
The time interval default parameter is set to 16 frames;
the preset duty cycle parameter is set to 90%.
A behavior recognition system based on cross-camera multi-target tracking, comprising:
the video image acquisition module: acquiring real-time video image data of a plurality of cameras in a scene;
Personnel area positioning module: carrying out region positioning on targets in the current frames of all cameras through a trained pedestrian detection model to obtain region coordinates of all the targets;
A dual target tracking module: for a single camera, performing position similarity calculation on the region coordinates of all targets in the current frame and the region coordinates of all targets in the previous frame, and finishing target matching in the single camera by using a Hungary algorithm, and giving the matched targets the same personnel number to obtain target information of the current frame;
For the inter-camera targets, calculating the appearance similarity between all targets in the current frame of any camera and all targets in the current frame of the rest cameras by using appearance characteristics, and finishing inter-camera target matching by using a greedy algorithm, and correcting the personnel numbers of the targets according to the matching result;
The behavior preliminary identification module: performing behavior recognition on all targets in the current frames of all cameras, and endowing each target with a corresponding primary behavior label;
and the behavior comprehensive judging module is used for: combining the behavior preliminary labels of the same personnel numbers in the previous adjacent specific frames of the multiple cameras, and determining the behavior category of the personnel according to a preset comprehensive judging strategy to obtain a comprehensive behavior judging result;
the identification result output module: and correcting the behavior preliminary labels of the targets in the current frames of all cameras according to the comprehensive behavior discrimination results, and visually outputting target areas and behavior labels related to behaviors in the images.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a behavior recognition method based on multi-target tracking across cameras when executing the computer program.
Compared with the prior art, the behavior recognition method, system and electronic equipment based on the cross-camera multi-target tracking can effectively recognize and track the same person under different visual angles in a changeable environment, ensure the behavior analysis of continuity and consistency, effectively overcome the problems of misjudgment and missed detection in the traditional single-frame image behavior recognition method, remarkably improve the accuracy of behavior recognition, further reduce the dependence on manpower, and remarkably reduce the demand on human resources; the implementation of the technology not only can promote the safety management level, but also can effectively reduce the safety accident risk caused by monitoring blind areas or identification errors. The invention can be applied to the operation scenes such as electric power capital construction and the like, realizes the efficient and reliable personnel behavior safety control task based on video monitoring, and provides a more reliable and advanced operation personnel safety control solution for the operation scene.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a personnel area locating result visualization provided by an embodiment of the present invention;
Fig. 3 is a flowchart of target matching in a single camera according to an embodiment of the present invention;
FIG. 4 is a flow chart of cross-camera target matching provided by an embodiment of the invention;
FIG. 5 is a schematic diagram of dual-target tracking result visualization provided by an embodiment of the present invention;
FIG. 6 is a flowchart for comprehensively discriminating behaviors according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a visual output of recognition results according to an embodiment of the present invention;
Fig. 8 is a schematic diagram of a system structure according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a behavior recognition method, a system and electronic equipment based on cross-camera multi-target tracking, which are used for acquiring real-time video image data of multiple cameras in a scene; the pedestrian detection algorithm is utilized to accurately locate the areas of the personnel targets in the current video frames of all cameras, and the area coordinates of all the targets are output; performing dual-target tracking, completing target matching in a single camera by using position similarity and a Hungary algorithm, completing target matching among cameras by using appearance similarity and a greedy algorithm, and giving personnel numbers; performing behavior recognition on all personnel targets in the current frame, and endowing each target with a corresponding primary behavior label; combining the behavior preliminary labels of the targets with the same number in the previous adjacent specific frames of the multiple cameras, and determining the behavior category of the personnel according to the comprehensive discrimination strategy; and correcting the behavior preliminary labels of the targets in the current frames of all cameras according to the comprehensive behavior discrimination results, and visually outputting target areas and behavior labels related to behaviors in the images. The method provides a solution to the problems of high omission rate and high false detection rate based on single frame image recognition.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
The embodiment of the invention discloses a behavior recognition method based on multi-target tracking across cameras, which is known by combining with FIG. 1, and comprises the following steps:
s100, video image acquisition:
acquiring real-time video image data of multiple cameras in a scene;
s200, personnel area positioning:
accurately positioning the personnel targets in all frames of all cameras by utilizing a trained pedestrian detection algorithm, and outputting the region coordinates of all targets;
s300 dual target tracking:
Performing position similarity calculation on all targets in the current frame and all targets in the previous frame according to the region coordinates, and completing target matching in a single camera by using a Hungary algorithm; performing appearance similarity calculation on all targets in the current frame and all targets in the video frames of other cameras according to appearance characteristics, and completing target matching among the cameras by using a greedy algorithm; the mutually matched targets belong to the same person, and the same person number is given;
s400, performing primary identification:
Performing behavior recognition on all personnel targets in the current frame, if the related behavior categories exist, assigning a behavior preliminary label of the corresponding category for each target, otherwise, setting the behavior preliminary label as non-existing behavior;
S500, comprehensive judgment of behaviors:
As the method is carried out frame by frame, all targets in the current frame and the previous frame already contain corresponding personnel numbers and behavior preliminary labels; for a certain person in the current frame, a primary behavior label of the same number target in the previous adjacent specific frame of the multiple cameras is combined, and the person behavior category is determined according to a comprehensive discrimination strategy;
S600, outputting a recognition result:
and correcting the behavior preliminary labels of the targets in the current frames of all cameras according to the comprehensive behavior discrimination results, and visually outputting target areas and behavior labels related to behaviors in the images.
In a specific embodiment, the multi-camera setup of S100 is specifically: for a specific application scene, a plurality of cameras are arranged at different positions and different heights, so that real-time video image data of different visual angles of the scene are acquired, and more comprehensive spatial information is provided.
In one specific embodiment, the pedestrian detection of S200 is specifically: the invention adopts an advanced F2DNet pedestrian detection algorithm, designs a feature extraction, focus detection and detection generation strategy, carries out accurate positioning and high-class recall through a detection head, and uses a lightweight suppression head to process false alarm, thus having the characteristics of high efficiency and accuracy, and the calculated amount is smaller and is suitable for actual scene deployment; after the pedestrian detection algorithm finishes pretraining by using the COCO data set, training is performed on the pedestrian data set CrowdHuman, cityperson, ETHZ and the multi-target tracking data MOT17 and MOT 20; specific training parameters are shown in table 1, the input size of training samples is 1440×800, training is performed on4 NVIDIA a100 GPUs, the number of input samples in each batch is 24, and the initial learning rate is 1e -3 by adopting an SGD optimizer, so that the total training is 120 rounds.
Table 1 pedestrian detection algorithm training parameter settings
The coordinates of the target area of the current frame of pedestrian detection output are as follows:
Where i represents the i-th object of the current frame, c represents the c-th camera of the current frame, And/>Respectively the abscissa and the ordinate of the left lower corner of the target area of the current frame,/>And/>The length and the width of the target area of the current frame are respectively; the visual diagram of the personnel area positioning result is shown in fig. 2, taking smoking, falling, crossing and calling behavior category scenes as an example, and a rectangular frame is used as a visual result of the target area coordinates.
In a specific embodiment, after the dual-target tracking in S300, a visual schematic diagram of the result is shown in fig. 5, and taking a smoking behavior category scene as an example, a person number is above each rectangular frame, and the same person number is the same person; the image representation of each row originates from the same camera, visualizing the previous frame (-4, -8, -12, -16) and the current frame (0 frame).
In a specific embodiment, the target matching in the single camera of S300 is specifically: for target matching in a single camera, in fact, video images of each camera are processed independently, a target matching flow chart in the single camera is shown in fig. 3, and the target matching steps are as follows:
S310: inputting all target information of a current frame and a previous frame, namely, including target area coordinate information of each target, wherein the target area coordinates of the previous frame are as follows:
wherein j represents the jth target of the previous frame, v represents the v camera of the current frame, And/>Respectively the abscissa and the ordinate of the lower left corner of the target area of the previous frame,/>And/>The length and width of the target area of the previous frame, respectively.
S311: for all targets of the previous frame, performing position prediction on the targets by using a configured Kalman filter to obtain predicted target region coordinates as follows:
Wherein, And/>Respectively predicting the abscissa and the ordinate of the lower left corner of the target area for the previous frame,/>And/>The length and width of the target area are predicted for the previous frame, respectively.
S312: calculating the position similarity of all targets of the current frame and all predicted targets of the previous frame, and adopting IoU as a measuring function:
for target matching in a single camera, c and v in the function are equal;
S313: the position similarity is used as a matching basis, a Hungary algorithm is adopted to complete matching between targets, the matched targets are the same person, the same person number is given, and primary target information after target matching in a single camera of the current frame is output as follows:
Wherein, Numbering the personnel.
In a specific embodiment, a cross-camera object matching flowchart of S300 is shown in fig. 4, and specifically is:
S320: inputting all target information of the current frame of the multiple cameras after target matching in the single camera, namely, the coordinate information of a target area containing each target and personnel numbers;
S321: extracting appearance characteristics of pedestrians in all target areas through the trained pedestrian re-identification model, and obtaining current frame secondary target information as follows:
Wherein, Is a target appearance feature;
S322: according to the camera numbering sequence, all targets in the first camera and all targets in other cameras respectively calculate cosine similarity as appearance similarity according to appearance characteristics one by one;
S323: the appearance similarity is used as a matching basis, a greedy algorithm is adopted to complete target matching among different cameras, the matched targets are the same person, and the person numbers of the targets in other cameras are corrected according to the person number of the first camera target, so that the numbers of the same person in multiple cameras are unified;
In one specific embodiment, S321: the pedestrian re-recognition model is specifically as follows: the advanced CAL pedestrian re-recognition model is adopted to extract the target appearance characteristics, which is a model designed for the re-recognition task of the clothing changing pedestrian, can extract more effective appearance characteristics irrelevant to wearing, and is suitable for people wearing similar work wearing and other application scenes related to the invention; the module trains on LTCC and PRCC re-identification data sets, specific training parameters are shown in table 2, training sample input sizes are 384×192, training is performed on a single NVIDIA A100 GPU, the number of input samples in each batch is 64, the learning rate is 3.5e -4, and training is performed through Adam for 60 rounds.
Table 2 training parameter settings for pedestrian re-recognition algorithm
In a specific embodiment, the behavior recognition of S400 is specifically: adopting YOLOX target detection algorithm to realize frame-by-frame behavior recognition, training on an acquired behavior recognition dataset by using COCO pre-training weight, wherein the behavior recognition dataset comprises high-quality labels of 7 types of personnel behaviors, including falling to the ground, smoking, crossing and the like, and the behavior recognition dataset is acquired by 6 cameras with different visual angles and covers image data samples in 4 scenes from daytime to evening; specific training parameters are shown in table 3, the input size of training samples is 448×448, training is performed on a single NVIDIA A100 GPU, the number of input samples in each batch is 32, the learning rate is selected from {1e -2,5e-3,1e-3 }, and meanwhile, a random depth is adopted as a regularization method to train for 80 rounds in total; the primary behavior label obtained by behavior recognition is combined into target information, and the three-level target information is in the form of:
Wherein, The preliminary labels of behaviors include concrete behavior category labels such as reverse (failing), smoking (smoking), crossing (cross) and the like, and non-existence behaviors (none).
Table 3 training parameter settings for behavior recognition algorithm
In a specific embodiment, the comprehensive discrimination policy of S500 is specifically: for a person in all the current frames of the multiple cameras, namely the same person numbering target, the following comprehensive judging step is carried out, and a behavior comprehensive judging flow chart is shown in fig. 6:
S510: inputting all targets with the same personnel number in the current frame of the multiple cameras;
S520: counting the quantity of the initial behavior label categories of all the input targets, and acquiring the category with the largest quantity ratio as the behavior category to be determined of the current frame of the person;
s530: acquiring the behavior preliminary labels of all the same personnel number targets in the adjacent previous frames in a specific time interval of the multiple cameras, and calculating the behavior preliminary label duty ratio which is the same as the behavior type to be determined in the current frame;
S540: if the label duty ratio is larger than the specific duty ratio, the behavior class of the person is the behavior class to be determined in the current frame, otherwise, the person does not perform related behaviors
Specifically, the method further comprises the following steps: s550: and determining the behavior category of the personnel according to a preset comprehensive discrimination strategy based on S510-S540 to obtain a comprehensive discrimination result of the behaviors.
Specifically, the parameter settings of the comprehensive discrimination strategy are specifically: the default parameter of the time interval is set to 16 frames, and the input video frame rate is 16 frames per second, namely the recognition result of the adjacent previous frames in 1 second is comprehensively considered; setting the default parameters to 90%; the time interval and the duty ratio can be fine-tuned according to the actual scene and the specific task to achieve more accurate results.
In one embodiment, the preliminary tag correction of S600 is specifically: and correcting the preliminary behavior labels of all targets of each person in the current frame by using the person behavior category of each person in the current frame determined by the comprehensive discrimination strategy in the step S500 as the basis of visual output and subsequent comprehensive discrimination reference. The visual diagram of the recognition result output is shown in fig. 7, and the smoking behavior type is taken as an example, and the comparison is performed for the effect of the display method, wherein the first behavior is the preliminary label before correction, and the second behavior is output after correction.
Referring to fig. 8, the embodiment of the invention also discloses a behavior recognition system based on cross-camera multi-target tracking, which comprises:
the video image acquisition module: acquiring real-time video image data of a plurality of cameras in a scene;
Personnel area positioning module: carrying out region positioning on targets in the current frames of all cameras through a trained pedestrian detection model to obtain region coordinates of all the targets;
A dual target tracking module: for a single camera, performing position similarity calculation on the region coordinates of all targets in the current frame and the region coordinates of all targets in the previous frame, and finishing target matching in the single camera by using a Hungary algorithm, and giving the matched targets the same personnel number to obtain target information of the current frame;
For the inter-camera targets, calculating the appearance similarity between all targets in the current frame of any camera and all targets in the current frame of the rest cameras by using appearance characteristics, and finishing inter-camera target matching by using a greedy algorithm, and correcting the personnel numbers of the targets according to the matching result;
The behavior preliminary identification module: performing behavior recognition on all targets in the current frames of all cameras, and endowing each target with a corresponding primary behavior label;
and the behavior comprehensive judging module is used for: combining the behavior preliminary labels of the same personnel numbers in the previous adjacent specific frames of the multiple cameras, and determining the behavior category of the personnel according to a preset comprehensive judging strategy to obtain a comprehensive behavior judging result;
the identification result output module: and correcting the behavior preliminary labels of the targets in the current frames of all cameras according to the comprehensive behavior discrimination results, and visually outputting target areas and behavior labels related to behaviors in the images.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a behavior recognition method based on multi-target tracking across cameras when executing the computer program.
In a specific embodiment, in order to verify the performance of the invention, experimental verification is performed on a behavior recognition data set acquired and manufactured for the invention, and the pedestrian detection model, the pedestrian re-recognition model and the behavior detection model related to the invention are trained according to the settings in the related steps, and reasoning verification is performed on a single GPU; the specific experimental settings and results are as follows:
The data set comprises the region and the number label of 7 types of personnel behaviors, and the data set information of specific types, sample numbers and the like is shown in table 4.
Table 4 number of data set samples according to the invention
Behavior category Region labeling Numbering and labeling Number of samples
Inverted floor 3241
Smoking article 4381
Crossing over 5314
Holding rod 2143
Ascending a height 2841
Ladder stand 3195
Telephone call making 4952
The effectiveness of the method is measured by adopting the omission rate, the false detection rate and the accuracy rate, the experimental verification results of the method are shown in table 5, and in order to illustrate the advantages of the method, the identification results of the behavior identification module (YOLOX) in the method are listed in the table, namely, the uncorrected results.
Table 5 method experiment verification results
Specifically, according to the behavior recognition method, system and electronic equipment based on multi-target tracking across cameras, the target area is determined through pedestrian detection, the dual-target tracking achieves the purpose of endowing the target personnel with numbers across the multiple cameras, the behavior recognition algorithm performs preliminary analysis on the target behaviors, and the multi-camera multi-adjacent frames are combined to achieve target behavior determination according to a comprehensive behavior discrimination strategy. The verification result on the behavior recognition data set can be known, by introducing multi-target tracking across cameras, the problems of missed detection and false detection of behavior recognition are effectively solved by utilizing multi-camera multi-different visual angle information and multi-frame data in continuous time sequences, the recognition accuracy is obviously improved, meanwhile, the behavior information of the same numbered personnel in a specific time period is utilized to comprehensively judge by utilizing the multi-camera multi-visual angle monitoring system commonly existing in operation scenes such as electric power construction and the like, the same personnel captured by different cameras are associated by a multi-target tracking algorithm across cameras, and unified personnel numbers are given. The method has the advantages that the method not only depends on a static single-frame image, but also combines multi-frame data in a continuous time sequence of multiple cameras, so that more comprehensive context information is provided for behavior recognition, the accuracy of behavior recognition is effectively improved, the possibility of misjudgment and omission is greatly reduced, high-reliability technical support is provided for safety management and control in operation scenes such as electric power construction and the like, and the technical application of the behavior recognition method in personnel safety management and control can be promoted.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A behavior recognition method based on cross-camera multi-target tracking is characterized by comprising the following steps:
S100: acquiring real-time video image data of a plurality of cameras in a scene;
s200: carrying out region positioning on targets in all frames of all cameras through a trained pedestrian detection model to obtain region coordinates of all targets;
S300: for a single camera, performing position similarity calculation on the region coordinates of all targets in the current frame and the region coordinates of all targets in the previous frame, and finishing target matching in the single camera by using a Hungary algorithm, and giving the matched targets the same personnel number to obtain target information of the current frame;
For the inter-camera targets, calculating the appearance similarity between all targets in the current frame of any camera and all targets in the current frame of the rest cameras by using appearance characteristics, and finishing inter-camera target matching by using a greedy algorithm, and correcting the personnel numbers of the targets according to the matching result;
S400: performing behavior recognition on all targets in the current frames of all cameras, and endowing each target with a corresponding primary behavior label;
s500: combining the behavior preliminary labels of the same personnel numbers in the previous adjacent specific frames of the multiple cameras, and determining the behavior category of the personnel according to a preset comprehensive judging strategy to obtain a comprehensive behavior judging result;
s600: and correcting the behavior preliminary labels of the targets in the current frames of all cameras according to the comprehensive behavior discrimination results, and visually outputting target areas and behavior labels related to behaviors in the images.
2. The behavior recognition method based on multi-target tracking across cameras according to claim 1, wherein the S200 comprises:
constructing a pedestrian detection model based on the F2DNet pedestrian detection framework;
acquiring a COCO data set, and pre-training the pedestrian detection model;
Acquiring CrowdHuman a pedestrian data set, cityperson pedestrian data set, ETHZ pedestrian data set, MOT17 multi-target tracking data and MOT20 multi-target tracking data, and training the pedestrian detection model to obtain a trained pedestrian detection model;
inputting the acquired current frame image of any camera into a pedestrian detection model which completes training, and outputting the region coordinates of all targets of the current frame, wherein the region coordinates are expressed as follows:
Where i represents the i-th object of the current frame, c represents the c-th camera of the current frame, And/>Respectively the abscissa and the ordinate of the left lower corner of the target area of the current frame,/>And/>The length and width of the target area of the current frame, respectively.
3. The behavior recognition method based on multi-target tracking across cameras according to claim 1, wherein in S300, for a single camera, performing a position similarity calculation on the region coordinates of all targets in a current frame and the region coordinates of all targets in a previous frame, and using a hungarian algorithm to complete target matching in the single camera, the method comprises:
s310: for target matching in a single camera, the region coordinates of all targets of a current frame and a previous frame are input, and the region coordinates of the targets of the previous frame are as follows:
wherein j represents the jth target of the previous frame, v represents the v camera of the current frame, And/>Respectively the abscissa and the ordinate of the lower left corner of the target area of the previous frame,/>And/>The length and the width of the target area of the previous frame are respectively;
s311: and carrying out position prediction on all targets of the previous frame by using the configured Kalman filter, and obtaining the area coordinates of the predicted targets as follows:
Wherein, And/>Respectively predicting the abscissa and the ordinate of the lower left corner of the target area for the previous frame,/>And/>Predicting the length and width of the target area for the previous frame respectively;
s312: and adopting IoU as a measuring function to calculate the position similarity between all targets of the current frame and all predicted targets of the previous frame:
for target matching in a single camera, c and v in the function are equal;
S313: taking the position similarity as a matching basis, adopting Hungary to calculate the matching degree between targets, taking the matched targets as the same person, giving the same person number, and outputting primary target information after target matching in the single camera of the current frame as follows:
Wherein, Numbering the personnel.
4. The behavior recognition method based on multi-target tracking across cameras according to claim 1, wherein in S300, for the inter-camera target matching between the across cameras, performing appearance similarity calculation by using appearance features on all targets in a current frame of any camera and all targets in a current frame of the rest cameras, and using a greedy algorithm to complete the inter-camera target matching, the method comprises:
S320: inputting the region coordinates of all targets of the current frame of the multiple cameras after target matching in the single camera and personnel numbering;
S321: extracting appearance characteristics of pedestrians in all target areas through a trained pedestrian re-identification model, and adding the appearance characteristics into primary target information of a current frame to obtain secondary target information of the current frame;
S322: according to the camera numbering sequence, all targets in the first camera and all targets in other cameras respectively calculate cosine similarity as appearance similarity according to appearance characteristics one by one;
s323: and (3) taking the appearance similarity as a matching basis, adopting a greedy algorithm to complete target matching among different cameras, taking the matched targets as the same person, and correcting the person numbers of the targets in other cameras by taking the person number of the first camera target as the basis.
5. The behavior recognition method based on multi-target tracking across cameras according to claim 4, wherein the step S321 comprises:
building a pedestrian re-identification model based on a CAL architecture;
obtaining an LTCC re-recognition data set and a PRCC re-recognition data set, training the pedestrian re-recognition model to obtain a trained pedestrian re-recognition model,
Extracting target appearance characteristics through the pedestrian re-identification model;
Adding the target appearance characteristics into the primary target information of the current frame, and obtaining the secondary target information of the current frame is as follows:
Wherein, Is a target appearance feature.
6. The behavior recognition method based on multi-target tracking across cameras according to claim 1, wherein the step S400 comprises:
performing frame-by-frame behavior recognition by adopting YOLOX target detection models, and pre-training weights by using COCO data sets;
Obtaining a behavior recognition dataset, the behavior recognition dataset comprising: pouring, smoking and crossing;
Training the YOLOX target detection model by the behavior recognition dataset;
the behavior preliminary label obtained through the trained YOLOX target detection model is combined into three-level target information, and the form is as follows:
Wherein, The preliminary labels of behaviors comprise concrete behavior category labels of inversions, smoking and crossing and non-existing behavior category labels.
7. The behavior recognition method based on multi-target tracking across cameras according to claim 1, wherein the step S500 comprises:
S510: inputting all targets with the same personnel number in the current frame of the multiple cameras;
s520: counting the quantity of the initial behavior label categories of all the input targets, and taking the category with the largest quantity ratio as the behavior category to be determined under the same personnel number in the current frame;
s530: acquiring the behavior preliminary labels of all the same personnel number targets in the adjacent previous frames in a specific time interval of the multiple cameras, and calculating the behavior preliminary label duty ratio which is the same as the behavior type to be determined in the current frame;
S540: if the label duty ratio is larger than the preset duty ratio, the behavior class under the same personnel number is the behavior class to be determined in the current frame, otherwise, the related behavior is not carried out under the corresponding same personnel number;
S550: and determining the behavior category of the personnel according to a preset comprehensive discrimination strategy based on S510-S540 to obtain a comprehensive discrimination result of the behaviors.
8. The behavior recognition method based on multi-target tracking across cameras according to claim 7, wherein the parameters of the comprehensive discrimination strategy include:
The time interval default parameter is set to 16 frames;
the preset duty cycle parameter is set to 90%.
9. A behavior recognition system based on cross-camera multi-target tracking using the behavior recognition method based on cross-camera multi-target tracking as claimed in any one of claims 1 to 8, comprising:
the video image acquisition module: acquiring real-time video image data of a plurality of cameras in a scene;
personnel area positioning module: carrying out region positioning on targets in all frames of all cameras through a trained pedestrian detection model to obtain region coordinates of all targets;
A dual target tracking module: for a single camera, performing position similarity calculation on the region coordinates of all targets in the current frame and the region coordinates of all targets in the previous frame, and finishing target matching in the single camera by using a Hungary algorithm, and giving the matched targets the same personnel number to obtain target information of the current frame;
For the inter-camera targets, calculating the appearance similarity between all targets in the current frame of any camera and all targets in the current frame of the rest cameras by using appearance characteristics, and finishing inter-camera target matching by using a greedy algorithm, and correcting the personnel numbers of the targets according to the matching result;
The behavior preliminary identification module: performing behavior recognition on all targets in the current frames of all cameras, and endowing each target with a corresponding primary behavior label;
and the behavior comprehensive judging module is used for: combining the behavior preliminary labels of the same personnel numbers in the previous adjacent specific frames of the multiple cameras, and determining the behavior category of the personnel according to a preset comprehensive judging strategy to obtain a comprehensive behavior judging result;
the identification result output module: and correcting the behavior preliminary labels of the targets in the current frames of all cameras according to the comprehensive behavior discrimination results, and visually outputting target areas and behavior labels related to behaviors in the images.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of behavior recognition based on multi-objective tracking across cameras as claimed in any one of claims 1 to 8 when the computer program is executed by the processor.
CN202410117750.0A 2024-01-29 2024-01-29 Behavior recognition method and system based on cross-camera multi-target tracking and electronic equipment Pending CN117953580A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410117750.0A CN117953580A (en) 2024-01-29 2024-01-29 Behavior recognition method and system based on cross-camera multi-target tracking and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410117750.0A CN117953580A (en) 2024-01-29 2024-01-29 Behavior recognition method and system based on cross-camera multi-target tracking and electronic equipment

Publications (1)

Publication Number Publication Date
CN117953580A true CN117953580A (en) 2024-04-30

Family

ID=90802599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410117750.0A Pending CN117953580A (en) 2024-01-29 2024-01-29 Behavior recognition method and system based on cross-camera multi-target tracking and electronic equipment

Country Status (1)

Country Link
CN (1) CN117953580A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170116753A1 (en) * 2014-04-30 2017-04-27 Institute Of Automation Chinese Academy Of Sciences Large-Range-First Cross-Camera Visual Target Re-identification Method
CN110399808A (en) * 2019-07-05 2019-11-01 桂林安维科技有限公司 A kind of Human bodys' response method and system based on multiple target tracking
CN111259790A (en) * 2020-01-15 2020-06-09 上海交通大学 Coarse-to-fine behavior rapid detection and classification method and system for medium-short time video
JP2021117635A (en) * 2020-01-24 2021-08-10 Kddi株式会社 Object tracking device and object tracking method
CN114240997A (en) * 2021-11-16 2022-03-25 南京云牛智能科技有限公司 Intelligent building online cross-camera multi-target tracking method
CN114693746A (en) * 2022-03-31 2022-07-01 西安交通大学 Intelligent monitoring system and method based on identity recognition and cross-camera target tracking
CN116363694A (en) * 2023-03-03 2023-06-30 中国电子科技集团公司第二十八研究所 Multi-target tracking method of unmanned system crossing cameras matched with multiple pieces of information

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170116753A1 (en) * 2014-04-30 2017-04-27 Institute Of Automation Chinese Academy Of Sciences Large-Range-First Cross-Camera Visual Target Re-identification Method
CN110399808A (en) * 2019-07-05 2019-11-01 桂林安维科技有限公司 A kind of Human bodys' response method and system based on multiple target tracking
CN111259790A (en) * 2020-01-15 2020-06-09 上海交通大学 Coarse-to-fine behavior rapid detection and classification method and system for medium-short time video
JP2021117635A (en) * 2020-01-24 2021-08-10 Kddi株式会社 Object tracking device and object tracking method
CN114240997A (en) * 2021-11-16 2022-03-25 南京云牛智能科技有限公司 Intelligent building online cross-camera multi-target tracking method
CN114693746A (en) * 2022-03-31 2022-07-01 西安交通大学 Intelligent monitoring system and method based on identity recognition and cross-camera target tracking
CN116363694A (en) * 2023-03-03 2023-06-30 中国电子科技集团公司第二十八研究所 Multi-target tracking method of unmanned system crossing cameras matched with multiple pieces of information

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YUHANG HE, ET AL.: "Multi-Target Multi-Camera Tracking by Tracklet-to-Target Assignment", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》, vol. 29, 19 March 2020 (2020-03-19), XP011779656, DOI: 10.1109/TIP.2020.2980070 *
齐冬莲等: "一种改进的YOLO目标检测方法在电缆设备异常状态识别中的应用", 《电测与仪表》, vol. 57, no. 2, 31 January 2020 (2020-01-31) *

Similar Documents

Publication Publication Date Title
CN111191576B (en) Personnel behavior target detection model construction method, intelligent analysis method and system
CN109598794B (en) Construction method of three-dimensional GIS dynamic model
CN106991668B (en) Evaluation method for pictures shot by skynet camera
CN111414807B (en) Tidal water identification and crisis early warning method based on YOLO technology
CN113903081A (en) Visual identification artificial intelligence alarm method and device for images of hydraulic power plant
CN110232379A (en) A kind of vehicle attitude detection method and system
CN111539938B (en) Method, system, medium and electronic terminal for detecting curvature of rolled strip steel strip head
CN109035307B (en) Set area target tracking method and system based on natural light binocular vision
CN111079518A (en) Fall-down abnormal behavior identification method based on scene of law enforcement and case handling area
CN114202646A (en) Infrared image smoking detection method and system based on deep learning
CN111339902A (en) Liquid crystal display number identification method and device of digital display instrument
CN113449675A (en) Coal mine personnel border crossing detection method
CN111008994A (en) Moving target real-time detection and tracking system and method based on MPSoC
CN112270381A (en) People flow detection method based on deep learning
CN111259736B (en) Real-time pedestrian detection method based on deep learning in complex environment
CN114067438A (en) Thermal infrared vision-based parking apron human body action recognition method and system
CN113256731A (en) Target detection method and device based on monocular vision
CN112580542A (en) Steel bar counting method based on target detection
CN115311623A (en) Equipment oil leakage detection method and system based on infrared thermal imaging
CN114332739A (en) Smoke detection method based on moving target detection and deep learning technology
EP3825804A1 (en) Map construction method, apparatus, storage medium and electronic device
CN113505643A (en) Violation target detection method and related device
CN110321808B (en) Method, apparatus and storage medium for detecting carry-over and stolen object
CN110059544B (en) Pedestrian detection method and system based on road scene
CN114821486B (en) Personnel identification method in power operation scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination