CN114359968A

CN114359968A - Swimming pool drowning prevention multi-camera target tracking method and device, computer equipment and storage medium

Info

Publication number: CN114359968A
Application number: CN202210022887.9A
Authority: CN
Inventors: 任小枫; 谢欣; 郭羽; 郭东岩; 王振华; 张剑华; 张都思
Original assignee: Hangzhou Juyan Xincheng Technology Co ltd
Current assignee: Hangzhou Juyan Xincheng Technology Co ltd
Priority date: 2022-01-10
Filing date: 2022-01-10
Publication date: 2022-04-15

Abstract

The embodiment of the invention discloses a method and a device for multi-camera target tracking for drowning prevention of a swimming pool, computer equipment and a storage medium. The method comprises the following steps: acquiring images shot by a plurality of cameras above the water surface of the swimming pool to obtain an original image; inputting the original image into a target detection model to perform swimming pool human target detection so as to obtain a swimming pool human target detection frame; performing target tracking on the swimming pool human target detection frame by adopting a DeepSORT algorithm to obtain a swimming track of the swimming pool human target; outputting the swimming track of the human body target of the swimming pool; the target detection model is obtained by training a convolutional neural network by using a plurality of images with position labels of human body targets as sample sets. By implementing the method provided by the embodiment of the invention, the position of the human body target of the swimming pool can be timely and accurately detected, the target tracking is carried out, the behavior of the human body target of the swimming pool is accurately judged according to the tracking result, and the accuracy of drowning prevention supervision is improved.

Description

Swimming pool drowning prevention multi-camera target tracking method and device, computer equipment and storage medium

Technical Field

The invention relates to a target tracking method, in particular to a multi-camera target tracking method, a device, computer equipment and a storage medium for preventing drowning of a swimming pool.

Background

With the development of sports industry, people have an increasing enthusiasm for participating in sports activities, and swimming, which is one of popular sports, is the highest sport of safety accidents. According to incomplete statistical data published by the ministry of health, about 5.7 million people die from drowning accidents every year in China, wherein the percentage of teenagers reaches 56.04%, which becomes the first cause of death of the teenagers, and the drowning death rate in China is the highest worldwide.

At present, most operating modes of swimming venues are operated based on a traditional manual mode, including a mode that a lifeguard is positioned at a high position to overlook and supervise the whole swimming pool when drowning prevention supervision is carried out, but because the main background of a target detection problem in the swimming pool is water, when a swimmer moves, the water in the swimming pool generates fluctuation sunlight and light irradiates on the water surface to generate reflection light, the reflection light is difficult to remove through a pretreatment technology, and the position of the reflection light is continuously changed along with the fluctuation of the water surface; the age span of the swimmers is wide, various action behaviors can occur on the sides of the swimming pool and people in the swimming pool, and the parts of the swimmers below the water surface are influenced by the refraction, the turbidity degree and the fluctuation of the water and are difficult to observe; various facilities and sundries such as stands, lifesaving equipment, training equipment and personal articles of swimmers can appear on the shore of the swimming pool, various influence factors such as swimming lane lines and training equipment can also appear in the swimming pool, the problem that the accuracy is not high in the mode of manual supervision is caused, hardware supervision such as sensor bracelets and radio frequency external equipment is adopted at present, and the accuracy of the mode is not high.

Therefore, it is necessary to design a new method to detect the position of the human target of the swimming pool in time and accurately, and perform target tracking, so as to accurately judge the behavior of the human target of the swimming pool from the tracking result, and improve the accuracy of drowning prevention supervision.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a swimming pool drowning prevention multi-camera target tracking method, a swimming pool drowning prevention multi-camera target tracking device, a swimming pool drowning prevention multi-camera target tracking computer device and a storage medium.

In order to achieve the purpose, the invention adopts the following technical scheme: a multi-camera target tracking method for preventing drowning of a swimming pool comprises the following steps:

acquiring images shot by a plurality of cameras above the water surface of the swimming pool to obtain an original image;

inputting the original image into a target detection model to perform swimming pool human target detection so as to obtain a swimming pool human target detection frame;

performing target tracking on the swimming pool human target detection frame by adopting a DeepSORT algorithm to obtain a swimming track of the swimming pool human target;

outputting the swimming track of the human body target of the swimming pool;

the target detection model is obtained by training a convolutional neural network by using a plurality of images with position labels of human body targets as sample sets.

The further technical scheme is as follows: the target detection model is obtained by training a convolutional neural network by using a plurality of images with position labels of human body targets as a sample set, and comprises the following steps:

constructing an image with a position label of a human body target as a sample set;

dividing the sample set to obtain a training set, a verification set and a test set;

performing enhancement processing on the training set, the verification set and the test set to obtain a processing result;

constructing a Yolov5 network, and adding a DLA-34 network, a Semantic Self-authorization mechanism and an Anchor-free network in the Yolov5 network to obtain an initial network;

training and verifying the initial network by using a training set and a verification set in the processing result, and calculating a loss value in the training process;

and when the loss value is kept unchanged, testing the initial network by using the test set in the processing result so as to take the trained initial network as a target detection model.

The further technical scheme is as follows: the method comprises the steps of constructing a Yolov5 network, and adding a DLA-34 network, a Semantic Self-authorization mechanism and an Anchor-free network in the Yolov5 network to obtain an initial network, wherein the steps comprise:

constructing a Yolov5 network;

adding a DLA-34 network as a backbone network in the Yolov5 network, and extracting features to obtain a first network;

adding a Semantic Self-authorization mechanism to the first network to obtain a second network;

an Anchor-free network is used in the target regression box network of the second network to obtain the initial network.

The further technical scheme is as follows: the pair of swimming pool human target detection frames adopts a DeepSORT algorithm to track targets so as to obtain swimming pool human target moving tracks, and the method comprises the following steps:

establishing a tracker according to the swimming pool human body target detection frame;

constructing a motion estimation model, and estimating the motion estimation model to obtain the position of the human body target of the swimming pool of the next frame;

and performing data association on the positions of the swimming pool human body target detection frame and the next frame of swimming pool human body target to obtain a swimming track of the swimming pool human body target.

The further technical scheme is as follows: the motion estimation model is a Kalman filtering model.

The further technical scheme is as follows: will human target detection frame of swimming pool and the position of the human target of next frame swimming pool carry out data association to obtain the human target orbit of swimming pool, include:

and fusing the motion information and the characteristic information of the positions of the swimming pool human body target detection frame and the next frame of swimming pool human body target to obtain the swimming track of the swimming pool human body target.

The further technical scheme is as follows: will human target detection frame of swimming pool and the motion information and the characteristic information of the position of the human target of next frame swimming pool fuse to obtain the human target orbit of swimming pool, include:

calculating similarity distance scores of the positions of the swimming pool human body target detection frame and the next frame of swimming pool human body target;

measuring the distance between the appearance characteristics of the positions of the swimming pool human body target detection frame and the next frame of swimming pool human body target through the cosine distance to obtain the appearance characteristic distance;

weighting the similarity distance scores and the appearance characteristic distances to obtain a similarity matrix;

setting a matched measurement criterion;

and limiting the numerical values which do not meet the requirements in the similarity matrix by using a gate control matrix, and performing cascade matching on the swimming pool human body target detection frame and the track of the swimming pool human body target in a cycle matching process of default iteration times to obtain the swimming pool human body target swimming track.

The invention also provides a multi-camera target tracking device for preventing drowning of a swimming pool, which comprises:

the swimming pool water surface monitoring system comprises an image acquisition unit, a control unit and a control unit, wherein the image acquisition unit is used for acquiring images shot by a plurality of cameras positioned above the water surface of a swimming pool so as to obtain an original image;

the target detection unit is used for inputting the original image into a target detection model to perform swimming pool human target detection so as to obtain a swimming pool human target detection frame;

the target tracking unit is used for tracking the targets of the swimming pool human body target detection frame by adopting a DeepsORT algorithm so as to obtain a swimming track of the swimming pool human body target;

and the output unit is used for outputting the swimming track of the human body target of the swimming pool.

The invention also provides computer equipment which comprises a memory and a processor, wherein the memory is stored with a computer program, and the processor realizes the method when executing the computer program.

The invention also provides a storage medium storing a computer program which, when executed by a processor, implements the method described above.

Compared with the prior art, the invention has the beneficial effects that: the invention obtains images shot by a plurality of cameras positioned above the water surface of the swimming pool, and detects the human body target of the swimming pool by means of a target detection model, wherein the target detection model is formed by adding a DLA-34 network, a Semantic Self-orientation mechanism and an Anchor-free network to a Yolov5 network, and is obtained by training, more effective information can be obtained, the accuracy is high, the human body target of the swimming pool can be quickly and accurately detected, the target tracking is carried out by adopting a DeepsORT algorithm to obtain the swimming track of the human body target of the swimming pool, the movement of a human body target detection frame of the swimming pool is automatically tracked, the position of the human body target of the swimming pool can be timely and accurately detected, the target tracking is carried out, the behavior of the human body target of the swimming pool can be accurately judged from the tracking result, and the accuracy of drowning prevention supervision is improved.

The invention is further described below with reference to the accompanying drawings and specific embodiments.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of a multi-camera target tracking method for preventing drowning of a swimming pool according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a multi-camera target tracking method for preventing drowning in a swimming pool according to an embodiment of the present invention;

FIG. 3 is a schematic view of a sub-flow chart of a multi-camera target tracking method for preventing drowning in a swimming pool according to an embodiment of the present invention;

FIG. 4 is a sub-flowchart of a multi-camera target tracking method for preventing drowning in a swimming pool according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a target detection model generating three scale feature maps according to an embodiment of the present invention;

FIG. 6 is a sub-flowchart of a multi-camera target tracking method for preventing drowning in a swimming pool according to an embodiment of the present invention;

FIG. 7 is a sub-flowchart of a multi-camera target tracking method for preventing drowning in a swimming pool according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating cascade matching according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of target tracking provided by an embodiment of the present invention;

FIG. 10 is a schematic block diagram of a multi-camera target tracking device for preventing drowning in a swimming pool according to an embodiment of the present invention;

FIG. 11 is a schematic block diagram of a tracking unit of a multi-camera target tracking apparatus for preventing drowning in a swimming pool according to an embodiment of the present invention;

FIG. 12 is a schematic block diagram of a data correlation subunit of a swimming pool drowning prevention multi-camera target tracking device provided by an embodiment of the invention;

FIG. 13 is a schematic block diagram of a computer device provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of a multi-camera target tracking method for preventing drowning of a swimming pool according to an embodiment of the present invention. Fig. 2 is a schematic flow chart of a multi-camera target tracking method for preventing drowning of a swimming pool according to an embodiment of the present invention. The multi-camera target tracking method for preventing drowning of the swimming pool is applied to a server. The server performs data interaction with a plurality of cameras and a terminal, performs target detection through images acquired by the plurality of cameras, performs target tracking by adopting a DeepSORT algorithm, determines the swimming track of the human body target of the swimming pool, and outputs the swimming track to the terminal.

Fig. 2 is a schematic flow chart of a multi-camera target tracking method for preventing drowning in a swimming pool according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S110 to S140.

And S110, acquiring images shot by a plurality of cameras above the water surface of the swimming pool to obtain an original image.

In this embodiment, a plurality of cameras are deployed above the surface of the pool, and are used to capture an omnidirectional image of the pool, thereby forming an original image.

And S120, inputting the original image into a target detection model to perform swimming pool human target detection so as to obtain a swimming pool human target detection frame.

In this embodiment, the swimming pool human body target detection frame is a boundary frame where the swimming pool human body target is located, and the boundary frame has coordinate information.

In an embodiment, referring to fig. 3, the above object detection model obtained by training the convolutional neural network using a plurality of images with position labels of the human object as a sample set may include steps S121 to S126.

And S121, constructing an image with a position label of the human body target as a sample set.

In this embodiment, the sample set refers to a plurality of images of the swimming pool, and the images formed by manually labeling the positions of the human targets.

And S122, dividing the sample set to obtain a training set, a verification set and a test set.

In the embodiment, the sample set is divided, and the division can be used for training, verifying and testing the model.

And S123, performing enhancement processing on the training set, the verification set and the test set to obtain a processing result.

In this embodiment, the processing result refers to a result obtained by processing the training set, the verification set, and the test set by an enhancement operation means such as rotation and inversion.

The training set, the verification set and the test set are subjected to enhancement processing, so that the stability of the model can be improved.

S124, constructing a Yolov5 network, and adding a DLA-34 network, a Semantic Self-extension mechanism and an Anchor-free network in the Yolov5 network to obtain an initial network.

In this embodiment, the initial network refers to a convolutional neural network formed by adding a DLA-34 network, a Semantic Self-anchoring mechanism and an Anchor-free network to a Yolov5 network.

In an embodiment, referring to fig. 4, the step S124 may include steps S1241 to S1244.

S1241, constructing a Yolov5 network;

s1242, adding a DLA-34 network as a backbone network in the Yolov5 network, and extracting features to obtain a first network.

In this embodiment, the first network refers to a network structure formed after the Yolov5 network adds the DLA-34 network as a backbone network.

DLA-34 in CenterNet is added as a Backbone for extracting characteristics on the basis of a used Yolov5 target detection model. DLA (Deep Layer Aggregation) is an image classification network with multi-level jump connection, and information of different layers can be better aggregated through deeper fusion. And the DLA-34 uses Deformable Convolution, namely DCN (Deformable Convolutional), so that the DLA-34 serving as a network structure of multi-target tracking can fuse the characteristic information in an iterative mode, and more effective information is obtained.

S1243, adding a Semantic Self-authorization mechanism in the first network to obtain a second network.

In this embodiment, the second network refers to a Yolov5 network, which adds a DLA-34 network as a backbone network as a foundation and adds a network formed by a Semantic Self-authorization mechanism.

In the target detection model, SSA (Self-Attention mechanism) is also added. Firstly, the Self-Attention mechanism of Self-Attention is to process global information, and the Self-Attention mechanism of Self-Attention takes a target detection frame as a segmented manually labeled frame ground route, and learns the segmented characteristics by using the target detection frame ground route, and fuses the characteristics and the detection characteristics, namely an Attention focused information range Attention map as the detection characteristics is detected. The process is as follows: the detection frame of the truth value is used as a divided mask, the mask is learned on the original characteristic diagram through convolution, and the characteristic diagram obtained through learning is used as an information range focused by attention and is fused on the original characteristic diagram.

S1244, using an Anchor-free network in the target regression box network of the second network to obtain an initial network.

In this embodiment, in the final target regression frame network, the Anchor-free algorithm is used, and the boundary frame where the final target is located is obtained according to the feature maps with the category information and the location information and then according to the feature maps.

Specifically, the target box, i.e. the bounding box prediction program where the target is located, implements: as shown in fig. 5, three scale feature maps are formed, in which the number below the convolutional layer is the number of channels, the number above is the two-dimensional image size value, and the input image is 736 × 1280, 3 channels. The different scale characteristic graphs deepen along with the convolution layer, the receptive fields on the input images are different, namely the sizes of the input image grids corresponding to the characteristic graphs are different.

When the target detection model is used for reasoning, multiplying the class information of each grid prediction and the confidence coefficient of the target frame prediction to obtain a class-specific confidence score of each target frame:

the first term on the left of the equation is the class probability of each mesh prediction, and the second third term is the confidence of each target box prediction. The product is the probability that the predicted target frame belongs to a certain category and also the probability of the accuracy of the target frame. After the class-confidence score of each target frame is obtained, setting a threshold value, filtering out target frames with low scores, and carrying out NMS (non-maximum suppression) treatment on the reserved target frames to obtain a final detection result; i.e. the location of the human target in the pool.

And S125, training and verifying the initial network by utilizing the training set and the verification set in the processing result, and calculating a loss value in the training process.

In this embodiment, the loss value refers to the variance between the result obtained by the training process and the actually labeled label.

And S126, when the loss value is kept unchanged, testing the initial network by using the test set in the processing result so as to take the trained initial network as a target detection model.

When the loss value is maintained unchanged, that is, the current network is already converged, that is, the loss value is basically unchanged and very small, it is also indicated that the current network can be used as a candidate target detection model, generally, the loss value is relatively large when training is started, the loss value is smaller after training, and if the loss value is not maintained unchanged, it is indicated that the current network cannot be used as the candidate target detection model, that is, the detected result is not accurate, which may cause the classification of the interference signal in the later period to be inaccurate; if the loss value is not maintained, adjusting parameters of each layer of the network, and executing the sequence as an input value to be input into the network for training the network.

S130, performing target tracking on the swimming pool human body target detection frame by adopting a DeepsORT algorithm to obtain a swimming pool human body target swimming track.

In this embodiment, the swimming pool human body target swimming track refers to the moving track of the swimming pool human body target.

Referring to fig. 8, the target apparent characteristics, the motion trajectory and the spatial position relationship are fused to realize the target matching tracking across the cameras. And detecting and tracking people in the picture, and then combining human posture estimation and behavior recognition. Algorithmic recognition software can determine what condition a swimmer is drowned, which triggers an early warning countdown condition. After a short countdown, the drowning alarm will be sent directly to the pool rescuer, who may be alerted earlier if a potential drowning event occurs than in a pool without this technology.

In an embodiment, referring to fig. 6, the step S130 may include steps S131 to S133.

S131, establishing a tracker according to the swimming pool human body target detection frame.

In the present embodiment, the tracker is deppsort this target tracking algorithm. DeepsORT is a multi-target Tracking algorithm based on a Tracking-by-Detection strategy.

S132, constructing a motion estimation model, and estimating the motion estimation model to obtain the position of the human target of the swimming pool in the next frame.

In this embodiment, the position of the next frame of the swimming pool human target refers to the next frame of the swimming pool human target detection frame relative to the swimming pool human target detection frame of the current frame.

Specifically, the motion estimation model is a kalman filter model. And adopting Kalman filtering as a motion estimation model, and performing data association according to the position of the target in the next frame and the target position detected by a target detection network, namely the swimming pool human body target detection frame.

S133, performing data association on the positions of the swimming pool human body target detection frame and the next frame of swimming pool human body target to obtain a swimming track of the swimming pool human body target.

In this embodiment, the positions of the swimming pool human body target detection frame and the next frame of swimming pool human body target are subjected to data association, and the motion information of the target and the feature information of the target are generally fused.

Specifically, the motion information and the characteristic information of the positions of the swimming pool human body target detection frame and the next frame of swimming pool human body target are fused to obtain the swimming track of the swimming pool human body target.

In an embodiment, referring to fig. 7, the step S133 may include steps S1331 to S1335.

And S1331, calculating similarity distance scores of the positions of the swimming pool human body target detection frame and the next frame of swimming pool human body target.

In this embodiment, the similarity distance score is a mahalanobis distance between the positions of the swimming pool human target detection frame and the next frame of swimming pool human target, and the similarity distance score can be used to calculate the similarity matrix after calculating the score.

In particular, describing the degree of motion correlation uses mahalanobis distance as a function of distance. Wherein d is_jThe jth detection result, namely the mentioned swimming pool human body target detection frame, y_iThe ith tracking result, namely the position of the human target of the swimming pool of the next frame obtained by the mentioned target tracking,

denotes d_jAnd y_iThe significance of the calculated covariance matrix is to calculate the detected target d_jAnd tracking target y_iThe correlation of (c). Final d⁽¹⁾The mahalanobis distance is calculated by matching the detection result and the tracking result through the Hungarian algorithm, and the calculation formula is as follows:

and S1332, measuring the distance between the appearance characteristics of the positions of the swimming pool human body target detection frame and the next frame of swimming pool human body target through cosine distance to obtain the appearance characteristic distance.

In this embodiment, the appearance feature distance is the distance between the swimming pool human object detection frame and the position of the next frame of swimming pool human object.

Using cosine distance d as cosine distance⁽²⁾To measure the distance between the appearance features,

shown is for each detection box d_jThe appearance characteristics that have been calculated are,

representing the appearance of each of the tracks k calculated, where R_iFor all having a mutual correlation

A set of sets of the data sets comprising,

the cosine similarity of the two appearance features is calculated. Metric tracking by cosine distanceThe appearance characteristics of the target and the appearance characteristics of the detected target can predict the ID more accurately, and the calculation formula is as follows:

and S1333, weighting the similarity distance scores and the appearance characteristic distances to obtain a similarity matrix.

In this embodiment, the similarity matrix refers to the similarity degree between the positions of the detection frame of the human target of the swimming pool and the next frame of the human target of the swimming pool and the distance of the appearance characteristic.

Specifically, the similarity Matrix (Cost Matrix): c. C_i，_jIt can be obtained by weighting the similarity distance score and the appearance feature distance: c. C_i,j＝λd⁽¹⁾(i,j)+(1-λ)d⁽²⁾(i, j). λ is a weighting coefficient of the similarity distance score.

And S1334, setting the matched measurement criterion.

In this embodiment, the matching metric criteria refers to the rules for deciding what trajectory belongs to the current matched trajectory of the pool human target.

Specifically, certain threshold values are set for similarity distance distribution to serve as a matching measurement criterion, finally, the association of target motion information and the association of target image characteristic information are considered at the same time, the matching problem can be equivalent to the optimal matching problem of bipartite graphs, and the optimal matching problem of the bipartite graphs is solved by a Hungarian matching method.

And S1335, limiting the numerical values which do not meet the requirements in the similarity matrix by using a gate control matrix, and performing cascade matching on the swimming pool human body target detection frame and the swimming pool human body target track in a cycle matching process of default iteration times to obtain the swimming pool human body target swimming track.

Specifically, referring to fig. 9, two parameters are first used: the gating threshold (gating _ threshold) and the maximum characteristic cosine distance (max _ distance) are converted into a gating Matrix (Gate Matrix) for limiting excessive values in the similarity Matrix. Subsequently, in a loop matching process with a default iteration number of 70(max _ age ═ 70), the trajectory of the target is matched with the detection result of the target. Tracks which are not lost are matched preferentially, and tracks which are lost more frequently are matched later.

In this embodiment, the trajectory of the human target in the swimming pool is a trajectory composed of the positions of all human targets in the swimming pool estimated by the motion estimation model. The detection result of the target refers to a swimming pool human body target detection frame.

S140, outputting the swimming track of the human body target of the swimming pool;

in a target tracking management module on a terminal, an information list page of target tracking can be viewed, and the list content comprises: device name of drowning prevention target, number of device, brand of device, longitude and latitude of device, state of device (off-line or on-line), IP of device, area of device. The drowning-preventing target tracking information can be screened and quickly searched by inputting the name of the equipment, the serial number of the equipment or selecting the state of the equipment. In the rightmost operation column of the list, the drowning-proof target tracking information can be edited, checked for details, deleted and the like. In the upper left corner of the list, the "device add" button can be clicked to perform the operation of device addition. The information of the page viewing device can be viewed in target tracking details of drowning prevention management, and the method comprises the following steps: device name, device number, device zone, device add time, etc. Meanwhile, real-time monitoring can be checked, and operations such as playback, snapshot, alarm and calling can be performed on the real-time monitoring. Can add the equipment page in the target tracking of drowning prevention management, carry out the operation of adding of equipment, the content of adding of equipment includes: the equipment number, the equipment name, the equipment IP, the equipment area, the equipment brand, the longitude, the latitude and the like, the adding content is filled, the adding can be completed by clicking 'save', and the adding is returned to the upper-level page by clicking 'cancel'.

The swimming pool drowning prevention multi-camera target tracking method obtains images shot by a plurality of cameras positioned above the water surface of the swimming pool and detects human targets of the swimming pool by means of the target detection model, wherein, the target detection model is formed by adding a DLA-34 network, a Semantic Self-orientation mechanism and an Anchor-free network in a Yolov5 network and is obtained by training, more effective information can be obtained, the accuracy is high, the swimming pool human target can be quickly and accurately detected, the target tracking is carried out by adopting a DeepSORT algorithm to obtain the swimming track of the swimming pool human target, the motion of the swimming pool human target detection frame is automatically tracked, the position of the swimming pool human target is timely and accurately detected, and the target tracking is carried out, the behavior of the human target of the swimming pool is accurately judged according to the tracking result, and the accuracy of drowning prevention supervision is improved.

Fig. 10 is a schematic block diagram of a multi-camera target tracking apparatus 300 for preventing drowning in a swimming pool according to an embodiment of the present invention. As shown in fig. 10, the present invention also provides a multi-camera target tracking apparatus 300 for preventing drowning of a swimming pool, corresponding to the above multi-camera target tracking method for preventing drowning of a swimming pool. The swimming pool drowning prevention multi-camera target tracking device 300 includes a unit for performing the above-described swimming pool drowning prevention multi-camera target tracking method, and the device may be configured in a server. Specifically, referring to fig. 10, the multi-camera target tracking apparatus 300 for preventing drowning of a swimming pool includes an image acquisition unit 301, a target detection unit 302, a target tracking unit 303 and an output unit 304.

An image acquisition unit 301, configured to acquire images captured by a plurality of cameras above the water surface of the swimming pool to obtain an original image; a target detection unit 302, configured to input the original image into a target detection model for performing swimming pool human target detection, so as to obtain a swimming pool human target detection frame; the target tracking unit 303 is configured to perform target tracking on the swimming pool human target detection frame by using a deepSORT algorithm to obtain a swimming pool human target trajectory; and the output unit 304 is used for outputting the swimming track of the human body target of the swimming pool.

In an embodiment, the swimming pool drowning prevention multi-camera target tracking device 300 further comprises a model generation unit;

the model generation unit is used for training the convolutional neural network by using a plurality of images with position labels of human body targets as a sample set so as to obtain a target detection model.

In an embodiment, the model generation unit includes a sample set construction subunit, a division subunit, an enhancement processing subunit, an initial network generation subunit, a training subunit, and a testing subunit.

The sample set constructing subunit is used for constructing an image with a position label of a human body target as a sample set; the dividing subunit is used for dividing the sample set to obtain a training set, a verification set and a test set; the enhancement processing subunit is used for carrying out enhancement processing on the training set, the verification set and the test set to obtain a processing result; the initial network generation subunit is used for constructing a Yolov5 network, and adding a DLA-34 network, a Semantic Self-authorization mechanism and an Anchor-free network in the Yolov5 network to obtain an initial network; the training subunit is used for training and verifying the initial network by using a training set and a verification set in the processing result, and calculating a loss value in the training process; and the testing subunit is used for testing the initial network by using the test set in the processing result when the loss value is kept unchanged, so as to take the trained initial network as a target detection model.

In an embodiment, the initial network generation subunit includes a basic network construction module, a first network generation module, a second network generation module, and a network processing module.

The basic network construction module is used for constructing a Yolov5 network; the first network generation module is used for adding a DLA-34 network in the Yolov5 network as a backbone network, and extracting features to obtain a first network; the second network generation module is used for adding a Semantic Self-authorization mechanism in the first network to obtain a second network; and the network processing module is used for constructing an Anchor-free network in the target regression frame network of the second network so as to obtain an initial network.

In one embodiment, as shown in fig. 11, the target tracking unit 303 includes a tracker establishment sub-unit 3031, a model construction sub-unit 3032, and a data association sub-unit 3033.

A tracker establishing subunit 3031, configured to establish a tracker according to the swimming pool human target detection frame; a model construction subunit 3032, configured to construct a motion estimation model, and estimate the motion estimation model to obtain a position of the human target of the next frame of the swimming pool; and a data correlation subunit 3033, configured to perform data correlation on the positions of the swimming pool human body target detection frame and the next frame of swimming pool human body target, so as to obtain a swimming track of the swimming pool human body target.

In one embodiment, as shown in fig. 12, the data association subunit 3033 includes a distance score calculation module 30331, a distance calculation module 30332, a matrix generation module 30333, a criterion setting module 30334, and a matching module 30335.

A distance score calculating module 30331, configured to calculate similarity distance scores between the positions of the swimming pool human body target detection frame and the next frame of swimming pool human body target; a distance calculating module 30332, configured to measure the distance between the appearance features of the positions of the swimming pool human target detection frame and the next frame of swimming pool human target by cosine distance to obtain an appearance feature distance; a matrix generating module 30333, configured to perform weighting processing on the similarity distance scores and the appearance feature distances to obtain a similarity matrix; a criterion setting module 30334, configured to set a matching metric criterion; a matching module 30335, configured to limit an unsatisfactory value in the similarity matrix by using a gate control matrix, and perform cascade matching on the swimming pool human target detection frame and the trajectory of the swimming pool human target in a cyclic matching process of a default iteration number to obtain a swimming pool human target trajectory.

It should be noted that, as will be clear to those skilled in the art, the specific implementation process of the above-mentioned swimming pool drowning prevention multi-camera target tracking apparatus 300 and each unit can refer to the corresponding description in the foregoing method embodiments, and for the convenience and brevity of description, no further description is provided herein.

The above-described swimming pool drowning prevention multi-camera target tracking apparatus 300 can be implemented in the form of a computer program that can be run on a computer device as shown in fig. 13.

Referring to fig. 13, fig. 13 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a server, wherein the server may be an independent server or a server cluster composed of a plurality of servers.

Referring to fig. 13, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032 comprises program instructions that, when executed, cause the processor 502 to perform a swimming pool drowning prevention multi-camera object tracking method.

The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the execution of the computer program 5032 in the non-volatile storage medium 503, which computer program 5032, when executed by the processor 502, causes the processor 502 to perform a pool drowning prevention multi-camera target tracking method.

The network interface 505 is used for network communication with other devices. Those skilled in the art will appreciate that the architecture shown in fig. 13 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing device 500 to which the disclosed aspects apply, as a particular computing device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

Wherein the processor 502 is configured to run the computer program 5032 stored in the memory to implement the following steps:

acquiring images shot by a plurality of cameras above the water surface of the swimming pool to obtain an original image; inputting the original image into a target detection model to perform swimming pool human target detection so as to obtain a swimming pool human target detection frame; performing target tracking on the swimming pool human target detection frame by adopting a DeepSORT algorithm to obtain a swimming track of the swimming pool human target; outputting the swimming track of the human body target of the swimming pool;

In an embodiment, when implementing the target detection model is a step of training a convolutional neural network by using a plurality of images with position labels of human targets as sample sets, the processor 502 specifically implements the following steps:

constructing an image with a position label of a human body target as a sample set; dividing the sample set to obtain a training set, a verification set and a test set; performing enhancement processing on the training set, the verification set and the test set to obtain a processing result; constructing a Yolov5 network, and adding a DLA-34 network, a Semantic Self-authorization mechanism and an Anchor-free network in the Yolov5 network to obtain an initial network; training and verifying the initial network by using a training set and a verification set in the processing result, and calculating a loss value in the training process; and when the loss value is kept unchanged, testing the initial network by using the test set in the processing result so as to take the trained initial network as a target detection model.

In an embodiment, when the processor 502 implements the above-mentioned building of the Yolov5 network and adds the DLA-34 network, the Semantic Self-authorization mechanism, and the Anchor-free network to the Yolov5 network to obtain the initial network step, the following steps are specifically implemented:

constructing a Yolov5 network; adding a DLA-34 network as a backbone network in the Yolov5 network, and extracting features to obtain a first network; adding a Semantic Self-authorization mechanism to the first network to obtain a second network; an Anchor-free network is used in the target regression box network of the second network to obtain the initial network.

In an embodiment, when the processor 502 implements the step of performing target tracking on the swimming pool human target detection frame by using the DeepSORT algorithm to obtain the swimming pool human target trajectory, the following steps are specifically implemented:

establishing a tracker according to the swimming pool human body target detection frame; constructing a motion estimation model, and estimating the motion estimation model to obtain the position of the human body target of the swimming pool of the next frame; and performing data association on the positions of the swimming pool human body target detection frame and the next frame of swimming pool human body target to obtain a swimming track of the swimming pool human body target.

Wherein the motion estimation model is a Kalman filtering model.

In an embodiment, the processor 502 performs the step of performing data correlation between the position of the human target detection frame and the position of the human target of the next frame to obtain the swimming trajectory of the human target of the swimming pool, specifically including the following steps:

In an embodiment, the processor 502 specifically implements the following steps when implementing the step of fusing the motion information and the feature information of the positions of the swimming pool human target detection frame and the next frame of the swimming pool human target to obtain the swimming track of the swimming pool human target:

calculating similarity distance scores of the positions of the swimming pool human body target detection frame and the next frame of swimming pool human body target; measuring the distance between the appearance characteristics of the positions of the swimming pool human body target detection frame and the next frame of swimming pool human body target through the cosine distance to obtain the appearance characteristic distance; weighting the similarity distance scores and the appearance characteristic distances to obtain a similarity matrix; setting a matched measurement criterion; and limiting the numerical values which do not meet the requirements in the similarity matrix by using a gate control matrix, and performing cascade matching on the swimming pool human body target detection frame and the track of the swimming pool human body target in a cycle matching process of default iteration times to obtain the swimming pool human body target swimming track.

It should be understood that in the embodiment of the present Application, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program includes program instructions, and the computer program may be stored in a storage medium, which is a computer-readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.

Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program, wherein the computer program, when executed by a processor, causes the processor to perform the steps of:

In an embodiment, when the processor executes the computer program to implement the step of training the convolutional neural network by using a plurality of images with position labels of the human body targets as a sample set, the processor implements the following steps:

In an embodiment, when the processor executes the computer program to implement the building of the Yolov5 network and adds the DLA-34 network, the Semantic Self-authorization mechanism, and the Anchor-free network to the Yolov5 network to obtain the initial network step, the following steps are specifically implemented:

In an embodiment, when the processor executes the computer program to implement the step of performing target tracking on the swimming pool human target detection frame by using a DeepSORT algorithm to obtain a swimming pool human target trajectory, the following steps are specifically implemented:

Wherein the motion estimation model is a Kalman filtering model.

In an embodiment, when the processor executes the computer program to perform the step of performing data correlation on the positions of the swimming pool human target detection frame and the next frame of swimming pool human target to obtain the swimming pool human target trajectory, the following steps are specifically performed:

In an embodiment, when the processor executes the computer program to perform the step of fusing the motion information and the feature information of the positions of the human target detection frame and the next frame of the human target to obtain the swimming trajectory of the human target, the following steps are specifically implemented:

The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.

The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A multi-camera target tracking method for preventing drowning of a swimming pool is characterized by comprising the following steps:

outputting the swimming track of the human body target of the swimming pool;

2. The swimming pool drowning prevention multi-camera target tracking method according to claim 1, wherein the target detection model is obtained by training a convolutional neural network by using a plurality of images with position labels of the human targets as a sample set, and comprises:

3. The swimming pool drowning prevention multi-camera target tracking method as claimed in claim 2, wherein the building of the Yolov5 network and the adding of DLA-34 network, Semantic Self-anchoring mechanism and Anchor-free network to the Yolov5 network to obtain the initial network comprises:

constructing a Yolov5 network;

4. The swimming pool drowning prevention multi-camera target tracking method as recited in claim 1, wherein the performing target tracking on the swimming pool human target detection frame by adopting a DeepsORT algorithm to obtain a swimming pool human target swimming track comprises:

5. The swimming pool drowning prevention multi-camera target tracking method according to claim 4, wherein the motion estimation model is a Kalman filtering model.

6. The swimming pool drowning prevention multi-camera target tracking method as claimed in claim 5, wherein the data correlating the positions of the swimming pool human target detection frame and the next frame of swimming pool human target to obtain the swimming track of the swimming pool human target comprises:

7. The swimming pool drowning prevention multi-camera target tracking method as claimed in claim 6, wherein the fusing the motion information and the feature information of the positions of the swimming pool human target detection frame and the next frame swimming pool human target to obtain the swimming track of the swimming pool human target comprises:

setting a matched measurement criterion;

8. Swimming pool prevents drowned multicamera target tracking device, its characterized in that includes:

9. A computer device, characterized in that the computer device comprises a memory, on which a computer program is stored, and a processor, which when executing the computer program implements the method according to any of claims 1 to 7.

10. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 7.