Disclosure of Invention
The invention provides a high-altitude parabolic detection method and device based on background modeling, YOLOv3 and self-optimization, and aims to solve the problem that the accuracy of a model of the existing high-altitude falling object detection method is limited.
In a first aspect, the present invention provides a high altitude parabolic detection method based on background modeling, YOLOv3 and self-optimization, wherein the method comprises:
acquiring building video data shot by a camera;
intercepting pictures of the building video data according to frames to obtain an initial data set;
preprocessing the initial data set to obtain a building picture data set;
dividing the building picture data set into a training set, a verification set and a test set according to a preset proportion;
marking the building picture data in the training set, the verification set and the test set according to object categories formulated in a preset high-altitude parabolic detection strategy to obtain marking data sets corresponding to the building picture data one by one, wherein the marking data sets comprise object marking frame coordinates and object category information;
performing data enhancement processing on the training set and the corresponding labeled data set;
carrying out model training on the training set subjected to data enhancement processing and the corresponding labeled data set by using a YOLOv3 network to obtain a plurality of high-altitude parabolic detection models with different weights;
inputting the marked data sets of the verification set and the verification set into a YOLOv3 network, verifying the model while training the model, and obtaining the current accuracy of the model so as to adjust model parameters in time and obtain an optimal weight model;
after the model training is finished, testing the optimal weight model by using the test set and the test set label data set;
acquiring real-time building video data shot by a camera;
searching for moving objects in the real-time building video data by using a background modeling algorithm;
inputting the image of the moving object into an optimal weight model to perform high-altitude parabolic detection to obtain a high-altitude parabolic detection result;
returning the high-altitude parabolic detection result to related personnel at the service end of the social treatment platform in a screenshot form for manual examination;
judging whether the moving object belongs to a falling object or not according to a manual checking result;
if the moving object belongs to a falling object, sending out falling object warning information to inform related personnel to process;
if the moving object does not belong to the high-altitude falling object, receiving modification information of the object type information of the moving object by related personnel, and taking a modified picture sample as a difficult sample;
judging whether the number of the difficult samples exceeds a preset threshold value or not;
and if the number of the difficult samples exceeds a preset threshold value, automatically starting a self-optimization process, combining the difficult samples and a training set into a new training set, and performing model training according to the new training set by using a YOLOv3 network to obtain a high altitude parabolic detection model after training iteration.
With reference to the first aspect, in a first implementation manner of the first aspect, in the step of preprocessing the starting data set to obtain a building picture data set, distorted, deformed and blurred picture data are screened and corrected to obtain the building picture data set.
With reference to the first aspect, in a second implementation manner of the first aspect, the marking building picture data in the training set, the verification set, and the test set according to an object class formulated in a preset high altitude parabolic detection strategy to obtain a marked data set corresponding to the building picture data one to one includes:
and selecting objects in each picture of the training set, the verification set and the test set in the object category by using a rectangular frame, storing the positions of the rectangular frame in the pictures, wherein the positions comprise coordinate information of the upper left corner and the lower right corner of the rectangular frame, marking the category of the objects, generating an XML file from the marked information, and forming a marked data set which corresponds to the marked pictures one by one.
With reference to the second implementable manner of the first aspect, in a third implementable manner of the first aspect, the object category includes moving objects that may often cause misjudgment, such as cars, birds, people, balloons, plastic bags, and the like.
With reference to the first aspect, in a fourth implementable manner of the first aspect, the performing data enhancement processing on the training set and the corresponding labeled data set includes:
performing data enhancement processing by adopting a flip transformation, a random pruning, a color dithering, a translation transformation, a scale transformation, a contrast transformation, a noise disturbance, a rotation transformation or a reflection transformation and a mixup method, wherein the mixup method comprises the following steps:
wherein (x)
i ,y
i ),(x
j ,y
j ) Is two samples randomly drawn from the training data, x represents the picture matrix, y represents the label information,
participating in model training data after enhancement, and enabling lambda to be E [0,1 ∈]。
With reference to the first aspect, in a fifth implementation manner of the first aspect, in the step of performing model training on the training set after data enhancement processing and the corresponding labeled data set by using a YOLOv3 network to obtain a multi-weight high altitude parabolic detection model, the labeled picture sample and the background picture sample are sent to the YOLOv3 network together, the labeled picture is a positive sample, the background picture does not contain an object in an object class, the unlabeled background picture is a negative sample, the positive and negative samples are trained in the YOLOv3 network together, and the multi-weight high altitude parabolic detection model is obtained through iterative training.
With reference to the first aspect, in a sixth implementation manner of the first aspect, in the step of acquiring real-time building video data shot by the camera, a data transmission format between the monitoring video and the server is a video stream, and a transmission protocol is an ONVIF protocol.
With reference to the first aspect, in a seventh implementable manner of the first aspect, the background modeling algorithm is an adaptive mixed gaussian background modeling algorithm.
In a second aspect, the present invention provides a high altitude parabolic detection apparatus based on background modeling, YOLOv3 and self-optimization, the apparatus comprising:
the first acquisition unit is used for acquiring building video data shot by the camera;
the intercepting unit is used for intercepting pictures of the building video data according to frames to obtain an initial data set;
the preprocessing unit is used for preprocessing the initial data set to obtain a building picture data set;
the grouping unit is used for dividing the building picture data set into a training set, a verification set and a test set according to a preset proportion;
the marking unit is used for marking the building picture data in the training set, the verification set and the test set according to object types formulated in a preset high-altitude parabolic detection strategy to obtain marking data sets corresponding to the building picture data one by one, and each marking data set comprises object marking frame coordinates and object type information;
the data enhancement unit is used for carrying out data enhancement processing on the training set and the corresponding marked data set;
the model training unit is used for performing model training on the training set subjected to data enhancement processing and the corresponding marked data set by using a YOLOv3 network to obtain a plurality of high-altitude parabolic detection models with different weights;
the model verification unit is used for inputting the verification set and the mark data set of the verification set into a YOLOv3 network, verifying the model while training the model, and obtaining the current accuracy of the model so as to adjust the parameters of the model in time and obtain an optimal weight model;
the model testing unit is used for testing the optimal weight model by utilizing the test set and the test set marking data set after the model training is finished;
the second acquisition unit is used for acquiring real-time building video data shot by the camera;
the searching unit is used for searching a moving object in the real-time building video data by utilizing a background modeling algorithm;
the detection unit is used for inputting the image of the moving object into an optimal weight model to perform high-altitude parabolic detection to obtain a high-altitude parabolic detection result;
the return unit is used for returning the high-altitude parabolic detection result to related personnel at the service end of the social treatment platform in a screenshot form for manual examination;
the first judgment unit is used for judging whether the moving object belongs to an overhead falling object or not according to a manual auditing result;
the notification unit is used for sending out high-altitude falling object warning information to notify related personnel to process under the condition that the moving object belongs to a high-altitude falling object;
the modifying unit is used for receiving modification information of object type information of the moving object by related personnel under the condition that the moving object does not belong to a high falling object, and taking a modified picture sample as a difficult sample;
the second judging unit is used for judging whether the number of the difficult samples exceeds a preset threshold value or not;
and the merging unit is used for automatically starting a self-optimization process under the condition that the number of the difficult samples exceeds a preset threshold value, merging the difficult samples and the training set into a new training set, and performing model training according to the new training set by using a YOLOv3 network to obtain a high-altitude parabolic detection model after training iteration.
The invention has the following beneficial effects: according to the high-altitude parabolic detection method and device based on background modeling, YOLOv3 and self-optimization, a building camera is used for shooting and transmitting a video in real time to serve as a data source, manual intervention is reduced, a scene where a falling object occurs can be locked quickly in time, and the purposes of early finding and early treating of the falling object behavior are achieved. The invention is applied to the detection of foreign matters in the social treatment platform, and manual treatment can be carried out on the platform after the high-altitude falling object behavior is detected. The method can collect the detected object pictures and the results of manual treatment, convert the results of manual treatment into the labels corresponding to the training models, automatically start the model training process when the number of the collected object pictures reaches the specified threshold value, and obtain more manual feedback and more on-line training iterations as time goes on, so that the effect of the models is gradually optimized, and the accuracy is gradually improved. The invention mainly aims at detecting falling objects, is based on background modeling and real-time detection of YOLOv3, has high model stability and good detection and identification performance, can eliminate the influence of external factors, and achieves the aim of 'precise control'.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the specific embodiments of the present invention and the accompanying drawings. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. The technical solutions provided by the embodiments of the present invention are described in detail below with reference to the accompanying drawings.
Referring to fig. 1, the present invention provides a high-altitude parabolic detection method based on background modeling, YOLOv3 and self-optimization, the method includes:
and step S101, acquiring building video data shot by the camera.
Data acquisition is an important foundation in the invention, in the YOLOv3 algorithm, data is required to be input into a network to train a model, and videos or photos of various common objects in a living scene including building video data shot by a camera can be used as a data set.
And S102, intercepting pictures of the building video data according to frames to obtain an initial data set.
And step S103, preprocessing the initial data set to obtain a building picture data set.
Specifically, distorted, deformed and blurred picture data are screened and corrected to obtain the building picture data set.
And step S104, dividing the building picture data set into a training set, a verification set and a test set according to a preset proportion.
And S105, marking the building picture data in the training set, the verification set and the test set according to object types formulated in a preset high-altitude parabolic detection strategy to obtain marking data sets corresponding to the building picture data one by one, wherein the marking data sets comprise object marking frame coordinates and object type information.
Specifically, for each object in the object category in each picture of the training set, the verification set and the test set, selecting the object by using a rectangular frame, storing the position of the rectangular frame in the picture, wherein the position comprises coordinate information of the upper left corner and the lower right corner of the rectangular frame, marking the category of the object, generating an XML file from the marked information, and forming a marked data set corresponding to the marked picture one to one. The object categories may include moving objects such as cars, birds, people, balloons, plastic bags, etc. which are often misjudged.
And S106, performing data enhancement processing on the training set and the corresponding labeled data set.
Because the data volume of the method is far smaller than the data volume required by general deep learning, the method can be classified into small sample learning (few _ shot _ learning), and the method for solving the small sample learning mainly comprises data enhancement and meta-learning. The invention mainly uses data enhancement to solve the problem of small sample learning. The generalization and the precision of the model can be improved by a large amount of data, so that the expansion of the data volume by technical means is an indispensable step on the premise of limited data.
Specifically, the data enhancement processing may be performed using a flip transform, a random clipping, a color dithering, a translation transform, a scale transform, a contrast transform, a noise disturbance, a rotation transform, or a reflection transform, and a mixup method as follows:
wherein (x)
i ,y
i ),(x
j ,y
j ) Is two samples randomly drawn from the training data, x represents the picture matrix, y represents the label information,
participating in model training data after enhancement, and enabling lambda to be in the range of 0,1]. The invention adopts a mixup method to add the obtained picture into the original data set. By expanding the training samples, the overfitting problem caused by too small data volume can be avoided. The mixup method is a simple data enhancement mode irrelevant to data, and a virtual training sample is constructed. The mixup extends the training distribution by incorporating a priori knowledge that linear interpolation of linear vectors should result in linear interpolation of the relevant labels, and introduces minimal computational overhead.
And S107, performing model training on the training set subjected to data enhancement processing and the corresponding marked data set by using a YOLOv3 network to obtain a plurality of high-altitude parabolic detection models with different weights.
Specifically, the marked picture sample and the background picture sample are sent to a YOLOv3 network together, the marked picture is a positive sample, the background picture does not contain objects in the object class, the unmarked background picture is a negative sample, the positive sample and the negative sample are trained in the YOLOv3 network together, and the high-altitude parabolic detection model with multiple weights is obtained through iterative training.
And S108, inputting the marked data sets of the verification set and the verification set into a YOLOv3 network, verifying the model while training the model, and obtaining the current accuracy of the model so as to adjust the parameters of the model in time and obtain an optimal weight model.
And step S109, after the model training is finished, testing the optimal weight model by using the test set and the test set label data set.
And step S110, acquiring real-time building video data shot by the camera.
Specifically, the data transmission format of the monitoring video and the server is video streaming, and the transmission protocol is an ONVIF protocol.
And step S111, searching for a moving object in the real-time building video data by using a background modeling algorithm.
Specifically, the background modeling algorithm is an adaptive Gaussian mixture background modeling algorithm.
The working principle of the self-adaptive mixed Gaussian background modeling is as follows: in the detection and extraction of the moving target, a background target is important for the identification and tracking of the target. Modeling is an important link of background target extraction. Foreground means that any meaningful moving object is the foreground under the assumption that the background is stationary. The problems of moving object detection are mainly divided into two categories, camera fixation and camera motion. For the problem of detecting a moving object moving by a camera, a well-known solution is an optical flow method, and an optical flow field of an image sequence is solved by solving a partial differential equation, so that the motion state of the camera is predicted. In the case of a fixed camera, an optical flow method may be used, but due to the complexity of the optical flow method, it is often difficult to calculate in real time. Whereas the mixed gaussian background modeling is suitable for separating the background and the foreground from the image sequence with the camera fixed. Under the condition that a camera is fixed, the change of a background is slow, and is mostly influenced by illumination, wind and the like, a foreground and the background are separated from a given image through modeling the background, and generally, the foreground is a moving object, so that the purpose of detecting the moving object is achieved.
The Gaussian mixture model has been widely applied to robust complex scene background modeling, especially in situations with small repetitive motions, such as swaying leaves, bushes, rotating fans, sea surges, rainy and snowy weather, light reflections, and the like. The pixel-based Gaussian mixture model is effective in modeling the multimodal distribution background, can adapt to the change of the background such as light gradual change, and can basically meet the real-time requirement of the algorithm in practical application.
The mixed Gaussian background modeling is a background representation method based on pixel sample statistical information, the background is represented by using statistical information (such as mode quantity, mean value and standard deviation of each mode) such as probability density of a large number of sample values of a pixel in a long time, and then target pixels are judged by using statistical difference, so that the complex dynamic background can be modeled, and the calculated amount is large.
In the Gaussian mixture background model, the color information among the pixels is considered to be irrelevant, and the processing of each pixel point is independent. For each pixel point in the video image, the change of the value in the sequence image can be regarded as a random process which continuously generates the pixel value, namely, the color rendering rule of each pixel point is described by Gaussian distribution.
For a multi-peak Gaussian distribution model, each pixel point of an image is modeled according to superposition of a plurality of Gaussian distributions with different weights, each Gaussian distribution corresponds to a state which can possibly generate the color presented by the pixel point, and the weight and distribution parameters of each Gaussian distribution are updated along with time. When processing color images, it is assumed that the R, G, B channels of the image pixels are independent and have the same variance. Observation data set { X for random variable X 1 ,x 2 ,…,x N },x t =(r t ,g t ,b t ) For a sample of the pixel at time t, then a single sample point x t Its obeyed mixture gaussian distribution probability density function:
where k is the total number of distribution modes, η (x)
t ,μ
i,t ,τ
i,t ) For the ith Gaussian distribution at time t, μ
i,t Is the mean value of
i,t For the purpose of its covariance matrix,
is variance, I is three-dimensional identity matrix, w
i,t The weight of the ith gaussian distribution at time t.
The maximum number of Gaussian distributions of each pixel point is set to be k by the self-adaptive Gaussian mixture model max And =4. Assuming that the initial number of the Gaussian distributions of each pixel point is k =1, taking the pixel value of each point of the first frame as the initial mean value u of the Gaussian distributions 0 Variance is σ 0 =30, weight is ω 0 =0.2。
If the current k Gaussian distributions are not matched with the target pixel and k exist<k max K = k +1, a new gaussian distribution is added to the background model, which is averaged over the current pixel value, with a standard deviation and weight of 30, 0.2, respectively. But if k = k at this time max A new gaussian distribution will be generated with its mean initialized with the current pixel value, standard deviation and weight of 30, 0.01 respectively. The new Gaussian distribution will immediately replace the original k max The one with the smallest weight among the distributions.
In the traditional mixed Gaussian background modeling algorithm, the first successful matching is taken as a matching result. In fact, the new pixel may be successfully matched to the polynomial gaussian distribution, and the first match is not necessarily the best match. In the adaptive Gaussian mixture model, each Gaussian distribution is matched with a new pixel, and the optimal distribution of the matching result is found out, wherein the optimal distribution is obtained by the following formula:
if the optimal matching of the continuous 10 frames is the same distribution and k is greater than 1, k = k-1, and the distribution with the minimum weight value is directly removed.
The working principle of the YOLOv3 network is explained again:
YOLOv3 adjusts the network structure on the basis of YOLOv1 and YOLOv2, object detection is carried out by utilizing multi-scale features, and softmax is replaced by Logistic in object classification. YOLOv3 has no full connection layer and no pooling layer, can correspond to input images of any size, mainly comprises 75 convolutional layers, and is additionally provided with a resnet residual module in the network, so that the gradient problem of a deep network is solved.
The Resnet residual error network is equivalent to adding a shortcut path in an original CNN network structure, and the learning process is changed from directly learning features to adding certain features on the basis of the previously learned features so as to obtain better features. Thus, a complex feature H (x), which was previously learned layer by layer independently, now becomes a model H (x) = F (x) + x, where x is the feature at the beginning of the short, and F (x) is the padding and addition of x, which becomes the residual. Therefore, the learning target changes from learning complete information to learning residual. The difficulty of learning the high-quality features is greatly reduced.
An image typically contains a variety of objects and has a size. It is desirable to be able to detect all sizes of objects at once. Therefore, the network must have the ability to "see" objects of different sizes. And the deeper the network, the smaller the signature, so that smaller objects are more difficult to detect later. For this problem, YOLOv3 uses 3 Feature maps of different scales to detect objects, and can detect features of finer granularity, YOLOv3 uses an FPN (Feature Pyramid Network) structure to correspond to different accuracies of multiple scales, and performs target detection on Feature maps of different depths respectively, and the Feature map of the current layer performs up-sampling on the Feature map of the future layer and uses the up-sampled Feature map to fuse the low-order features and the high-order features, thereby improving the detection accuracy.
The Softmax layer is replaced with a 1x1 convolutional layer + logistic activation function structure. May correspond to a multi-tagged object. When YOLOv3 predicts the preselected boxes bbox, a logistic regression is adopted, each preselected box comprises five elements bbox (bx, y, w, h, c), wherein the first four elements represent the size and the coordinate position of the preselected box, and the last value is a confidence coefficient.
Pr (object) = IOU (bbox, object), where Pr (object) = IOU (bbox, object) is confidence.
The logistic regression will score the bbox surrounding part for an object, and find the one with the highest object existence probability score.
And step S112, inputting the image of the moving object into the optimal weight model to perform high-altitude parabolic detection, so as to obtain a high-altitude parabolic detection result.
Specifically, by comparing the image of the moving object with the object identified by the YOLOv3 target detection model, if the identification result is an object such as a bird, a balloon, a plastic bag, or the like, it is not determined as a falling object. And strategies can be flexibly configured according to actual scenes, and interference of aerial flying objects is effectively reduced.
And S113, returning the high-altitude parabolic detection result to related personnel of the social treatment platform server in a screenshot form for manual examination.
And step S114, judging whether the moving object belongs to a falling object or not according to the manual checking result.
And step S115, if the moving object belongs to a falling object, sending out falling object warning information to inform related personnel to process.
And step S116, if the moving object does not belong to the high-altitude falling object, receiving modification information of the object type information of the moving object by related personnel, and taking the modified picture sample as a difficult sample.
In step S117, it is determined whether the number of difficult samples exceeds a preset threshold.
And S118, if the number of the difficult samples exceeds a preset threshold value, automatically starting a self-optimization process, combining the difficult samples and a training set into a new training set, and performing model training according to the new training set by using a YOLOv3 network to obtain a high-altitude parabolic detection model after training iteration.
The data source of the invention mainly comprises three parts: and detecting the wrong picture sample and the background picture sample which does not contain the target object type. The marked picture sample is a positive sample, the background picture is a negative sample, the positive sample and the negative sample form a data set X, and the X is sent to a YOLOv3 network for training; the samples that detect errors are called difficult samples whose labels are manually altered and then iteratively trained in conjunction with the data set X. The method adds the detected result into the data set to enter a self-optimization process, collects the detected object picture and the result of manual treatment, and converts the result of manual treatment into a label corresponding to the training model. When the number of the collected object pictures reaches a specified threshold value, the method can automatically start the process of model training. With the lapse of time, the artificial feedback obtained by the method is increased, the number of times of on-line training iteration is increased, the effect of the model is gradually optimized, and the accuracy is gradually improved.
As shown in fig. 2, the present invention provides a high altitude parabolic detection apparatus based on background modeling, YOLOv3 and self-optimization, the apparatus includes:
a first obtaining unit 201, configured to obtain building video data captured by a camera.
And an intercepting unit 202, configured to intercept a picture from the building video data according to a frame, so as to obtain an initial data set.
And the preprocessing unit 203 is configured to preprocess the initial data set to obtain a building picture data set.
And the grouping unit 204 is used for dividing the building picture data set into a training set, a verification set and a test set according to a preset proportion.
And the marking unit 205 is configured to mark the building picture data in the training set, the verification set, and the test set according to an object type formulated in a preset high-altitude parabolic detection strategy, so as to obtain a marking data set in one-to-one correspondence with the building picture data, where the marking data set includes object marking frame coordinates and object type information.
And a data enhancement unit 206, configured to perform data enhancement processing on the training set and the corresponding labeled data set.
And the model training unit 207 is configured to perform model training on the training set subjected to the data enhancement processing and the corresponding labeled data set by using the YOLOv3 network to obtain a plurality of high-altitude parabolic detection models with different weights.
And the model verification unit 208 is configured to input the verification set and the labeled data set of the verification set into a YOLOv3 network, verify the model while training the model, and obtain the current accuracy of the model, so as to adjust model parameters in time and obtain an optimal weight model.
And the model testing unit 209 is configured to test the optimal weight model by using the test set and the test set label data set after the model training is completed.
And a second obtaining unit 210, configured to obtain real-time building video data captured by the camera.
The finding unit 211 is configured to find a moving object in the real-time building video data by using a background modeling algorithm.
And the detection unit 212 is configured to input the image of the moving object into the optimal weight model to perform high-altitude parabolic detection, so as to obtain a high-altitude parabolic detection result.
And a returning unit 213, configured to return the high-altitude parabolic detection result to a relevant person at the service end of the social treatment platform in a screenshot form for manual review.
And a first judging unit 214, configured to judge whether the moving object belongs to an overhead falling object according to the manual review result.
And a notification unit 215, configured to send out falling object warning information to notify relevant persons to perform processing in case that the moving object belongs to a falling object.
And the modifying unit 216 is configured to receive modification information of the object category information of the moving object from a related person under the condition that the moving object does not belong to a high falling object, and take the modified picture sample as a difficult sample.
A second judging unit 217, configured to judge whether the number of difficult samples exceeds a preset threshold.
A merging unit 218, configured to automatically start a self-optimization process when the number of the difficult samples exceeds a preset threshold, merge the difficult samples and the training set into a new training set, and perform model training according to the new training set by using a YOLOv3 network to obtain a training-iterated high altitude parabolic detection model.
The embodiment of the present invention further provides a storage medium, and the storage medium stores a computer program, and when the computer program is executed by a processor, the computer program implements part or all of the steps in each embodiment of the high altitude parabola detection method based on background modeling, YOLOv3 and self-optimization provided by the present invention. The storage medium may be a magnetic disk, an optical disk, a Read-only memory (ROM) or a Random Access Memory (RAM).
Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented using software plus any required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The same and similar parts in the various embodiments in this specification may be referred to each other. In particular, for the embodiments of the high altitude parabolic detection apparatus based on background modeling, YOLOv3 and self optimization, since they are substantially similar to the embodiments of the method, the description is simple, and the relevant points can be referred to the description in the embodiments of the method.
The above-described embodiments of the present invention do not limit the scope of the present invention.