CN113689459A - GMM (Gaussian mixture model) combined with YOLO (YOLO) based real-time tracking and graph building method in dynamic environment - Google Patents

GMM (Gaussian mixture model) combined with YOLO (YOLO) based real-time tracking and graph building method in dynamic environment Download PDF

Info

Publication number
CN113689459A
CN113689459A CN202110869065.XA CN202110869065A CN113689459A CN 113689459 A CN113689459 A CN 113689459A CN 202110869065 A CN202110869065 A CN 202110869065A CN 113689459 A CN113689459 A CN 113689459A
Authority
CN
China
Prior art keywords
dynamic
key frame
yolo
image
foreground
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110869065.XA
Other languages
Chinese (zh)
Other versions
CN113689459B (en
Inventor
刘佳
顾淇尧
闫冬
钱昌宇
卞方舟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202110869065.XA priority Critical patent/CN113689459B/en
Publication of CN113689459A publication Critical patent/CN113689459A/en
Application granted granted Critical
Publication of CN113689459B publication Critical patent/CN113689459B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/207Analysis of motion for motion estimation over a hierarchy of resolutions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a GMM-based and YOLO-based real-time tracking and graph building method in a dynamic environment, which comprises the following steps: (1) extracting characteristic points of each frame of image in the dynamic image, and correcting the non-key frame by using an affine transformation matrix; (2) training images at a non-key frame stage by using a Gaussian mixture model, modeling background images and segmenting foreground dynamic regions by using GMM (Gaussian mixture model); (3) inputting the non-key frame image trained in the step (2) and the key frame image in the step (1) into a YOLO detector, tracking and predicting the image of the YOLO detector by using a particle filter algorithm, removing dynamic feature points detected by the current frame, and inserting the key frame for map construction. The invention utilizes the global discontinuous characteristic of the key frame, trains the background image through GMM, segments the foreground dynamic area and provides prior for YOLO; by utilizing the advantages of high speed and good robustness of YOLOv3, the dynamic target detection between continuous frames is realized, and the detection precision of a dynamic area is improved.

Description

GMM (Gaussian mixture model) combined with YOLO (YOLO) based real-time tracking and graph building method in dynamic environment
Technical Field
The invention relates to the field of image processing, in particular to a GMM-based and YOLO-based real-time tracking and mapping method in a dynamic environment.
Background
The SLAM is used for simultaneously positioning and mapping, can solve the motion problem of the robot in an unknown environment, and constructs an environment map by timely feeding back the posture and the motion track of the robot after the robot observes the environment. Early SLAM systems mainly utilized sensors such as single line laser radar, sonar to realize self location, along with the rapid development of computer vision, the vision SLAM system through camera and IMU relies on its advantage convenient and with low costs, has obtained wide application in fields such as robot, AR map construction, unmanned driving.
The traditional dynamic target detection method comprises an optical flow method, an interframe difference method and a background elimination method, and has the following problems: the optical flow method is greatly influenced by the change of scene brightness, and the frame difference method is greatly influenced by noise, so that the phenomena of false detection, missing detection and the like exist in the target detection process, the target is drifted during tracking, and the target tracking precision is further influenced.
In many researches, a VSLAM system is established in a static environment, however, a real environment is more complex, a plurality of dynamic targets such as people and cars often exist in a plurality of scenes such as classrooms, hospitals, shopping places and the like, and many VSLAM systems do not have the adaptability of complex scenes, so that errors are generated in a map point and a pose matrix obtained through calculation. With the improvement of the tracking precision requirement, the number of target tracking is increased, and the traditional filtering algorithm cannot provide a good tracking effect.
Disclosure of Invention
The purpose of the invention is as follows: in view of the above problems, the present invention aims to provide a GMM-based and YOLO-based real-time tracking and mapping method in a dynamic environment, so as to accurately remove dynamic areas in the dynamic environment, and further achieve stable tracking and mapping.
The technical scheme is as follows: the invention discloses a real-time tracking and mapping method based on combination of GMM and YOLO in a dynamic environment, which comprises the following steps:
(1) extracting characteristic points of each frame of image in the dynamic image, dividing the characteristic points into a key frame and a non-key frame, calculating an affine transformation matrix between two adjacent frames in the non-key frame, and correcting the non-key frame by using the affine transformation matrix;
(2) training images at a non-key frame stage by using a Gaussian Mixture Model (GMM), and modeling background images and segmenting foreground dynamic regions by using the GMM;
(3) inputting the non-key frame image trained in the step (2) and the key frame image in the step (1) into a YOLO detector, tracking and predicting the image of the YOLO detector by using a particle filter algorithm, removing dynamic feature points detected by the current frame, and inserting the key frame for map construction.
Further, the relationship of the affine transformation matrix in the step (1) is as follows:
Figure BDA0003188388600000021
where the left matrix of the equation
Figure BDA0003188388600000022
Representing the current frame coordinates, right of equation
Figure BDA0003188388600000023
A matrix of the simulated transformation is represented,
Figure BDA0003188388600000024
representing the coordinates of the previous frame of the current frame.
Further, the modeling process of step (2) includes:
(21) and (3) matching each pixel point of the non-key frame stage image by using the GMM, finding a corresponding model of each pixel point in the K-type normal distribution model, and if the pixel point belongs to the current normal distribution model, satisfying the following formula:
|Xti,t-1|≤2.5σi,t-1
in the formula XtRepresenting pixel points to be matched, i representing the category corresponding to the normal distribution model, mui,t-1Represents the pixel mean value, sigma corresponding to the ith normal distribution model at the t-1 momenti,t-1Representing the standard deviation of all pixels corresponding to the ith normal distribution model at the t-1 moment;
(22) weight w to K-class normal distribution modelk,tUpdating is carried out, and the expression is as follows:
wk,t=(1-α)*wk,t-1+α*Mk,t
wherein α represents a learning rate, wk,t-1Representing the weight of the kth normal distribution model at the t-1 moment; mk,tRepresenting the matching judgment of the kth model at the time t, if the formula in the step (21) is established, matching the pixel points with a Gaussian model M k,t1, otherwise Mk,t=0;
(23) If the current pixel point does not meet the formula in the step (21), the current pixel point does not belong to the background image, and the pixel mean value mu is obtainedi,t-1And standard deviation σi,t-1Keeping the same;
if the current pixel point meets the formula in the step (21), which indicates that the current pixel point belongs to the background image, updating the parameters of the current distribution model, wherein the expression is as follows:
ρ=α*η(Xt∣μkk)
μt=(1-ρ)*μt-1+ρ*Xt
Figure BDA0003188388600000031
where ρ represents the intermediate parameter, η (X)t∣μkk) A learning rate change function representing the kth model at the time t;
(24) if all the pixel points do not meet the formula in the step (21), modifying the distribution model parameter with the minimum weight in the Gaussian mixture model, and modifying the average value into the current pixel value;
(25) sequencing the K distribution models, sequentially arranging the K distribution models according to the sequence of the weights from large to small, selecting the first B distribution models as background pixels, selecting the other models as foreground pixels, and obtaining the expression of B:
Figure BDA0003188388600000032
wherein T is the proportion of the background, and b is the number of the selected models.
Further, after the key frame image is sent to the YOLO detector in step (3), the key frame establishes dynamic candidate regions, receives each candidate region, and discards candidate regions that cannot be identified.
Further, after the non-key frame image trained in the step (2) is input into a YOLO detector, a foreground dynamic region provides a priori for the detector, and the YOLO detector estimates the foreground dynamic region of the key frame according to the foreground dynamic region of the non-key frame; when a foreground dynamic area provided by a non-key part is overlapped with a dynamic target detected by a YOLO detector, accepting the current dynamic target candidate area; if the foreground dynamic area provided by the non-key is not overlapped with the dynamic target detected by the YOLO detector, discarding the current dynamic target candidate area; and taking the received dynamic target candidate area as a key frame foreground dynamic target estimated by the YOLO detector.
Further, step (3): and tracking the foreground dynamic target of the key frame estimated by the YOLO by using a particle filter algorithm, and updating the position and length information of the foreground dynamic target of the next frame of the current frame.
Further, the detection network used by the YOLO detector is YOLOv 3.
Further, before the correction in the step (1), equalization processing is performed on each pixel point of the image in the non-key frame.
Further, the step (1) of extracting the feature points is realized by ORB-SLAM 2.
Has the advantages that: compared with the prior art, the invention has the following remarkable advantages:
1. the invention utilizes the global discontinuous characteristic of the key frame, trains the background image through GMM, segments the foreground dynamic area and provides prior for YOLO;
2. by utilizing the advantages of high speed and good robustness of YOLOv3, the dynamic target detection between continuous frames is realized, and the detection precision of a dynamic area is improved;
3. and the dynamic target is tracked for a long time by using YOLOv3 and combining a particle filter algorithm, so that the stable operation of the tracked target is effectively ensured.
Drawings
FIG. 1 is a schematic representation of an affine transformation matrix solution;
FIG. 2 is a diagram of GMM dynamic solution;
FIG. 3 is a GMM dynamic target detection flow diagram;
FIG. 4 is a schematic diagram of a bounding box regression;
FIG. 5 is a schematic diagram of a dynamic target detection of YOLO;
FIG. 6 is a schematic diagram of a dynamic target detection by YOLO;
fig. 7 is a flow chart of dynamic target tracking based on particle filtering.
Detailed Description
The GMM-based and YOLO-based real-time tracking and mapping method in the embodiment includes the following steps:
(1) extracting feature points from each frame of image in the dynamic image by ORB-SLAM2, dividing into key frames and non-key frames, and calculating affine transformation matrix between two adjacent frames in the non-key frames as shown in FIG. 1, wherein (x)1,y1)、(x2,y2)、(x3,y3) For the current frame corresponding mapping point coordinates, (x)1’,y1’)、(x2’,y2’)、(x3’,y3') are the mapping point coordinates corresponding to the previous frame, and P1, P2 and P3 are three-dimensional points.
And (4) carrying out equalization processing on each pixel point of the image in the non-key frame, and correcting the non-key frame by using an affine transformation matrix.
The affine transformation matrix has the relation:
Figure BDA0003188388600000041
where the left matrix of the equation
Figure BDA0003188388600000042
Representing the current frame coordinates, right of equation
Figure BDA0003188388600000043
A matrix of the simulated transformation is represented,
Figure BDA0003188388600000044
representing the coordinates of the previous frame of the current frame.
(2) Images in the non-key frame phase are trained using GMM, as shown in FIG. 2. The background image is modeled by the GMM to further segment the foreground dynamic region, the specific process is as follows, and the flow chart is as shown in fig. 3.
(21) And (3) matching each pixel point of the non-key frame stage image by using the GMM, finding a model matched with each pixel point in the K-type normal distribution model, and if the pixel point belongs to the current normal distribution model, satisfying the following formula:
|Xti,t-1|≤2.5σi,t-1
in the formula XtRepresenting pixel points to be matched, i representing the category corresponding to the normal distribution model, mui,t-1Represents the pixel mean value, sigma corresponding to the ith normal distribution model at the t-1 momenti,t-1Representing the standard deviation of all pixels corresponding to the ith normal distribution model at the t-1 moment;
(22) weight w to K-class normal distribution modelk,tUpdating is carried out, and the expression is as follows:
wk,t=(1-α)*wk,t-1+α*Mk,t
wherein α represents a learning rate, wk,t-1Representing the weight of the kth normal distribution model at the t-1 moment; mk,tRepresents the k model matching judgment at the time tOtherwise, if the formula in step (21) is established, the pixel point belongs to a certain model in the Gaussian mixture model, M k,t1 is ═ 1; if the formula in the step (21) is not satisfied, the pixel point does not belong to a Gaussian mixture model, Mk,t=0;
(23) If the current pixel point does not meet the formula in the step (21), the current pixel point does not belong to the background image, and the pixel mean value mu is obtainedi,t-1And standard deviation σi,t-1Keeping the same;
if the current pixel point meets the formula in the step (21), which indicates that the current pixel point belongs to the background image, updating the parameters of the current distribution model, wherein the expression is as follows:
ρ=α*η(Xtk,σk)
μt=(1-ρ)*μt-1+ρ*Xt
Figure BDA0003188388600000052
where ρ represents the intermediate parameter, η (X)tk,σk) A learning rate change function representing the kth model at the time t;
(24) if all the pixel points do not meet the formula in the step (21), modifying the distribution model parameter with the minimum weight in the Gaussian mixture model, modifying the mean value into the current pixel value, modifying the standard deviation to be larger than the previous standard deviation, and modifying the weight to be smaller than the previous weight;
(25) sequencing the K distribution models, sequentially arranging the K distribution models according to the sequence of the weights from large to small, selecting the first B distribution models as background pixels, selecting the other models as foreground pixels, and obtaining the expression of B:
Figure BDA0003188388600000051
wherein T is the proportion of the background, and b is the number of the selected models.
(3) Inputting the non-key frame image trained in the step (2) and the key frame image in the step (1) into a YOLO detector, tracking and predicting the image of the YOLO detector by using a particle filter algorithm, removing dynamic feature points detected by the current frame, and inserting the key frame for map construction.
And (3) after the key frame image is sent to a YOLO detector, the key frame establishes dynamic candidate areas, each candidate area is received, and candidate areas which cannot be identified are discarded.
Step (3) inputting the non-key frame image trained in the step (2) into a YOLO detector, wherein a foreground dynamic region provides a priori for the detector, and the YOLO detector estimates the foreground dynamic region of the key frame according to the foreground dynamic region of the non-key frame; when a foreground dynamic area provided by a non-key part is overlapped with a dynamic target detected by a YOLO detector, accepting the current dynamic target candidate area; if the foreground dynamic area provided by the non-key is not overlapped with the dynamic target detected by the YOLO detector, discarding the current dynamic target candidate area; and taking the received dynamic target candidate area as a key frame foreground dynamic target estimated by the YOLO detector.
As shown in fig. 5, in the present embodiment, a YOLOv3 target detection algorithm is adopted, a Darknet-53 is adopted as a network main body frame, an input picture is divided into 13 × 13 tables, a dynamic target is detected by using each cell, each cell includes a bounding box and a recognition probability value, and thus, whether the cell includes a dynamic target object, and position information and probability information of the target object is determined. Selecting a priori frames of 3 scales and 9 types in a mode of selecting dimension clustering on the boundary frames, converting the detection problem of the boundary frames into a regression problem, and predicting 4 coordinate offsets t of each boundary frame as shown in figure 4x,ty,tw,thAnd calculating the result of the target frame through the offset, wherein the calculation formula is as follows:
bx=σ(tx)+cx
by=σ(ty)+cy
Figure BDA0003188388600000061
Figure BDA0003188388600000062
wherein t isx,ty,tw,thRespectively representing the offset of x-coordinate, y-coordinate, width, height, bx、by、bw、bhRepresents the result of the final target box, σ () represents the Sigmoid function, cx、cyThe grid number coordinate representing the current position in the characteristic diagram offset relative to the grid at the upper left corner, and the result of x is normalized to accelerate the network convergence speed, pwAnd phIs the width and height of the prior box.
The prior box includes: for the characteristic graph with the scale of 13 × 13, three prior frames with the pixel sizes of 10 × 13, 16 × 30 and 33 × 23 are adopted; for a characteristic scale map of 26 × 26, three prior boxes with the sizes of 30 × 61, 62 × 45 and 59 × 119 pixels are adopted; for a feature scale map size of 52 × 52, three prior boxes of 116 × 90, 156 × 198, 373 × 326 pixel sizes are used.
The loss function of YOLOv3 mainly includes the following three parts:
objective confidence loss function
Figure BDA0003188388600000071
Figure BDA0003188388600000072
Objective classification penalty function
Figure BDA0003188388600000073
Figure BDA0003188388600000074
Target location offset loss function
Figure BDA0003188388600000075
Figure BDA0003188388600000076
An overall Loss function Loss is established through three Loss function models:
Figure BDA0003188388600000077
wherein o isijWhether the rectangular box is responsible for predicting a target object or not is shown, if the rectangular box is responsible for predicting a target, the size of the rectangular box is 1, and if not, the rectangular box is equal to 0; c is the probability score of the target object contained in the prediction frame,
Figure BDA0003188388600000078
the mark box contains the actual probability score of the target object, p represents the probability of a certain category,
Figure BDA0003188388600000079
representing the true value of the category to which the mark box belongs; (x)ij,yij) Is the coordinates of the center of the rectangular box of the network prediction,
Figure BDA00031883886000000710
is the center coordinate of the marked rectangular frame, (w)ij,hij) The size of the width and height of the rectangle predicted for the network,
Figure BDA00031883886000000711
is the size of the width and height of the marked rectangular frame,
Figure BDA00031883886000000712
the true values are c, p, g are fitting values.
A schematic diagram of dynamic regional target detection using YOLO is shown in fig. 6, where a dotted frame represents a foreground dynamic target candidate region provided by GMM, a solid frame represents that YOLO 3 detects all dynamic targets, the overlap degree IOU result is used as probability information to obtain the most possible dynamic target, for example, position(s), while the region(s) in which GMM dynamic detection fails, for example, position(s), is discarded, and other static targets obtained by YOLO 3, for example, position(s), are tracked with respect to the solid frame(s), for example, position(s).
Tracking the foreground dynamic target of the key frame estimated by the YOLO by using a particle filter algorithm, and updating the position and length information of the foreground dynamic target of the next frame of the current frame, wherein the specific process comprises the following steps:
firstly, a state equation and an observation equation are set up as follows:
xr=fr(xr-1,vr-1)
yr=hr(xr,nr)
where r represents the current time, x represents the state quantity, v and n represent the noise quantity, f is the state transfer function, and h is the measurement function. Particle filtering exists in order to calculate the x of maximum confidencerI.e. maximum a posteriori p (x) obeying Bayesian probabilityr|y1:r). The probability distribution function at the last moment is known as p (x)r-1|y1:r-1) And the state transition obeys the requirement of the first-order markov model, i.e. linear time sequence relation, and the measured data is only related to the state value, the particle filter algorithm based on target identification follows the following steps, and the flow chart is shown in fig. 7.
(301) Initialization: and constructing N particles, giving the same weight to each particle, and uniformly distributing the particles on the image, wherein each particle meets a state equation and an observation equation. Meanwhile, in order to apply the particle filter equation to the field of target tracking, attribute information is given to each particle, wherein the attribute information comprises target position information, target speed information, target area frame length information, target area frame width information and target weight information. Setting a state transition matrix A and an observation matrix C as follows:
Figure BDA0003188388600000081
C=[I 0]
Figure BDA0003188388600000082
wherein the matrix I is represented as an identity matrix;
(302) predicting the state of the current particle according to the state results of the N particles in the previous frame by using a state equation:
p(xr∣y1:r-1)=∫p(xr,xr-1∣y1:r-1)dxr-1
=∫p(xr∣xr-1,y1:r-1)p(xr-1∣y1:r-1)dxr-1
=∫p(xr∣xr-1)p(xr-1∣y1:r-1)dxr-1
(303) a correction stage: and calculating the weight of each particle through an observation equation, calculating weight information according to the descriptor matching result, the IOU value and the pixel consistency result, and finally performing normalization calculation on the weight results of all candidate particles. In order to avoid the situation that VSLAM local map and pose tracking fails due to the fact that a large number of VSLAM video sequences are divided into dynamic regions, the observation results are limited, the pixels of the observation regions are larger than half of the video frames, and dynamic result collection between two key frames is omitted. Also, if a large number of small dynamic objects appear, the matching time of the descriptors is sharply increased, an upper limit 10 for the number of observations is set, the upper limit is exceeded, and the IOU and pixel consistency results are used as weight information. The expected outcome of the particle states is calculated using monte carlo sampling estimates for the calculation of the integration results.
Figure BDA0003188388600000091
Wherein q (x)r∣y1:r) Is a simple probability distribution function introduced.
(304) And (3) a resampling stage: and screening the predicted particles through the particle weight, and reserving a large number of particles with large weights. All in oneIn order to avoid the problem of particle weight degradation, the particles with low weight are discarded, and the particles with large weight are copied to complement the discarded number of particles according to the weight proportion. The resampled particles represent the probability distribution of the true state. Weight wiThe calculation formula is as follows:
Figure BDA0003188388600000092
wherein wiouFor the result of the cross-over ratio (normalization) of the prediction box and the detection box, wfThe number ratio and normalization result obtained by matching the target area image intercepted from the key frame with the current frame image, wappAs a result of appearance similarity (normalization). The process of prediction updating is completed sequentially through steps (302), (303), (304) and (302), and the above steps (301) - (304) only exist in the image sequence between two key frames. When the VSLAM constructs the key frame, whether a new target area needs to be constructed is judged again. Therefore, an incremental model is added during the construction of the key frame to ensure that dynamic increments can be stably tracked in the tracking process, or whether to cancel tracking of lost target information is judged through the incremental model.

Claims (8)

1. The GMM-based and YOLO-based real-time tracking and mapping method under the dynamic environment is characterized by comprising the following steps of:
(1) extracting characteristic points of each frame of image in the dynamic image, dividing the characteristic points into a key frame and a non-key frame, calculating an affine transformation matrix between two adjacent frames in the non-key frame, and correcting the non-key frame by using the affine transformation matrix;
(2) training images at a non-key frame stage by using a Gaussian mixture model, modeling background images and segmenting foreground dynamic regions by using GMM (Gaussian mixture model);
(3) inputting the non-key frame image trained in the step (2) and the key frame image in the step (1) into a YOLO detector, tracking and predicting the image of the YOLO detector by using a particle filter algorithm, removing dynamic feature points detected by the current frame, and inserting the key frame for map construction.
2. The real-time tracking and mapping method according to claim 1, wherein the modeling process of step (2) comprises:
(21) and (3) matching each pixel point of the non-key frame stage image by using the GMM, finding a model matched with each pixel point in the K-type normal distribution model, and if the pixel point belongs to the current normal distribution model, satisfying the following formula:
|Xti,t-1|≤2.5σi,t-1
in the formula XtRepresenting pixel points to be matched, i representing the category corresponding to the normal distribution model, mui,t-1Represents the pixel mean value, sigma corresponding to the ith normal distribution model at the t-1 momenti,t-1Representing the standard deviation of all pixels corresponding to the ith normal distribution model at the t-1 moment;
(22) weight w to K-class normal distribution modelk,tUpdating is carried out, and the expression is as follows:
wk,t=(1-α)*wk,t-1+α*Mk,t
wherein α represents a learning rate, wk,t-1Representing the weight of the kth normal distribution model at the t-1 moment; mk,tRepresenting the matching judgment of the kth model at the time t, if the formula in the step (21) is established, matching the pixel points with a Gaussian model Mk,t1, otherwise Mk,t=0;
(23) If the current pixel point does not meet the formula in the step (21), the current pixel point does not belong to the background image, and the pixel mean value mu is obtainedi,t-1And standard deviation σi,t-1Keeping the same;
if the current pixel point meets the formula in the step (21), which indicates that the current pixel point belongs to the background image, updating the parameters of the current distribution model, wherein the expression is as follows:
y=α*η(Xtk,σk)
μt=(1-P)*μt-1+ρ*Xt
Figure FDA0003188388590000021
where ρ represents the intermediate parameter, η (X)t∣μkk) A learning rate change function representing the kth model at the time t;
(24) if all the pixel points do not meet the formula in the step (21), modifying the distribution model parameter with the minimum weight in the Gaussian mixture model, and modifying the average value into the current pixel value;
(25) sequencing the K distribution models, sequentially arranging the K distribution models according to the sequence of the weights from large to small, selecting the first B distribution models as background pixels, selecting the other models as foreground pixels, and obtaining the expression of B:
Figure FDA0003188388590000022
wherein T is the proportion of the background, and b is the number of the selected models.
3. The real-time tracking and mapping method of claim 1, wherein after the step (3) of sending the keyframe image to a YOLO detector, the keyframe establishes dynamic target candidate regions, accepts the respective candidate regions, and discards candidate regions that cannot be identified.
4. The real-time tracking and mapping method according to claim 3, wherein step (3) inputs the non-key frame image trained in step (2) to a YOLO detector, the foreground dynamic region provides a priori for the detector, and the YOLO detector estimates the foreground dynamic region of the key frame according to the foreground dynamic region of the non-key frame; when a foreground dynamic area provided by a non-key part is overlapped with a dynamic target detected by a YOLO detector, accepting the current dynamic target candidate area; if the foreground dynamic area provided by the non-key is not overlapped with the dynamic target detected by the YOLO detector, discarding the current dynamic target candidate area; and taking the received dynamic target candidate area as a key frame foreground dynamic target estimated by the YOLO detector.
5. The real-time tracking and mapping method according to claim 4, wherein the step (3): and tracking the foreground dynamic target of the key frame estimated by the YOLO by using a particle filter algorithm, and updating the position and length information of the foreground dynamic target of the next frame of the current frame.
6. The real-time tracking and mapping method according to claim 5, wherein the detection network used by the YOLO detector is YOLOv 3.
7. The real-time tracking and mapping method according to claim 1, wherein before the correction in step (1), each pixel point of the image in the non-key frame is equalized.
8. The real-time tracking and mapping method according to claim 1, wherein the step (1) of extracting feature points is implemented by ORB-SLAM 2.
CN202110869065.XA 2021-07-30 2021-07-30 Real-time tracking and mapping method based on GMM and YOLO under dynamic environment Active CN113689459B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110869065.XA CN113689459B (en) 2021-07-30 2021-07-30 Real-time tracking and mapping method based on GMM and YOLO under dynamic environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110869065.XA CN113689459B (en) 2021-07-30 2021-07-30 Real-time tracking and mapping method based on GMM and YOLO under dynamic environment

Publications (2)

Publication Number Publication Date
CN113689459A true CN113689459A (en) 2021-11-23
CN113689459B CN113689459B (en) 2023-07-18

Family

ID=78578376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110869065.XA Active CN113689459B (en) 2021-07-30 2021-07-30 Real-time tracking and mapping method based on GMM and YOLO under dynamic environment

Country Status (1)

Country Link
CN (1) CN113689459B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114429192A (en) * 2022-04-02 2022-05-03 中国科学技术大学 Image matching method and device and electronic equipment
CN116363494A (en) * 2023-05-31 2023-06-30 睿克环境科技(中国)有限公司 Fish quantity monitoring and migration tracking method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018146558A2 (en) * 2017-02-07 2018-08-16 Mindmaze Holding Sa Systems, methods and apparatuses for stereo vision and tracking
CN110660095A (en) * 2019-09-27 2020-01-07 中国科学院自动化研究所 Visual SLAM (simultaneous localization and mapping) initialization method, system and device in dynamic environment
CN111486855A (en) * 2020-04-28 2020-08-04 武汉科技大学 Indoor two-dimensional semantic grid map construction method with object navigation points
CN112184759A (en) * 2020-09-18 2021-01-05 深圳市国鑫恒运信息安全有限公司 Moving target detection and tracking method and system based on video
CN112699769A (en) * 2020-12-25 2021-04-23 北京竞业达数码科技股份有限公司 Detection method and system for left-over articles in security monitoring

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018146558A2 (en) * 2017-02-07 2018-08-16 Mindmaze Holding Sa Systems, methods and apparatuses for stereo vision and tracking
CN110660095A (en) * 2019-09-27 2020-01-07 中国科学院自动化研究所 Visual SLAM (simultaneous localization and mapping) initialization method, system and device in dynamic environment
CN111486855A (en) * 2020-04-28 2020-08-04 武汉科技大学 Indoor two-dimensional semantic grid map construction method with object navigation points
CN112184759A (en) * 2020-09-18 2021-01-05 深圳市国鑫恒运信息安全有限公司 Moving target detection and tracking method and system based on video
CN112699769A (en) * 2020-12-25 2021-04-23 北京竞业达数码科技股份有限公司 Detection method and system for left-over articles in security monitoring

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHANDAN G 等: "Real time object detection and tracking using Deep Learning and OpenCV", 《2018 INTERNATIONAL CONFERENCE ON INVENTIVE RESEARCH IN COMPUTING APPLICATIONS (ICIRCA)》, pages 1305 - 1308 *
JIA LIU 等: "VSLAM method based on object detection in dynamic environments", 《FRONTIERS INNEUROROBOTICS》, pages 1 - 16 *
李寰宇 等: "一种易于初始化的类卷积神经网络视觉跟踪算法", 《电子与信息学报》, vol. 38, no. 1, pages 1 - 7 *
门玉森: "基于轨迹匹配的模仿学习在类人机器人运动行为中的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, vol. 3, pages 140 - 1055 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114429192A (en) * 2022-04-02 2022-05-03 中国科学技术大学 Image matching method and device and electronic equipment
CN114429192B (en) * 2022-04-02 2022-07-15 中国科学技术大学 Image matching method and device and electronic equipment
CN116363494A (en) * 2023-05-31 2023-06-30 睿克环境科技(中国)有限公司 Fish quantity monitoring and migration tracking method and system

Also Published As

Publication number Publication date
CN113689459B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
CN111563442B (en) Slam method and system for fusing point cloud and camera image data based on laser radar
CN109387204B (en) Mobile robot synchronous positioning and composition method facing indoor dynamic environment
CN108596974B (en) Dynamic scene robot positioning and mapping system and method
Clark et al. Vidloc: A deep spatio-temporal model for 6-dof video-clip relocalization
CN112132893B (en) Visual SLAM method suitable for indoor dynamic environment
CN106055091B (en) A kind of hand gestures estimation method based on depth information and correcting mode
CN106875425A (en) A kind of multi-target tracking system and implementation method based on deep learning
CN113674328A (en) Multi-target vehicle tracking method
Teulière et al. Using multiple hypothesis in model-based tracking
CN105809716B (en) Foreground extraction method integrating superpixel and three-dimensional self-organizing background subtraction method
CN113689459A (en) GMM (Gaussian mixture model) combined with YOLO (YOLO) based real-time tracking and graph building method in dynamic environment
CN113483747A (en) Improved AMCL (advanced metering library) positioning method based on semantic map with corner information and robot
CN111797688A (en) Visual SLAM method based on optical flow and semantic segmentation
CN105760898A (en) Vision mapping method based on mixed group regression method
CN113362341B (en) Air-ground infrared target tracking data set labeling method based on super-pixel structure constraint
CN115035260A (en) Indoor mobile robot three-dimensional semantic map construction method
CN114782499A (en) Image static area extraction method and device based on optical flow and view geometric constraint
CN111340881A (en) Direct method visual positioning method based on semantic segmentation in dynamic scene
CN114708293A (en) Robot motion estimation method based on deep learning point-line feature and IMU tight coupling
CN115063447A (en) Target animal motion tracking method based on video sequence and related equipment
CN112991534A (en) Indoor semantic map construction method and system based on multi-granularity object model
CN110553650B (en) Mobile robot repositioning method based on small sample learning
CN110163132A (en) A kind of correlation filtering tracking based on maximum response change rate more new strategy
Wan et al. Automatic moving object segmentation for freely moving cameras
Taj et al. Multi-view multi-object detection and tracking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant