CN115482257A - Motion estimation method integrating deep learning characteristic optical flow and binocular vision - Google Patents
Motion estimation method integrating deep learning characteristic optical flow and binocular vision Download PDFInfo
- Publication number
- CN115482257A CN115482257A CN202211149943.1A CN202211149943A CN115482257A CN 115482257 A CN115482257 A CN 115482257A CN 202211149943 A CN202211149943 A CN 202211149943A CN 115482257 A CN115482257 A CN 115482257A
- Authority
- CN
- China
- Prior art keywords
- optical flow
- image
- deep learning
- pixel
- point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000003287 optical effect Effects 0.000 title claims abstract description 95
- 230000033001 locomotion Effects 0.000 title claims abstract description 53
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000013135 deep learning Methods 0.000 title claims abstract description 26
- 238000006073 displacement reaction Methods 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims abstract description 11
- 238000000605 extraction Methods 0.000 claims abstract description 6
- 239000013598 vector Substances 0.000 claims description 24
- 239000011159 matrix material Substances 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 12
- 230000000007 visual effect Effects 0.000 claims description 10
- 238000013507 mapping Methods 0.000 claims description 9
- 230000008859 change Effects 0.000 claims description 8
- 238000012937 correction Methods 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 8
- 230000001186 cumulative effect Effects 0.000 claims description 6
- 238000005315 distribution function Methods 0.000 claims description 6
- 230000004907 flux Effects 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 230000003321 amplification Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000005520 cutting process Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 claims description 2
- 238000005259 measurement Methods 0.000 abstract description 8
- 230000007547 defect Effects 0.000 abstract description 4
- 238000003384 imaging method Methods 0.000 abstract description 3
- 238000013527 convolutional neural network Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 230000009012 visual motion Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/40—Image enhancement or restoration using histogram techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/269—Analysis of motion using gradient-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a motion estimation method integrating deep learning characteristic optical flow and binocular vision, which comprises the following steps: carrying out equalization pretreatment based on a controllable self-adaptive histogram on a driving image data set; constructing an optical flow feature extraction model based on deep learning, and performing recognition training on a moving target; ranging through a binocular camera to obtain the position of a target object; and acquiring the movement speed of the vehicle body. Compared with the traditional optical flow speed measurement, the method is based on the deep learning optical flow and the binocular imaging principle, motion parameter estimation of carrier displacement and speed can be achieved according to video data, the problems that the traditional optical flow estimation method is too sensitive and cannot stably estimate when the vehicle runs at night, namely in a weak light environment are solved, and reliability is further improved. Meanwhile, the method avoids the accumulated error of the traditional inertial sensor; and the defects of poor anti-interference capability and low updating frequency in the case of depending on GPS positioning and speed measurement.
Description
Technical Field
The invention relates to the technical field of unmanned driving, in particular to a motion estimation method integrating deep learning characteristic optical flow and binocular vision.
Background
The optical flow is a relative visual motion between an observer and an observed scene, and includes motion information such as surfaces and edges of objects in the visual scene. The optical flow can be regarded as the projection of three-dimensional motion on a two-dimensional plane, and the optical flow can be applied to the fields of unmanned driving and the like because the optical flow contains rich motion and three-dimensional structure information of an object and has the characteristics of high robustness, high real-time performance, low cost and no error accumulation.
The existing mainstream speed measurement mode mainly depends on a carried inertial sensor, a GPS positioning speed measurement or a hybrid speed measurement and the like. Inertial sensors have high real-time performance, but accumulate errors with time; the GPS positioning and speed measuring has high precision, but low updating frequency and poor real-time performance, and is easy to receive signal interference. When the vehicle needs to be driven for a long time without signals, the speed measurement is carried out by the aid of ubiquitous optical flow information.
With the continuous development of computer technology, the processing precision and speed can be further improved by carrying out production learning through an artificial neural network (deep learning), and rich data information connotations can be extracted; in the unsupervised algorithm, a real light stream image is not needed as a training sample, a real scene is directly utilized for network training, and besides, a brightness conservation function, a smooth change function and the like are generally used for replacing an error loss function of an end-to-end terminal in a supervised learning model. In the present stage, the optical flow estimation of a moving object can be rapidly and accurately realized through the artificial neural network, but the optical flow realization in a weak light environment is still unstable. Most of the existing visual odometers are monocular visual odometers, visual inertial navigation odometers and the like, but all have the defects of low estimation precision, high requirement on environmental illumination intensity, poor stability and the like.
Disclosure of Invention
The invention aims to provide a motion estimation method integrating deep learning characteristic optical flow and binocular vision, which solves the problems of weak optical flow estimation capability and low precision of the traditional algorithm in a low-light environment; and the defects of accumulated error and poor real-time performance under the traditional motion estimation method are avoided, and a new estimation method is provided for the unmanned technology.
In order to achieve the above object, the present application provides a motion estimation method for fusing a deep learning feature optical flow and binocular vision, including:
carrying out equalization pretreatment based on a controllable self-adaptive histogram on a driving image data set;
constructing an optical flow feature extraction model based on deep learning, and performing recognition training on a moving target;
ranging through a binocular camera to obtain the position of a target object;
and acquiring the movement speed of the vehicle body.
Further, the vehicle driving image data set is subjected to equalization preprocessing based on a controllable self-adaptive histogram, specifically comprising the following steps: the method comprises the following steps of scaling an original driving image into a set resolution ratio, carrying out controllable self-adaptive histogram equalization processing, and then cutting the driving image into a preset value to limit amplification intensity to obtain a neighborhood cumulative distribution function:
wherein, cdf min Is the minimum value of the cumulative distribution function of the pixel values, mxN is the number of running image pixels, G i Is the number of gray levels.
Further, an optical flow feature extraction model based on deep learning is constructed, and recognition training is carried out on the moving target, specifically:
designing a local smoothing hypothesis according to the optical flow characteristics to obtain an optical flow equation:
wherein x is a pixel abscissa, y is a pixel ordinate, and t is time; dx, dy, dt are the differentials of x, y, t,/is the optical flow image information; alpha is a differential operation symbol, and delta x, delta y and delta t are change values of x, y and t;
constructing two CNN layers sharing weight to extract the characteristics of the driving image;
and (3) carrying out inner product calculation on the feature pairs of the two driving images: characteristic f 1 ∈R H×W×D And characteristic f 2 ∈R H×W×D Respectively representing driving images I 1 And I 2 The visual similarity is obtained by the pairwise inner product of the feature vectors, and is expressed as:
wherein C (f) 1 ,f 2 )∈R H×W×H×W Ij and kl are the position information of the light flow point of the first frame and the second frame, respectively, d is the specific channel of the image, and the value range is [0, D-1 ]]C is a four-dimensional vector feature; h and W are image resolution, and D is the number of channels;
constructing a pyramid to perform pooling operation on the four-dimensional vector characteristics;
obtaining four-dimensional vector characteristics of a high-resolution driving image: recording the corresponding point x' = (u + f) of optical flow between two frames due to data cost between pyramid levels 1 (u),v+f 2 (v) U is the pixel abscissa, v is the pixel ordinate, f) 1 As an optical flow feature of the first frame image, f 2 For the optical flow feature of the second frame image, the neighborhood grid ism is the number of layers, thenFinding any position corresponding to the optical flow on each layer, wherein k is any real number(ii) a According to the corresponding relation, the four-dimensional vector characteristics of the high-resolution driving image are expressed as follows:
wherein m is the mth layer of the pyramid, and p and q are respectively information of the pth row and the pth column in a pixel matrix of the optical flow point on the mth layer;
the CNN layer carries out iterative update on the driving image data: given a current optical flow state of f k Each iteration generates a residual optical flow, i.e. an updated value f, relative to the output of the last iteration 1 If delta f, the predicted value of the next optical flow is delta f + f k =f k+1 (ii) a The updating method comprises the following steps:
wherein R is t To reset the gate, Z t To update the gate, σ is a function operation, H t For preserving the information content of the hidden state of the previous stage, H t-1 For the hidden layer, X t For the optical flow input value W r 、W z Is a weight information matrix.
After the four-dimensional vector characteristics of a high-resolution driving image (moving object image after motion enhancement) are obtained, the pixel information of an optical flow is obtained under the original resolution tracking of a pyramid, and then a moving target moving area is obtained, wherein the minimum image displacement of the area is v = [ v ] = x ,v y ] T The matching error and the minimum value epsilon (v) in each point neighborhood range are:
wherein v is x ,v y Respectively the transverse and longitudinal displacement of pyramid top layer, p x As the abscissa of the luminous flux point, w x Is a range of the abscissa neighborhood, p y As the abscissa of the luminous flux point, w y Is a longitudinal seatA scale domain, A (x, y) is a first frame optical flow feature, B (x, y) is a second frame optical flow feature;
carrying out recognition training on the moving target object in the moving area;
a supervision algorithm is selected for the model, and the loss function is set as:
i.e. the L1 norm of the iteration result and the true value, where N is the number of iterations, γ =0.8; f. of gt To estimate the optical flow characteristics, f i For the actual light flow characteristic, Δ x gt 、Δy rt To estimate the amount of lateral and longitudinal displacement of the optical flow, Δ x i 、Δy i And the horizontal and vertical displacement coordinates of the actual optical flow are obtained.
Further, the CNN layers comprise two residual layers with 1/2 resolution, two 1/4 resolution and two 1/8 resolution, and the number of channels is increased when the resolution between the residual layers is reduced by half; when extracting features, two continuous frames are input, then there is R H×W×3 →R H×W×D And H and W are image resolution, and D is the number of channels.
Further, constructing a pyramid to perform pooling operation on the four-dimensional vector features, specifically:
constructing three layers of similarity driving image pyramids, wherein kernels are respectively 1,2 and 4 from one layer to three layers, and pooling the latter two dimensions of the four-dimensional vector features; the road image pyramid is represented as follows:
wherein L is a pyramid layer, I L And x and y are optical flow point pixel position information.
Furthermore, the distance measurement is carried out through a binocular camera to obtain the position of the target, and the method specifically comprises the following steps:
obtaining distortion description according to parameters and distortion coefficients of a left camera and a right camera;
projecting and changing left and right images shot in the same scene at the same time, namely acquiring a correction mapping table and remapping by using the correction mapping table;
finding corresponding points in the left image and the right image through the SGM in stereo matching to obtain the visual difference disparity = u l -u r Wherein u is l 、u r Respectively generating a disparity map for the column coordinates of the target corresponding point in the left image and the right image, and then performing filtering processing by using a Guided Filter;
calculating the depth of the targetWhere f is the focal length, b is the baseline length, d is the parallax, c xr 、c xl Column coordinates of two camera principal points.
Constructing a 3D space according to a group of disparity maps to obtain three-dimensional coordinates of pixel points:
[X Y Z W] T =Q*[g h disparity(g,h)1] T
wherein X, Y, Z and W are four-dimensional information obtained after matrix transformation; g. h is the position information of each pixel, Q is the perspective transformation matrix, and 3DImage (a, c, d) is the (x, y, z) coordinate information in the viewing coordinate system.
Further, the distortion disparity (g, h) is described as:
wherein s is 1 、s 2 、s 3 Is the distortion coefficient of the thin prism, k 1 、k 2 、k 3 、k 4 、k 5 、k 6 Is the radial distortion coefficient, p 1 、p 2 Is the tangential distortion coefficient, r is the distortion radius;
furthermore, two images shot in the same scene at the same time are subjected to projection change, specifically:
the two image planes are parallel to the base line, and the same target point is positioned on the same horizontal line in the left image and the right image, namely the coplanar lines are aligned;
x point in space, through left and right lens C of camera 1 、C 2 The projection matrix equation of (a) is respectively:
wherein (u) 1 ,v 1 1) and (u) 2 ,v 2 1) are each x 1 And x 2 Homogeneous coordinates in the respective images; (X, Y, Z, 1) is homogeneous coordinate of the point P under world coordinate; m is ij Is the ith row and j column elements of the projection matrix M;
when a point becomes a straight line element, let c 1 、c 2 The straight lines S of the left camera and the right camera corresponding to the same space respectively have the following linear equations in a space coordinate system:
and substituting the above formula into a projection matrix to obtain a mapping equation of the left camera and the right camera under a projection plane:
further, the vehicle body movement speed is obtained, specifically:
wherein d is k 、d k-1 And the Z-axis value is the three-dimensional coordinate value of the pixel point of the current frame and the pixel point of the previous frame.
Compared with the prior art, the technical scheme adopted by the invention has the advantages that: the method is based on the deep learning optical flow and binocular imaging principle, and can realize the motion parameter estimation of the displacement and the speed of the vehicle body according to the video data, so that the motion estimation can be carried out by the method when the vehicle is driven at night, and the reliability is further improved. Meanwhile, the method avoids the defects of poor anti-jamming capability and low updating frequency in the traditional method of relying on an inertial sensor and relying on GPS positioning speed measurement; the method has the characteristics of high portability, high real-time performance, excellent robustness, no accumulated error, low cost and the like, and provides new motion estimation for the unmanned technology.
Drawings
FIG. 1 is a hardware schematic involved in the embodiment;
FIG. 2 is a schematic block diagram of a motion estimation method;
FIG. 3 is a flow chart of a motion estimation method;
FIG. 4 is a view of the structure of an optical flow model;
FIG. 5 is a diagram of an optical flow model implementation;
FIG. 6 is a schematic diagram of tangential distortion formation during calibration and correction;
fig. 7 is a schematic diagram of coplanar row alignment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the application, i.e., the embodiments described are only a subset of, and not all embodiments of the application.
Example 1
The method determines the road environment characteristics through the optical flow strengthening information based on the artificial neural network, identifies the stationary objects at two sides of the road, determines the distance between the carrier and the target object, and further obtains the speed according to the distance difference and the time between the carrier and the target object between two frames. The camera and the video velocimeter are physically independent from each other, as shown in fig. 1. The schematic block diagram is shown in fig. 2, and the motion estimation system implementing the method includes: the system comprises a video acquisition unit, a frame-by-frame recording unit, a storage unit, a deep learning intelligent motion enhancement unit, an optical flow characteristic target identification unit and a motion analysis unit, wherein the motion enhancement unit and the target identification unit are packaged together and are combined into a motion target calibration unit. The images shot by the binocular camera are connected with the storage unit through the video acquisition unit and the bus, so that the shooting and recording functions are realized; the computer reads the collected image data, sequentially identifies roadside objects through the frame-by-frame recording unit, the moving target calibration unit and the motion analysis unit, constructs three-dimensional coordinate information, judges the motion state of the vehicle where the device is located through the relative motion speed of the three-dimensional coordinate information, and finally returns the information to the storage unit to perform a motion analysis function.
As shown in fig. 3, a method for motion estimation by fusing deep learning feature optical flow and binocular vision includes:
s1, carrying out controllable self-adaptive histogram equalization preprocessing on a driving image data set;
specifically, driving videos under different speeds and road conditions are collected, and a driving image data set X = X [ n ] is established] 1 ,X[n] 2 ,X[n] 3 …X[n] fo N is the video sequence number, fo is the video frame number.
Scaling the original driving image to a set resolution (such as 1088 multiplied by 436) and carrying out controllable adaptive histogram equalization (CLAHE) processing, then cutting the driving image to a preset value to limit the amplification intensity, and obtaining a neighborhood cumulative distribution function:
wherein, cdf min Is the minimum value of the cumulative distribution function of the pixel values, mxN is the number of running image pixels, G i Is the number of gray levels (which may be set to 256 in this embodiment). The method can improve the contrast of the image, enhance the color intensity of the image in a low-light environment and facilitate subsequent analysis.
S2, constructing an optical flow model, and performing recognition training on a moving target object; as shown in fig. 4:
s2.1 design local smoothing assumptions from optical flow characteristics: assume one: the brightness is constant, and the brightness of the image pixel is not changed when the frame moves; suppose two: small motion, the motion between pixel frames is small, namely the relative motion of the image along with the change of time is small; suppose three: spatially coherent, adjacent points on the same surface in the same scene have similar motion. The optical flow equation is derived from the above assumptions:
s2.2, two CNN layers sharing weight are constructed to extract the characteristics of the driving image, and the CNN layer architecture is shown in FIG. 5;
specifically, the CNN layers include two 1/2 resolutions, two 1/4 resolutions, and two 1/8 resolutions, and total six residual layers, and the number of channels increases when the resolution between the residual layers decreases by half; when extracting features, two continuous frames are input, then there is R H×W×3 →R H×W×D Where D is the number of channels, which may be set to 256.
S2.3, carrying out inner product calculation on the feature pairs of the two driving images;
in particular, characteristic f 1 ∈R H×W×D And characteristic f 2 ∈R H×W×D Respectively representing driving images I 1 And I 2 The visual similarity is obtained by the pairwise inner product of the feature vectors, and is expressed as:
wherein C (f) 1 ,f 2 )∈R H×W×H×W Ij and kl are the position information of the light flow point of the first and second frames, respectively, d is the specific channel of the image, and the value range is [0, D-1 ]]C is a four-dimensional vector feature; wherein H and W are image resolution, and D is the number of channels;
according to the optical flow equation and the brightness constancy hypothesis in step S2.1, the inter-image motion state is obtained by the least square method:
wherein, I x (q i )、I y (q i )、I t (q i ) Is the optical flow characteristic of the area pixel around the optical flow pixel position, i is 1,2,3 \8230;
s2.4, constructing a pyramid to perform pooling operation on the four-dimensional vector characteristics;
specifically, three layers of relevant pyramids are constructed, 1,2 and 4 kernels are respectively arranged from one layer to three layers, and pooling processing is performed on two dimensions behind the four-dimensional vector obtained in the step 2.3, so that motion change is more obvious. The road image pyramid is represented as follows:
S2.5, acquiring four-dimensional vector characteristics of the high-resolution driving image;
specifically, note the optical flow corresponding point x' = (u + f) between two frames 1 (u),v+f 2 (v) U is the pixel abscissa, v is the pixel ordinate, f) 1 As an optical flow feature of the first frame image, f 2 For the optical flow feature of the second frame image, the neighborhood grid ism is the number of layers, then passSearching any position corresponding to the optical flow on each layer, wherein k is any real number; according to the corresponding relation, the four-dimensional vector characteristics of the high-resolution driving image are expressed as follows:
wherein m is the mth layer of the pyramid, and p and q are respectively information of the pth row and the pth column in a pixel matrix of the optical flow point on the mth layer;
s2.6, the CNN layer carries out iterative update on the driving image data;
specifically, the CNN layer performs an optical flow estimation to generate image features, and also performs a data iteration function. An update iteration is a sequence of classical gated cyclic units with previous data, which can be trained by shared weight convolution layers. Default initial value is 0, given current optical flow state as f k Each iteration generates a residual optical flow, i.e. an updated value f, relative to the output of the last iteration 1 If delta f, then the predicted value of the next optical flow is delta f + f k =f k+1 (ii) a The update gates of the gated loop unit are as follows:
wherein R is t To reset the gate, Z t To update the gate, σ is a function operation, H t To remain the last oneAmount of information of stage hidden state, H t-1 For the hidden layer, X t For the optical flow input value W r 、W z Is a weight information matrix.
S2.7, after four-dimensional vector characteristics of a high-resolution driving image (moving object image after motion enhancement) are obtained, pixel information of an optical flow is obtained under the original resolution tracking of a pyramid, and then a motion target activity area is obtained, wherein the minimum displacement of the image in the area is v = [ v ] = [ v = x ,v y ] T The matching error and minimum value ε (v) in each point neighborhood range is:
wherein v is x ,v y Respectively the transverse and longitudinal displacement of the pyramid top layer, p x As the abscissa of the luminous flux point, w x Is a range of the abscissa neighborhood, p y As the abscissa of the luminous flux point, w y The range of the ordinate neighborhood is A (x, y) is a first frame optical flow characteristic, and B (x, y) is a second frame optical flow characteristic;
s2.8, performing recognition training on the moving target object in the moving area; ( Such as: common roadside facilities such as street trees, signboards, buildings and the like )
And selecting a supervision algorithm for the model, wherein the loss function is set as follows:
i.e. the L1 norm of the iteration result and the true value, where N is the number of iterations and γ =0.8.
S3, ranging through a binocular camera to obtain the position of a target object;
s3.1, obtaining distortion description according to parameters and distortion coefficients of the left camera and the right camera;
in particular, barrel distortion is caused by the configuration of the lens, and tangential distortion is caused by the inability of the lens and the imaging plane to be perfectly parallel during the camera assembly process. Therefore, distortion vectors are obtained through the internal parameters, the external parameters and the distortion coefficients of the left camera and the right camera; for correction in a subsequent step. The distortion disparity (g, h) is described as:
wherein s is 1 、s 2 、s 3 Is the distortion coefficient of the thin prism, k 1 、k 2 、k 3 、k 4 、k 5 、k 6 Is the radial distortion coefficient, p 1 、p 2 Is the tangential distortion coefficient and r is the distortion radius.
S3.2, projecting and changing the left image and the right image which are shot in the same scene at the same time, namely acquiring a correction mapping table, and remapping by using the correction mapping table;
specifically, due to parallax, images obtained by the left lens and the right lens of the binocular camera cannot be completely overlapped, so that two views which are shot in the same scene at the same time are subjected to projection change: the two image planes are parallel to the baseline and the same target point is on the same horizontal line in both the left and right images, i.e., co-planar rows are aligned, as shown in fig. 7.
X point in space, passing through left and right lens C of camera 1 、C 2 The projection matrix equations of (a) are respectively:
wherein (u) 1 ,v 1 1) and (u) 2 ,v 2 1) are each x 1 And x 2 Image homogeneous coordinates in the respective images; x, Y, Z, 1) is a homogeneous coordinate of the point P under world coordinates; m is a unit of ij Is the ith row and j column element of the projection matrix M.
When a point becomes a straight line element, let c 1 、c 2 Are respectively left and right phaseThe machine corresponds to a straight line S in the same space, and the linear equation in a space coordinate system is as follows:
and substituting the above formula into a projection matrix to obtain a mapping equation of the left camera and the right camera under a projection plane:
s3.3 finding corresponding points in the left image and the right image through the SGM to obtain the visual difference disparity = u l -u r Wherein u is l 、u r And respectively generating a disparity map for the column coordinates of the target corresponding point in the left image and the right image, and then performing filtering processing by using a Guided Filter to reduce noise.
S3.4 calculating the depth of the target objectWhere f is the focal length, b is the baseline length, d is the parallax, c xr 、c xl Column coordinates of two camera principal points;
s3.5, constructing a 3D space according to the group of parallax images to obtain three-dimensional coordinates of pixel points:
[X Y Z W] T =Q*[g h disparity(g,h)1] T
wherein X, Y, Z and W are four-dimensional information obtained after matrix transformation; g. h is the position information of each pixel, Q is the perspective transformation matrix, and 3DImage (a, c, d) is the (x, y, z) coordinate information in the viewing coordinate system. Finally, a three-dimensional matrix is obtained, and X, Y and Z coordinates are recorded respectively (a coordinate system is established by taking the left camera as a reference).
S4, acquiring the movement speed of the vehicle body;
specifically, the Z-axis coordinate value obtained in step 3.5 is extracted from two adjacent frames, and the vehicle body movement speed can be obtained as the camera shooting mode is 25fps and the relative movement principleWherein d is k 、d k-1 And the three-dimensional coordinate Z-axis value of the pixel point of the current frame and the pixel point of the previous frame is obtained.
The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the invention and various alternatives and modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.
Claims (9)
1. A motion estimation method for fusing deep learning characteristic optical flow and binocular vision is characterized by comprising the following steps:
carrying out equalization pretreatment based on a controllable self-adaptive histogram on a driving image data set;
constructing an optical flow feature extraction model based on deep learning, and performing recognition training on a moving target;
measuring distance through a binocular camera to obtain the position of a target object;
and acquiring the movement speed of the vehicle body.
2. The method for estimating motion by fusing deep learning characteristic optical flow and binocular vision according to claim 1, wherein a driving image dataset is subjected to equalization preprocessing based on a controllable adaptive histogram, specifically: the method comprises the following steps of scaling an original driving image into a set resolution ratio, carrying out controllable self-adaptive histogram equalization processing, and then cutting the driving image into a preset value to limit amplification intensity to obtain a neighborhood cumulative distribution function:
wherein, cdf min Is the minimum value of the cumulative distribution function of the pixel values, mxN is the number of running image pixels, G i Is the number of gray levels.
3. The method for motion estimation by fusing deep learning characteristic optical flow and binocular vision according to claim 1, is characterized by constructing a deep learning-based optical flow characteristic extraction model and performing recognition training on a moving object, and specifically comprises the following steps:
designing a local smoothing hypothesis according to the optical flow characteristics to obtain an optical flow equation:
wherein x is a pixel abscissa, y is a pixel ordinate, and t is time; dx, dy and dt are derivatives of x, y and t, and I is optical flow image information; alpha is a differential operation sign, and delta x, delta y and delta t are change values of x, y and t;
constructing two CNN layers sharing weight to extract the characteristics of the driving image;
and (3) carrying out inner product calculation on the feature pairs of the two driving images: characteristic f 1 ∈R H×W×D And feature f 2 ∈R H×W×D Respectively represent driving images I 1 And I 2 The visual similarity is obtained by the pairwise inner product of the feature vectors, and is expressed as:
wherein C (f) 1 ,f 2 )∈R H×W×H×W Ij and kl are the position information of the light flow point of the first and second frames, respectively, d is the specific channel of the image, and the value range is [0, D-1 ]]C is a four-dimensional vector feature; wherein H and W are image resolution, and D is the number of channels;
constructing a pyramid to perform pooling operation on the four-dimensional vector characteristics;
acquiring four-dimensional vector characteristics of a high-resolution driving image: recording the corresponding point x' = (u + f) of optical flow between two frames due to data cost between pyramid levels 1 (u),v+f 2 (v) U is the pixel abscissa, v is the pixel ordinate, f) 1 As an optical flow feature of the first frame image, f 2 For the optical flow characteristics of the second frame image, the neighborhood grid ism is the number of layers, then passSearching any position corresponding to the optical flow on each layer, wherein k is any real number; according to the corresponding relation, the four-dimensional vector characteristics of the high-resolution driving image are expressed as follows:
wherein m is the mth layer of the pyramid, and p and q are respectively information of the pth row and the pth column in a pixel matrix of the optical flow point on the mth layer;
the CNN layer carries out iterative update on the driving image data: given a current optical flow state of f k Each iteration generates a residual optical flow, i.e. an updated value f, relative to the output of the last iteration 1 Δ f, then the next step of optical flow predictionA value of Δ f + f k =f k+1 (ii) a The updating method comprises the following steps:
wherein R is t To reset the gate, Z t To update the gate, σ is a function operation, H t For preserving the information content of the hidden state of the previous stage, H t-1 For the hidden layer, X t For the optical flow input value W r 、W z Is a weight information matrix;
after the four-dimensional vector characteristics of the high-resolution driving image are obtained, the pixel information of the optical flow is obtained under the original resolution tracking of the pyramid, and the moving object moving area is obtained, wherein the minimum image displacement of the area is v = [ v ] x ,v y ] T The matching error and minimum value ε (v) in each point neighborhood range is:
wherein v is x ,v y Respectively the transverse and longitudinal displacement of pyramid top layer, p x As the abscissa of the luminous flux point, w x Is a range of the abscissa neighborhood, p y As the abscissa of the luminous flux point, w y The range of the vertical coordinate neighborhood is defined, A (x, y) is a first frame optical flow characteristic, and B (x, y) is a second frame optical flow characteristic;
carrying out recognition training on the moving target object in the moving area;
and selecting a supervision algorithm for the model, wherein the loss function is set as follows:
i.e. L1 norm of the iteration result and the true value, where N is the number of iterations, γ =0.8; f. of gt To estimate the optical flow characteristics, f i For the actual light flow characteristic, Δ x gt 、Δy rt To estimate the amount of lateral and longitudinal displacement of the optical flow, Δ x i 、Δy i And the horizontal and vertical displacement coordinates of the actual optical flow are obtained.
4. The method for motion estimation by fusing deep learning characteristic optical flow and binocular vision according to claim 3, wherein the CNN layers comprise two residual layers with 1/2 resolution, two 1/4 resolution and two 1/8 resolution, and the number of channels increases every half of the resolution between the residual layers; when feature extraction is performed, two continuous frames are input, and R is present H×W×3 →R H×W×D Wherein H and W are image resolution, and D is the number of channels.
5. The method for motion estimation by fusing deep learning feature optical flow and binocular vision according to claim 3, characterized by constructing a pyramid to perform pooling operation on the four-dimensional vector features, specifically:
constructing three layers of similarity running image pyramids, wherein kernels are 1,2 and 4 from one layer to three layers respectively, and pooling the latter two dimensions of the four-dimensional vector features; the road image pyramid is represented as follows:
wherein L is a pyramid layer, I L And x and y are optical flow point pixel position information.
6. The method for estimating the motion by fusing the optical flow with the deep learning features and the binocular vision according to claim 1, wherein the target position is obtained by ranging through a binocular camera, and specifically comprises the following steps:
obtaining distortion description according to parameters and distortion coefficients of the left camera and the right camera;
projecting and changing left and right images shot in the same scene at the same time, namely acquiring a correction mapping table and remapping the correction mapping table;
finding corresponding points in the left image and the right image through the SGM in stereo matching to obtain the visual difference disparity = u l -u r Wherein u is l 、u r Respectively generating a disparity map for the column coordinates of the target corresponding point in the left image and the right image, and then performing filtering processing by using a guidedFilter;
calculating the depth of the targetWhere f is the focal length, b is the baseline length, d is the parallax, c xr 、c xl Column coordinates for two camera principal points;
constructing a 3D space according to a group of disparity maps to obtain three-dimensional coordinates of pixel points:
[X Y Z W] T =Q*[g h disparity(g,h) 1] T
wherein X, Y, Z and W are four-dimensional information obtained after matrix transformation; g. h is position information of each pixel, Q is a perspective transformation matrix, and 3DImage (a, c, d) is (x, y, z) coordinate information in the viewing coordinate system.
7. The method for motion estimation by fusing deep learning feature optical flow and binocular vision according to claim 6, wherein the distortion disparity (g, h) is described as:
wherein s is 1 、s 2 、s 3 Is the thin prism distortion coefficient, k 1 、k 2 、k 3 、k 4 、k 5 、k 6 Is the radial distortion coefficient, p 1 、p 2 Is the tangential distortion coefficient and r is the distortion radius.
8. The method for estimating motion by fusing deep learning characteristic optical flow and binocular vision according to claim 6, wherein two images simultaneously shot in the same scene are subjected to projection change, specifically:
the two image planes are parallel to the base line, and the same target point is positioned on the same horizontal line in the left image and the right image, namely the coplanar lines are aligned;
x point in space, through left and right lens C of camera 1 、C 2 The projection matrix equations of (a) are respectively:
wherein (u) 1 ,v 1 1) and (u) 2 ,v 2 1) are each x 1 And x 2 Homogeneous coordinates in the respective images; (X, Y, Z, 1) is a homogeneous coordinate of the point P under world coordinates; m is a unit of ij Is the ith row and j columns of elements of the projection matrix M;
when a point becomes a straight line element, let c 1 、c 2 The straight lines S corresponding to the same space are respectively taken as the left camera and the right camera, and the linear equation under the space coordinate system is as follows:
and substituting the above formula into a projection matrix to obtain a mapping equation of the left camera and the right camera under a projection plane:
9. the method for estimating the motion by fusing the deep learning characteristic optical flow and the binocular vision according to claim 1, wherein the method for estimating the motion by fusing the deep learning characteristic optical flow and the binocular vision is characterized by acquiring the motion speed of a vehicle body, and specifically comprises the following steps:
wherein d is k 、d k-1 And the Z-axis value is the three-dimensional coordinate value of the pixel point of the current frame and the pixel point of the previous frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211149943.1A CN115482257A (en) | 2022-09-21 | 2022-09-21 | Motion estimation method integrating deep learning characteristic optical flow and binocular vision |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211149943.1A CN115482257A (en) | 2022-09-21 | 2022-09-21 | Motion estimation method integrating deep learning characteristic optical flow and binocular vision |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115482257A true CN115482257A (en) | 2022-12-16 |
Family
ID=84423546
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211149943.1A Pending CN115482257A (en) | 2022-09-21 | 2022-09-21 | Motion estimation method integrating deep learning characteristic optical flow and binocular vision |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115482257A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117705064A (en) * | 2023-12-15 | 2024-03-15 | 河南理工大学 | Vehicle running state judging method based on visual assistance in urban canyon |
-
2022
- 2022-09-21 CN CN202211149943.1A patent/CN115482257A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117705064A (en) * | 2023-12-15 | 2024-03-15 | 河南理工大学 | Vehicle running state judging method based on visual assistance in urban canyon |
CN117705064B (en) * | 2023-12-15 | 2024-09-17 | 河南理工大学 | Vehicle running state judging method based on visual assistance in urban canyon |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112435325B (en) | VI-SLAM and depth estimation network-based unmanned aerial vehicle scene density reconstruction method | |
CN110569704B (en) | Multi-strategy self-adaptive lane line detection method based on stereoscopic vision | |
CN110675418B (en) | Target track optimization method based on DS evidence theory | |
CN110689562A (en) | Trajectory loop detection optimization method based on generation of countermeasure network | |
Vaudrey et al. | Differences between stereo and motion behaviour on synthetic and real-world stereo sequences | |
CN110688905B (en) | Three-dimensional object detection and tracking method based on key frame | |
CN113506318B (en) | Three-dimensional target perception method under vehicle-mounted edge scene | |
CN111797684B (en) | Binocular vision ranging method for moving vehicle | |
Qian et al. | Robust visual-lidar simultaneous localization and mapping system for UAV | |
CN112465021A (en) | Pose track estimation method based on image frame interpolation method | |
CN110706253B (en) | Target tracking method, system and device based on apparent feature and depth feature | |
CN115482257A (en) | Motion estimation method integrating deep learning characteristic optical flow and binocular vision | |
CN116188550A (en) | Self-supervision depth vision odometer based on geometric constraint | |
CN112432653A (en) | Monocular vision inertial odometer method based on point-line characteristics | |
CN109740405B (en) | Method for detecting front window difference information of non-aligned similar vehicles | |
CN113706599B (en) | Binocular depth estimation method based on pseudo label fusion | |
CN114708321B (en) | Semantic-based camera pose estimation method and system | |
CN116129318A (en) | Unsupervised monocular three-dimensional target detection method based on video sequence and pre-training instance segmentation | |
CN116151320A (en) | Visual odometer method and device for resisting dynamic target interference | |
CN115482282A (en) | Dynamic SLAM method with multi-target tracking capability in automatic driving scene | |
CN115496788A (en) | Deep completion method using airspace propagation post-processing module | |
CN111833384B (en) | Method and device for rapidly registering visible light and infrared images | |
CN103236053A (en) | MOF (motion of focus) method for detecting moving objects below mobile platform | |
Huang et al. | Single target tracking in high-resolution satellite videos: a comprehensive review | |
CN115994934B (en) | Data time alignment method and device and domain controller |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |