CN115482257A - Motion estimation method integrating deep learning characteristic optical flow and binocular vision - Google Patents

Motion estimation method integrating deep learning characteristic optical flow and binocular vision Download PDF

Info

Publication number
CN115482257A
CN115482257A CN202211149943.1A CN202211149943A CN115482257A CN 115482257 A CN115482257 A CN 115482257A CN 202211149943 A CN202211149943 A CN 202211149943A CN 115482257 A CN115482257 A CN 115482257A
Authority
CN
China
Prior art keywords
optical flow
image
deep learning
pixel
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211149943.1A
Other languages
Chinese (zh)
Inventor
关乐
张天琦
王鑫阳
王珍
张志新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN202211149943.1A priority Critical patent/CN115482257A/en
Publication of CN115482257A publication Critical patent/CN115482257A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/40Image enhancement or restoration using histogram techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a motion estimation method integrating deep learning characteristic optical flow and binocular vision, which comprises the following steps: carrying out equalization pretreatment based on a controllable self-adaptive histogram on a driving image data set; constructing an optical flow feature extraction model based on deep learning, and performing recognition training on a moving target; ranging through a binocular camera to obtain the position of a target object; and acquiring the movement speed of the vehicle body. Compared with the traditional optical flow speed measurement, the method is based on the deep learning optical flow and the binocular imaging principle, motion parameter estimation of carrier displacement and speed can be achieved according to video data, the problems that the traditional optical flow estimation method is too sensitive and cannot stably estimate when the vehicle runs at night, namely in a weak light environment are solved, and reliability is further improved. Meanwhile, the method avoids the accumulated error of the traditional inertial sensor; and the defects of poor anti-interference capability and low updating frequency in the case of depending on GPS positioning and speed measurement.

Description

Motion estimation method integrating deep learning characteristic optical flow and binocular vision
Technical Field
The invention relates to the technical field of unmanned driving, in particular to a motion estimation method integrating deep learning characteristic optical flow and binocular vision.
Background
The optical flow is a relative visual motion between an observer and an observed scene, and includes motion information such as surfaces and edges of objects in the visual scene. The optical flow can be regarded as the projection of three-dimensional motion on a two-dimensional plane, and the optical flow can be applied to the fields of unmanned driving and the like because the optical flow contains rich motion and three-dimensional structure information of an object and has the characteristics of high robustness, high real-time performance, low cost and no error accumulation.
The existing mainstream speed measurement mode mainly depends on a carried inertial sensor, a GPS positioning speed measurement or a hybrid speed measurement and the like. Inertial sensors have high real-time performance, but accumulate errors with time; the GPS positioning and speed measuring has high precision, but low updating frequency and poor real-time performance, and is easy to receive signal interference. When the vehicle needs to be driven for a long time without signals, the speed measurement is carried out by the aid of ubiquitous optical flow information.
With the continuous development of computer technology, the processing precision and speed can be further improved by carrying out production learning through an artificial neural network (deep learning), and rich data information connotations can be extracted; in the unsupervised algorithm, a real light stream image is not needed as a training sample, a real scene is directly utilized for network training, and besides, a brightness conservation function, a smooth change function and the like are generally used for replacing an error loss function of an end-to-end terminal in a supervised learning model. In the present stage, the optical flow estimation of a moving object can be rapidly and accurately realized through the artificial neural network, but the optical flow realization in a weak light environment is still unstable. Most of the existing visual odometers are monocular visual odometers, visual inertial navigation odometers and the like, but all have the defects of low estimation precision, high requirement on environmental illumination intensity, poor stability and the like.
Disclosure of Invention
The invention aims to provide a motion estimation method integrating deep learning characteristic optical flow and binocular vision, which solves the problems of weak optical flow estimation capability and low precision of the traditional algorithm in a low-light environment; and the defects of accumulated error and poor real-time performance under the traditional motion estimation method are avoided, and a new estimation method is provided for the unmanned technology.
In order to achieve the above object, the present application provides a motion estimation method for fusing a deep learning feature optical flow and binocular vision, including:
carrying out equalization pretreatment based on a controllable self-adaptive histogram on a driving image data set;
constructing an optical flow feature extraction model based on deep learning, and performing recognition training on a moving target;
ranging through a binocular camera to obtain the position of a target object;
and acquiring the movement speed of the vehicle body.
Further, the vehicle driving image data set is subjected to equalization preprocessing based on a controllable self-adaptive histogram, specifically comprising the following steps: the method comprises the following steps of scaling an original driving image into a set resolution ratio, carrying out controllable self-adaptive histogram equalization processing, and then cutting the driving image into a preset value to limit amplification intensity to obtain a neighborhood cumulative distribution function:
Figure BDA0003856610570000021
wherein, cdf min Is the minimum value of the cumulative distribution function of the pixel values, mxN is the number of running image pixels, G i Is the number of gray levels.
Further, an optical flow feature extraction model based on deep learning is constructed, and recognition training is carried out on the moving target, specifically:
designing a local smoothing hypothesis according to the optical flow characteristics to obtain an optical flow equation:
Figure BDA0003856610570000031
wherein x is a pixel abscissa, y is a pixel ordinate, and t is time; dx, dy, dt are the differentials of x, y, t,/is the optical flow image information; alpha is a differential operation symbol, and delta x, delta y and delta t are change values of x, y and t;
constructing two CNN layers sharing weight to extract the characteristics of the driving image;
and (3) carrying out inner product calculation on the feature pairs of the two driving images: characteristic f 1 ∈R H×W×D And characteristic f 2 ∈R H×W×D Respectively representing driving images I 1 And I 2 The visual similarity is obtained by the pairwise inner product of the feature vectors, and is expressed as:
Figure BDA0003856610570000032
wherein C (f) 1 ,f 2 )∈R H×W×H×W Ij and kl are the position information of the light flow point of the first frame and the second frame, respectively, d is the specific channel of the image, and the value range is [0, D-1 ]]C is a four-dimensional vector feature; h and W are image resolution, and D is the number of channels;
constructing a pyramid to perform pooling operation on the four-dimensional vector characteristics;
obtaining four-dimensional vector characteristics of a high-resolution driving image: recording the corresponding point x' = (u + f) of optical flow between two frames due to data cost between pyramid levels 1 (u),v+f 2 (v) U is the pixel abscissa, v is the pixel ordinate, f) 1 As an optical flow feature of the first frame image, f 2 For the optical flow feature of the second frame image, the neighborhood grid is
Figure BDA0003856610570000033
m is the number of layers, then
Figure BDA0003856610570000034
Finding any position corresponding to the optical flow on each layer, wherein k is any real number(ii) a According to the corresponding relation, the four-dimensional vector characteristics of the high-resolution driving image are expressed as follows:
Figure BDA0003856610570000041
wherein m is the mth layer of the pyramid, and p and q are respectively information of the pth row and the pth column in a pixel matrix of the optical flow point on the mth layer;
the CNN layer carries out iterative update on the driving image data: given a current optical flow state of f k Each iteration generates a residual optical flow, i.e. an updated value f, relative to the output of the last iteration 1 If delta f, the predicted value of the next optical flow is delta f + f k =f k+1 (ii) a The updating method comprises the following steps:
Figure BDA0003856610570000042
wherein R is t To reset the gate, Z t To update the gate, σ is a function operation, H t For preserving the information content of the hidden state of the previous stage, H t-1 For the hidden layer, X t For the optical flow input value W r 、W z Is a weight information matrix.
After the four-dimensional vector characteristics of a high-resolution driving image (moving object image after motion enhancement) are obtained, the pixel information of an optical flow is obtained under the original resolution tracking of a pyramid, and then a moving target moving area is obtained, wherein the minimum image displacement of the area is v = [ v ] = x ,v y ] T The matching error and the minimum value epsilon (v) in each point neighborhood range are:
Figure BDA0003856610570000043
wherein v is x ,v y Respectively the transverse and longitudinal displacement of pyramid top layer, p x As the abscissa of the luminous flux point, w x Is a range of the abscissa neighborhood, p y As the abscissa of the luminous flux point, w y Is a longitudinal seatA scale domain, A (x, y) is a first frame optical flow feature, B (x, y) is a second frame optical flow feature;
carrying out recognition training on the moving target object in the moving area;
a supervision algorithm is selected for the model, and the loss function is set as:
Figure BDA0003856610570000051
i.e. the L1 norm of the iteration result and the true value, where N is the number of iterations, γ =0.8; f. of gt To estimate the optical flow characteristics, f i For the actual light flow characteristic, Δ x gt 、Δy rt To estimate the amount of lateral and longitudinal displacement of the optical flow, Δ x i 、Δy i And the horizontal and vertical displacement coordinates of the actual optical flow are obtained.
Further, the CNN layers comprise two residual layers with 1/2 resolution, two 1/4 resolution and two 1/8 resolution, and the number of channels is increased when the resolution between the residual layers is reduced by half; when extracting features, two continuous frames are input, then there is R H×W×3 →R H×W×D And H and W are image resolution, and D is the number of channels.
Further, constructing a pyramid to perform pooling operation on the four-dimensional vector features, specifically:
constructing three layers of similarity driving image pyramids, wherein kernels are respectively 1,2 and 4 from one layer to three layers, and pooling the latter two dimensions of the four-dimensional vector features; the road image pyramid is represented as follows:
Figure BDA0003856610570000052
wherein L is a pyramid layer, I L And x and y are optical flow point pixel position information.
Furthermore, the distance measurement is carried out through a binocular camera to obtain the position of the target, and the method specifically comprises the following steps:
obtaining distortion description according to parameters and distortion coefficients of a left camera and a right camera;
projecting and changing left and right images shot in the same scene at the same time, namely acquiring a correction mapping table and remapping by using the correction mapping table;
finding corresponding points in the left image and the right image through the SGM in stereo matching to obtain the visual difference disparity = u l -u r Wherein u is l 、u r Respectively generating a disparity map for the column coordinates of the target corresponding point in the left image and the right image, and then performing filtering processing by using a Guided Filter;
calculating the depth of the target
Figure BDA0003856610570000061
Where f is the focal length, b is the baseline length, d is the parallax, c xr 、c xl Column coordinates of two camera principal points.
Constructing a 3D space according to a group of disparity maps to obtain three-dimensional coordinates of pixel points:
[X Y Z W] T =Q*[g h disparity(g,h)1] T
Figure BDA0003856610570000062
wherein X, Y, Z and W are four-dimensional information obtained after matrix transformation; g. h is the position information of each pixel, Q is the perspective transformation matrix, and 3DImage (a, c, d) is the (x, y, z) coordinate information in the viewing coordinate system.
Further, the distortion disparity (g, h) is described as:
Figure BDA0003856610570000063
wherein s is 1 、s 2 、s 3 Is the distortion coefficient of the thin prism, k 1 、k 2 、k 3 、k 4 、k 5 、k 6 Is the radial distortion coefficient, p 1 、p 2 Is the tangential distortion coefficient, r is the distortion radius;
furthermore, two images shot in the same scene at the same time are subjected to projection change, specifically:
the two image planes are parallel to the base line, and the same target point is positioned on the same horizontal line in the left image and the right image, namely the coplanar lines are aligned;
x point in space, through left and right lens C of camera 1 、C 2 The projection matrix equation of (a) is respectively:
Figure BDA0003856610570000064
Figure BDA0003856610570000071
wherein (u) 1 ,v 1 1) and (u) 2 ,v 2 1) are each x 1 And x 2 Homogeneous coordinates in the respective images; (X, Y, Z, 1) is homogeneous coordinate of the point P under world coordinate; m is ij Is the ith row and j column elements of the projection matrix M;
when a point becomes a straight line element, let c 1 、c 2 The straight lines S of the left camera and the right camera corresponding to the same space respectively have the following linear equations in a space coordinate system:
Figure BDA0003856610570000072
Figure BDA0003856610570000073
and substituting the above formula into a projection matrix to obtain a mapping equation of the left camera and the right camera under a projection plane:
Figure BDA0003856610570000074
Figure BDA0003856610570000075
further, the vehicle body movement speed is obtained, specifically:
Figure BDA0003856610570000076
wherein d is k 、d k-1 And the Z-axis value is the three-dimensional coordinate value of the pixel point of the current frame and the pixel point of the previous frame.
Compared with the prior art, the technical scheme adopted by the invention has the advantages that: the method is based on the deep learning optical flow and binocular imaging principle, and can realize the motion parameter estimation of the displacement and the speed of the vehicle body according to the video data, so that the motion estimation can be carried out by the method when the vehicle is driven at night, and the reliability is further improved. Meanwhile, the method avoids the defects of poor anti-jamming capability and low updating frequency in the traditional method of relying on an inertial sensor and relying on GPS positioning speed measurement; the method has the characteristics of high portability, high real-time performance, excellent robustness, no accumulated error, low cost and the like, and provides new motion estimation for the unmanned technology.
Drawings
FIG. 1 is a hardware schematic involved in the embodiment;
FIG. 2 is a schematic block diagram of a motion estimation method;
FIG. 3 is a flow chart of a motion estimation method;
FIG. 4 is a view of the structure of an optical flow model;
FIG. 5 is a diagram of an optical flow model implementation;
FIG. 6 is a schematic diagram of tangential distortion formation during calibration and correction;
fig. 7 is a schematic diagram of coplanar row alignment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the application, i.e., the embodiments described are only a subset of, and not all embodiments of the application.
Example 1
The method determines the road environment characteristics through the optical flow strengthening information based on the artificial neural network, identifies the stationary objects at two sides of the road, determines the distance between the carrier and the target object, and further obtains the speed according to the distance difference and the time between the carrier and the target object between two frames. The camera and the video velocimeter are physically independent from each other, as shown in fig. 1. The schematic block diagram is shown in fig. 2, and the motion estimation system implementing the method includes: the system comprises a video acquisition unit, a frame-by-frame recording unit, a storage unit, a deep learning intelligent motion enhancement unit, an optical flow characteristic target identification unit and a motion analysis unit, wherein the motion enhancement unit and the target identification unit are packaged together and are combined into a motion target calibration unit. The images shot by the binocular camera are connected with the storage unit through the video acquisition unit and the bus, so that the shooting and recording functions are realized; the computer reads the collected image data, sequentially identifies roadside objects through the frame-by-frame recording unit, the moving target calibration unit and the motion analysis unit, constructs three-dimensional coordinate information, judges the motion state of the vehicle where the device is located through the relative motion speed of the three-dimensional coordinate information, and finally returns the information to the storage unit to perform a motion analysis function.
As shown in fig. 3, a method for motion estimation by fusing deep learning feature optical flow and binocular vision includes:
s1, carrying out controllable self-adaptive histogram equalization preprocessing on a driving image data set;
specifically, driving videos under different speeds and road conditions are collected, and a driving image data set X = X [ n ] is established] 1 ,X[n] 2 ,X[n] 3 …X[n] fo N is the video sequence number, fo is the video frame number.
Scaling the original driving image to a set resolution (such as 1088 multiplied by 436) and carrying out controllable adaptive histogram equalization (CLAHE) processing, then cutting the driving image to a preset value to limit the amplification intensity, and obtaining a neighborhood cumulative distribution function:
Figure BDA0003856610570000091
wherein, cdf min Is the minimum value of the cumulative distribution function of the pixel values, mxN is the number of running image pixels, G i Is the number of gray levels (which may be set to 256 in this embodiment). The method can improve the contrast of the image, enhance the color intensity of the image in a low-light environment and facilitate subsequent analysis.
S2, constructing an optical flow model, and performing recognition training on a moving target object; as shown in fig. 4:
s2.1 design local smoothing assumptions from optical flow characteristics: assume one: the brightness is constant, and the brightness of the image pixel is not changed when the frame moves; suppose two: small motion, the motion between pixel frames is small, namely the relative motion of the image along with the change of time is small; suppose three: spatially coherent, adjacent points on the same surface in the same scene have similar motion. The optical flow equation is derived from the above assumptions:
Figure BDA0003856610570000092
s2.2, two CNN layers sharing weight are constructed to extract the characteristics of the driving image, and the CNN layer architecture is shown in FIG. 5;
specifically, the CNN layers include two 1/2 resolutions, two 1/4 resolutions, and two 1/8 resolutions, and total six residual layers, and the number of channels increases when the resolution between the residual layers decreases by half; when extracting features, two continuous frames are input, then there is R H×W×3 →R H×W×D Where D is the number of channels, which may be set to 256.
S2.3, carrying out inner product calculation on the feature pairs of the two driving images;
in particular, characteristic f 1 ∈R H×W×D And characteristic f 2 ∈R H×W×D Respectively representing driving images I 1 And I 2 The visual similarity is obtained by the pairwise inner product of the feature vectors, and is expressed as:
Figure BDA0003856610570000101
wherein C (f) 1 ,f 2 )∈R H×W×H×W Ij and kl are the position information of the light flow point of the first and second frames, respectively, d is the specific channel of the image, and the value range is [0, D-1 ]]C is a four-dimensional vector feature; wherein H and W are image resolution, and D is the number of channels;
according to the optical flow equation and the brightness constancy hypothesis in step S2.1, the inter-image motion state is obtained by the least square method:
Figure BDA0003856610570000102
wherein, I x (q i )、I y (q i )、I t (q i ) Is the optical flow characteristic of the area pixel around the optical flow pixel position, i is 1,2,3 \8230;
s2.4, constructing a pyramid to perform pooling operation on the four-dimensional vector characteristics;
specifically, three layers of relevant pyramids are constructed, 1,2 and 4 kernels are respectively arranged from one layer to three layers, and pooling processing is performed on two dimensions behind the four-dimensional vector obtained in the step 2.3, so that motion change is more obvious. The road image pyramid is represented as follows:
Figure BDA0003856610570000111
Figure BDA0003856610570000112
wherein L is a pyramid layer, I L Is an image of the L layer.
S2.5, acquiring four-dimensional vector characteristics of the high-resolution driving image;
specifically, note the optical flow corresponding point x' = (u + f) between two frames 1 (u),v+f 2 (v) U is the pixel abscissa, v is the pixel ordinate, f) 1 As an optical flow feature of the first frame image, f 2 For the optical flow feature of the second frame image, the neighborhood grid is
Figure BDA0003856610570000113
m is the number of layers, then pass
Figure BDA0003856610570000114
Searching any position corresponding to the optical flow on each layer, wherein k is any real number; according to the corresponding relation, the four-dimensional vector characteristics of the high-resolution driving image are expressed as follows:
Figure BDA0003856610570000115
wherein m is the mth layer of the pyramid, and p and q are respectively information of the pth row and the pth column in a pixel matrix of the optical flow point on the mth layer;
s2.6, the CNN layer carries out iterative update on the driving image data;
specifically, the CNN layer performs an optical flow estimation to generate image features, and also performs a data iteration function. An update iteration is a sequence of classical gated cyclic units with previous data, which can be trained by shared weight convolution layers. Default initial value is 0, given current optical flow state as f k Each iteration generates a residual optical flow, i.e. an updated value f, relative to the output of the last iteration 1 If delta f, then the predicted value of the next optical flow is delta f + f k =f k+1 (ii) a The update gates of the gated loop unit are as follows:
Figure BDA0003856610570000121
wherein R is t To reset the gate, Z t To update the gate, σ is a function operation, H t To remain the last oneAmount of information of stage hidden state, H t-1 For the hidden layer, X t For the optical flow input value W r 、W z Is a weight information matrix.
S2.7, after four-dimensional vector characteristics of a high-resolution driving image (moving object image after motion enhancement) are obtained, pixel information of an optical flow is obtained under the original resolution tracking of a pyramid, and then a motion target activity area is obtained, wherein the minimum displacement of the image in the area is v = [ v ] = [ v = x ,v y ] T The matching error and minimum value ε (v) in each point neighborhood range is:
Figure BDA0003856610570000122
wherein v is x ,v y Respectively the transverse and longitudinal displacement of the pyramid top layer, p x As the abscissa of the luminous flux point, w x Is a range of the abscissa neighborhood, p y As the abscissa of the luminous flux point, w y The range of the ordinate neighborhood is A (x, y) is a first frame optical flow characteristic, and B (x, y) is a second frame optical flow characteristic;
s2.8, performing recognition training on the moving target object in the moving area; ( Such as: common roadside facilities such as street trees, signboards, buildings and the like )
And selecting a supervision algorithm for the model, wherein the loss function is set as follows:
Figure BDA0003856610570000123
i.e. the L1 norm of the iteration result and the true value, where N is the number of iterations and γ =0.8.
S3, ranging through a binocular camera to obtain the position of a target object;
s3.1, obtaining distortion description according to parameters and distortion coefficients of the left camera and the right camera;
in particular, barrel distortion is caused by the configuration of the lens, and tangential distortion is caused by the inability of the lens and the imaging plane to be perfectly parallel during the camera assembly process. Therefore, distortion vectors are obtained through the internal parameters, the external parameters and the distortion coefficients of the left camera and the right camera; for correction in a subsequent step. The distortion disparity (g, h) is described as:
Figure BDA0003856610570000131
wherein s is 1 、s 2 、s 3 Is the distortion coefficient of the thin prism, k 1 、k 2 、k 3 、k 4 、k 5 、k 6 Is the radial distortion coefficient, p 1 、p 2 Is the tangential distortion coefficient and r is the distortion radius.
S3.2, projecting and changing the left image and the right image which are shot in the same scene at the same time, namely acquiring a correction mapping table, and remapping by using the correction mapping table;
specifically, due to parallax, images obtained by the left lens and the right lens of the binocular camera cannot be completely overlapped, so that two views which are shot in the same scene at the same time are subjected to projection change: the two image planes are parallel to the baseline and the same target point is on the same horizontal line in both the left and right images, i.e., co-planar rows are aligned, as shown in fig. 7.
X point in space, passing through left and right lens C of camera 1 、C 2 The projection matrix equations of (a) are respectively:
Figure BDA0003856610570000132
Figure BDA0003856610570000133
wherein (u) 1 ,v 1 1) and (u) 2 ,v 2 1) are each x 1 And x 2 Image homogeneous coordinates in the respective images; x, Y, Z, 1) is a homogeneous coordinate of the point P under world coordinates; m is a unit of ij Is the ith row and j column element of the projection matrix M.
When a point becomes a straight line element, let c 1 、c 2 Are respectively left and right phaseThe machine corresponds to a straight line S in the same space, and the linear equation in a space coordinate system is as follows:
Figure BDA0003856610570000141
Figure BDA0003856610570000142
and substituting the above formula into a projection matrix to obtain a mapping equation of the left camera and the right camera under a projection plane:
Figure BDA0003856610570000143
Figure BDA0003856610570000144
s3.3 finding corresponding points in the left image and the right image through the SGM to obtain the visual difference disparity = u l -u r Wherein u is l 、u r And respectively generating a disparity map for the column coordinates of the target corresponding point in the left image and the right image, and then performing filtering processing by using a Guided Filter to reduce noise.
S3.4 calculating the depth of the target object
Figure BDA0003856610570000145
Where f is the focal length, b is the baseline length, d is the parallax, c xr 、c xl Column coordinates of two camera principal points;
s3.5, constructing a 3D space according to the group of parallax images to obtain three-dimensional coordinates of pixel points:
[X Y Z W] T =Q*[g h disparity(g,h)1] T
Figure BDA0003856610570000146
wherein X, Y, Z and W are four-dimensional information obtained after matrix transformation; g. h is the position information of each pixel, Q is the perspective transformation matrix, and 3DImage (a, c, d) is the (x, y, z) coordinate information in the viewing coordinate system. Finally, a three-dimensional matrix is obtained, and X, Y and Z coordinates are recorded respectively (a coordinate system is established by taking the left camera as a reference).
S4, acquiring the movement speed of the vehicle body;
specifically, the Z-axis coordinate value obtained in step 3.5 is extracted from two adjacent frames, and the vehicle body movement speed can be obtained as the camera shooting mode is 25fps and the relative movement principle
Figure BDA0003856610570000147
Wherein d is k 、d k-1 And the three-dimensional coordinate Z-axis value of the pixel point of the current frame and the pixel point of the previous frame is obtained.
The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the invention and various alternatives and modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims (9)

1. A motion estimation method for fusing deep learning characteristic optical flow and binocular vision is characterized by comprising the following steps:
carrying out equalization pretreatment based on a controllable self-adaptive histogram on a driving image data set;
constructing an optical flow feature extraction model based on deep learning, and performing recognition training on a moving target;
measuring distance through a binocular camera to obtain the position of a target object;
and acquiring the movement speed of the vehicle body.
2. The method for estimating motion by fusing deep learning characteristic optical flow and binocular vision according to claim 1, wherein a driving image dataset is subjected to equalization preprocessing based on a controllable adaptive histogram, specifically: the method comprises the following steps of scaling an original driving image into a set resolution ratio, carrying out controllable self-adaptive histogram equalization processing, and then cutting the driving image into a preset value to limit amplification intensity to obtain a neighborhood cumulative distribution function:
Figure FDA0003856610560000011
wherein, cdf min Is the minimum value of the cumulative distribution function of the pixel values, mxN is the number of running image pixels, G i Is the number of gray levels.
3. The method for motion estimation by fusing deep learning characteristic optical flow and binocular vision according to claim 1, is characterized by constructing a deep learning-based optical flow characteristic extraction model and performing recognition training on a moving object, and specifically comprises the following steps:
designing a local smoothing hypothesis according to the optical flow characteristics to obtain an optical flow equation:
Figure FDA0003856610560000012
wherein x is a pixel abscissa, y is a pixel ordinate, and t is time; dx, dy and dt are derivatives of x, y and t, and I is optical flow image information; alpha is a differential operation sign, and delta x, delta y and delta t are change values of x, y and t;
constructing two CNN layers sharing weight to extract the characteristics of the driving image;
and (3) carrying out inner product calculation on the feature pairs of the two driving images: characteristic f 1 ∈R H×W×D And feature f 2 ∈R H×W×D Respectively represent driving images I 1 And I 2 The visual similarity is obtained by the pairwise inner product of the feature vectors, and is expressed as:
Figure FDA0003856610560000021
wherein C (f) 1 ,f 2 )∈R H×W×H×W Ij and kl are the position information of the light flow point of the first and second frames, respectively, d is the specific channel of the image, and the value range is [0, D-1 ]]C is a four-dimensional vector feature; wherein H and W are image resolution, and D is the number of channels;
constructing a pyramid to perform pooling operation on the four-dimensional vector characteristics;
acquiring four-dimensional vector characteristics of a high-resolution driving image: recording the corresponding point x' = (u + f) of optical flow between two frames due to data cost between pyramid levels 1 (u),v+f 2 (v) U is the pixel abscissa, v is the pixel ordinate, f) 1 As an optical flow feature of the first frame image, f 2 For the optical flow characteristics of the second frame image, the neighborhood grid is
Figure FDA0003856610560000022
m is the number of layers, then pass
Figure FDA0003856610560000023
Searching any position corresponding to the optical flow on each layer, wherein k is any real number; according to the corresponding relation, the four-dimensional vector characteristics of the high-resolution driving image are expressed as follows:
Figure FDA0003856610560000024
wherein m is the mth layer of the pyramid, and p and q are respectively information of the pth row and the pth column in a pixel matrix of the optical flow point on the mth layer;
the CNN layer carries out iterative update on the driving image data: given a current optical flow state of f k Each iteration generates a residual optical flow, i.e. an updated value f, relative to the output of the last iteration 1 Δ f, then the next step of optical flow predictionA value of Δ f + f k =f k+1 (ii) a The updating method comprises the following steps:
Figure FDA0003856610560000031
wherein R is t To reset the gate, Z t To update the gate, σ is a function operation, H t For preserving the information content of the hidden state of the previous stage, H t-1 For the hidden layer, X t For the optical flow input value W r 、W z Is a weight information matrix;
after the four-dimensional vector characteristics of the high-resolution driving image are obtained, the pixel information of the optical flow is obtained under the original resolution tracking of the pyramid, and the moving object moving area is obtained, wherein the minimum image displacement of the area is v = [ v ] x ,v y ] T The matching error and minimum value ε (v) in each point neighborhood range is:
Figure FDA0003856610560000032
wherein v is x ,v y Respectively the transverse and longitudinal displacement of pyramid top layer, p x As the abscissa of the luminous flux point, w x Is a range of the abscissa neighborhood, p y As the abscissa of the luminous flux point, w y The range of the vertical coordinate neighborhood is defined, A (x, y) is a first frame optical flow characteristic, and B (x, y) is a second frame optical flow characteristic;
carrying out recognition training on the moving target object in the moving area;
and selecting a supervision algorithm for the model, wherein the loss function is set as follows:
Figure FDA0003856610560000033
i.e. L1 norm of the iteration result and the true value, where N is the number of iterations, γ =0.8; f. of gt To estimate the optical flow characteristics, f i For the actual light flow characteristic, Δ x gt 、Δy rt To estimate the amount of lateral and longitudinal displacement of the optical flow, Δ x i 、Δy i And the horizontal and vertical displacement coordinates of the actual optical flow are obtained.
4. The method for motion estimation by fusing deep learning characteristic optical flow and binocular vision according to claim 3, wherein the CNN layers comprise two residual layers with 1/2 resolution, two 1/4 resolution and two 1/8 resolution, and the number of channels increases every half of the resolution between the residual layers; when feature extraction is performed, two continuous frames are input, and R is present H×W×3 →R H×W×D Wherein H and W are image resolution, and D is the number of channels.
5. The method for motion estimation by fusing deep learning feature optical flow and binocular vision according to claim 3, characterized by constructing a pyramid to perform pooling operation on the four-dimensional vector features, specifically:
constructing three layers of similarity running image pyramids, wherein kernels are 1,2 and 4 from one layer to three layers respectively, and pooling the latter two dimensions of the four-dimensional vector features; the road image pyramid is represented as follows:
Figure FDA0003856610560000041
Figure FDA0003856610560000042
wherein L is a pyramid layer, I L And x and y are optical flow point pixel position information.
6. The method for estimating the motion by fusing the optical flow with the deep learning features and the binocular vision according to claim 1, wherein the target position is obtained by ranging through a binocular camera, and specifically comprises the following steps:
obtaining distortion description according to parameters and distortion coefficients of the left camera and the right camera;
projecting and changing left and right images shot in the same scene at the same time, namely acquiring a correction mapping table and remapping the correction mapping table;
finding corresponding points in the left image and the right image through the SGM in stereo matching to obtain the visual difference disparity = u l -u r Wherein u is l 、u r Respectively generating a disparity map for the column coordinates of the target corresponding point in the left image and the right image, and then performing filtering processing by using a guidedFilter;
calculating the depth of the target
Figure FDA0003856610560000051
Where f is the focal length, b is the baseline length, d is the parallax, c xr 、c xl Column coordinates for two camera principal points;
constructing a 3D space according to a group of disparity maps to obtain three-dimensional coordinates of pixel points:
[X Y Z W] T =Q*[g h disparity(g,h) 1] T
Figure FDA0003856610560000052
wherein X, Y, Z and W are four-dimensional information obtained after matrix transformation; g. h is position information of each pixel, Q is a perspective transformation matrix, and 3DImage (a, c, d) is (x, y, z) coordinate information in the viewing coordinate system.
7. The method for motion estimation by fusing deep learning feature optical flow and binocular vision according to claim 6, wherein the distortion disparity (g, h) is described as:
Figure FDA0003856610560000053
wherein s is 1 、s 2 、s 3 Is the thin prism distortion coefficient, k 1 、k 2 、k 3 、k 4 、k 5 、k 6 Is the radial distortion coefficient, p 1 、p 2 Is the tangential distortion coefficient and r is the distortion radius.
8. The method for estimating motion by fusing deep learning characteristic optical flow and binocular vision according to claim 6, wherein two images simultaneously shot in the same scene are subjected to projection change, specifically:
the two image planes are parallel to the base line, and the same target point is positioned on the same horizontal line in the left image and the right image, namely the coplanar lines are aligned;
x point in space, through left and right lens C of camera 1 、C 2 The projection matrix equations of (a) are respectively:
Figure FDA0003856610560000061
Figure FDA0003856610560000062
wherein (u) 1 ,v 1 1) and (u) 2 ,v 2 1) are each x 1 And x 2 Homogeneous coordinates in the respective images; (X, Y, Z, 1) is a homogeneous coordinate of the point P under world coordinates; m is a unit of ij Is the ith row and j columns of elements of the projection matrix M;
when a point becomes a straight line element, let c 1 、c 2 The straight lines S corresponding to the same space are respectively taken as the left camera and the right camera, and the linear equation under the space coordinate system is as follows:
Figure FDA0003856610560000063
Figure FDA0003856610560000064
and substituting the above formula into a projection matrix to obtain a mapping equation of the left camera and the right camera under a projection plane:
Figure FDA0003856610560000065
Figure FDA0003856610560000066
9. the method for estimating the motion by fusing the deep learning characteristic optical flow and the binocular vision according to claim 1, wherein the method for estimating the motion by fusing the deep learning characteristic optical flow and the binocular vision is characterized by acquiring the motion speed of a vehicle body, and specifically comprises the following steps:
Figure FDA0003856610560000067
wherein d is k 、d k-1 And the Z-axis value is the three-dimensional coordinate value of the pixel point of the current frame and the pixel point of the previous frame.
CN202211149943.1A 2022-09-21 2022-09-21 Motion estimation method integrating deep learning characteristic optical flow and binocular vision Pending CN115482257A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211149943.1A CN115482257A (en) 2022-09-21 2022-09-21 Motion estimation method integrating deep learning characteristic optical flow and binocular vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211149943.1A CN115482257A (en) 2022-09-21 2022-09-21 Motion estimation method integrating deep learning characteristic optical flow and binocular vision

Publications (1)

Publication Number Publication Date
CN115482257A true CN115482257A (en) 2022-12-16

Family

ID=84423546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211149943.1A Pending CN115482257A (en) 2022-09-21 2022-09-21 Motion estimation method integrating deep learning characteristic optical flow and binocular vision

Country Status (1)

Country Link
CN (1) CN115482257A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117705064A (en) * 2023-12-15 2024-03-15 河南理工大学 Vehicle running state judging method based on visual assistance in urban canyon

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117705064A (en) * 2023-12-15 2024-03-15 河南理工大学 Vehicle running state judging method based on visual assistance in urban canyon
CN117705064B (en) * 2023-12-15 2024-09-17 河南理工大学 Vehicle running state judging method based on visual assistance in urban canyon

Similar Documents

Publication Publication Date Title
CN112435325B (en) VI-SLAM and depth estimation network-based unmanned aerial vehicle scene density reconstruction method
CN110569704B (en) Multi-strategy self-adaptive lane line detection method based on stereoscopic vision
CN110675418B (en) Target track optimization method based on DS evidence theory
CN110689562A (en) Trajectory loop detection optimization method based on generation of countermeasure network
Vaudrey et al. Differences between stereo and motion behaviour on synthetic and real-world stereo sequences
CN110688905B (en) Three-dimensional object detection and tracking method based on key frame
CN113506318B (en) Three-dimensional target perception method under vehicle-mounted edge scene
CN111797684B (en) Binocular vision ranging method for moving vehicle
Qian et al. Robust visual-lidar simultaneous localization and mapping system for UAV
CN112465021A (en) Pose track estimation method based on image frame interpolation method
CN110706253B (en) Target tracking method, system and device based on apparent feature and depth feature
CN115482257A (en) Motion estimation method integrating deep learning characteristic optical flow and binocular vision
CN116188550A (en) Self-supervision depth vision odometer based on geometric constraint
CN112432653A (en) Monocular vision inertial odometer method based on point-line characteristics
CN109740405B (en) Method for detecting front window difference information of non-aligned similar vehicles
CN113706599B (en) Binocular depth estimation method based on pseudo label fusion
CN114708321B (en) Semantic-based camera pose estimation method and system
CN116129318A (en) Unsupervised monocular three-dimensional target detection method based on video sequence and pre-training instance segmentation
CN116151320A (en) Visual odometer method and device for resisting dynamic target interference
CN115482282A (en) Dynamic SLAM method with multi-target tracking capability in automatic driving scene
CN115496788A (en) Deep completion method using airspace propagation post-processing module
CN111833384B (en) Method and device for rapidly registering visible light and infrared images
CN103236053A (en) MOF (motion of focus) method for detecting moving objects below mobile platform
Huang et al. Single target tracking in high-resolution satellite videos: a comprehensive review
CN115994934B (en) Data time alignment method and device and domain controller

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination