CN113947686A

CN113947686A - Method and system for dynamically adjusting feature point extraction threshold of image

Info

Publication number: CN113947686A
Application number: CN202110901212.7A
Authority: CN
Inventors: 朱凯赢
Original assignee: Shanghai Yogo Robot Co Ltd
Current assignee: Shanghai Yogo Robot Co Ltd
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2022-01-18

Abstract

The embodiment of the invention provides a method and a system for dynamically adjusting a feature point extraction threshold of an image, wherein feature extraction and feature tracking are regarded as two interdependent processes, the quality of feature points extracted in a feature extraction stage influences a feature tracking result, and the feature tracking result is fed back to the feature extraction stage to adjust key parameters required in feature extraction; the feature point extraction threshold is dynamically adjusted according to the current environment, so that the traditional feature extraction method can effectively extract feature points under the extreme illumination environment.

Description

Method and system for dynamically adjusting feature point extraction threshold of image

Technical Field

The embodiment of the invention relates to the technical field of image processing, in particular to a method and a system for dynamically adjusting a feature point extraction threshold of an image.

Background

Direct method vision SLAM (Simultaneous localization and mapping) can perform registration between images by using a large amount of information on the images, more information can generally provide more constraints, and the problem solving is facilitated. However, when there is a lot of noise in the image data or the image data is distorted, if the image is not properly processed, such as filtering, blurring, and distortion removal, a lot of utilized information may be invalid or even wrong information, which may otherwise destroy the robustness of the system, resulting in wrong camera pose estimation for direct vision SLAM. On the other hand, the design of direct method visual SLAM is based on the assumption that the luminosity of the front and back images is consistent, which puts high requirements on the data, and in real life, the environment is often changeable and complex, which requires a large amount of preprocessing on the data, such as adjusting the luminosity of the images by referring to a reference image, which is provided in the embodiment of the invention. In addition to direct method visual SLAM, another very popular visual SLAM framework is feature point method visual SLAM. The feature point method vision SLAM utilizes a specially designed image feature extraction method to extract features of an image, then utilizes the features extracted from two frames of images to carry out feature matching, and finally utilizes the obtained feature corresponding relation and multi-view geometric constraint to construct an over-definite equation to solve the pose of the camera. Compared with the direct method vision SLAM, the characteristic point method vision SLAM omits a large amount of redundant information on the image, and only selects a small amount of information which is most representative on the image to carry out image registration, so that the interference of invalid image information on the image registration is reduced to a certain extent. Meanwhile, the extracted features on the same image can describe the content of the image to some extent, so that the extracted features on each image can be saved and reused for matching with a frame of image which is far away from the time axis at a future moment, which is often used for visual repositioning. The characteristic point method visual SLAM is widely applied due to the advantages of being less influenced by the environment, naturally supporting the visual relocation technology and the like, so it is also necessary to discuss how the performance of the visual SLAM is influenced by the extreme lighting conditions and how the visual SLAM can more robustly operate under the extreme lighting conditions.

Feature point method visual SLAM uses artificially designed features to extract pixels with locally differentiated degrees on an image. Common image feature points include Fast feature, Harris feature, ORB feature, SIFT feature and SUFT feature. In this case, the Fast feature and the ORB feature calculate the difference between each pixel and the surrounding circle of pixels, and if there are enough consecutive points and the difference is large, it is regarded as an image feature point. The difference is judged by calculating the difference of pixel values between two pixels, and when the difference of the pixel values is larger than a certain set threshold value, the two pixels are considered to be in a state of being in a state of being in a state of being in a state of being in a being in a state of being in a state of being in a beingThe pixel difference is large. The Harris feature, the SIFT feature and the SUFT feature are characterized in that a pre-designed template is used for carrying out convolution operation on an image to obtain a response image of the image to the template, local maximum response points are found on the response image, and finally pixel points on the image corresponding to the local maximum response points are used as the features of the image. The local maximum response point is often selected to set a minimum response threshold to prevent the extraction of the wrong feature point in the weak texture region. After the feature extraction, the corresponding relationship between the extracted features from the two frames of images needs to be found. For ORB features, SIFT features, SUFT features, they will additionally compute feature descriptors for each feature point in the feature extraction stage. The ORB characteristic uses a BRIEF algorithm to calculate a characteristic descriptor, two pixel points are selected for comparison for many times in a fixed area around a characteristic point according to a certain rule, the comparison result is recorded in a form of 0 or 1 in sequence, and finally a binary sequence is formed as the descriptor of the characteristic. And the SIFT feature and the SURF feature equally divide a fixed region around the feature point into a plurality of sub-regions, calculate the gradient direction of each pixel in each sub-region and generate a gradient direction histogram, and then combine the gradient direction histograms obtained from each sub-region together to be used as a feature descriptor of the feature point. With feature descriptors, features can be matched directly with each other, and the simplest method is to compare the descriptor of a certain feature on one image with the descriptor of each feature on the other image by using a suitable distance measurement method, and take the feature with the smallest distance as the matched feature. While the Harris feature and the SIFT feature do not need to additionally calculate the feature descriptor of each feature, the optical flow method can be used for such feature to perform feature tracking according to the pixel value of the feature point and the gradient distribution around the feature point so as to find the corresponding other feature on the other image. In general, there are some false matches in all the feature correspondences found, and in order to filter out these false matches, a RANSAC (Random Sample Consensus) algorithm is used to correspond all the featuresA group of corresponding relations occupying most of the relations is found, specifically, a model is determined firstly, the model comprises a conversion relation between corresponding pixel points in two frames of images, after the model is available, a plurality of groups of all feature corresponding relations are selected to solve the model, then the model is counted, how much of the rest feature corresponding relations support the model, the process is circulated, finally, the model which can meet the most feature corresponding relations is found, all the feature corresponding relations which support the model are considered to be correct matching, and the rest feature corresponding relations which do not support the model are considered to be wrong matching and are discarded finally. With the feature correspondence between the two images, an over-determined equation can be constructed by constraints provided by the multi-view geometry. The shooting of the same object by the camera from different positions can be represented by fig. 1. In the figure O_lAnd O_rRepresenting the imaging centers of the two cameras, a point P representing the object to be photographed in a three-dimensional space, a three-dimensional point P and a camera center O_lIs connected to line P_lI.e. the projection dimension point P and the camera center O of the target point projected onto the camera in the imaging process_lIs connected to line P_lI.e. the projection line of the target point projected onto the camera during the imaging process, a point p is mapped on the image_lSimilarly, the three-dimensional point P and the camera center O_rIs connected to line P_rMapping out a point p on the image_r. Always find a set of spatial rotation and translation P_lConversion to P_rIs formulated as:

P_r＝R·(P_t-T) (1)

(P_l-T＝R^-1 P_r＝R^TP_r (2)

where R and T represent rotation and translation in space, respectively. And due to P_l、P_rAnd T are in the same plane, so there are the following constraints:

(P_l-T)^T·T×P_l＝0 (3)

namely:

(R^TP_r)^T·T×P_l＝0 (4)

wherein:

substituting formula (5) for formula (4) to obtain:

P_r ^T·R·S·P_l＝0 (6)

regarding the R · S in the formula as a matrix E, this matrix is called an essential matrix, and it can be seen that this essential matrix E contains all the information of the relative motion of the corresponding camera in the three-dimensional space between the two frames of images. The pinhole imaging model is generally used as the imaging model of the camera:

p_l＝KP_l (7)

wherein the content of the first and second substances,

wherein f is_xAnd f_yThe focal lengths of the camera in the horizontal direction and the vertical direction, respectively, are obtained by substituting equation (7) into equation (6):

p_r ^TK^-TEK^-1p_l＝0 (9)

k in handle type^-TEK^-1This matrix is referred to as the base matrix, and is considered as a matrix F. Equation (9) establishes the relationship between the different projection results and camera motion for the same three-dimensional spatial point between two images, and thus can be used in the RANSAC algorithm to find the correct feature correspondence. Because the basic matrix lacks scale information used for translation, and only 8 degrees of freedom exist, 8 groups of corresponding pixel points are at least found between two frames of images when the basic matrix is solved by the formula (9). Usually, in order to ensure the resolution precision, more than 8 sets of corresponding pixels are always selected to construct an over-determined equation and perform solution. After solving the basis matrix, it is calculated by the following formulaEssence matrix:

E＝K^TFK (10)

and recovering the relative motion of the corresponding camera between the two frames of images in a three-dimensional space, namely three-dimensional rotation and three-dimensional translation, by singular value decomposition after the essential matrix is obtained, wherein an unknown scale relationship also exists between the three-dimensional translation and the real translation. After the motion information of the camera in the three-dimensional space exists, each pixel point on the image can be projected to the three-dimensional space in a back projection mode to obtain the coordinate of the point on the image corresponding to the three-dimensional space. The feature point correspondences between two images are not very accurate due to the possible presence of noise in the images, and may reduce the accuracy of the estimated relative motion of the camera in three-dimensional space. In order to further reduce the estimation error, after the initial motion estimation of the camera is calculated, all matched feature points on the previous frame image are projected onto another frame image in a three-dimensional projection mode through a three-dimensional projection method, and a three-dimensional projection error function is constructed:

wherein:

wherein

Representing feature points on all matches on the image,. pi.c (-) representing the projection of the three-dimensional coordinates in the world coordinate system onto the image coordinate system,. pi.c^-1Denotes projecting the two-dimensional coordinates on the image coordinate system into the world coordinate system. And (3) fine-tuning the trajectory estimation of the camera in the three-dimensional space by using a nonlinear optimization method so as to minimize the projection errors of all the characteristic points. Compared with the direct method, the same pixel can be obtained by projecting a point in the three-dimensional space onto the front and the back frames of imagesThe motion of the camera in the three-dimensional space is found by minimizing the difference between the pixel value of the projected pixel point and the pixel value of the projection point, and the feature point method finds the corresponding relation of the pixel points between two frames of images in the feature matching stage, so the motion of the camera in the three-dimensional space can be found by minimizing the coordinates of the projection point on the image and the coordinates of the point on the other image corresponding to the projected point.

It can be seen that the feature point method is not affected by the disparity of the luminosity before and after the image in the motion estimation stage of the optimized camera, that is, it is not affected by the illumination change in this stage. However, the change in illumination does not have any effect on the feature point method SLAM at all. The construction of the projection error by the feature point method SLAM is actually based on another assumption different from the direct method SLAM, that there is a sufficient number of matches of feature points between two frame images, and the result of the matching is accurate. This requires that the artificially designed feature extraction method can extract enough feature points under different illumination conditions, and the feature matching process should be robust to illumination changes. However, most image feature points are not designed to take into account extreme lighting conditions, such as dim environments, over-exposed images, changing lighting conditions, etc. Under the extreme scenes, the feature point extraction methods cannot extract enough feature points, specific optimization aiming at the extreme scenes is difficult to consider the change of different scenes, and the quality of feature point extraction under non-extreme scenes can be reduced.

Disclosure of Invention

The embodiment of the invention provides a method and a system for dynamically adjusting a feature point extraction threshold of an image, and aims to solve the problems that in the prior art, extreme illumination conditions, such as dim environment, overexposed image, constantly changing illumination conditions and the like, cannot be taken into consideration during image feature point extraction, so that sufficient feature points cannot be extracted, the change of different scenes is difficult to consider, and the quality of feature point extraction in non-extreme scenes is reduced.

In a first aspect, an embodiment of the present invention provides a method for dynamically adjusting a feature point extraction threshold of an image, including:

step S1, setting initial image feature point extraction parameters, wherein the initial image feature point extraction parameters comprise a feature point extraction threshold, a feature point extraction threshold increasing multiplying power and a feature point extraction threshold decreasing multiplying power;

step S2, determining a feature point extraction result and a feature point tracking result of a current frame and a previous frame when image feature point extraction is performed based on the initial image feature point extraction parameters;

step S3, determining average moving distance of the feature points, success rate of feature tracking and tracking number of the feature points based on the feature point extraction result and the feature point tracking result of the current frame and the previous frame;

if the camera is judged to move according to the average moving distance based on the feature points, and the feature tracking success rate meets a first preset condition, increasing the magnification based on the feature point extraction threshold value and increasing the feature point extraction threshold value;

and if the camera is judged to move according to the average moving distance based on the feature points, and the feature tracking success rate and the feature point tracking quantity meet a second preset condition, reducing the magnification based on the feature point extraction threshold value and reducing the feature point extraction threshold value.

Preferably, in step S1, the initial image feature point extraction parameters further include a maximum feature point tracking number, a minimum feature point tracking number, and a feature point tracking success rate threshold.

Preferably, the first preset condition is that: the feature tracking success rate is not more than the feature point tracking success rate threshold, or the feature tracking success rate is more than the feature point tracking success rate threshold, and the feature point tracking number is more than the maximum feature point tracking number;

the second preset condition is as follows: the feature tracking success rate is greater than the feature point tracking success rate threshold, the number of feature point tracks is greater than the minimum number of feature point tracks and is not greater than the maximum number of feature point tracks, or the feature tracking success rate is greater than the feature point tracking success rate threshold, the number of feature point tracks is not greater than the maximum number of feature point tracks, and the distribution of the feature points is not uniform.

Preferably, the step S3, if it is determined that the camera is moving based on the average moving distance of the feature points, includes:

if the average moving distance of the feature points of the current frame image relative to the previous frame image is smaller than a preset moving threshold, judging that the corresponding camera does not move;

and if the average moving distance of the feature points of the current frame image relative to the previous frame image is not less than a preset moving threshold, judging that the corresponding camera moves.

Preferably, in step S3, if it is determined that the camera is not moving based on the average moving distance of the feature points, the feature extraction threshold is not adjusted at the current time.

Preferably, in step S3, when determining whether the distribution of the feature points in the second preset condition is satisfied, the image is equally divided into a plurality of regions having the same size, and the number of successfully tracked feature points in each region is counted, and if the number of feature points in any one region is less than the preset threshold value of the number of feature points, it is determined that the distribution of the feature points currently being effectively tracked is not uniform.

Preferably, in step S3, the feature tracking success rate is:

in the above formula, n_fRepresenting the quantity of the feature points obtained before the current feature point tracking, including the feature points successfully tracked at the previous moment and the new feature points extracted from the feature points at the current moment; n is_tIndicating the number of successfully tracked feature points at the current time.

In a second aspect, an embodiment of the present invention provides a system for dynamically adjusting a feature point extraction threshold of an image, including:

the device comprises an initialization setting module, a feature point extraction module and a feature point extraction module, wherein the initialization setting module is used for setting initial image feature point extraction parameters, and the initial image feature point extraction parameters comprise a feature point extraction threshold, a feature point extraction threshold increasing multiplying power and a feature point extraction threshold decreasing multiplying power;

the characteristic extraction tracking module is used for determining a characteristic point extraction result and a characteristic point tracking result of a current frame and a previous frame when the initial image characteristic point extraction parameters are used for extracting the image characteristic points;

the threshold dynamic adjustment module is used for determining the average moving distance of the characteristic points, the success rate of characteristic tracking and the tracking quantity of the characteristic points based on the characteristic point extraction result and the characteristic point tracking result of the current frame and the previous frame;

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method for dynamically adjusting the feature point extraction threshold of an image according to the embodiment of the first aspect of the present invention when executing the program.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for dynamically adjusting the feature point extraction threshold of an image according to an embodiment of the first aspect of the present invention.

According to the method and the system for dynamically adjusting the feature point extraction threshold of the image, provided by the embodiment of the invention, feature extraction and feature tracking are regarded as two interdependent processes, the quality of the feature points extracted in the feature extraction stage influences the result of the feature tracking, and the result of the feature tracking is fed back to the feature extraction stage to adjust key parameters required in feature extraction; the feature point extraction threshold is dynamically adjusted according to the current environment, so that the traditional feature extraction method can effectively extract feature points under the extreme illumination environment.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is an antipodal geometry schematic diagram of a camera taking a photograph of the same object from different positions according to an embodiment of the invention;

FIG. 2 is a block diagram of a method for dynamically adjusting feature point extraction thresholds of an image according to an embodiment of the present invention;

FIG. 3 is a flow chart of a dynamic threshold adjustment method according to an embodiment of the invention;

FIG. 4 is a logic diagram for adjusting feature point extraction thresholds according to an embodiment of the present invention;

FIG. 5 is an image in test data and a corresponding image taken with good lighting conditions according to an embodiment of the present invention;

FIG. 6 is a feature point extraction threshold map according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating feature point extraction from frame 110 to frame 115 according to an embodiment of the present invention;

FIG. 8 is a graph of the total number of feature points, the number of valid feature points, and the number of invalid feature points, according to an embodiment of the invention;

FIG. 9 is a diagram illustrating feature point tracking according to an embodiment of the present invention;

FIG. 10 is a diagram illustrating feature point distribution at certain time and threshold value map extraction from the 800 th frame to the 860 th frame according to an embodiment of the present invention;

FIG. 11 is a graph illustrating the results of feature tracking according to an embodiment of the present invention;

fig. 12 is a diagram illustrating feature extraction thresholds, the number of all feature points, the number of invalid feature points, and the selected time from the 1300 th frame to the 1600 th frame according to an embodiment of the present invention;

FIG. 13 is a graph of the number of active feature points according to an embodiment of the invention;

fig. 14 is a schematic physical structure diagram according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

In the embodiment of the present application, the term "and/or" is only one kind of association relationship describing an associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone.

The terms "first" and "second" in the embodiments of the present application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a system, product or apparatus that comprises a list of elements or units is not limited to only those elements or units but may alternatively include other elements or units not expressly listed or inherent to such product or apparatus. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The existing commonly used image feature points such as Harris feature, Fast feature, ORB feature, SIFT feature and SURF feature do not take the influence of extreme lighting condition on the image into consideration at the beginning of design, and meanwhile, it can be found that no matter the feature is extracted directly based on the image pixel value or the feature is extracted based on the image response graph to a specific template, a threshold value needs to be manually set to distinguish the feature point from the non-feature point. The current improvements to these feature extraction methods mainly focus on improving the computational efficiency and the accuracy of feature matching in feature extraction, while neglecting the image quality to extract features in extreme lighting conditions and lighting variation situations.

Therefore, the embodiment of the invention provides a method and a system for dynamically adjusting the feature point extraction threshold of an image, so that the selection standard of the feature points can change along with the change of a scene in real time, and the sufficient and effective feature points can be extracted under any illumination condition and in the process of changing any illumination condition. The following description and description will proceed with reference being made to various embodiments.

In the conventional feature extraction method, feature extraction is an independent module, and the feature extraction and the subsequent feature matching are two separated processes. In the method, the feature extraction and the feature tracking are regarded as two interdependent processes, the quality of the feature points extracted in the feature extraction stage influences the result of the feature tracking, and the result of the feature tracking is fed back to the feature extraction stage to adjust key parameters required in the feature extraction. A flowchart of a dynamic threshold adjustment method provided in an embodiment of the present invention is shown in fig. 2. The method mainly comprises 4 steps of parameter initialization, feature extraction, feature tracking and threshold adjustment;

as shown in fig. 2 and fig. 3, a method for dynamically adjusting a feature point extraction threshold of an image according to an embodiment of the present invention is applicable to feature point extraction of an image, and includes:

specifically, the initial state of variables used in the system is set in the parameter initialization stage, and the initial state includes the maximum feature point tracking number, the minimum feature point tracking number, the feature point tracking success rate threshold, the feature point extraction threshold increasing magnification, the feature point extraction threshold decreasing magnification, and the pixel moving minimum distance. The feature point extraction threshold, i.e. the threshold required by the feature point extraction method to distinguish the features from each other, is set to an initial value according to the selected feature extraction method.

The whole dynamic threshold adjustment method is described below as a Harris feature detection method. Harris features look for pixels on the image that have large pixel value changes in both directions. For a given image and a neighborhood window of fixed size, calculating the sum of squares of the difference values of each pixel before and after translation of the window, namely calculating the autocorrelation function of the pixel values in the neighborhood window, and expressing the autocorrelation function as the following formula:

wherein p represents the coordinate of a certain pixel point of the image, d ∈ Ω represents the offset of the coordinate of each pixel point in the neighborhood window relative to the coordinate of the pixel point p, and w (d) represents the weight value of the corresponding offset position in the neighborhood window. I (p) represents the pixel value at pixel point p on the image. The weighted value in the neighborhood window can be a mean function template or a Gaussian function template, and compared with the mean function template, the weighted value of the Gaussian function template in the central area is larger than the weighted values of other positions around, so that the position of the extracted feature can be positioned more accurately. Then, the first-order approximation is carried out on I (p + d) by using Taylor expansion:

I(p+d)≈I(p)+I_x(p)*u+I_y(p)*v (14)

wherein, I_xAnd I_yAre image gradient values in the horizontal and vertical directions of the image, respectively, and u and v are shift amounts in the horizontal and vertical directions, respectively. Bringing formula (14) into formula (13):

wherein:

the two eigenvalues of the eigenvalue of the matrix M determine whether the current pixel is a Harris eigenvalue. When both the two characteristic values are larger, the pixel point is an effective characteristic angular point, when one characteristic value is larger and the other characteristic value is smaller, the pixel point is positioned on the edge of a certain object in the image, and when both the two characteristic values are smaller, the pixel point is positioned in a weak texture area on the image. In order to quickly judge whether the current pixel point is a qualified characteristic angular point according to the matrix M, the characteristic response is directly calculated by using the following formula:

R＝det(M)-k(trace(M))² (17)

where det (-) denotes the determinant of the matrix, trace (-) denotes the rank of the matrix, and k is a constant. When the characteristic response R is larger than a certain threshold value, the current pixel point is judged to be a Harris characteristic point. It can be seen that the characteristic response of the Harris characteristic has no clear value range, so when the threshold value of the extraction of the Harris characteristic point is set, the distribution of the pixel values on the current image needs to be determinedA reasonable initial threshold is calculated. Specifically, when the system is initialized, the characteristic response value of each pixel point of an input first frame image is calculated, the length and the height of sub image blocks are set according to the size of the input image, the image is uniformly divided into n × m sub image blocks, the local maximum characteristic response value is selected from each sub image block, the local maximum characteristic response values of the n × m image blocks are sequenced, and one characteristic response value R is selected according to a proportion of α ∈ (0,1)_iSuch that of all local maximum eigenvalues, the values of α n m are less than Ri, and the other values of (1- α) n m are greater than R_iAnd applying the characteristic response value R_iAs an initial feature extraction threshold. The initial value selection method is applicable to methods for extracting features according to the magnitude of the response value of the features, such as Harris features. For the method of extracting features according to local pixel value differences or feature response value differences of Fast features, ORB features, SIFT features and SUFT features, the Harris features can be extracted by the aforementioned method when the first frame image is input, and since the feature extraction methods are all used for searching for corners on the image, the extracted Harris features can be regarded as reliable corners and simultaneously should be detected by other methods. For the SIFT feature and the SUFT feature, corresponding feature response values may be calculated at positions of the Harris feature points, differences between the feature response values of the points and surrounding feature response values may be calculated, and the difference values calculated at each point may be averaged to serve as an initial feature extraction threshold. For the FAST feature and the ORB feature, the difference between the pixel value at the position of these Harris feature points and the surrounding pixel values is directly calculated, and the average of all the calculated differences is taken as the initial feature extraction threshold. The maximum feature point tracking number is used to limit the number of feature points continuously tracked between each frame in the system, because when the distribution density of the feature points on the image exceeds a certain amount, a large number of feature points provide redundant information, the amount of extra information provided for the position of a subsequent solution camera is very small, and meanwhile, tracking a large number of feature points in each frame consumes a lot of time, which cannot meet the requirement of real-time performance of the systemTherefore, the maximum number of feature points for performing feature tracking needs to be set appropriately according to the size of the image and the available computation power. Conversely, if the number of feature points used for feature tracking is too small, this means that the number of feature points on the currently available matches is small, and thus too few constraints are provided for subsequently estimating the pose of the camera, which easily causes inaccurate estimation, and particularly in the case of a mismatch in the limited matching results, it is also necessary to set the minimum number of feature point tracking. In the experiment of the embodiment of the present invention, the maximum feature point tracking number and the minimum feature point tracking number were set to 300 and 30, respectively. The method provided by the embodiment of the invention can dynamically adjust the threshold value of the feature extraction, and in order to prevent the situation that the threshold value is continuously changed when the environment is not changed because the feature tracking is continuously carried out on two frames of the same images under the condition that the camera does not move and the threshold value is continuously adjusted, the minimum distance of pixel movement needs to be set, when the movement of the pixel point is found to be smaller than the set minimum distance of pixel movement according to the tracking result of the feature point, the camera and the scene do not move, and the threshold value of the feature extraction is not adjusted at this time.

the extraction of the characteristics is carried out after the initialization of the parameters is finished, and because a reasonable initial value is selected for necessary parameters in the initialization stage, the condition that the characteristic points cannot be extracted and the condition that the characteristic points are extracted too densely can not occur in the characteristic extraction stage, and in addition, the situation that the extracted basic average values of the characteristics are distributed on an image and are densely concentrated in a certain area can not occur, so that the extraction of a sufficient number of characteristic points is ensured, and the uniform distribution of the characteristic points is also ensured. In the feature tracking stage, the feature tracking is carried out by using a pyramid optical flow method with a distortion model. For two frames of images, in the extremely small three-dimensional motion of the camera and the extremely short time, the pixel point on the previous frame of image can be considered to move upwards on the two-dimensional planeMoved a small distance and the pixel values did not change. Under such an assumption, the problem can be quantified as u ═ u for one pixel point on the previous frame image_xu_y]After moving on the plane of the two-dimensional image, the point v ═ u on the image of the next frame can be obtained_x+d_xu_y+d_y]Finding out that affine deformation exists between two captured images due to the motion of the camera, and further considering an imitation change model:

wherein d is_xx，d_xy，d_yx，d_yyAre the four parameters of the affine deformation model of the image. It can be considered that each pixel in one local image block moves on the image plane with the same two-dimensional motion, and therefore the objective function of the optical flow tracking feature point can be established as:

wherein x represents a pixel coordinate and Ω represents each pixel in the subimage block. In order to estimate the movement of the feature point on the two-dimensional plane more accurately, the input image is used for generating new images with different sizes from large to small by using an image pyramid, each layer of image is obtained by down-sampling the next layer of image, and the down-sampling generates a new image with the length and height of the image reduced by half by using the input image in a linear interpolation mode. With the image pyramid, the tracking of the feature points can be started from the highest-level image, the tracking result is transmitted to the next layer for continuous feature tracking, and the motion of the pixel points on the two-dimensional image plane is estimated from coarse to fine finally by continuously optimizing the motion of the pixel points on the image layer by layer. For each layer of pyramid image, the objective equation, equation (19), is solved by an optimization method, making the objective function to differentiate all independent variables:

in the ideal case, when the respective variables are optimized to the optimal point:

D_optimum＝[0 0 0 0 0 0] (21)

expanding equation (20) to:

wherein:

the linearization is performed by performing first-order taylor expansion on J (Ax + d) in equation (22):

J(Ax+d)≈J(x)+D*Δx (24)

wherein:

Δx＝[d_xd_yd_xxd_xyd_yxd_yy] (25)

then, formula (24) is substituted into formula (22) to obtain:

in the Jacobian matrix D^*All matrix elements in (b) are derivatives of image J in the horizontal direction and the vertical direction at a certain pixel point, that is, a horizontal direction gradient value and a vertical direction gradient value of image J at a certain pixel point. Since there is usually a very small variation between image I and image J, the gradient values on image J can be approximately replaced by the gradient values of image I at the corresponding locations. This has the advantage that a smaller number of calculations can be performed in subsequent calculations. The gradient value calculation formula for image I is:

in the formula (28), x and y represent the abscissa and ordinate of the image, respectively. In order to prevent noise from affecting gradient calculation, a Sobel operator is used for calculating the gradient when the gradient is calculated, and the mathematical form of the Sobel operator is as follows:

the value calculated by the Sobel operator needs to be normalized and then can be used as the gradient value of a certain pixel point. To solve for a set of affine transformation parameters and translation parameters such that the value of the objective function is minimal, let equations 5-26 equal zero:

wherein:

the calculation formula (30) obtains the adjustment value of the independent variable, and adds the adjustment value to the initial value of the independent variable to obtain a new value. And continuously and iteratively calculating the adjustment value of the independent variable, and continuously updating the value of the independent variable until convergence. In equation (31) it can be seen that the coefficient matrix Σ to the left of the equation_x∈ΩD*D*^TThe values in (1) are only related to the image I, which is fixed and invariant, and thus on the image IThe gradient values are fixed, which results in the coefficient matrix Σ to the left of the equation_x∈ΩD*D*^TThe calculation is only needed once at the beginning of the iteration, and a large amount of calculation time is saved. Note that since the arguments are continually fine-tuned during the iteration, and j (x) always represents the corresponding value under the current argument, i (x) -j (x) in the matrix to the right of the equation varies with each iteration, requiring recalculation in each iteration. Solving the equation set to obtain the updating quantity of each independent variable, and updating the independent variables:

and continuously iteratively solving the variable quantity of the independent variable until a certain preset convergence condition is reached, for example, the updated quantity of the translation is small enough, or the iteration number reaches a set upper limit, and the like. Due to the use of image pyramids, the solution of the independent variables will be performed layer by layer starting from the highest layer pyramid. When affine transformation parameters and translation parameters obtained by calculation under a certain pyramid are required to be transferred to the next pyramid for calculation, as affine transformation is not influenced by the scale, and translation is influenced by the scale, the affine transformation matrix A does not need to be changed, and the translation matrix d needs to be multiplied by the scale, namely, the scale is doubled. And after the pyramid at the bottom layer is also calculated, obtaining a group of affine transformation parameters and translation parameters which need to be solved finally. With the set of parameters, a new position of a feature point on the next frame image can be found for a certain feature point on the image, i.e. a matched feature point is found. Note that for each feature point on an image, feature tracking needs to be performed by the above optical flow method alone, and therefore the number of feature points that need to be tracked directly affects the amount of computation. The tracking feature points of the optical flow method always find a local optimal matching point on another frame image for any feature point, even if the tracked feature point is a meaningless pixel point, such as a point on a white wall, and the matching result is usually inaccurate, so in order to eliminate the inaccurate feature matching result, the RANSAC algorithm based on the two-point method is used for carrying out rapid matching result screening.

The main use of the RANSAC algorithm is to estimate the parameters of a predetermined mathematical model from a set of outlier-bearing data by an iterative method, and it can also be used to find the outliers in the given data. The method assumes that most data in given data are non-outliers and all conform to a fixed mathematical model, estimates parameters of the mathematical model by continuously and randomly sampling data, and verifies the correctness of the model by using other data, so that the more data of the model obtained by estimation, the higher the probability of correctness of the model. Usually, when the RANSAC algorithm is used to filter the mismatching of the image features, the model used is the formula (10), that is, a basic matrix is solved in each iteration, and since the basic matrix has 9 unknowns and the matrix has one degree of freedom, at least 8 sets of matched feature points are needed in each solving, which has certain requirements on the number and the calculation amount of the feature points. In the method, in order to improve the calculation speed of the RANSAC algorithm and reduce the degree of dependence on the number of the feature points, the RANSAC algorithm based on two groups of matched feature points is used, and the embodiment of the invention needs additional motion information of a gyroscope. Two frames when a gyroscope exists in the system, as long as the system moves, the rotation quantity delta phi, delta theta and delta psi of the system in three dimensions in space can be obtained, and the rotation matrix corresponding to the rotation quantity delta phi, the delta theta and the delta psi can be represented as R_x(Δφ)，R_y(Δθ)， R_z(Δ ψ), the rotation matrix of the corresponding camera in three-dimensional space between the two previous and next frames can be expressed as:

R＝R_x(Δφ)·R_y(Δθ)·R_z(Δψ) (35)

the camera direction corresponding to one frame image can be switched to the camera direction corresponding to the other frame image according to equation (35). So that there is only translation of the corresponding camera relative motion between the two images. The parameters of the camera are generally known, so the essential matrix E can be chosen when choosing the mathematical model required by the RANSAC algorithm. The essence matrix consists of translations and rotations of the camera in three-dimensional space:

E＝[T]_×R (36)

wherein:

wherein T is_x，T_y，T_zRepresenting translation of the camera in three directions in three-dimensional space. When the camera direction corresponding to one frame image is turned to the camera direction corresponding to the other frame image, the relative motion of the camera in the three-dimensional space becomes:

T＝ρ[sin(β)·cos(α)-sin(β)·sin(α)cos(β)]^T (38)

R＝I₃ (39)

where ρ represents the mode length of translation T in three-dimensional space, α, β represent two angles, respectively, I₃Representing a three-dimensional unit vector. Substituting (36) equation (38) reduces the essence matrix to:

it can be seen that only two unknowns remain in the essential matrix, and thus the correspondence of features existing between two images needs to satisfy:

x₁(y₀·cos(β)+z₀·sin(α)sin(β))-y₂·(x₀cos(β)-z₀·cos(α)sin(β))-z₂·(y0cos(α) sin(β)+x₀·sin(α)sin(β))＝0 (41)

at least two sets of matched feature points are used to solve alpha and beta, and then relative movement and direction of a set of cameras in a three-dimensional space are obtained. And projecting other feature points from one frame image to another frame image in a three-dimensional projection mode according to the set of information, calculating a back projection error, and distinguishing whether the set of matched feature points supports the currently estimated camera motion parameters or not by setting a reasonable threshold value. When the RANSAC method based on two groups of matched feature points is used, two groups of matched feature points are randomly selected each time, in order to ensure that the two groups of matched feature points provide enough information quantity for solving an equation, a distance threshold value is set to ensure that the two groups of matched feature points have enough distance, and if the distance between the two groups of matched feature points which are currently selected is smaller than the set distance threshold value, the two groups of matched feature points are reselected. And continuously and repeatedly selecting two groups of matched feature points, solving the motion parameters of the camera, and verifying the number of the other matched feature points to accord with the camera motion parameters obtained by solution. And when the upper limit of the iteration times is reached, selecting a group of camera motion parameters which can meet the most matched feature points. Meanwhile, a constraint that the motion of the system should be continuous is additionally added, and since image data is often acquired at a higher frame rate in the SLAM system, it can be considered that the motion direction is not greatly changed in a very short time. And comparing whether the motion direction of the current estimated camera in the space is equal to the motion direction estimated at the previous moment or not, if the difference is larger than a preset threshold value, skipping the motion parameters of the current group of cameras, and replacing the next group. And filtering out the feature tracking result which does not support the set of parameters according to the selected set of camera motion parameters.

As shown in fig. 4, the first preset condition is: the characteristic tracking success rate is not greater than the characteristic point tracking success rate threshold, or the characteristic tracking success rate is greater than the characteristic point tracking success rate threshold, and the characteristic point tracking number is greater than the maximum characteristic point tracking number;

And entering a threshold adjusting stage after the characteristic tracking stage is completed. The main purpose of the threshold adjustment phase is to decide how to adjust the feature extraction threshold based on the results of the current feature tracking. The distribution of the feature points currently successfully tracked over the image is first examined. Equally dividing the image into 4 areas of 2 x 2, and counting the number of successfully tracked feature points in each area respectively, and if the tracking number of the feature points in any area is less than a preset threshold value, determining that the feature distribution still effectively tracked currently is uneven. In this example the threshold is set to 5. And then judging whether the camera moves or not according to the average moving distance of the features on the image plane, which is calculated in the feature tracking stage, and when the average moving distance is less than 2 pixels, considering that the camera does not move compared with the previous moment, and not adjusting the feature extraction threshold value at the current moment. If the camera is detected to be moving, the feature extraction threshold is adjusted according to a series of conditions. Specifically, the tracking success rate of the current feature is first calculated using the following equation:

If the current feature tracking success rate is less than the feature point tracking success rate threshold, it indicates that there are many meaningless feature points, such as points on a white wall, in all the feature points at present. This is because the current feature extraction threshold is too low, so that the feature response value at some positions of the weak texture region also reaches the set threshold, and therefore the feature extraction threshold should be increased. And if the current feature tracking success rate is greater than the feature point tracking success rate threshold, which indicates that the current feature points are all valid, further judging whether the number of the currently tracked feature points is greater than the maximum feature point tracking number, and if so, indicating that a large number of textures exist in the current scene and the current feature extraction threshold can be set to be relatively large. Information provided by feature points which are too dense has a large number of redundant components, so that great help is not provided for improving the accuracy of camera pose estimation, and the calculation efficiency is reduced, so that the feature extraction threshold value needs to be increased at the moment. If the number of the currently tracked feature points is smaller than the maximum feature point tracking number, whether the number of the currently tracked feature points is too small is further judged, and the too small number of the tracked feature points means that the obtained feature matching result is too small, so that enough constraints cannot be provided when the camera pose is estimated, and the estimation result is possibly not accurate enough. Therefore, if the number of currently tracked feature points is found to be less than the minimum feature point tracking number, the feature extraction threshold needs to be lowered. On the other hand, if the features extracted currently are all found to be concentrated in a certain area on the image, this indicates that the distribution of the current features is not uniform enough, and it is generally desirable that the available features are uniformly distributed on the image, which is beneficial to improve the accuracy of camera pose estimation, so that the threshold value of feature extraction also needs to be reduced in this case. For all the rules, when the threshold value of the feature extraction needs to be improved, the current feature extraction threshold value is multiplied by a fixed multiplying factor, namely the feature point extraction threshold value is increased in multiplying factor, and when the threshold value of the feature extraction needs to be reduced, the current feature point extraction threshold value is multiplied by the feature point extraction threshold value to be reduced in multiplying factor.

In order to verify the effectiveness of the dynamic threshold value adjusting method provided by the embodiment of the invention, image data under the condition of poor ambient light conditions is shot in an indoor scene, and the frame rate of the data is 20 frames per second. In the experiment, Harris features are used for image feature extraction, and a pyramid optical flow method is used for tracking feature points. In the embodiment of the invention, the logarithm value is reduced in scale when the Harris characteristic response is calculated in the experimental process. At the initial time, the feature extraction threshold is set to 600, which is adjusted continuously during the operation of the system. Other preset parameters include that the minimum feature point tracking number is set to 30, the feature point tracking success rate is set to 0.85, the maximum feature point tracking number is set to 200, the feature point extraction threshold increasing magnification and the feature point extraction threshold decreasing magnification are set to 1.1 and 0.9, respectively, and the pixel movement minimum distance is set to 1. The scene in the whole video comprises an area with obvious texture, a white wall area and an area with uneven illumination, and the whole shot image is dark due to poor illumination condition in the environment, so that the traditional visual SLAM algorithm is very challenging. The partial image in the data is shown in fig. 5.

The first line in the figure is the image in the test data, and the second line is the corresponding position image additionally taken at a similar position under good lighting conditions. As can be seen from the figure, some scenes originally have certain texture information, but due to a dark environment, the image captured by the camera in the scene without the light almost appears black, and the image captured in the scene with the light is locally bright and dark all around, and such data greatly test the robustness of the visual feature extraction and tracking. In the experiment, only the performance of feature extraction and matching is tested, for each input image, feature tracking is carried out on feature points which exist in the system and are still in a trackable state on a new image by using a light flow method, the positions of the feature points on the new image are found, and then new feature points are extracted from the current image according to the feature tracking result. The situation that the feature point extraction threshold changes in the system in the process of traversing the whole data is shown in fig. 6.

As can be seen from the figure, at the beginning of the operation of the system, the feature point extraction threshold value does not change because the camera does not move, and the number of extracted feature points is very small because the feature point extraction threshold value at this time is high. When the camera starts to move, the system gradually starts to lower the feature point extraction threshold, the threshold rapidly drops to a relatively reasonable range from the 60 th frame to the 90 th frame, and as the feature point extraction threshold is reduced, new images of each frame can gradually extract feature points. As shown by the gray broken line in the figure, the number of feature points that can be extracted increases as the gradient decreases and a peak is reached between the 110 th frame and the 115 th frame. As shown by the 2 nd broken line in the figure, as the number of feature points that can be extracted from each frame of image increases, the number of feature points that can be effectively tracked in the system also increases, which is also effective information really required by the feature point method SLAM, that is, the corresponding relationship of feature points between image frames increases. Although more feature points are extracted from the image between the 110 th frame and the 115 th frame, since a large part of feature points are extracted from moving targets in the scene and are filtered out in the feature tracking stage, the number of feature points which can be effectively tracked in the system is not increased. The case of feature point extraction from the 110 th frame to the 115 th frame is shown in fig. 7. The gray dots in the figure represent the newly extracted feature points of each frame, and a large part of the feature points fall on a walking person and a shaking notebook computer. Since no additional feature extraction is performed in the vicinity of the feature points already existing when the features are extracted from the new image, it can be seen from 130 frames onward that there are some trackable feature points in the system and the feature points start to be stable, and the feature points extracted from the image start to be reduced and maintained at a lower level. The change of the threshold value extracted from the feature points in the whole sequence can be seen as the threshold value is continuously changed along with the change of the environment, but the threshold value is always kept at a lower level because the environment is always in a dark condition. In the whole process of the change of the feature point extraction threshold, the threshold is also increased to several tens of times of the threshold at the lowest level at certain moments, at these moments, the camera moves to an environment with relatively good illumination conditions, the originally lower feature point extraction threshold causes the extracted feature points to become very large, on the other hand, the number of extracted meaningless feature points is also increased, which causes the tracking rate to be reduced, and the two conditions cause the feature point extraction threshold to be continuously increased in the period so as to accord with the current environment. Fig. 8 shows the number of feature points owned at each time in the system from frame 2400 to frame 2700 (the number of feature points trackable at the previous time plus the number of newly extracted feature points), the number of feature points trackable after feature tracking at the current time, and the number of feature points filtered at the current time. It can be seen that at some point in time the number of feature points exceeds a set threshold while at each point in time there is a certain number of invalid feature points. The feature tracking at

frames

2500, 2600, 2700 is shown in fig. 9. Each row is a group of images, and the left side and the right side of each group of images are two continuous front-back frame images. The valid feature points present in the system at the last moment are shown on the left image, where the grey dots indicate that these points can still be correctly tracked on the current image. The result of tracking the feature points on the current image is shown on the image on the right side, a green line segment can be used for representing the trace of the correctly tracked feature points in the actual application process, and a red line segment can be used for representing the wrong tracking result. The feature point tracking results shown here have not been filtered by the RANSAC algorithm, and in the final feature tracking results, the feature points in the rectangular box are also filtered out because they fall on a moving person. As can be seen from the figure, more effective feature points are extracted in a region with relatively good ambient lighting conditions, and increasing the feature point extraction threshold value can effectively prevent excessive feature points from being extracted while ensuring that a sufficient number of effective features are matched. At some point, the scene appears bright and rich in texture in local areas, while some other areas are relatively dark. This typically occurs during camera movement from a well lit area to a poorly lit area. There are also cases in which the feature point extraction threshold changes in the system during the period from frame 800 to frame 860 in the test data, and the distribution of the valid feature points existing in the system on the image is shown in fig. 10. It can be seen from the figure that there is a region with better illumination condition in the environment between the 800 th frame and the 825 th frame, so that the feature point extraction threshold is relatively higher at this time, and the feature points are mainly distributed in the brighter region on the image. Since the upper region of the image is darker, feature points cannot be extracted at a higher feature extraction threshold. In order to make the distribution of the extracted features on the image more uniform, the feature point extraction threshold value is continuously reduced, and it can be seen from the 830 th frame to the 845 th frame that the distribution of the feature points gradually extends to the darker area, and the feature points can be uniformly distributed on the whole image all the time in the process of switching from the lighter scene to the darker scene, and the situation that the feature points are concentrated in a small area or cannot be extracted does not occur. On the other hand, the feature extraction threshold does not drop to an excessively low level, and it can be seen from the figure that even though the feature extraction threshold is always dropping, no invalid feature points are extracted in the white wall area on the left side of the image, the variation of the feature extraction threshold can always ensure that enough feature points can be extracted and the quality of the feature points can be ensured, and some feature tracking results are shown in fig. 11.

For the visual SLAM algorithm, the weak texture region usually makes the algorithm unable to extract the feature points, and since the dynamic threshold adjustment method provided by the embodiment of the present invention continuously reduces the feature extraction threshold when the number of effective feature points existing in the system is small, a new challenge becomes how to prevent the feature point extraction threshold from being reduced in a featureless scene. The number of all feature points (the feature point effectively tracked at the last time and the feature point newly extracted at the current frame) and the number of invalid feature points of the feature point extraction threshold at each time between the 1300 th frame and the 1600 th frame are shown in fig. 12. During this time the camera is in a weak texture environment, most of the area is white wall. It can be seen that the threshold value, though constantly fluctuating, is always kept at a very low level, whereas the number of active features present in the system always fluctuates around zero. In such extreme circumstances, slight threshold variations can result in large variations in the number of feature points that can be extracted. In fig. 5-11, which particularly show the feature extraction from the 1436 frame to the 1438 frame, it can be seen that the feature point extraction threshold value is reduced a little at the 1437 frame, which results in a large number of invalid feature points being extracted, and these feature points are filtered out in the feature tracking stage, which makes the feature tracking rate very small, so that the threshold value is raised a little, and these invalid feature points are not extracted again on the next frame image. It can be seen that the number of effective features in the weak texture region is very small, so that the method provided by the embodiment of the present invention continuously tries to reduce the threshold to find possible effective feature points, which finally causes the feature point extraction threshold to oscillate within a small range, and suddenly extracts a large number of invalid feature points at some time. To address this problem, the frequency of change or the magnitude of adjustment of the feature point extraction threshold may be artificially reduced when the feature point extraction threshold is at a very low level. Finally, the number of valid feature points present in the system at each moment in time is shown in fig. 13. It can be seen that under such data, the system can still extract a sufficient number of valid feature points in most regions, and the number of feature points is substantially maintained within the set maximum number of feature points, while in those weak texture regions, the system can also effectively identify invalid feature points and filter them out. The efficient operation of this system is also reflected from the side that under extreme lighting conditions, even if the texture present in the original scene appears on the image as being difficult for the human eye to recognize, the computer can effectively locate and utilize these features once an appropriate feature response threshold is found. Testing on this data was also attempted experimentally with the latest open source method, ORB-SLAM3, which failed to extract feature points on such data, resulting in a system that was never initialized and run.

The embodiment of the invention also provides a system for dynamically adjusting the feature point extraction threshold of an image, and the method for dynamically adjusting the feature point extraction threshold of the image based on the embodiments comprises the following steps:

Based on the same concept, an embodiment of the present invention further provides an entity structure schematic diagram, as shown in fig. 14, the server may include: a processor (processor)810, a communication Interface 820, a memory 830 and a communication bus 840, wherein the processor 810, the communication Interface 820 and the memory 830 communicate with each other via the communication bus 840. The processor 810 may call logic instructions in the memory 830 to perform the steps of the method for dynamically adjusting the feature point extraction threshold of an image as described in the embodiments above. Examples include:

Furthermore, the logic instructions in the memory 830 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Based on the same concept, embodiments of the present invention further provide a non-transitory computer-readable storage medium, where a computer program is stored, where the computer program includes at least one code, and the at least one code is executable by a master device to control the master device to implement the steps of the method for dynamically adjusting the feature point extraction threshold of the image according to the embodiments. Examples include:

Based on the same technical concept, the embodiment of the present application further provides a computer program, which is used to implement the above method embodiment when the computer program is executed by the main control device.

The program may be stored in whole or in part on a storage medium packaged with the processor, or in part or in whole on a memory not packaged with the processor.

Based on the same technical concept, the embodiment of the present application further provides a processor, and the processor is configured to implement the above method embodiment. The processor may be a chip.

In summary, according to the method and system for dynamically adjusting the feature point extraction threshold of the image provided by the embodiment of the present invention, feature extraction and feature tracking are regarded as two interdependent processes, the quality of the feature points extracted in the feature extraction stage affects the result of the feature tracking, and the result of the feature tracking is fed back to the feature extraction stage to adjust the key parameters required in the feature extraction; the feature point extraction threshold is dynamically adjusted according to the current environment, so that the traditional feature extraction method can effectively extract feature points under the extreme illumination environment.

The embodiments of the present invention can be arbitrarily combined to achieve different technical effects.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the procedures or functions described in accordance with the present application are generated, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital subscriber line) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid state disk), among others.

One of ordinary skill in the art will appreciate that all or part of the processes of the methods of the above embodiments may be implemented by a computer program that can be stored in a computer-readable storage medium and that, when executed, can include the processes of the above method embodiments. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM or RAM, magnetic or optical disks, etc.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for dynamically adjusting a feature point extraction threshold of an image is characterized by comprising the following steps:

2. The method for dynamically adjusting the feature point extraction threshold of the image according to claim 1, wherein in step S1, the initial image feature point extraction parameters further include a maximum feature point tracking number, a minimum feature point tracking number, and a feature point tracking success rate threshold.

3. The method for dynamically adjusting the feature point extraction threshold of the image according to claim 2, wherein the first preset condition is that: the characteristic tracking success rate is not more than the characteristic point tracking success rate threshold, or the characteristic tracking success rate is more than the characteristic point tracking success rate threshold, and the characteristic point tracking number is more than the maximum characteristic point tracking number;

4. The method for dynamically adjusting the feature point extraction threshold of the image according to claim 1, wherein the step S3, if it is determined that the camera is moving based on the average moving distance of the feature points, specifically comprises:

5. The method for dynamically adjusting the feature point extraction threshold of an image according to claim 1, wherein in step S3, if it is determined that the camera does not move based on the average moving distance of the feature points, the feature extraction threshold is not adjusted at the current time.

6. The method for dynamically adjusting the feature point extraction threshold of the image according to claim 3, wherein in step S3, when determining whether the feature point distribution condition in the second preset condition is satisfied, the image is equally divided into a plurality of regions with the same size, and the number of feature points successfully tracked in each region is respectively counted, and if the number of feature points in any one region is less than the preset feature point number threshold, it is determined that the currently effectively tracked feature points are not uniformly distributed.

7. The method for dynamically adjusting the feature point extraction threshold of the image according to claim 3, wherein in the step S3, the feature tracking success rate is:

in the above formula, n_fRepresenting the number of feature points obtained before the current feature point tracking, including the feature points successfully tracked at the last momentExtracting new characteristic points from the characteristic points at the current moment; n is_tRepresenting the number of successfully tracked feature points at the current time.

8. A system for dynamically adjusting a feature point extraction threshold of an image, comprising:

the characteristic extraction tracking module is used for determining a characteristic point extraction result and a characteristic point tracking result of a current frame and a previous frame when image characteristic point extraction is carried out based on the initial image characteristic point extraction parameters;

the threshold dynamic adjustment module is used for determining the average moving distance of the feature points, the success rate of feature tracking and the tracking number of the feature points based on the feature point extraction result and the feature point tracking result of the current frame and the previous frame;

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for dynamically adjusting the feature point extraction threshold of an image according to any one of claims 1 to 7 when executing the program.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the method for dynamically adjusting feature point extraction thresholds of an image according to any one of claims 1 to 7.