CN113034595B

CN113034595B - Method for visual localization and related device, apparatus, storage medium

Info

Publication number: CN113034595B
Application number: CN202110297587.7A
Authority: CN
Inventors: 翟尚进; 陈常; 章国锋
Original assignee: Zhejiang Shangtang Technology Development Co Ltd
Current assignee: Zhejiang Shangtang Technology Development Co Ltd
Priority date: 2021-03-19
Filing date: 2021-03-19
Publication date: 2024-06-07
Anticipated expiration: 2041-03-19
Also published as: CN113034595A

Abstract

The application discloses a visual positioning method, a related device, equipment and a storage medium, wherein the visual positioning method comprises the following steps: acquiring a current image frame acquired in the process of visually positioning a target; judging whether the current image frame meets the positioning condition or not; and under the condition that the positioning condition is met, performing visual positioning processing by using the current image frame. According to the scheme, the visual positioning processing is performed on the current image frame only under the condition that the current frame meets the positioning condition, so that the visual positioning processing frequency is dynamically adjusted, and the accuracy of visual positioning is higher.

Description

Method for visual localization and related device, apparatus, storage medium

Technical Field

The present application relates to the field of positioning and navigation technologies, and in particular, to a visual positioning method, and related devices, apparatuses, and storage media.

Background

At present, the visual positioning method plays an important role in the fields of computer vision, robots, unmanned aerial vehicles, three-dimensional reconstruction primary augmented reality and the like. The existing visual positioning method adopts fixed processing frequency, and the defects of the processing mode include that if equipment processes static or low-speed movement, a large amount of useless calculation is brought, meanwhile, certain measurement errors exist in pre-integration and positioning tracking, and the accuracy of visual positioning is reduced due to the fact that the errors are possibly amplified by high-frequency processing.

Disclosure of Invention

The application provides at least one visual positioning method, a related device, equipment and a storage medium.

The application provides a visual positioning method, which comprises the following steps: acquiring a current image frame acquired in the process of visually positioning a target; judging whether the current image frame meets the positioning condition or not; and under the condition that the positioning condition is met, performing visual positioning processing by using the current image frame.

Therefore, after judging that the current image frame meets the positioning condition, the current image frame is utilized to perform visual positioning processing, so that the frequency of the visual positioning processing is dynamically adjusted, unnecessary visual positioning processing can be reduced, namely, the calculated amount is reduced, the efficiency of the positioning processing is improved, and errors can be accumulated in each visual positioning processing, therefore, the visual positioning is performed under the condition that the positioning condition is met, the error accumulation can be reduced, and the accuracy of the visual positioning is improved.

Wherein the visual positioning processing comprises image processing and positioning processing; under the condition that the positioning condition is met, performing visual positioning processing by using the current image frame, wherein the visual positioning processing comprises the following steps: image processing is carried out on the current image frame, and a processing result of the current image frame is obtained; positioning processing is carried out by utilizing the processing result of the current image frame; wherein at least one step of the image processing and the positioning processing is performed in a case where a corresponding positioning condition is satisfied.

Therefore, by setting positioning conditions for image processing and/or positioning processing in the visual positioning process and determining whether corresponding processing steps need to be executed or not by corresponding positioning conditions, thereby realizing finer and flexible dynamic adjustment of the visual positioning processing frequency, further, some unnecessary processing steps of visual positioning can be more accurately reduced, and some necessary processing steps remain to be executed, and under the condition of reducing the calculation amount, the accuracy of visual positioning is further improved.

The image processing is executed under the condition that positioning conditions corresponding to the image processing are met, and the positioning processing is executed according to a first preset frequency; or the image processing is executed according to a second preset frequency, and the positioning processing is executed under the condition that the positioning condition corresponding to the positioning processing is met; or the image processing is performed when the positioning condition corresponding to the image processing is satisfied, and the positioning processing is performed when the positioning condition corresponding to the positioning processing is satisfied, wherein the positioning conditions corresponding to the image processing and the positioning processing are the same or different.

Therefore, by dynamically adjusting the processing frequency of at least one of the image processing and the positioning processing, some unnecessary processing steps of positioning can be reduced more precisely, and some necessary processing steps remain to be executed, so that the positioning accuracy is further improved under the condition of reducing the calculation amount.

Wherein, judge whether the present image frame meets the locating condition, including at least one first judgement step of: judging whether the motion time between the current image frame and the first historical image frame meets the preset time requirement or not; judging whether the motion amplitude between the current image frame and a first historical image frame meets the preset amplitude requirement, wherein the first historical image frame is an image frame subjected to visual positioning processing before; and if the judgment result of the first judgment step is yes, determining that the positioning condition is met.

Therefore, the current image frame is determined to meet the positioning condition by utilizing the motion time or the motion amplitude, and whether the positioning is performed or not is further determined, so that the positioning adjustment frequency depends on the motion speed and/or the motion frequency, and the positioning adjustment frequency is more reasonable.

The method for judging whether the motion time between the current image frame and the first historical image frame meets the preset time requirement comprises at least one second judging step of: judging whether the movement time is longer than a first preset time; judging whether the movement time is greater than a second preset time corresponding to the preset movement state under the condition that the target is determined to be in the preset movement state, wherein the second preset time is smaller than the first preset time, at least one of the preset movement states of the target is provided, and each preset movement state corresponds to a second preset time; if the judgment result of the second judgment step is yes, determining that the preset time requirement is met; and/or, judging whether the motion amplitude between the current image frame and the first historical image frame meets the preset amplitude requirement, including: and judging whether the parallax between the current image frame and the first historical image frame is larger than a first preset parallax or not.

Therefore, by setting various motion states, the visual positioning method can accurately judge under various conditions of the target. And, utilize parallax between current image frame and the first historical image frame to characterize the motion amplitude, make the judgement result more accurate.

The positioning condition comprises a plurality of different processing steps for judging whether to position the current image frame or not, wherein at least one of a first preset time, a second preset time and a first preset parallax in each positioning condition is different; and/or, disparity between the current image frame and the first historical image frame: a first average parallax between a plurality of matched two-dimensional point pairs in the current image frame and the first historical image frame, or a second average parallax between a plurality of predicted points of the current image frame and a plurality of two-dimensional points of the first historical image frame, wherein the plurality of predicted points are projection points of tracking three-dimensional points mapped by the second historical image frame on the current image frame, and the plurality of two-dimensional points are two-dimensional points corresponding to the tracking three-dimensional points on the first historical image frame.

Therefore, by setting a plurality of positioning conditions, the current image frame can be judged a plurality of times, so that the frequency of visual positioning processing of the image frame can be reduced to a certain extent, and the accuracy of the visual positioning processing is increased.

And when the number of the predicted points is smaller than the first number, the second average parallax is a second preset parallax, wherein the second preset parallax is larger than the first preset parallax.

Therefore, when the number of the predicted points is smaller than the first number, the movement amplitude of the target is excessively large, if the visual positioning processing is not performed on the target, the subsequent positioning failure may be caused, that is, the first average parallax is set to be larger than the first preset parallax, so that the current image frame can participate in the visual positioning processing, and the stability of the visual positioning processing is improved.

The preset motion states comprise two first motion states, wherein each first motion state is larger than a preset speed, and the motion speed of the first motion state is smaller than that of the second first motion state; the method further comprises at least one of the following steps: before judging whether the current image frame meets the positioning condition or not, if a second number of image frames exist to be subjected to image processing, increasing a first preset parallax; if the target is determined to be in a first motion state, reducing a first preset parallax; and if the target is determined to be in the second first motion state, increasing the first preset parallax.

Therefore, the first preset parallax is flexibly adjusted according to the motion state of the current image frame or the computational power of the equipment before judging whether the current image frame meets the positioning condition, namely whether the second number of image frames are to be subjected to image processing, so that the positioning accuracy and stability are improved.

The preset motion states comprise at least one first motion state, the motion speed of each first motion state is larger than the preset speed, and the motion speeds of different first motion states are different; determining that the target is in a first motion state comprises at least one of the following steps: determining that the parallax between the current image frame and the first historical image frame is greater than a third preset parallax; determining that the number of tracked three-dimensional points mapped by the second historical image frame is less than a third number; and determining that the average re-projection error of the tracking three-dimensional point in each second historical image frame is larger than a preset error.

Therefore, the motion state of the target is determined through the parallax between the current image frame and the first historical image frame or the number of tracking three-dimensional points and the re-projection error, so that the motion state is determined more accurately.

The third preset parallaxes, the third numbers and the preset errors corresponding to the different first motion states are different, wherein the larger the motion speed of the first motion states is, the smaller the corresponding third numbers are, and the larger the third preset parallaxes and the preset errors are.

Therefore, the larger the motion state of the target is, the smaller the corresponding third quantity is, and the larger the third preset parallax and the preset error are, which indicates that the requirement on the preset state is stricter, so that the motion amplitude of the target can be adapted to be larger.

The preset motion state comprises a second motion state, and the second motion state appears after the first motion state; determining that the target is in the second motion state comprises at least one of the following steps: determining that the parallax between the current image frame and the first historical image frame is larger than a third preset parallax, and the number of the tracking three-dimensional points mapped by the second historical image frame is larger than or equal to a fourth number, wherein the fourth number is larger than the third number; and determining that the average re-projection error of the tracking three-dimensional points in each second historical image frame is larger than a preset error, and the number of the tracking three-dimensional points mapped by the second historical image frames is larger than or equal to the fourth number.

Therefore, by setting a plurality of motion states and different information which can be used for the visual positioning process in different motion states, the frequency of the visual positioning process of the target can be adjusted according to the motion state of the target.

Wherein, confirm the goal is not in the preset motion state, including at least one step of: determining that the parallax of adjacent image frames of the continuous fourth number of image frames subjected to visual positioning processing is smaller than or equal to a third preset parallax; determining that the number of tracked three-dimensional points mapped by the second historical image frame is greater than or equal to a third number; and determining that the average re-projection error of the tracking three-dimensional point in each second historical image frame is smaller than or equal to the preset error.

Therefore, the corresponding positioning condition judgment can be executed by determining that the target is not in the preset motion state, so that the aim of adjusting the image processing frequency is fulfilled.

Wherein the method further comprises: under the condition that the positioning condition is not met or the preset number of image frames are to be subjected to image processing, the current image frame is not utilized to perform visual positioning processing, the next image frame is obtained, and whether the positioning condition is met or not and the follow-up steps are judged for the next image frame.

Therefore, by not performing visual positioning processing on the current image frame under the condition that the positioning condition is not met or the preset number of image frames are to be subjected to image processing, the positioning precision and the computing power of the equipment are improved.

The application provides a visual positioning device, comprising: the acquisition module is used for acquiring a current image frame acquired in the process of visually positioning the target; the condition judging module is used for judging whether the current image frame meets the positioning condition or not; and the positioning module is used for performing visual positioning processing by using the current image frame under the condition that the positioning condition is met.

The application provides an electronic device comprising a memory and a processor for executing program instructions stored in the memory to implement the method of visual localization described above.

The present application provides a computer readable storage medium having stored thereon program instructions which, when executed by a processor, implement a method of visual localization as described above.

According to the scheme, after the current image frame is judged to meet the positioning condition, the current image frame is utilized to perform visual positioning processing, so that the frequency of the visual positioning processing is dynamically adjusted, unnecessary visual positioning processing can be reduced, namely, the calculated amount is reduced, the efficiency of the visual positioning processing is improved, and errors can be accumulated in each visual positioning processing, therefore, the positioning is performed under the condition that the positioning condition is met, the error accumulation can be reduced, and the positioning precision is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a first flow chart of an embodiment of a method for visual positioning according to the present application;

FIG. 2 is a second flow chart of an embodiment of a method for visual localization according to the present application;

FIG. 3 is a schematic view of an embodiment of a visual positioning device according to the present application;

FIG. 4 is a schematic diagram of an embodiment of an electronic device of the present application;

FIG. 5 is a schematic diagram of an embodiment of a computer readable storage medium of the present application.

Detailed Description

The following describes embodiments of the present application in detail with reference to the drawings.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.

The application is applicable to devices having image processing capabilities. In addition, the device may be provided with image acquisition or video acquisition functionality, for example, the device may include means for acquiring images or video, such as a camera. Or the device may obtain the required video stream or image from the other device by performing data transmission or data interaction with the other device, or access the required video stream or image from the storage resource of the other device, etc. The other devices have an image capturing function or a video capturing function, and are in communication connection with the device, for example, the device may perform data transmission or data interaction with the other devices through bluetooth, a wireless network, or the like, and the communication manner between the two devices is not limited herein, and may include but is not limited to the above-mentioned exemplary case. In one implementation, the device may include a cell phone, tablet, interactive screen, etc., without limitation.

Referring to fig. 1, fig. 1 is a schematic flow chart of a method for visual positioning according to an embodiment of the application. Specifically, the method may include the steps of:

Step S11: a current image frame acquired during visual localization of the target is acquired.

The current image frame acquired in the visual positioning process of the target can be obtained by shooting by any equipment with an imaging component, and can be an image acquired from other equipment, or an image subjected to frame selection, brightness adjustment, resolution adjustment and the like. The target may be the device itself that acquired the current image frame. Wherein, other devices refer to devices which can be operated by different central processing units respectively. For example, in the unmanned aerial vehicle driving process, the unmanned aerial vehicle is the target of the embodiment of the disclosure. Or the sweeping robot is in operation, and the sweeping robot is also an object of the embodiment of the disclosure. I.e. the device is in an environment, enabling to locate and/or navigate its own position in the environment.

Step S12: and judging whether the current image frame meets the positioning condition.

Wherein, the positioning condition can be one or a plurality of positioning conditions. When the positioning conditions are multiple, the current image frame is determined to meet the positioning conditions when all the positioning conditions are met. Of course, in other embodiments, if there are multiple positioning conditions, when the current image frame satisfies one or more of the positioning conditions, it may be determined that the current image frame satisfies the positioning conditions, and all the positioning conditions need not be satisfied.

Step S13: and under the condition that the positioning condition is met, performing visual positioning processing by using the current image frame.

The method of performing visual positioning processing by using the current image frame may be that features in the current image frame are extracted and the extracted features are matched with features in a previous image frame, and then pose calculation is performed on the device or the preset target object according to the feature matching result, so as to realize positioning of the device or the preset target. In some disclosed embodiments, if the current frame is the first frame image, the positioning can be performed directly by using the image information of the current frame image, and if the current frame is not the first frame image, the positioning can be performed by using the positioning information of the history frame image and the image information of the related frame image, that is, the image tracking positioning.

According to the scheme, after the current image frame is judged to meet the positioning condition, the current image frame is utilized to perform visual positioning processing, so that the frequency of the visual positioning processing is dynamically adjusted, unnecessary visual positioning processing can be reduced, namely, the calculated amount is reduced, the efficiency of the positioning processing is improved, and errors can be accumulated in each visual positioning processing, therefore, the visual positioning is performed under the condition that the positioning condition is met, the error accumulation can be reduced, and the accuracy of the visual positioning is improved.

In some disclosed embodiments, determining whether the current image frame satisfies the positioning condition includes at least one first determining step of:

Wherein the first judging step includes: and judging whether the motion time between the current image frame and the first historical image frame meets the preset time requirement. The first historical image frame is an image frame which is subjected to visual positioning processing before. Optionally, the embodiment of the disclosure uses a frame that is processed by the last visual positioning of the current image frame as the first historical image frame. Of course, in other embodiments, any of the previous visual localization processed image frames of the current image frame may be selected as the first historical image frame. The selection of the first historical image frame is not specifically defined herein.

Wherein determining whether the motion time between the current image frame and the first historical image frame meets the preset time requirement may include at least one of the following second determining steps: the first and second judging steps are to judge whether the movement time is longer than the first preset time. The setting for the first preset time may be evaluated based on the computing power of the device. Optionally, the frequency H1 of the lowest visual positioning processing per second of the device is acquired, and the first preset time is 1/H1. Of course, the manner of setting the preset time is not limited to this manner, and may be set according to specific requirements, or set randomly, or the like. Of course, the first preset time and the second preset time can be determined by comprehensively referring to the preset time T1 for processing the visual positioning every second, the frequency H1 for processing the visual positioning every second of the lowest acquisition equipment, the frequency H3 for processing the visual positioning every second of the preset motion state, the image processing time T1, the positioning processing time T2 and other information of each image frame. The second preset time is set to 1/H3, for example. Optionally, if the product of the sum of the image processing time and the positioning processing time for each image frame and the frequency of processing the visual positioning at the lowest per second is greater than or equal to the preset per second time for processing the visual positioning, the first preset time is set to infinity.

The second judging step of judging whether the motion time between the current image frame and the first historical image frame meets the preset time requirement is to judge whether the motion time is larger than the second preset time corresponding to the preset motion state or not under the condition that the target is determined to be in the preset motion state, wherein at least one preset motion state of the target exists, and each preset motion state corresponds to the second preset time. Alternatively, it may be determined whether the current image frame satisfies one of the first second determining step or the second determining step in the case where the target is in the preset motion state, and only whether the current image frame satisfies the first second determining step in the case where the target is not in the preset motion state. By the method, the power consumption of the equipment in the judging process can be saved under the condition that the target is not in the preset motion state.

The number of the preset motion states may be 1, or may be 2 or more, and the present application is not particularly limited with respect to the number of the preset motion states. The preset motion states comprise at least one first motion state, the motion speed of each first motion state is larger than the preset speed, and the motion speeds of different first motion states are different. For example, the preset motion state includes two first motion states. Optionally, the speed of movement of the first state of movement is less than the speed of movement of the second first state of movement. The movement speed here may be the speed of rotation and/or movement. The first motion state of the preset motion may be determined according to the motion state of the previous image processing frame, or may be determined according to the motion state of the current image frame. For example, if the previous image processing frame obtains that the target is in the preset motion state through analysis, the target is still in the preset motion state when the current image frame is shot, and the motion state analysis of the current frame affects the determination of the motion state of the next image frame. Alternatively, the second preset times of the different preset motion states may be different.

Optionally, the manner of determining that the target is in the first motion state may include at least one of the following steps: one determines that the disparity between the current image frame and the first historical image frame is greater than a third preset disparity. The disparity determination between the current image frame and the first historical image frame may be by determining an average of disparities between pairs of true two-dimensional points that match in the current image frame and the first historical image frame. The third preset parallax is larger than the parallax of the target in the normal motion state. The determination manner of the first historical image frame is as described above, if the parallax between the two frames is too large, it indicates that the target moving speed or the rotating speed is too fast, and the moving state of the target moving speed or the rotating speed belongs to abnormal movement, and the moving state is listed as a preset moving state. And secondly, determining that the number of the tracked three-dimensional points mapped by the second historical image frame is smaller than the third number. Wherein the second historical image frame here may be a number of historical image frames before the current image frame. For example, a sliding window with a preset size is set, and all image frames in the sliding window are second historical image frames. The size of the sliding window is not limited, and the size of the sliding window is set to 10, that is, the sliding window includes 10 second historical image frames in total, that is, 10 second historical image frames before the current frame are reserved in the sliding window. The mapping tracking of the three-dimensional points by the second historical image frames may be performed by acquiring two-dimensional points in each second historical image frame, and acquiring three-dimensional points corresponding to each two-dimensional point by a triangulation method. When the number of the tracking three-dimensional points is smaller than the third number, the moving speed or the rotating speed of the target is too high, the normal number of the tracking three-dimensional points cannot be formed, and the target is determined to belong to abnormal movement. With respect to the determination of the third number, no specific provision may be made here, depending on the specific requirements. Thirdly, determining that the average re-projection error of the tracking three-dimensional point in the second historical image frame is larger than a preset error. I.e. the average error of the re-projection of the tracked three-dimensional point determined for each second historical image frame in the sliding window on each second historical image frame. The re-projection error refers to the distance between the actual two-dimensional point observed in each second historical image frame and the re-projected two-dimensional point. Average re-projection errors refer to the averaging of the re-projection errors. In some disclosed embodiments, weights may be set for the re-projection errors corresponding to the second historical image frames based on the distance from the current image frame. For example, the second historical image frame closer to the current image frame may be set to a greater weight for the re-projection error and the second historical image frame further from the current image frame may be set to a lower weight for the re-projection error. Of course, in other embodiments, the same weight may be set for the re-projection error corresponding to each second historical image frame. The solution for the mean re-projection error is not specifically defined here. If the first motion state includes a plurality of motion states, the third preset parallax, the third number and the preset error of the average re-projection are properly adjusted to adapt to more preset motion states.

The motion state of the target is determined through the parallax between the current image frame and the first historical image frame or the number of tracking three-dimensional points and the re-projection error, so that the motion state is determined more accurately.

In the embodiment of the disclosure, the third preset parallaxes, the third numbers and the preset errors corresponding to the different first motion states are different. The greater the movement speed of the first movement state is, the smaller the corresponding third quantity is, and the greater the third preset parallax and the preset error are. Wherein the movement speed here comprises a movement speed and/or a rotation speed. Each first motion state corresponds to a preset motion, and the corresponding preset motion speed can be the maximum speed or the average speed during the motion. The greater the movement speed of the target, the smaller the corresponding third number, and the greater the third preset parallax and the preset error, the more stringent the requirements on the preset state are indicated, so that the more movement amplitudes of the target can be adapted.

In some disclosed embodiments, the preset motion state comprises a second motion state. Wherein the second motion state occurs after the first motion state. I.e. the second movement state occurs between the first movement state and the non-preset movement state. Specifically, the second motion state is located between the first motion state and the non-preset motion state. The judgment of the non-preset motion state refers to the following process of determining that the target is not in the preset motion state. The method for determining that the target is in the second motion state comprises at least one step of determining that the parallax between the current image frame and the first historical image frame is larger than a third preset parallax, and the number of tracking three-dimensional points mapped by the second historical image frame is larger than or equal to a fourth number. Wherein the fourth number is greater than the third number. And secondly, determining that the average re-projection error of the tracking three-dimensional points in each second historical image frame is larger than a preset error, and the number of the tracking three-dimensional points mapped by the second historical image frames is larger than or equal to the fourth number.

Correspondingly, the manner of determining that the target is not in the preset motion state may include at least one of the following steps, the first step: and determining that the parallax of adjacent image frames of the continuous fourth number of image frames subjected to the visual positioning processing is smaller than or equal to the third preset parallax. The fourth number may be the number of all second historical image frames contained in the sliding window plus the current frame, or the number of part of second historical image frames in the sliding window, wherein the part of second historical image frames in the sliding window refers to the part of second historical image frames determined from the direction from being close to the current image frame to the direction from being far away from the current image frame. Of course, in other embodiments, part of the second historical image frames in the sliding window may also be a plurality of second historical image frames selected in a preset manner in the sliding window, for example, according to parallax between adjacent second historical image frames. And a second step of: the number of tracked three-dimensional points of the second historical image frame map is determined to be greater than or equal to the third number. The manner in which the second historical image frame map tracks the three-dimensional points is described above, and the third number of ways in which the third number is determined is described above, and will not be described again here. The third step is to determine that the average re-projection error of the tracked three-dimensional point in the second historical image frame is less than or equal to the preset error. The solution of the mean re-projection error is not specified here. And determining that the target is not in the preset motion through the third preset parallax between the continuous fourth number of image frames, so that the obtained result is more accurate.

The second first judging step of judging whether the current image frame satisfies the positioning condition includes: and judging whether the motion amplitude between the current image frame and the first historical image frame meets the preset amplitude requirement. The first historical image frame is an image frame which is subjected to visual positioning processing before. The method for judging whether the motion amplitude between the current image frame and the first historical image frame meets the preset amplitude requirement comprises the step of judging whether the parallax between the current image frame and the first historical image frame is larger than a first preset parallax. By setting various motion states, the visual positioning method can accurately judge under various conditions of the target. And, utilize parallax between current image frame and the first historical image frame to characterize the motion amplitude, make the judgement result more accurate.

Alternatively, the disparity between the current image frame and the first historical image frame may be a first average disparity between the current image frame and a number of matching pairs of two-dimensional points in the first historical image frame, or a second average disparity between a number of predicted points of the current image frame and a number of two-dimensional points of the first historical image frame. The plurality of predicted points are projection points of the tracking three-dimensional points mapped by the second historical image frame on the current image frame, and the plurality of two-dimensional points are two-dimensional points corresponding to the tracking three-dimensional points on the first historical image frame. Here, the two-dimensional point refers to a two-dimensional point actually observed by the camera, and is not a projection point. Of course, in other embodiments, the disparity between the current image frame and the first historical image frame may also be an average disparity between a number of predicted points in the current image frame and a number of predicted points in the first historical image frame. Therefore, the manner of determination regarding the parallax between the current frame and the first historical image frame is not specifically specified here. By setting a plurality of positioning conditions, the current image frame can be judged for a plurality of times, so that the frequency of visual positioning processing of the image frame can be reduced to a certain extent, and the accuracy of the visual positioning processing is improved.

Optionally, in a case where the number of predicted points is smaller than the first number, the second average parallax is a second preset parallax. Wherein the second preset parallax is greater than the first preset parallax. Alternatively, the second preset parallax is set to a maximum value, and the current image frame is considered to satisfy the positioning condition no matter how large the first preset parallax is set. Under the condition that the number of the predicted points is smaller than the first number, the movement amplitude of the target is overlarge, if the target is not subjected to visual positioning processing, the subsequent visual positioning failure possibly occurs, namely, the second average parallax is set to be larger than the first preset parallax, so that the current image frame can participate in the visual positioning processing, and the stability of the visual positioning processing is improved.

And if the judgment result of the first judgment step is yes, determining that the positioning condition is met. That is, as long as the first judgment step or the second first judgment step described above is satisfied, the current image frame is determined to satisfy the positioning condition.

The current image frame is determined to meet the positioning condition by utilizing the movement time or the movement amplitude, and then whether visual positioning is performed or not is determined, so that the adjustment frequency of the visual positioning depends on the movement speed and/or the movement frequency, and the adjustment frequency of the visual positioning is more reasonable.

In some disclosed embodiments, the positioning conditions include a plurality of different processing steps for determining whether to position the current image frame. At least one of the first preset time, the second preset time and the first preset parallax in each positioning condition is different, that is, the first preset time in each positioning condition is the same or different, the second preset time in each positioning condition is the same or different, and the first preset parallax in each positioning condition is the same or different.

In some disclosed embodiments, the visual localization process may include multiple processing steps, including for example, two processing steps, image processing and localization processing. The step S13 may specifically include: image processing is carried out on the current image frame, and a processing result of the current image frame is obtained; positioning processing is carried out by utilizing the processing result of the current image frame; wherein at least one step of the image processing and the positioning processing is performed under the condition that the corresponding positioning condition is satisfied.

Optionally, the image processing is performed under the condition that a positioning condition corresponding to the image processing is satisfied, and the positioning processing is performed according to a first preset frequency; or the image processing is executed according to a second preset frequency, and the positioning processing is executed under the condition that the positioning condition corresponding to the positioning processing is met; alternatively, the image processing may be performed under the condition that the positioning condition corresponding to the image processing is satisfied, and the positioning processing may be performed under the condition that the positioning condition corresponding to the positioning processing is satisfied, where the positioning conditions corresponding to the image processing and the positioning processing are the same or different. That is, one of the image processing and the positioning processing may be performed at a certain frequency, and the other may be performed when the corresponding positioning condition is satisfied, or both the image processing and the positioning processing may be performed when the corresponding positioning condition is satisfied. As described above, the positioning conditions of the image processing and the positioning conditions corresponding to the positioning processing, respectively, may be the same or different, for example, when the image processing and the positioning processing are both performed in a case where the corresponding positioning conditions are satisfied, the positioning conditions corresponding to the image processing and the positioning processing, respectively, are different, which, of course, is merely an example and is not intended to limit the technical solutions proposed by the embodiments of the present disclosure. Alternatively, whether the image processing is performed at the first preset frequency or performed under the satisfaction of the corresponding positioning condition, the frequency of performing the image processing is greater than or equal to the frequency of performing the positioning processing.

If the first preset time, the second preset time and the first preset parallax in each positioning condition are the same, the image processing and the positioning processing are indicated to be in the same frequency, and if the first preset time, the second preset time and the first preset parallax in each positioning condition are the same and different, the image processing and the positioning processing are not in the same frequency. Optionally, if the first preset parallax in the positioning condition corresponding to the positioning process is greater than the first preset parallax in the positioning condition corresponding to the image process. The result of this arrangement is that the image frames subjected to the image processing do not necessarily undergo positioning processing.

The following illustrates the positioning determination of the different processing steps of the visual positioning process using different positioning conditions in conjunction with fig. 2. Referring to fig. 2, fig. 2 is a schematic diagram of a second process in an embodiment of the method for visual positioning according to the present application. As shown in fig. 1 and 2.

In some disclosed embodiments, step S12 may be specifically divided into step S121 and step S122, and step S13 may specifically include step S131 and step S132. Specifically, step S121: and judging whether the current image frame meets the positioning condition corresponding to the image processing. If the determination result in step S121 is not satisfied, step S14 is executed: the current image frame is deleted and the next image frame is directly processed. If yes, step S131 is executed: and performing image processing on the current image frame to obtain a processing result of the current image frame. After step S131 is performed, step S122 is performed: and judging whether the current image frame meets the positioning condition corresponding to the positioning process. If the determination result in step S122 is not satisfied, step S15 is executed: the next image frame is directly processed. If the determination result in step S122 is satisfied, step S132 is executed: positioning processing is carried out by utilizing the processing result of the current image frame; wherein each positioning condition is the same or different.

In particular, the image processing may be to perform feature point matching between the current image frame and other image frames. The feature point match may be a match between a first historical image frame and a two-dimensional point corresponding to a tracked three-dimensional point in the current image frame. The specific feature point matching mode comprises the following steps: extracting characteristic points of the current image frame; for feature points that can be projected to the current image frame by tracking three-dimensional points, using the projection points as initial values for current image frame tracking, and for feature points that cannot be projected to the current image frame, calculating a predicted point as an initial value for current image frame tracking using motion information between the current image frame and the first historical image frame, wherein the motion information includes readings of a gyroscope and/or an accelerometer, such as angle information and acceleration information. The initial value is used for reducing the range of feature matching, namely, the initial value can guide each two-dimensional point in the first historical image frame to correspond to the approximate range of the two-dimensional point in the current image frame, so that the matching speed and the matching precision are improved. Of course, the feature point matching may also be performed by using a sparse optical flow method to perform feature matching on the current image frame and the first historical image frame. The sparse optical flow method used in the embodiments of the present disclosure may be referred to as a commonly used sparse optical flow method implementation, and thus, the sparse optical flow method is not described herein too much.

As described above, the image processing, that is, the feature extraction and the matching may be performed only when the positioning condition corresponding to the image processing is satisfied, and determining whether the parallax between the current image frame and the first historical image frame is greater than the first preset parallax is included in the positioning condition, so that the parallax between the current image frame and the first historical image frame refers to the second average parallax between the plurality of predicted points of the current image frame and the plurality of two-dimensional points of the first historical image frame in the positioning condition corresponding to the image processing in the embodiment of the disclosure.

Moreover, in the embodiment of the present disclosure, the determination of the preset motion state needs to determine the parallax between the current image frame and the first historical image, where the parallax refers to the first average parallax between several matched two-dimensional point pairs in the current image frame and the first historical image frame, and therefore, the determination step regarding the preset motion state is located after the image processing step. That is, after each execution of the image processing step, it is necessary to determine the motion state of the object. That is, the motion state of the object in the positioning condition corresponding to the image processing is determined according to the first historical image frame. That is, the motion state determination result of the current image frame is used as the determination result of the preset motion state in the positioning condition corresponding to the image processing of the next image frame. For example, if the motion state of the first historical image frame is in the preset motion state, the target is determined to be in the preset motion state when shooting the current image frame.

In the embodiment of the disclosure, after image processing is performed on a current image frame and when positioning conditions corresponding to positioning processing are satisfied, positioning processing is performed by using a processing result of the current image frame. Therefore, by setting positioning conditions for different processing steps in the visual positioning processing process and determining whether to execute the different processing steps in the positioning process by corresponding to the positioning conditions, the dynamic adjustment of finer and flexible visual positioning processing frequency is realized, and further, some unnecessary processing steps of visual positioning can be more accurately reduced, and some necessary processing steps are kept to be executed, so that the accuracy of visual positioning is further improved under the condition of reducing the calculation amount.

The positioning processing is mainly used for positioning the target based on the characteristic point matching result and the inertial sensor data. The method for locating the target according to the result of feature point matching and the inertial sensor data can be referred to a general locating method, and the embodiments of the present disclosure will not be described in any great detail.

According to the scheme, the image processing step and the positioning processing step in the visual positioning processing process can respectively and dynamically adjust the frequency of the visual positioning processing. Because the image processing step and the positioning processing step have error accumulation, the image processing step and the positioning processing step respectively and dynamically adjust the processing frequency instead of fixing the unified frequency, the respective errors can be isolated, and the accuracy and the stability of visual positioning are further improved.

In some disclosed embodiments, the determining of the motion state of the target may specifically be performed after the image processing of the current image frame, and before determining whether the current image frame meets the positioning condition corresponding to the positioning process. Since the motion state of the target may be different from the motion state of the target obtained by the first history image frame after the determination of the positioning condition corresponding to the positioning process is performed, in the positioning condition corresponding to the positioning process, the motion state of the target is determined to be in accordance with the motion state of the target obtained by the current image frame.

In some disclosed embodiments, the visual positioning process is not performed with the current image frame in the event that the positioning condition is not met or in the event that there is a preset number of image frames to be image processed. And directly acquiring the next image frame, and judging whether the next image frame meets the positioning condition or not and carrying out the subsequent steps. When there are a preset number of image frames, image processing is needed, which means that the computing performance of the device is insufficient, and the current image frame is not processed any more. The determining whether the preset number of image frames is to be subjected to the image processing may be before or after the determining of the first positioning condition for the current image frame. Therefore, the visual positioning processing without using the current image frame may mean that the current image frame is directly deleted without judging the first positioning condition of the current image frame, or that the current image frame is directly deleted after the current image frame satisfies the first positioning condition if this is the case. Of course, if the current image frame does not meet the positioning condition, the current image frame is not utilized to perform visual positioning processing. And directly acquiring the next image frame, and judging whether the next image frame meets the positioning condition or not and carrying out the subsequent steps. If the current image frame does not meet the positioning condition corresponding to the image processing, the current image frame is not subjected to the image processing, and the next image frame is directly judged. Or if the current image frame meets the positioning condition corresponding to the image processing, after the image processing, when the positioning condition corresponding to the positioning processing is judged, the current image frame is found not to meet the positioning condition, the positioning processing is not carried out on the current image frame, and the judgment of the positioning condition is directly carried out on the next image frame. Of course, the determination of the positioning condition for the next image frame here may be a determination of a positioning condition corresponding to the positioning process for the image frame subjected to the image process. Under the condition that the positioning condition is not met or the preset number of image frames are to be subjected to image processing, the current image frame is not subjected to visual positioning processing, so that the accuracy of visual positioning is improved, and the computing power of equipment is more suitable.

In some disclosed embodiments, the first preset parallax is adjusted if it is determined that the target is in a preset motion state when shooting the current image frame and/or if there is a preset number of image frames to be subjected to image processing. Specifically, before judging whether the current image frame meets the positioning condition, if a second number of image frames exist to be subjected to image processing, increasing a first preset parallax; and if the target is determined to be in the first preset motion state, reducing the first preset parallax. If the target is determined to be in the second first motion state, the first preset parallax is increased, wherein the motion speed corresponding to the second first motion state is greater than the motion speed of the first preset motion state. And if the target is in the second motion state, reducing the first preset parallax. Here, the increase or decrease is for the first preset parallax in which the target is in the normal motion state, and the normal motion state includes the non-preset motion state. It can be seen that the relationship of the first preset parallaxes in the respective states is as follows, the first preset parallaxes in the first motion state < the first preset parallaxes in the second motion state < the first preset parallaxes in the normal motion state < the first preset parallaxes in the second first motion state. Adjusting the first preset parallax corresponds to adjusting the frequency of visual positioning processing, because if the first preset parallax is larger, the requirement on the movement time and/or movement amplitude of the target is larger, so that the target cannot be positioned in a short time and/or when the movement amplitude is smaller, and the positioning frequency is reduced, thereby reducing useless calculation, otherwise, if the first preset parallax is smaller, the positioning frequency is improved, and the positioning accuracy is improved to a certain extent. In the first preset motion state, because the first preset parallax is reduced, the second preset time corresponding to the first motion state in the positioning condition is smaller than the first preset time. Similarly, the first parallax is increased by the second first motion state, so that the second preset time corresponding to the second first motion state is longer than the first preset time in the positioning condition. Wherein adjusting the first preset disparity will affect the second positioning condition determination of the current image frame and the determination of the first positioning condition of the next image frame. The first preset parallax is flexibly adjusted according to the motion state of the current image frame or the computational power of equipment before judging whether the current image frame meets the positioning condition, namely whether the second number of image frames are to be subjected to image processing, so that the positioning accuracy and stability are improved.

The main body of execution of the method of visual positioning may be a visual positioning device, for example, the method of visual positioning may be executed by a terminal device or a server or other processing devices, where the terminal device may be: virtual reality helmets, augmented reality glasses, unmanned vehicles, mobile robots, sweeping robots, flying devices, user Equipment (UE), mobile devices, user terminals, cellular phones, cordless phones, personal digital assistants (Personal DIGITAL ASSISTANT, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, and the like, having devices with both image sensors and inertial sensors. In some possible implementations, the method of visual localization may be implemented by way of a processor invoking computer readable instructions stored in a memory.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a visual positioning device according to an embodiment of the application. The visual positioning device 30 includes: the acquisition module 31, the condition judgment module 32 and the positioning module 33. An acquisition module 31, configured to acquire a current image frame acquired during a process of visually locating a target; a condition judging module 32 for judging whether the current image frame satisfies the positioning condition; and a positioning module 33, configured to perform visual positioning processing by using the current image frame when the positioning condition is satisfied.

In some disclosed embodiments, the visual localization process includes an image process and a localization process; in the case where the positioning condition is satisfied, the positioning module 33 performs positioning processing using the current image frame, including: image processing is carried out on the current image frame, and a processing result of the current image frame is obtained; positioning processing is carried out by utilizing the processing result of the current image frame; wherein at least one step of the image processing and the positioning processing is performed in a case where a corresponding positioning condition is satisfied.

According to the scheme, the positioning conditions are set for the image processing and/or positioning processing in the visual positioning process, and whether the corresponding processing steps are needed to be executed or not is determined according to the corresponding positioning conditions, so that the dynamic adjustment of finer and flexible visual positioning processing frequency is realized, further, unnecessary processing steps of positioning can be reduced more accurately, the execution of necessary processing steps is reserved, and the positioning precision is further improved under the condition of reducing the calculated amount.

In some disclosed embodiments, the image processing is performed under a condition that a positioning condition corresponding to the image processing is satisfied, and the positioning processing is performed according to a first preset frequency; or the image processing is executed according to a second preset frequency, and the positioning processing is executed under the condition that the positioning condition corresponding to the positioning processing is met; or the image processing is performed when the positioning condition corresponding to the image processing is satisfied, and the positioning processing is performed when the positioning condition corresponding to the positioning processing is satisfied, wherein the positioning conditions corresponding to the image processing and the positioning processing are the same or different.

According to the scheme, the processing frequency of at least one of the image processing and the positioning processing is dynamically adjusted, so that unnecessary processing steps of positioning can be reduced more accurately, and the necessary processing steps are kept to be executed, and under the condition of reducing the calculated amount, the accuracy of visual positioning is further improved.

In some disclosed embodiments, the condition determination module 32 determines whether the current image frame satisfies the positioning condition, including at least one of the following first determination steps: judging whether the motion time between the current image frame and the first historical image frame meets the preset time requirement or not; judging whether the motion amplitude between the current image frame and a first historical image frame meets the preset amplitude requirement, wherein the first historical image frame is an image frame subjected to visual positioning processing before; and if the judgment result of the first judgment step is yes, determining that the positioning condition is met.

According to the scheme, the current image frame is determined to meet the positioning condition by utilizing the movement time or the movement amplitude, and whether visual positioning is performed or not is further determined, so that the adjustment frequency of the visual positioning depends on the movement speed and/or the movement frequency, and the adjustment frequency of the visual positioning is more reasonable.

In some disclosed embodiments, the condition determining module 32 determines whether the motion time between the current image frame and the first historical image frame meets a preset time requirement, including at least one second determining step of: judging whether the movement time is longer than a first preset time; judging whether the movement time is greater than a second preset time corresponding to the preset movement state under the condition that the target is determined to be in the preset movement state, wherein at least one of the preset movement states of the target exists, and each preset movement state corresponds to a second preset time; if the judgment result of the second judgment step is yes, determining that the preset time requirement is met; and/or, judging whether the motion amplitude between the current image frame and the first historical image frame meets the preset amplitude requirement, including: and judging whether the parallax between the current image frame and the first historical image frame is larger than a first preset parallax or not.

According to the scheme, through setting various motion states, the visual positioning method can accurately judge under various conditions of the target. And, utilize parallax between current image frame and the first historical image frame to characterize the motion amplitude, make the judgement result more accurate.

In some disclosed embodiments, the positioning conditions include a plurality of different processing steps for determining whether to visually position the current image frame, wherein at least one of the first preset time, the second preset time, and the first preset parallax in each positioning condition is different; and/or, disparity between the current image frame and the first historical image frame: a first average parallax between a plurality of matched two-dimensional point pairs in the current image frame and the first historical image frame, or a second average parallax between a plurality of predicted points of the current image frame and a plurality of two-dimensional points of the first historical image frame, wherein the plurality of predicted points are projection points of tracking three-dimensional points mapped by the second historical image frame on the current image frame, and the plurality of two-dimensional points are two-dimensional points corresponding to the tracking three-dimensional points on the first historical image frame.

According to the scheme, the plurality of positioning conditions are set, so that the current image frame can be judged for a plurality of times, the positioning processing frequency of the image frame can be reduced to a certain extent, and the positioning processing precision is increased.

In some disclosed embodiments, the second average parallax is a second preset parallax when the number of predicted points is less than the first number, wherein the second preset parallax is greater than the first preset parallax.

According to the scheme, when the number of the predicted points is smaller than the first number, the movement amplitude of the target is excessively large, if visual positioning processing is not performed on the target, subsequent visual positioning failure can be possibly caused, namely, the first average parallax is set to be larger than the first preset parallax, so that the current image frame can participate in the positioning processing, and the stability of the positioning processing is improved.

In some disclosed embodiments, the preset motion state comprises two first motion states, wherein each first motion state is greater than a preset speed, and the motion speed of the first motion state is less than the motion speed of the second first motion state; the condition judgment module 32 is further configured to perform at least one of the following steps: before judging whether the current image frame meets the positioning condition or not, if a second number of image frames exist to be subjected to image processing, increasing a first preset parallax; if the target is determined to be in a first motion state, reducing a first preset parallax; if the target is determined to be in the second first motion state, the first preset parallax is increased, wherein the motion speed corresponding to the second first motion state is greater than the motion speed of the first preset motion state.

According to the scheme, the first preset parallax is flexibly adjusted according to the motion state of the current image frame or the computational power of the equipment before judging whether the current image frame meets the positioning condition, namely whether the second number of image frames are to be subjected to image processing, so that the positioning accuracy and stability are improved.

In some disclosed embodiments, the preset motion state comprises at least one first motion state, the motion speed of each first motion state is greater than the preset speed, and the motion speeds of different first motion states are different; the condition determination module 32 determines that the target is in the first motion state, including at least one of the following: determining that the parallax between the current image frame and the first historical image frame is greater than a third preset parallax; determining that the number of tracked three-dimensional points mapped by the second historical image frame is less than a third number; and determining that the average re-projection error of the tracking three-dimensional point in each second historical image frame is larger than a preset error.

According to the scheme, the motion state of the target is determined through the parallax between the current image frame and the first historical image frame or the number of tracking three-dimensional points and the reprojection error, so that the motion state is determined more accurately.

In some disclosed embodiments, the third preset parallaxes, the third amounts and the preset errors corresponding to the different first motion states are different, wherein the greater the motion speed of the first motion state is, the smaller the corresponding third amounts and the greater the third preset parallaxes and the preset errors are.

According to the scheme, the larger the motion state of the target is, the smaller the corresponding third quantity is, and the larger the third preset parallax and the preset error are, the more strict the requirement on the preset state is, and the more motion amplitude of the target can be adapted.

In some disclosed embodiments, the preset motion state includes a second motion state that occurs after the first motion state; the condition determination module 32 determines that the target is in the second motion state, including at least one of: determining that the parallax between the current image frame and the first historical image frame is larger than a third preset parallax, and the number of the tracking three-dimensional points mapped by the second historical image frame is larger than or equal to a fourth number, wherein the fourth number is larger than the third number; and determining that the average re-projection error of the tracking three-dimensional points in each second historical image frame is larger than a preset error, and the number of the tracking three-dimensional points mapped by the second historical image frames is larger than or equal to the fourth number.

According to the scheme, the multiple motion states are set, and the information which can be used for positioning in different motion states is different, so that the frequency of performing visual positioning processing on the target can be adjusted according to the motion state of the target.

In some disclosed embodiments, the condition determination module 32 determines that the target is not in the preset motion state, including at least one of: determining that the parallax of adjacent image frames of the continuous fourth number of image frames subjected to visual positioning processing is smaller than or equal to a third preset parallax; determining that the number of tracked three-dimensional points mapped by the second historical image frame is greater than or equal to a third number; and determining that the average re-projection error of the tracking three-dimensional point in each second historical image frame is smaller than or equal to the preset error.

According to the scheme, the corresponding positioning condition judgment can be executed by determining that the target is not in the preset motion state, so that the aim of adjusting the image processing frequency is fulfilled.

In some disclosed embodiments, the positioning device 30 is further configured to perform the steps of: in the case where the positioning condition is not satisfied or there are a preset number of image frames to be subjected to image processing, the acquisition module 31 acquires the next image frame without performing visual positioning processing with the current image frame, and determines whether the positioning condition is satisfied or not with the condition determination module 32 and the positioning module 33 performs the subsequent steps.

According to the scheme, under the condition that the positioning condition is not met or the preset number of image frames are to be subjected to image processing, the visual positioning processing is not performed on the current image frames, so that the accuracy of visual positioning is improved, and the computing power of equipment is more suitable.

Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the application. The electronic device 40 comprises a memory 41 and a processor 42, the processor 42 being arranged to execute program instructions stored in the memory 41 for carrying out the steps of any of the method embodiments of visual localization described above. In one particular implementation scenario, electronic device 40 may include, but is not limited to: the mobile robot, handheld mobile device, sweeping robot, unmanned vehicle, virtual reality helmet, augmented reality glasses flying device, microcomputer, desktop computer, server, etc. other devices having both image sensor and inertial sensor module, in addition, the electronic device 40 may also include mobile devices such as notebook computer, tablet computer, etc., without limitation.

In particular, the processor 42 is used to control itself and the memory 41 to implement the steps in the method embodiments of any of the visual localization described above. The processor 42 may also be referred to as a CPU (Central Processing Unit ). The processor 42 may be an integrated circuit chip having signal processing capabilities. The Processor 42 may also be a general purpose Processor, a digital signal Processor (DIGITAL SIGNAL Processor, DSP), an Application SPECIFIC INTEGRATED Circuit (ASIC), a Field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, a discrete gate or transistor logic device, a discrete hardware component. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 42 may be commonly implemented by an integrated circuit chip.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an embodiment of a computer readable storage medium according to the present application. The computer readable storage medium 50 stores program instructions 51 executable by the processor, the program instructions 51 for implementing the steps in the method embodiments of any of the visual localization described above.

In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.

The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical, or other forms.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units. The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Claims

1. A method of visual localization comprising:

acquiring a current image frame acquired in the process of visually positioning a target;

judging whether the current image frame meets a positioning condition or not;

under the condition that the positioning condition is met, performing visual positioning processing by utilizing the current image frame;

The determining whether the current image frame meets the positioning condition includes: judging whether the motion time between the current image frame and a first historical image frame meets a preset time requirement, if so, determining that the positioning condition is met, wherein the first historical image frame is an image frame which is subjected to the visual positioning processing before, if the motion time is longer than a second preset time corresponding to the preset motion state under the condition that the target is in the preset motion state, determining that the motion time meets the preset time requirement, wherein at least one preset motion state of the target corresponds to the second preset time, each preset motion state comprises at least one first motion state, the motion speed of each first motion state is greater than the preset speed, and the motion speeds of different first motion states are different;

Wherein determining that the target is in the first motion state comprises at least one of: determining that the number of tracked three-dimensional points mapped by the second historical image frame is less than a third number; and determining that the average re-projection error of the tracking three-dimensional point in each second historical image frame is larger than a preset error.

2. The method of claim 1, wherein the visual localization process comprises an image process and a localization process; and under the condition that the positioning condition is met, performing visual positioning processing by using the current image frame, wherein the visual positioning processing comprises the following steps:

performing image processing on the current image frame to obtain a processing result of the current image frame;

positioning processing is carried out by utilizing the processing result of the current image frame;

Wherein at least one step of the image processing and the positioning processing is performed in a case where the corresponding positioning condition is satisfied.

3. The method according to claim 2, wherein the image processing is performed in a case where the positioning condition corresponding to the image processing is satisfied, and the positioning processing is performed at a first preset frequency;

or the image processing is executed according to a second preset frequency, and the positioning processing is executed under the condition that the positioning condition corresponding to the positioning processing is met;

Or the image processing is performed when the positioning condition corresponding to the image processing is satisfied, and the positioning processing is performed when the positioning condition corresponding to the positioning processing is satisfied, wherein the positioning conditions respectively corresponding to the image processing and the positioning processing are the same or different.

4. A method according to any one of claims 1 to 3, wherein said determining whether the current image frame satisfies a positioning condition further comprises:

Judging whether the motion amplitude between the current image frame and the first historical image frame meets the preset amplitude requirement or not;

and if the judgment result is yes, determining that the positioning condition is met.

5. The method of claim 4, wherein the determining whether the motion time between the current image frame and the first historical image frame meets a preset time requirement further comprises the second determining step of:

judging whether the movement time is longer than a first preset time or not;

if the judgment result of the second judgment step is yes, determining that the preset time requirement is met;

And/or, the determining whether the motion amplitude between the current image frame and the first historical image frame meets a preset amplitude requirement includes:

And judging whether the parallax between the current image frame and the first historical image frame is larger than a first preset parallax or not.

6. The method of claim 5, wherein the positioning conditions include a plurality of different processing steps for determining whether to visually position the current image frame, wherein at least one of the first preset time, the second preset time, and the first preset parallax in each of the positioning conditions is different;

and/or, a disparity between the current image frame and the first historical image frame: and a first average parallax between a plurality of matched two-dimensional point pairs in the current image frame and the first historical image frame or a second average parallax between a plurality of predicted points of the current image frame and a plurality of two-dimensional points of the first historical image frame, wherein the plurality of predicted points are projection points, mapped by the second historical image frame, of the tracking three-dimensional points on the current image frame, and the plurality of two-dimensional points are two-dimensional points, corresponding to the tracking three-dimensional points in the first historical image frame.

7. The method of claim 6, wherein the second average disparity is a second preset disparity if the number of predicted points is less than the first number, wherein the second preset disparity is greater than the first preset disparity.

8. The method of any one of claims 5to 7, wherein the preset motion states comprise two first motion states, wherein each of the first motion states is greater than a preset speed and the motion speed of a first one of the first motion states is less than the motion speed of a second one of the first motion states; the method further comprises at least one of the following steps:

before judging whether the current image frame meets the positioning condition or not, if a second number of image frames exist to be subjected to image processing, increasing the first preset parallax;

if the target is determined to be in a first motion state, reducing the first preset parallax;

and if the target is determined to be in the second first motion state, increasing the first preset parallax.

9. The method of any one of claims 5 to 7, wherein the determining that the target is in the first state of motion further comprises:

and determining that the parallax between the current image frame and the first historical image frame is larger than a third preset parallax.

10. The method of claim 9, wherein the third preset disparities, third amounts and preset errors corresponding to different first motion states are different, wherein the greater the motion speed of the first motion state, the smaller the corresponding third amounts, the greater the third preset disparities and preset errors.

11. The method of claim 9, wherein the preset motion state comprises a second motion state that occurs after the first motion state; determining that the target is in the second motion state comprises at least one of the following steps:

determining that the parallax between the current image frame and the first historical image frame is larger than a third preset parallax, and the number of the tracking three-dimensional points mapped by the second historical image frame is larger than or equal to a fourth number, wherein the fourth number is larger than the third number;

And determining that the average re-projection error of the tracking three-dimensional points in each second historical image frame is larger than a preset error, and the number of the tracking three-dimensional points mapped by the second historical image frames is larger than or equal to the fourth number.

12. The method according to any one of claims 5 to 7, wherein determining that the target is not in a preset motion state comprises at least one of the following steps:

Determining that adjacent image frame parallaxes of the image frames subjected to the visual positioning processing in the fourth number are smaller than or equal to a third preset parallax;

Determining that the number of tracked three-dimensional points mapped by the second historical image frame is greater than or equal to a third number;

and determining that the average re-projection error of the tracking three-dimensional point in each second historical image frame is smaller than or equal to a preset error.

13. A method according to any one of claims 1 to 3, further comprising:

And under the condition that the positioning condition is not met or the preset number of image frames are to be subjected to image processing, the current image frame is not utilized to perform visual positioning processing, a next image frame is obtained, and whether the positioning condition is met or not and the subsequent steps are judged for the next image frame.

14. A visual positioning device, comprising:

The acquisition module is used for acquiring a current image frame acquired in the process of visually positioning the target;

The condition judging module is used for judging whether the current image frame meets positioning conditions or not;

the positioning module is used for performing visual positioning processing by utilizing the current image frame under the condition that the positioning condition is met;

The condition judging module is configured to judge whether the current image frame meets a positioning condition, and includes: judging whether the motion time between the current image frame and a first historical image frame meets a preset time requirement, if so, determining that the positioning condition is met, wherein the first historical image frame is an image frame which is subjected to the visual positioning processing before, if the motion time is longer than a second preset time corresponding to the preset motion state under the condition that the target is in the preset motion state, determining that the motion time meets the preset time requirement, wherein at least one preset motion state of the target corresponds to the second preset time, each preset motion state comprises at least one first motion state, the motion speed of each first motion state is greater than the preset speed, and the motion speeds of different first motion states are different;

The condition judging module determines that the target is in the first motion state, and comprises at least one of the following steps: determining that the number of tracked three-dimensional points mapped by the second historical image frame is less than a third number; and determining that the average re-projection error of the tracking three-dimensional point in each second historical image frame is larger than a preset error.

15. An electronic device comprising a memory and a processor for executing program instructions stored in the memory to implement the method of any one of claims 1 to 13.

16. A computer readable storage medium having stored thereon program instructions, which when executed by a processor, implement the method of any of claims 1 to 13.