CN116698023A

CN116698023A - State distinguishing method, device, equipment and storage medium

Info

Publication number: CN116698023A
Application number: CN202310715131.7A
Authority: CN
Inventors: 翟尚进; 陈丹鹏; 王楠
Original assignee: Zhejiang Shangtang Technology Development Co Ltd
Current assignee: Zhejiang Shangtang Technology Development Co Ltd
Priority date: 2023-06-15
Filing date: 2023-06-15
Publication date: 2023-09-05

Abstract

The disclosure discloses a state distinguishing method, a state distinguishing device, state distinguishing equipment and a storage medium, wherein the method comprises the following steps: acquiring image information and pose information from a visual inertial positioning system; the image information is acquired through a visual sensor in the visual inertial positioning system, and the pose information is acquired through an inertial sensor in the visual inertial positioning system; distinguishing the running state of the visual inertial positioning system based on the image information and the pose information to obtain the target state of the visual inertial positioning system; wherein the operating state comprises at least a rest to be initialized and a motion to be initialized.

Description

State distinguishing method, device, equipment and storage medium

Technical Field

The present disclosure relates to, but is not limited to, the field of computer vision, and in particular, to a state distinguishing method, apparatus, device, and storage medium.

Background

Visual inertial tracking and positioning systems such as visual inertial navigation systems (Visual Inertial Navigation System, VINS) are important underlying technologies in the fields of computer vision, robotics, three-dimensional reconstruction, augmented reality, and the like. The rapid stable initialization method of the visual inertia tracking and positioning system has important practical value in the fields of augmented reality, virtual reality and the like. However, the existing visual inertial tracking and positioning system cannot distinguish the static state from the uniform motion state well, so that the initialization method corresponding to the visual inertial tracking and positioning system is complex, and a user needs to be guided by a special technology to use the system normally, so that the initialization method is poor in stability and high in failure rate, and the accuracy of the visual inertial tracking and positioning system is affected.

Disclosure of Invention

In view of this, embodiments of the present disclosure at least provide a method, an apparatus, a device, and a storage medium for distinguishing states.

The technical scheme of the embodiment of the disclosure is realized as follows:

in one aspect, an embodiment of the present disclosure provides a state distinguishing method, including: acquiring image information and pose information from a visual inertial positioning system; wherein the image information is acquired by a visual sensor in the visual inertial positioning system, and the pose information is acquired by an inertial sensor in the visual inertial positioning system; distinguishing the running state of the visual inertial positioning system based on the image information and the pose information to obtain a target state of the visual inertial positioning system; wherein the operating state comprises at least a rest to be initialized and a motion to be initialized.

In another aspect, an embodiment of the present disclosure provides a status distinguishing apparatus, including: the acquisition module is used for acquiring image information and pose information from the visual inertial positioning system; wherein the image information is acquired by a visual sensor in the visual inertial positioning system, and the pose information is acquired by an inertial sensor in the visual inertial positioning system; the distinguishing module is used for distinguishing the running state of the visual inertial positioning system based on the image information and the pose information to obtain the target state of the visual inertial positioning system; wherein the operating state comprises at least a rest to be initialized and a motion to be initialized.

In yet another aspect, embodiments of the present disclosure provide a computer device comprising a memory and a processor, the memory storing a computer program executable on the processor, the processor implementing some or all of the steps of the above method when the program is executed.

In yet another aspect, the disclosed embodiments provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs some or all of the steps of the above method.

In yet another aspect, the disclosed embodiments provide a computer program comprising computer readable code which, when run in a computer device, causes a processor in the computer device to perform some or all of the steps for carrying out the above method.

In yet another aspect, the disclosed embodiments provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program which, when read and executed by a computer, performs some or all of the steps of the above method.

In the embodiment of the disclosure, firstly, image information and pose information are acquired from a visual inertial positioning system; the image information is acquired through a visual sensor in the visual inertial positioning system, and the pose information is acquired through an inertial sensor in the visual inertial positioning system. Secondly, distinguishing the running state of the visual inertial positioning system based on the image information and the pose information to obtain the target state of the visual inertial positioning system; wherein the operating state comprises at least a rest to be initialized and a motion to be initialized. Therefore, by combining the image information of the visual sensor and the pose information of the inertial sensor, the problem that the running state cannot be accurately distinguished by independently utilizing the inertial sensor in the related technology is solved, and the static to be initialized and the motion to be initialized can be easily and accurately distinguished, so that the visual inertial positioning system can accurately realize initialization under the condition of static to be initialized or motion to be initialized, and the stability and the accuracy of the initialization are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the aspects of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.

Fig. 1 is a schematic implementation flow chart of a state distinguishing method according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of an implementation of a state distinguishing method according to an embodiment of the disclosure;

fig. 3 is a schematic implementation flow chart of a state distinguishing method according to an embodiment of the disclosure;

fig. 4 is a schematic implementation flow chart of a state distinguishing method according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a composition structure of a status differentiating apparatus according to an embodiment of the disclosure;

fig. 6 is a schematic diagram of a hardware entity of a computer device according to an embodiment of the disclosure.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present disclosure more apparent, the technical solutions of the present disclosure are further elaborated below in conjunction with the drawings and the embodiments, and the described embodiments should not be construed as limiting the present disclosure, and all other embodiments obtained by those skilled in the art without making inventive efforts are within the scope of protection of the present disclosure.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict. The term "first/second/third" is merely to distinguish similar objects and does not represent a particular ordering of objects, it being understood that the "first/second/third" may be interchanged with a particular order or precedence where allowed, to enable embodiments of the disclosure described herein to be implemented in other than those illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used herein is for the purpose of describing the present disclosure only and is not intended to be limiting of the present disclosure.

Embodiments of the present disclosure provide a state differentiation method that may be performed by a processor of a computer device. The computer device may be a mobile device such as a robot, an unmanned vehicle, or an unmanned plane, or may be a device with a status distinguishing capability such as a user device, a user terminal, a cordless phone, a personal digital assistant, a handheld device, a computing device, a vehicle-mounted device, an augmented reality device, a virtual reality device, or a visual inertial navigation device. Fig. 1 is a schematic implementation flow chart of a state distinguishing method according to an embodiment of the present disclosure, as shown in fig. 1, the method includes steps S101 to S102 as follows:

Step S101, acquiring image information and pose information from the visual inertial positioning system.

Here, the visual inertial positioning system may be a method for representing a type of fusing image information (may also be referred to as visual information) and pose information (may also be referred to as inertial information) and for synchronizing positioning and environment reconstruction. The image information may be acquired by a vision sensor in the vision inertial positioning system and the pose information may be acquired by an inertial sensor in the vision inertial positioning system. The vision sensor may be a camera or the like, the inertial sensor may be an inertial measurement unit (Inertial Measurement Unit, IMU) or the like, and the inertial sensor may include components such as a gyroscope and an accelerometer. The image information may be a multi-frame two-dimensional image obtained by shooting by a camera, or may be the number and the positions of key feature points in the two-dimensional image determined based on the multi-frame two-dimensional image, and tracking information such as the direction, the displacement, the rotation and the like of an object in the two-dimensional image determined based on the multi-frame two-dimensional image; the pose information may include position information, orientation information (or posture information), etc., or may include information such as angular velocity acquired by a gyroscope and acceleration acquired by an accelerometer, and is not limited thereto.

In the implementation process of step S101, the method may include: acquiring image data by using a visual sensor in a visual inertial positioning system to obtain image information; and acquiring inertial data by using an inertial sensor in the visual inertial positioning system to obtain pose information.

And step S102, distinguishing the running state of the visual inertial positioning system based on the image information and the pose information to obtain the target state of the visual inertial positioning system.

Here, the operation state may refer to a state of a computer device in which the visual inertial positioning system is deployed in mechanical motion with respect to a certain reference frame, such as a state of uniform motion, stationary, uncertain, and the like. The operation state may include at least a rest to be initialized and a motion to be initialized, where the rest to be initialized may refer to a rest state capable of being initialized for the visual inertial positioning system later, and the motion to be initialized may refer to a motion state capable of being initialized later. The target state may refer to a motion state determined based on the image information and the pose information, e.g., the target state of the visual inertial positioning system is stationary to be initialized, etc.

In the implementation process of step S102, the method may include: performing first motion state identification processing based on the image information to obtain a first identification result; performing second motion state identification processing based on pose information to obtain a second identification result; based on the first recognition result and the second recognition result, a target state of the visual inertial positioning system is determined. For example: under the condition that the first identification result and the second identification result are simultaneously characterized as stationary to be initialized, determining the target state as stationary to be initialized; and under the condition that the first recognition result and the second recognition result are simultaneously characterized as movement to be initialized, determining the target state as the movement to be initialized and the like.

Taking image information as a multi-frame image and pose information as position information and pose information as examples for explanation: acquiring multi-frame images and multiple groups of pose information within a preset duration according to respective preset acquisition frequencies of the visual sensor and the inertial sensor; performing key point detection on the multi-frame image to obtain a plurality of characteristic points; determining coordinates of feature points on each image; based on the coordinates of the feature points on each image, determining parallax corresponding to the multi-frame images; under the condition that the parallax is smaller than a preset threshold, determining that the vision sensor is stationary to be initialized within a preset time period; under the condition that the parallax is larger than or equal to a preset threshold value, determining that the vision sensor is in motion to be initialized in a preset time period, and the like; determining the position change quantity and the state change quantity of the inertial sensor based on a plurality of groups of pose information in a preset time period; under the condition that the position change quantity and the state change quantity meet preset conditions, determining that the inertial sensor is stationary to be initialized within a preset time period; and under the condition that the position change amount and the state change amount do not meet the preset conditions, determining that the inertial sensor is in motion to be initialized in the preset time period, and the like.

In some embodiments, in the event that the visual inertial positioning system is determined to be stationary to initialize or in motion to initialize, the visual inertial positioning system may be initialized based on the image information and the pose information. The initialization process of the visual inertial positioning system includes at least one of: and aligning the pose of the initial frame corresponding to the image information with the world coordinates, calculating a scale factor for converting the visual estimation with uniform scale into the real scale, determining the initial speed of the visual inertial positioning system, the offset of the inertial sensor, the gravity vector and the like, and determining the rotation external parameters between the inertial sensor and the visual sensor and the like. Through initialization, the transformation of an image coordinate system corresponding to the image information and a camera coordinate system corresponding to the vision sensor and the alignment between the image coordinate system corresponding to the image information and a world coordinate system can be completed, so that the speed and displacement of the real scale near the initial frame are obtained, and parameters such as the offset of the inertial sensor are estimated; wherein the parameter may be used for subsequent operation of the visual inertial positioning system. In the embodiment of the disclosure, under the condition that the visual inertial positioning system is determined to be stationary to be initialized or to be in motion to be initialized, the visual inertial positioning system can be initialized based on the image information and the pose information, namely, the inertial navigation system initialization process independent of three-dimensional scene structure recovery is provided, so that more complex scenes can be processed, and the success rate of initialization can be improved for scenes with low textures or large-scale outdoor environments and the like.

In the embodiments of the present disclosure, the process of initializing the visual inertial positioning system is not limited herein. For example: the rotation constraint and the translation constraint can be constructed first, and a plurality of constraints are constructed based on the rotation amounts at a plurality of moments; determining a preset matrix equation by using a robustness function; solving a matrix equation by adopting a singular value decomposition or nonlinear optimization mode to obtain an external rotation parameter so as to realize the estimation of the external rotation parameter; the singular vector corresponding to the minimum singular value in the singular value decomposition can be used as an external rotation parameter. In some embodiments, the visual tracking result of the visual sensor may be determined first based on the image information, and then the visual inertial positioning system may be initialized based on the visual tracking result and the pose information, which is not limited herein.

The embodiment of the present disclosure provides a state distinguishing method, as shown in fig. 2, including the following steps S201 to S203:

step S201 corresponds to step S101 described above, and reference may be made to the specific embodiment of step S101 described above when implemented.

Step S202, cross-verifying the image information and the pose information to obtain a first detection result.

Here, cross-validation may refer to determining whether the vision sensor is stationary based on image information acquired by the vision sensor, and at the same time, determining whether the inertial sensor is stationary based on pose information acquired by the inertial sensor; then, a first detection result of the visual inertial positioning system is determined together based on the state of the visual sensor and the state of the inertial sensor; wherein the first detection result is used for representing whether the visual inertial positioning system is at an initialization standstill. For example: acquiring a first frame image and a last frame image of a preset time interval, determining parallax between the first frame image and the last frame image, and determining that the vision sensor is stationary under the condition that the parallax meets preset conditions; acquiring a first pose and a second pose at preset time intervals, determining pose variation between the first pose and the second pose, and determining that the inertial sensor is stationary under the condition that the pose variation meets preset conditions; in the case that the visual sensor and the inertial sensor are at rest at the same time, determining that the first detection result represents that the visual inertial positioning system is at rest to be initialized, and the like.

And step S203, performing motion detection on the visual inertial positioning system based on the pose information to obtain a second detection result.

Here, the motion detection may refer to determining whether the inertial sensor is in motion based on pose information acquired by the inertial sensor, and then determining a state of the visual inertial positioning system based on the state of the inertial sensor, where the second detection result is used to characterize whether the visual inertial positioning system is in motion to be initialized. For example: acquiring a first pose and a second pose at preset time intervals, determining pose variation between the first pose and the second pose, determining that an inertial sensor moves under the condition that the pose variation does not meet preset conditions, determining that a second detection result represents that the visual inertial positioning system moves to be initialized, and the like. Wherein the purpose of motion detection is to confirm whether the key frames in the sliding window are in some extreme motion that is detrimental to initialization during subsequent sliding window processing.

In the embodiment of the disclosure, by combining the image information and the pose information, different modes are adopted, so that the running states of the visual inertial positioning system can be accurately distinguished, and the accuracy and the efficiency of state distinguishing are improved.

In some embodiments, the step S202 may include the following steps S2021 to S2023:

step S2021, determining a position variance and an angle variance of the inertial sensor in the first period based on the pose information in the first period.

Here, the pose information may include position information and posture information, the position variance may be used to characterize a degree of movement change of the inertial sensor, the angle variance may be used to characterize a degree of rotation change of the inertial sensor, and the like. If the inertial sensor can directly judge whether the inertial sensor is currently stationary, the judging result of the inertial sensor is directly used as the detection result of the inertial sensor. If the inertial sensor cannot directly judge whether the inertial sensor is currently stationary, the position variance and the angle variance can be used for determining whether the inertial sensor is stationary. For example: acquiring multiple groups of pose information of a first period according to a preset acquisition frequency; determining the position and the gesture of each group of gesture information at the first moment; determining a corresponding position variance in the first period based on the positions of all the moments; and determining the corresponding angle variance in the first period based on the postures at all the moments.

Step S2022, determining, based on the image information in the first period, at least two image disparities of the feature points acquired by the vision sensor in the first period and a distribution area of the feature points.

Here, the image parallax may be used to represent the operation variation degree of the vision sensor, and the image parallax may refer to the parallax represented by all the pixels in the image, or may refer to the parallax represented by part of the pixels (feature points) in the image, which is not limited herein. For example: the multi-frame image in the first period may be acquired as image information, an image parallax between each frame of image and the tail frame of image may be determined, and whether the vision sensor is stationary may be determined based on all the image parallaxes. For example: determining pixel values of all pixel points corresponding to each frame of image; and determining a first pixel point difference value between the corresponding pixel points of each image based on the pixel values of all the pixel points corresponding to each frame of image, and determining the first pixel point difference value as image parallax. The feature point detection can be carried out on the first frame image to obtain a plurality of feature points; determining a pixel value corresponding to the feature point in each image; and determining a second pixel point difference value between the characteristic points in each image based on the pixel value corresponding to the characteristic point in each image, and determining the second pixel point difference value as image parallax.

Step S2023, determining that the visual inertial positioning system is at rest to be initialized, if the position variance, the angle variance, and the image parallax meet a first preset condition.

Here, the first preset condition includes at least: the position variance is smaller than the first threshold, the angle variance is smaller than the second threshold, each image parallax is smaller than the third threshold, and the area of the distribution area is larger than the fourth threshold when the image parallax is larger than or equal to the third threshold. For example: if the position variance is smaller than the first threshold value and the angle variance is smaller than the second threshold value, determining that the inertial sensor is stationary; if each image parallax is determined to be smaller than the third threshold value, determining that the vision sensor is stationary; since there may be a moving object (e.g., a vehicle or a pedestrian) in the scene acquired by the vision sensor, and there is at least one image parallax greater than or equal to the third threshold, it may be determined that the distribution of feature points with the image parallax greater than or equal to the third threshold is determined, if the feature points are not concentrated in a small area of the image, it is also determined that the vision sensor is stationary, for example, a distribution area corresponding to the feature points is determined, and then an area of the distribution area is determined, and if the area of the distribution area is greater than the fourth threshold, it is determined that the feature points are not concentrated in a small area of the image.

In some embodiments, under the condition that the three preset conditions are simultaneously met, determining that the visual inertial positioning system is stationary to be initialized, namely, representing that the stationary detection of the visual inertial positioning system is successful; and under the condition that the three preset conditions are not met at the same time, determining that the visual inertial positioning system is not stationary to be initialized, namely, representing that the stationary detection of the visual inertial positioning system fails.

Compared with the related art, the static detection module cannot accurately distinguish the static state from the uniform motion state, and the static detection module simply uses the parallax of the feature points of two adjacent frames to determine. In the embodiment of the disclosure, by determining the position variance, the angle variance, the image parallax and other attributes, then judging whether the preset condition is met or not, and obtaining the judging result, the visual inertial positioning system can be rapidly and accurately determined whether the visual inertial positioning system is stationary to be initialized or not based on the judging result.

In some embodiments, the step S203 may include the following steps S2031 to S2033:

step S2031, obtaining an operation duration corresponding to the pose information.

Here, the operation duration represents the duration of successful operation of the inertial sensor, for example, if pose information of the inertial sensor is continuously received based on a preset time interval, the inertial sensor is determined to be successful in operation, and the operation duration is recorded; based on a preset time interval, the pose information of the inertial sensor cannot be continuously received, and the failure of the operation of the inertial sensor is determined.

Step S2032, determining, based on pose information in a second period, a change in orientation, a displacement speed, and a displacement distance of the inertial sensor in the second period.

Here, the pose information may include position information, which may be determined based on acceleration acquired by the inertial sensor, and pose information, which may be determined based on acceleration acquired by the inertial sensor. The orientation change, displacement speed and displacement distance are used for representing the motion change degree of the inertial sensor, for example, the starting gesture and the ending gesture in the second period can be determined, and the change amount between the starting gesture and the ending gesture is determined as the orientation change; a start position and an end position within the second period of time may be determined, and an amount of change between the start position and the end position is determined as a displacement distance; under the condition of determining the displacement distance, determining the displacement time based on the acquisition frequency preset by the inertial sensor; based on the displacement distance and the displacement time, a corresponding displacement velocity (average velocity in the second period) is determined.

Step S2033, determining that the visual inertial positioning system is in the motion to be initialized if the operation duration, the orientation change, the displacement speed, and the displacement distance satisfy a second preset condition.

Here, the second preset condition includes at least one of: the running time is longer than the fifth threshold, the change of orientation is smaller than the sixth threshold, the displacement speed is smaller than the seventh threshold, and the displacement distance is smaller than the eighth threshold. For example: if the operation duration is less than or equal to the fifth threshold value, determining that the motion detection of the visual inertial positioning system fails; if the orientation change is greater than or equal to a sixth threshold, determining that the motion detection of the visual inertial positioning system fails; if the displacement speed in the second period is greater than or equal to a seventh threshold value, or the maximum speed at a certain moment in the second period is greater than a preset speed threshold value, determining that the motion detection of the visual inertial positioning system fails; if the displacement distance in the second period is greater than or equal to an eighth threshold value, the motion detection of the fixed vision inertial positioning system fails; if all the four second preset conditions are not met, the motion detection of the visual inertial positioning system is successfully represented, namely the visual inertial positioning system is in motion to be initialized. And if any one of the second preset conditions is not met, determining that the visual inertial positioning system is in an uncertain state.

In the embodiment of the disclosure, by determining the multi-dimensional attributes such as the operation time length, the orientation change, the displacement speed, the displacement distance and the like, the judgment can be accurately performed based on the second preset condition, whether the visual inertial positioning system is in the motion to be initialized or not is determined, and the accuracy and the efficiency of state discrimination are improved.

The embodiment of the disclosure provides a state distinguishing method, as shown in fig. 3, which includes the following steps S301 to S304:

steps S301 to S302 correspond to steps S101 to S102, respectively, and reference may be made to the specific embodiments of steps S101 to S102.

Step S303, when the running state of the visual inertial positioning system is the rest to be initialized or the motion to be initialized, aligning the image information and the pose information to obtain an alignment result.

Here, the alignment of the image information and pose information is used to determine some of the values required for subsequent visual inertial positioning system initialization: visual sensor pose, velocity, gravity vector, inertial sensor offset, and landmark (key point) position, etc. to achieve visual inertial alignment. Through the visual inertia alignment, a conversion relationship between a coordinate system (e.g., a camera coordinate system) corresponding to the visual sensor and a world coordinate system can be determined, so that a locus (Pose) in the coordinate system corresponding to the visual sensor is aligned to the world coordinate system, and a scale factor corresponding to the visual sensor and the like can be obtained according to a pre-integrated value corresponding to the inertial sensor. Wherein the visual inertial alignment may comprise: if the rotation external parameters are unknown, estimating the rotation external parameters; estimating an offset of the inertial sensor using a rotation constraint (e.g., a rotation change measured by the visual sensor is equal to a rotation change measured by the inertial sensor); estimating the gravity direction and the scale initial value by using translation constraint; further optimizing the gravity vector; solving a rotation matrix between the world coordinate system and the camera coordinate system, and aligning the track of the object in the camera coordinate system to the world coordinate system.

For example: determining a residual term and an optimization variable (such as a pre-integral variable corresponding to an inertial sensor) based on a preset constraint condition; determining a jacobian matrix of residual terms on optimization variables; determining how the loss function consists of residual terms (e.g., a robust kernel function, etc.); and optimizing by adopting a nonlinear least two algorithm. The alignment result may refer to a result of optimizing the residual term, e.g., the alignment result includes alignment success and alignment failure, where the alignment result may include an optimization confidence level such as a visual state variable and an inertial state variable, and the visual state variable includes at least one of: three-dimensional point inverse depth, position, angle, etc., the inertial state variables include at least one of: gravity vector, position, angle, speed, scale, inertial sensor bias, etc.

Step S304, under the condition that the alignment result represents successful alignment, the prior information matched with the alignment result is determined.

Here, a priori information is used to initialize the visual inertial positioning system, which may be used to determine the initial values of the state variables. For example: under the condition of alignment failure, determining preset parameters as priori information of the visual inertial positioning system; under the condition that alignment is successful, determining the optimal confidence in the alignment result as priori information of the visual inertial positioning system; updating state variables in the visual inertial positioning system in the running process of the visual inertial positioning system; and under the condition that the updated state variable meets convergence, the initialization of the visual inertial positioning system is completed.

In the embodiment of the disclosure, the alignment result is obtained by aligning the image information and the pose information, the prior information matched with the alignment result can be rapidly and accurately determined, and the subsequent initialization of the visual inertial positioning system based on the prior information can be accurately realized.

In some embodiments, the step S303 may include the following steps S3031 to S3034:

step S3031, a sliding window method is adopted to perform triangulation processing on the feature points in the image information, so as to obtain three-dimensional information of the feature points in the image information.

Here, the sliding window may refer to: maintaining a sliding window with a certain length, wherein a certain number of image frames and pose information are stored in the sliding window; after a new image frame is acquired, judging whether the new image frame is a key frame according to the parallax between the new image frame and each image frame in the sliding window, if so, deleting the earliest stored image frame in the sliding window, otherwise, deleting the latest stored image frame; and then based on the image frames in the sliding window and the corresponding pose information, realizing visual inertia alignment and obtaining an alignment result. Triangularization, which may also be referred to as triangularization (triangularization), is performed by determining three-dimensional information (e.g., three-dimensional spatial point coordinates) of corresponding feature points from at least two visual sensor poses and the same feature point coordinates. If so, acquiring two adjacent frames of images in the sliding window, and respectively detecting the characteristic points of the two adjacent frames of images to obtain two sets of characteristic points; matching the two groups of characteristic points to obtain a plurality of characteristic point pairs; and determining three-dimensional information (namely world coordinates) corresponding to each feature point pair based on the coordinates of each feature point pair and the pose information of the visual sensor corresponding to each feature point pair.

In some embodiments, a corresponding residual term may be determined based on three-dimensional information of a feature point on the image information in the sliding window and pose information corresponding to the inertial sensor, and then the residual term is optimized to achieve visual inertial alignment, obtain an alignment result, and the like, which is not limited herein.

Step S3032, determining a confidence matrix corresponding to the feature point in the image information based on the confidence matrix corresponding to the pose information and the three-dimensional information.

Here, the confidence coefficient matrix corresponding to the pose information is used for characterizing the attribute features of the pose information, and the confidence coefficient matrix corresponding to the feature points in the image information can be used for characterizing the attribute features of the feature points under the preset constraint. In the process of realizing visual inertia alignment, the pose information can be subjected to preprocessing such as normalization and the like, and the pose information is converted into a corresponding confidence coefficient matrix to obtain the confidence coefficient matrix corresponding to the pose information; specifically, the inertial sensor processes output values of the gyroscope and the accelerometer to obtain a confidence coefficient matrix corresponding to pose information; of course, the covariance matrix can be obtained by performing relative pose transformation estimation on the pose information based on the iterative closest point (Iterative Closest Point, ICP) algorithm, and then the corresponding confidence matrix can be obtained by performing confidence matrix conversion processing according to the covariance matrix. And determining the confidence coefficient matrix corresponding to the feature points in the image information based on the coordinate matrix corresponding to the feature points in the image information and the confidence coefficient matrix corresponding to the pose information. For example: determining a corresponding conversion matrix based on preset constraints between the vision sensor and the optical inertial sensor; and multiplying the transformation matrix by a coordinate matrix corresponding to the feature points and a confidence coefficient matrix corresponding to the pose information to obtain the confidence coefficient matrix corresponding to the feature points in the image information.

Step S3033, determining an optimization function corresponding to the visual inertial positioning system based on the confidence coefficient matrix corresponding to the feature points in the image information.

Here, the optimization function may be composed of an marginalized prior residual, a pre-integrated residual, and a re-projection residual, which may also be referred to as a residual term, for determining the optimization confidence, where the marginalized prior residual may be determined by constraints of pose and feature points removed from the sliding window, for example, in a sliding window manner, a matrix in a calculation process may be processed by using marginalization and sull compensation manners, so as to obtain the marginalized prior residual. The pre-integral residual error can be determined based on pose information between adjacent frames in a sliding window, because the frequency of acquiring data by an inertial sensor is high, the pose information between two frames is not small, if each layer of iterative optimization needs to recalculate integral (used for determining the attributes such as displacement, speed, rotation and the like), a great amount of calculation force is needed, and the calculation force is reduced, so that the processing is performed in a pre-integral mode; the purpose of the pre-integration is to convert the rotational motion in the world coordinate system into the rotational motion of the coordinate system corresponding to the inertial sensor when the frame is relatively the last. The re-projection residual refers to the difference between the projection of the real coordinates in the world coordinate system onto the image plane (i.e., the pixels on the image) and the re-projection (i.e., the virtual pixels obtained with the calculated values). For example: and back projecting the camera coordinate system of the ith frame to the world coordinate system, and then re-projecting the world coordinate system to the camera coordinate system of the jth frame to obtain a basic predicted value. A true observed value can then be determined by matching the feature points, and the error between the predicted value and the observed value can be determined as a re-projection residual.

In some embodiments, in determining the optimization function, the pose, the speed, the inertial sensor bias and the depth of the feature point when the feature point is observed for the first time corresponding to each image frame in the sliding window are taken as state quantities, and then a beam method adjustment (Bundle Adjustment, BA) model is constructed based on the state quantities so as to determine the optimization function. For example: the optimization function may be determined using the following formula:

X＝[x ₀ ,x ₁ ,x ₂ ,...,x _N-1 ,λ ₁ ,λ ₂ ,...,λ _m-1 ] (1)；

in the formulas (1), (2) and (3), lambda _i Representing the i-th observed depth of the feature point, wherein i represents a positive integer; x is x _i Representing the status of the ith frame in the sliding window, including position p _i Posture q _i Velocity v _i Accelerometer bias b _acc,i Gyro bias b _gyr,i N represents the length of the sliding window, and m represents the number of feature points; r is (r) _p -H _p x represents an marginalized a priori residual,representing pre-integral residual,/->Representing a re-projection residual; r is (r) _p 、H _p For the marginalization result r _B For pre-integral error +.>For the pre-integration result between two adjacent frames, < >>For the corresponding covariance matrix, C is the set formed by the combination of the image frame and the pose information, r _C Error of reprojection>For the observation of pose information in the corresponding image frame, < >>For a corresponding covariance matrix, etc.

And step S3034, optimizing the optimization function by using a nonlinear optimization mode to obtain an optimization result.

Here, the optimization results are used to characterize the alignment results. For example: optimizing the optimization function by adopting nonlinear modes such as a steepest descent method, a Newton method, a Gauss-Newton method (GN), a Levenberg-Marquardt method (LM) and the like, if an optimization result exists, indicating that the alignment is successful, and obtaining an optimization result containing an optimization confidence coefficient; and if the optimization result does not exist, indicating that the alignment fails.

In some embodiments, if the optimization function has no solution (e.g., the optimization function does not converge); or determining that the re-projection error (or the re-projection residual) is smaller than the preset first error threshold value, if the re-projection error (or the re-projection residual) is smaller than the preset first error threshold value; or, the re-projection error (or re-projection residual) is greater than a preset second error threshold; a visual inertial alignment failure is determined. If the visual inertia alignment is determined to be successful, determining a characteristic point (or a three-dimensional point in a world coordinate system) with a re-projection error (or a re-projection residual error) larger than a preset third error threshold, determining the characteristic point as an outlier, eliminating the outlier, performing one-time optimization treatment on the visual inertia positioning system, and the like; wherein, the outlier may refer to noise during the initialization process.

In the embodiment of the disclosure, the optimization function can be quickly constructed through the image information and the pose information, and then the optimization result is accurately obtained through optimizing the optimization function.

The embodiment of the disclosure provides a state distinguishing method, wherein the optimization result comprises an optimization confidence; as shown in fig. 4, the method includes the following steps S401 to S404:

steps S401 to S403 correspond to steps S301 to S303, respectively, and reference may be made to the specific embodiments of steps S301 to S303.

And step S404, acquiring the acceleration offset and the angular velocity offset acquired by the inertial sensor, and presetting pose information and three-dimensional information in a sliding window.

Here, the optimization confidence is used to characterize the optimization result of the optimization function, and the optimization confidence may be used to determine a priori information to enable the visual inertial positioning system to be initialized. The method comprises the steps of acquiring acceleration offset and angular velocity offset acquired by an inertial sensor, and presetting pose information and three-dimensional information in a sliding window, wherein the pose information and the three-dimensional information are used for determining prior information of a visual inertial positioning system, and initial confidence of feature points corresponding to the three-dimensional information is optimized confidence. For example: if the alignment is successful, the pose information acquired by the inertial sensor in the sliding window and the three-dimensional information of the feature points in the image information can be preset to be directly determined as priori information of the visual inertial positioning system, wherein the initial confidence coefficient of the feature points is directly obtained by using the optimization confidence coefficient in the optimization process; and directly determining the acceleration bias and the angular velocity bias determined by the inertial sensor as a priori information of the visual inertial positioning system, and may set a higher confidence level.

In the embodiment of the disclosure, in the case of successful alignment, the prior information of the visual inertial positioning system can be accurately determined based on the optimization confidence.

In some embodiments, the method may further include the following step S311 after implementing step S303:

step S311, when the visual inertial positioning system is at the rest to be initialized, performing zero-speed update on the visual inertial positioning system to obtain an updated visual inertial positioning system.

Here, the zero-speed update means that accumulated errors in the zero-speed state are eliminated by using a zero-speed update algorithm, which generally means that navigation errors are estimated by using a kalman filter in a zero-speed interval, and the basic principle of the kalman filter is as follows: and carrying out iterative estimation on errors of the visual inertial positioning system according to the calculation result and the observation data of the visual inertial positioning system at continuous updating moments. If the visual inertial positioning system is in the static state to be initialized, the visual inertial alignment is realized, and the degree of freedom of the visual inertial positioning system is smaller under the static state to be initialized, so that higher confidence coefficient can be set for the state quantities such as the position, the orientation, the speed and the like corresponding to the key frame in the sliding window; meanwhile, the visual inertial positioning system can be updated at zero speed, so that the updated visual inertial positioning system is obtained, and the unreasonable increase of the speed is restrained.

In the embodiment of the disclosure, the unreasonable increase of the speed is restrained by carrying out zero-speed update on the visual inertial positioning system.

In some embodiments, the method may further include the following step S321 after implementing step S303:

step S321, determining the optimization confidence coefficient as a position state quantity, an angle state quantity and a speed state quantity corresponding to the image information in the preset sliding window when the visual inertial positioning system is in the motion to be initialized.

Here, if the visual inertial positioning system is in the condition of the motion to be initialized, the visual inertial alignment is realized, and the state quantities such as the position, the orientation, the speed and the like corresponding to the key frame in the sliding window can be directly used for optimizing the obtained optimizing confidence coefficient.

In the embodiments of the present disclosure, a priori information of the visual inertial positioning system in the case of motion to be initialized may be accurately determined based on the optimization confidence.

The state distinguishing method provided by the embodiment of the present disclosure is described below by taking an initialization scene in a visual inertial positioning system as an example.

The initialization method of the visual inertial positioning system in the related art is complex, a user needs to receive more specialized technical guidance, and the initialization method depends on three-dimensional structure recovery of a scene and is easy to fail in the scene with insufficient texture, long depth of field and the like. Meanwhile, in the related art, initialization is divided into motion initialization and stationary initialization. For motion initialization, on one hand, when the computer equipment is static or rotates, enough parallax cannot be provided, and visual initialization cannot be completed; on the other hand, the initialization of the inertial sensor still has certain probability of unsuccessful due to the influence of the initialization effect of the visual sensor and the pose data acquired by the inertial sensor, and the initialization success rate is lower for scenes such as the outdoor environment with not abundant textures or large scale. For stationary initialization, the computer equipment needs to be kept for a period of relative stationary time, and in the process from stationary to motion, on one hand, the scale is mutually coupled with the bias of the accelerometer in the inertial sensor, so that the correct scale is difficult to obtain; on the other hand, the static initialization is to restrict the visual inertial positioning system under the condition of no three-dimensional points through a zero-speed update strategy, but error zero-speed update is easy to generate in the process from static to motion, and the error of scale solving is also caused.

In the embodiment of the disclosure, an initialization method based on a visual inertial positioning system is provided, which can process more complex initialization motion modes; the visual inertial positioning system is initialized independently of three-dimensional scene structure recovery, and can process more complex scenes; compared with the prior art, the visual inertial positioning system has poor precision in local movement, the visual inertial positioning system cannot be initialized by directly using the pose information obtained by the visual inertial positioning system, and in the embodiment of the disclosure, the result can be integrated into the visual inertial positioning system in a visual joint optimization mode; meanwhile, compared with the fact that the visual inertial positioning system in the related art cannot accurately distinguish the uniform motion and the static state, the embodiment of the disclosure combines the static judgment of the visual inertial positioning system with the visual characteristic tracking (such as the tracking result determined based on the image information and the like), so that the influence of the existing problem of the visual inertial positioning system is reduced, and the initializing stability is improved; because the static initialization and the motion initialization have different requirements on the visual inertial positioning system, the visual inertial positioning system in the embodiment of the disclosure can be compatible with the static initialization and the motion initialization and has good performance in two initialization modes.

The initialization method of the visual inertial positioning system in the embodiment of the disclosure may include: a stationary detection model, a motion detection module, an alignment module, and an initialization model. The static detection model can utilize the output of the inertial sensor (such as pose information) and the output of the visual sensor (such as a visual tracking result determined by image information) to perform cross verification, so as to judge whether the equipment is static to be initialized; the motion detection module can determine whether the current motion mode (namely motion to be initialized) is suitable for initializing the visual inertial positioning system by utilizing the (equipment) pose information output by the inertial sensor; the alignment module can align the equipment pose information obtained by the inertial sensor with the visual feature tracking information obtained by the visual sensor under the condition of static to be initialized or motion to be initialized to obtain an alignment result; the initialization model can assign initial values to the state quantity of the visual inertial positioning system and the corresponding initial confidence coefficient by utilizing the alignment result output by the alignment module, so that initialization is completed.

The visual inertial positioning system in the embodiment of the disclosure may be operated based on a mode matching, gait processing, deep learning and other modes, and is not limited herein; the detection of the motion detection module can be an identification result of the inertial sensor or can be identified based on pose information in the initialization process; in determining the visual tracking result, feature descriptor matching may be used, or matching may be performed using image blocks based on image texture information, or the like, which is not limited herein.

For example: simultaneously starting a static detection module and a motion detection module, and entering an alignment module if one of the latest frame of images and pose information acquired by an inertial sensor is successfully detected; if none of the detection is successful, waiting for the next frame of data. The alignment module is used for attempting to align pose information output by the inertial sensor, image information output by the visual sensor and pre-integration information of the inertial sensor; if the alignment fails, returning to the previous step, and continuing to perform static detection and motion detection; if the alignment is successful, entering an initialization module; the initialization module is used for initializing the visual inertial positioning system by using the optimized confidence coefficient obtained by the alignment module, and different system state quantities (such as visual state quantity and inertial state quantity) are respectively applied with different confidence coefficients according to the alignment result of the alignment module so as to realize initialization.

The track accuracy obtained by the inertial sensor during local movement is low, so that the inertial sensor cannot be directly used for a visual inertial positioning system, and the accuracy of the system is further improved by using a combined visual information optimization mode. For example: the method comprises the steps of adopting a sliding window mode, triangulating all characteristic points in the sliding window by using equipment pose information obtained by an inertial sensor, and simultaneously, calculating a confidence coefficient matrix of the three-dimensional position of each three-dimensional point by using a confidence coefficient matrix of the equipment pose information output by the inertial sensor (for example, adopting an inverse depth mode to represent the three-dimensional point position information, wherein an anchor point frame is a first frame for observing the three-dimensional point, and the like); if there is no three-dimensional point with enough parallax or a three-dimensional point far away from a visual sensor (such as a camera), respectively giving an assumed inverse depth value for the movement distance of the camera, adding a larger uncertainty to the depth of the three-dimensional point, and then realizing rough three-dimensional structure construction based on an inertial sensor to construct an optimization equation; then, optimizing the optimization variance by using a preset optimizer to obtain an optimization result and the like; finally, based on the optimization result, the initialization module can be utilized to realize initialization; the initialization module is used for initializing the visual inertial positioning system by using the alignment result of the alignment module, so that the visual inertial positioning system is ensured to have reliable prior information.

In the embodiment of the disclosure, aiming at the operation difficulty of initializing the visual inertial positioning system by a user, an initialization scheme integrating the visual sensor and the inertial sensor is provided, so that the dependence of the initialization on a user motion mode and scene three-dimensional information is effectively reduced, and the initialization is compatible with motion and static initialization, and meanwhile, the robustness of the initialization system is improved, and the initialization operation of the visual inertial positioning system is facilitated to be finished more easily by the user. The visual inertial positioning system can be applied to a smart phone, a virtual reality helmet, augmented reality glasses and other devices with a visual (or image) sensor and an inertial sensor module.

Based on the foregoing embodiments, the embodiments of the present disclosure provide a state distinguishing apparatus, which includes each module included, and may be implemented by a processor in a computer device; of course, the method can also be realized by a specific logic circuit; in practice, the processor may be a central processing unit (Central Processing Unit, CPU), microprocessor (Microprocessor Unit, MPU), digital signal processor (Digital Signal Processor, DSP) or field programmable gate array (Field Programmable Gate Array, FPGA), etc.

Fig. 5 is a schematic structural diagram of a state differentiating device according to an embodiment of the present disclosure, and as shown in fig. 5, a state differentiating device 500 includes: an acquisition module 510 and a differentiation module 520, wherein:

an acquisition module 510, configured to acquire image information and pose information from the visual inertial positioning system; wherein the image information is acquired by a visual sensor in the visual inertial positioning system, and the pose information is acquired by an inertial sensor in the visual inertial positioning system; a distinguishing module 520, configured to distinguish an operation state of the visual inertial positioning system based on the image information and the pose information, so as to obtain a target state of the visual inertial positioning system; wherein the operating state comprises at least a rest to be initialized and a motion to be initialized.

In some embodiments, the differentiating module is further configured to: cross-verifying the image information and the pose information to obtain a first detection result; wherein the first detection result is used for representing whether the visual inertial positioning system is at the initialization standstill; performing motion detection on the visual inertial positioning system based on the pose information to obtain a second detection result; wherein the second detection result is used for representing whether the visual inertial positioning system is in the motion to be initialized.

In some embodiments, the differentiating module is further configured to: determining a position variance and an angle variance of the inertial sensor in a first period based on pose information in the first period; determining at least two image parallaxes of characteristic points acquired by the vision sensor in the first period and distribution areas of the characteristic points based on the image information in the first period; determining that the visual inertial positioning system is at rest to be initialized under the condition that the position variance, the angle variance and the image parallax meet a first preset condition; wherein the first preset condition at least includes: the positional variance is less than a first threshold, the angular variance is less than a second threshold, each of the image disparities is less than a third threshold, and an area of the distribution area is greater than a fourth threshold in the presence of the image disparities being greater than or equal to the third threshold.

In some embodiments, the differentiating module is further configured to: acquiring the operation time length corresponding to the pose information; wherein the operation duration characterizes the duration of successful operation of the inertial sensor; determining orientation changes, displacement speeds and displacement distances of the inertial sensor in a second period based on pose information in the second period; determining that the visual inertial positioning system is in the motion to be initialized under the condition that the operation time length, the orientation change, the displacement speed and the displacement distance meet a second preset condition; wherein the second preset condition includes at least one of: the operation duration is greater than a fifth threshold, the change in orientation is less than a sixth threshold, the displacement speed is less than a seventh threshold, and the displacement distance is less than an eighth threshold.

In some embodiments, the apparatus further comprises: the alignment module is used for aligning the image information and the pose information to obtain an alignment result when the running state of the visual inertial positioning system is the static state to be initialized or the motion to be initialized; the first determining module is used for determining prior information matched with the alignment result under the condition that the alignment result represents successful alignment; wherein the a priori information is used to initialize the visual inertial positioning system.

In some embodiments, the alignment module is further configured to: performing triangularization processing on the characteristic points in the image information by adopting a sliding window mode to obtain three-dimensional information of the characteristic points in the image information; determining a confidence coefficient matrix corresponding to the feature points in the image information based on the confidence coefficient matrix corresponding to the pose information and the three-dimensional information; determining an optimization function corresponding to the visual inertial positioning system based on a confidence coefficient matrix corresponding to the feature points in the image information; the optimization function consists of an marginalized prior residual error, a pre-integral residual error and a re-projection residual error; optimizing the optimization function by using a nonlinear optimization mode to obtain an optimization result; wherein the optimization result is used to characterize the alignment result.

In some embodiments, the optimization results include an optimization confidence; the first determining module is further configured to: acquiring acceleration offset and angular velocity offset acquired by the inertial sensor, and presetting pose information and three-dimensional information in a sliding window; the acceleration offset and the angular velocity offset acquired by the inertial sensor are used for determining pose information and three-dimensional information in the preset sliding window as priori information of the visual inertial positioning system; and the initial confidence coefficient of the feature point corresponding to the three-dimensional information is the optimized confidence coefficient.

In some embodiments, the apparatus further comprises: and the updating module is used for carrying out zero-speed updating on the visual inertial positioning system under the condition that the visual inertial positioning system is at rest to be initialized, so as to obtain the updated visual inertial positioning system.

In some embodiments, the apparatus further comprises: and the second determining module is used for determining the optimal confidence coefficient as a position state quantity, an angle state quantity and a speed state quantity corresponding to the image information in the preset sliding window under the condition that the visual inertial positioning system is in the motion to be initialized.

The description of the apparatus embodiments above is similar to that of the method embodiments above, with similar advantageous effects as the method embodiments. In some embodiments, functions or modules included in the apparatus provided by the embodiments of the present disclosure may be used to perform the methods described in the embodiments of the method, and for technical details not disclosed in the embodiments of the apparatus of the present disclosure, please understand with reference to the description of the embodiments of the method of the present disclosure.

It should be noted that, in the embodiment of the present disclosure, if the above-mentioned state distinguishing method is implemented in the form of a software functional module, and sold or used as a separate product, it may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present disclosure may be essentially or portions contributing to the related art, and the software product may be stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes. Thus, embodiments of the present disclosure are not limited to any specific hardware, software, or firmware, or any combination of the three.

The disclosed embodiments provide a computer device comprising a memory storing a computer program executable on the processor and a processor implementing some or all of the steps of the above method when the processor executes the program.

The disclosed embodiments provide a computer readable storage medium having stored thereon a computer program which when executed by a processor performs some or all of the steps of the above method. The computer readable storage medium may be transitory or non-transitory.

The disclosed embodiments provide a computer program comprising computer readable code which, when run in a computer device, performs some or all of the steps for implementing the methods described above.

Embodiments of the present disclosure provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program which, when read and executed by a computer, performs some or all of the steps of the above-described method. The computer program product may be realized in particular by means of hardware, software or a combination thereof. In some embodiments, the computer program product is embodied as a computer storage medium, in other embodiments the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

It should be noted here that: the above description of various embodiments is intended to emphasize the differences between the various embodiments, the same or similar features being referred to each other. The above description of apparatus, storage medium, computer program and computer program product embodiments is similar to that of method embodiments described above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the disclosed apparatus, storage medium, computer program and computer program product, please refer to the description of the embodiments of the disclosed method.

It should be noted that, fig. 6 is a schematic diagram of a hardware entity of a computer device in an embodiment of the disclosure, as shown in fig. 6, the hardware entity of the computer device 600 includes: a processor 601, a communication interface 602, and a memory 603, wherein:

the processor 601 generally controls the overall operation of the computer device 600.

The communication interface 602 may enable a computer device to communicate with other terminals or servers over a network.

The memory 603 is configured to store instructions and applications executable by the processor 601, and may also cache data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or processed by various modules in the processor 601 and the computer device 600, which may be implemented by a FLASH memory (FLASH) or a random access memory (Random Access Memory, RAM). Data transfer may be performed between the processor 601, the communication interface 602, and the memory 603 via the bus 604.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present disclosure, the size of the sequence numbers of the steps/processes described above does not mean the order of execution, and the order of execution of the steps/processes should be determined by their functions and inherent logic, and should not constitute any limitation on the implementation of the embodiments of the present disclosure. The foregoing embodiment numbers of the present disclosure are merely for description and do not represent advantages or disadvantages of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read Only Memory (ROM), a magnetic disk or an optical disk, or the like, which can store program codes.

Alternatively, the above-described integrated units of the present disclosure may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the present disclosure may be embodied essentially or in part in a form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.

The methods disclosed in the several method embodiments provided in the present disclosure may be arbitrarily combined without collision to obtain a new method embodiment.

If the embodiment of the disclosure relates to personal information, the product applying the embodiment of the disclosure clearly informs the personal information processing rule and obtains personal autonomous consent before processing the personal information. If the disclosed embodiments relate to sensitive personal information, the product to which the disclosed embodiments are applied has obtained individual consent before processing the sensitive personal information, and at the same time meets the requirement of "explicit consent".

The foregoing is merely an embodiment of the present disclosure, but the protection scope of the present disclosure is not limited thereto, and any person skilled in the art can easily think about the changes or substitutions within the technical scope of the present disclosure, and should be covered by the protection scope of the present disclosure.

Claims

1. A method of distinguishing between states, comprising:

acquiring image information and pose information from a visual inertial positioning system; wherein the image information is acquired by a visual sensor in the visual inertial positioning system, and the pose information is acquired by an inertial sensor in the visual inertial positioning system;

Distinguishing the running state of the visual inertial positioning system based on the image information and the pose information to obtain a target state of the visual inertial positioning system; wherein the operating state comprises at least a rest to be initialized and a motion to be initialized.

2. The method of claim 1, wherein distinguishing the operational state of the visual inertial positioning system based on the image information and the pose information to obtain the target state of the visual inertial positioning system comprises:

cross-verifying the image information and the pose information to obtain a first detection result; wherein the first detection result is used for representing whether the visual inertial positioning system is at the initialization standstill;

performing motion detection on the visual inertial positioning system based on the pose information to obtain a second detection result; wherein the second detection result is used for representing whether the visual inertial positioning system is in the motion to be initialized.

3. The method according to claim 2, wherein the cross-verifying the image information and the pose information to obtain a first detection result includes:

Determining a position variance and an angle variance of the inertial sensor in a first period based on pose information in the first period;

determining at least two image parallaxes of characteristic points acquired by the vision sensor in the first period and distribution areas of the characteristic points based on the image information in the first period;

determining that the visual inertial positioning system is at rest to be initialized under the condition that the position variance, the angle variance and the image parallax meet a first preset condition;

wherein the first preset condition at least includes: the positional variance is less than a first threshold, the angular variance is less than a second threshold, each of the image disparities is less than a third threshold, and an area of the distribution area is greater than a fourth threshold in the presence of the image disparities being greater than or equal to the third threshold.

4. A method according to claim 2 or 3, wherein the performing motion detection on the visual inertial positioning system based on the pose information to obtain a second detection result comprises:

acquiring the operation time length corresponding to the pose information; wherein the operation duration characterizes the duration of successful operation of the inertial sensor;

Determining orientation changes, displacement speeds and displacement distances of the inertial sensor in a second period based on pose information in the second period;

determining that the visual inertial positioning system is in the motion to be initialized under the condition that the operation time length, the orientation change, the displacement speed and the displacement distance meet a second preset condition;

wherein the second preset condition includes at least one of: the operation duration is greater than a fifth threshold, the change in orientation is less than a sixth threshold, the displacement speed is less than a seventh threshold, and the displacement distance is less than an eighth threshold.

5. The method according to any one of claims 1 to 4, further comprising:

under the condition that the running state of the visual inertial positioning system is the static state to be initialized or the motion to be initialized, aligning the image information and the pose information to obtain an alignment result;

under the condition that the alignment result represents successful alignment, determining prior information matched with the alignment result; wherein the a priori information is used to initialize the visual inertial positioning system.

6. The method of claim 5, wherein aligning the image information and the pose information to obtain an alignment result comprises:

performing triangularization processing on the characteristic points in the image information by adopting a sliding window mode to obtain three-dimensional information of the characteristic points in the image information;

determining a confidence coefficient matrix corresponding to the feature points in the image information based on the confidence coefficient matrix corresponding to the pose information and the three-dimensional information;

determining an optimization function corresponding to the visual inertial positioning system based on a confidence coefficient matrix corresponding to the feature points in the image information; the optimization function consists of an marginalized prior residual error, a pre-integral residual error and a re-projection residual error;

optimizing the optimization function by using a nonlinear optimization mode to obtain an optimization result; wherein the optimization result is used to characterize the alignment result.

7. The method of claim 6, wherein the optimization result comprises an optimization confidence; the determining a priori information matching the alignment result includes:

acquiring acceleration offset and angular velocity offset acquired by the inertial sensor, and presetting pose information and three-dimensional information in a sliding window;

The acceleration offset and the angular velocity offset acquired by the inertial sensor are used for determining pose information and three-dimensional information in the preset sliding window as priori information of the visual inertial positioning system; and the initial confidence coefficient of the feature point corresponding to the three-dimensional information is the optimized confidence coefficient.

8. The method of claim 7, wherein the method further comprises:

and under the condition that the visual inertial positioning system is at the static state to be initialized, carrying out zero-speed updating on the visual inertial positioning system to obtain the updated visual inertial positioning system.

9. The method of claim 7, wherein the method further comprises:

and under the condition that the visual inertial positioning system is in the motion to be initialized, determining the optimization confidence as a position state quantity, an angle state quantity and a speed state quantity corresponding to the image information in the preset sliding window.

10. A state discrimination apparatus, comprising:

the acquisition module is used for acquiring image information and pose information from the visual inertial positioning system; wherein the image information is acquired by a visual sensor in the visual inertial positioning system, and the pose information is acquired by an inertial sensor in the visual inertial positioning system;

The distinguishing module is used for distinguishing the running state of the visual inertial positioning system based on the image information and the pose information to obtain the target state of the visual inertial positioning system; wherein the operating state comprises at least a rest to be initialized and a motion to be initialized.

11. A computer device comprising a memory and a processor, the memory storing a computer program executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1 to 9 when the program is executed.

12. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, realizes the steps in the method according to any one of claims 1 to 9.