WO2020186943A1

WO2020186943A1 - Mobile device posture determination apparatus and method, and visual odometer

Info

Publication number: WO2020186943A1
Application number: PCT/CN2020/075049
Authority: WO
Inventors: 查红彬; 薛飞; 方奕庚; 姜立
Original assignee: 京东方科技集团股份有限公司; 北京大学
Priority date: 2019-03-15
Filing date: 2020-02-13
Publication date: 2020-09-24
Also published as: CN109798888B; CN109798888A

Abstract

A mobile device posture determination apparatus and method and a visual odometer. The apparatus comprises one or more processors (81920) configured to: determine an image-difference feature with respect to a current frame and a preceding frame in a video stream acquired by a mobile device; determine, according to the image-difference feature, current encoding information by means of a first machine learning model; and determine, according to the current encoding information and at least one piece of historical encoding information, a posture of the mobile device by means of a second machine learning model.

Description

Device and method for determining posture of mobile equipment and visual odometer

Cross references to related applications

This application is based on the application with the CN application number 201910199169.7 and the filing date of March 15, 2019, and claims its priority. The disclosure of the CN application is hereby incorporated into this application as a whole.

Technical field

The present disclosure relates to the field of computer technology, and in particular to a posture determination device of a mobile device, a posture method of a mobile device, a visual odometer and a computer-readable storage medium.

Background technique

The visual odometer can determine the position and posture of the robot by analyzing and processing related image sequences, and then record the entire trajectory of the robot.

In related technologies, the visual odometer combines the image information of adjacent frames in the video stream, and uses the local map to optimize the camera pose of the corresponding frame based on the geometric characteristics of the image; or based on the IMU (Inertial Measurement Unit) to provide Information to determine the camera pose.

Summary of the invention

According to some embodiments of the present disclosure, there is provided an apparatus for determining a posture of a mobile device, including one or more processors configured to determine the current frame and the previous frame in the video stream acquired by the mobile device. Image difference characteristics between frames; according to the image difference characteristics, the first machine learning model is used to obtain current encoding information; according to the current encoding information and at least one historical encoding information, the second machine learning model is used to determine the movement The attitude of the device.

In some embodiments, the current frame is the Mth frame, and M is a positive integer greater than 1. At least one of the movement distance or the posture change of the mobile device corresponding to the Nth frame to the N-1th frame If the threshold is exceeded, the coding information of the Nth frame is stored as the historical coding information, and N is a positive integer less than M.

In some embodiments, according to the correlation between the various channel components of the current coding information, the various channel components of the current coding information are fused to obtain the current coding information after fusion; The correlation between the various channel components is to fuse each channel component of the historical coding information to obtain the fused historical coding information; according to the fused current coding information and the fused historical coding information, use The second machine learning model determines the posture of the mobile device.

In some embodiments, the first weight of each channel component is determined according to the correlation between each channel component of the current coding information; the each channel component is weighted according to the first weight to obtain the The current encoding information after the fusion is described.

In some embodiments, the second weight of each channel component is determined according to the correlation between each channel component of each historical coded information; each channel component is weighted according to the second weight to obtain The fused historical coding information.

In some embodiments, according to the correlation between the historical coding information, the historical coding information is fused to obtain integrated historical coding information; according to the integrated historical coding information and the current coding information, the second The machine learning model determines the posture of the mobile device.

In some embodiments, the third weight of each historical coding information is determined according to the correlation between each historical coding information; according to the third weight, the weighted sum of each historical coding information is performed to obtain the Describes comprehensive historical coding information.

In some embodiments, the current encoding information and the historical encoding information are spliced according to the channel dimension direction to generate output encoding information; according to the output encoding information, the second machine learning model is used to determine the mobile device Gesture.

In some embodiments, the image difference feature is acquired through an optical flow network model; at least one of the first machine learning model and the second machine learning model is ConvLSTM (Convolutional Long Short-Term Memory Network, convolution Long short-term memory network) model.

According to other embodiments of the present disclosure, a method for determining the posture of a mobile device is provided, including: determining the image difference feature between the current frame and the previous frame in the video stream acquired by the mobile device; and according to the image difference Characteristic, using a first machine learning model to determine current encoding information; using a second machine learning model to determine the posture of the mobile device according to the current encoding information and at least one piece of historical encoding information.

In some embodiments, the second weight of each channel component is determined according to the correlation between each channel component of each historical coded information; each channel component is weighted according to the second weight to obtain The historical coding information after the fusion.

In some embodiments, the at least one piece of historical coding information includes multiple pieces of historical coding information, and according to the correlation between each piece of historical coding information, the pieces of historical coding information are fused to obtain comprehensive historical coding information; Synthesize the historical coding information and the current coding information, and use a second machine learning model to determine the posture of the mobile device.

In some embodiments, the image difference feature is obtained through an optical flow network model; at least one of the first machine learning model and the second machine learning model is a ConvLSTM model.

According to still other embodiments of the present disclosure, there is provided a visual odometer, including: the posture determination apparatus as described in any of the foregoing embodiments, configured to determine the posture of the mobile device according to the video stream shot by the mobile device.

In some embodiments, the visual odometer further includes an image sensor for acquiring the video stream.

According to still other embodiments of the present disclosure, there is provided a computer-readable storage medium having a computer program stored thereon, and when the program is executed by a processor, the posture determination method as described in any of the foregoing embodiments is implemented.

Through the following detailed description of exemplary embodiments of the present disclosure with reference to the accompanying drawings, other features and advantages of the present disclosure will become clear.

Description of the drawings

The drawings described here are used to provide a further understanding of the present disclosure and constitute a part of the present application. The exemplary embodiments of the present disclosure and their descriptions are used to explain the present disclosure, and do not constitute an improper limitation of the present disclosure. In the attached drawing:

Fig. 1 is a flowchart showing a method for determining a posture of a mobile device according to an embodiment of the present disclosure;

Fig. 2a is a schematic diagram showing a method for determining a posture of a mobile device according to an embodiment of the present disclosure;

FIG. 2b is a schematic diagram showing ConvLSTM used in a method for determining a posture of a mobile device according to an embodiment of the present disclosure;

FIG. 3 is a flowchart showing an embodiment of step 130 in FIG. 1;

FIG. 4 is a schematic diagram showing an embodiment of step 1320 in FIG. 3;

FIG. 5 is a flowchart showing another embodiment of step 130 in FIG. 1;

FIG. 6 is a schematic diagram showing an embodiment of step 1321 in FIG. 5;

FIG. 7 is a flowchart showing another embodiment of step 130 in FIG. 1;

Fig. 8 is a block diagram showing an apparatus for determining a posture of a mobile device according to an embodiment of the present disclosure;

FIG. 9 is a block diagram showing an apparatus for determining a posture of a mobile device according to another embodiment of the present disclosure;

FIG. 10 is a block diagram showing a visual odometer according to an embodiment of the present disclosure.

It should be understood that the sizes of the various parts shown in the drawings are not drawn in accordance with the actual proportional relationship. In addition, the same or similar reference numerals indicate the same or similar components.

detailed description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The following description of at least one exemplary embodiment is actually only illustrative, and in no way serves as any limitation to the present disclosure and its application or use. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present disclosure.

Unless specifically stated otherwise, the relative arrangement of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure. At the same time, it should be understood that, for ease of description, the sizes of the various parts shown in the drawings are not drawn in accordance with actual proportional relationships. The technologies, methods, and equipment known to those of ordinary skill in the relevant fields may not be discussed in detail, but where appropriate, the technologies, methods, and equipment should be regarded as part of the authorization specification. In all examples shown and discussed here, any specific value should be interpreted as merely exemplary, rather than as a limitation. Therefore, other examples of the exemplary embodiment may have different values. It should be noted that similar reference numerals and letters indicate similar items in the following drawings, so once a certain item is defined in one drawing, it does not need to be further discussed in subsequent drawings.

Fig. 1 is a flowchart showing a method for determining a posture of a mobile device according to an embodiment of the present disclosure.

As shown in Fig. 1, the method includes: step 110, determining the image difference feature; step 120, determining the current encoding information; and step 130, determining the posture of the mobile device.

In step 110, the image difference feature between the current frame and the previous frame in the video stream acquired by the mobile device is determined.

For example, the mobile device may be a movable platform such as a robot, an unmanned vehicle, a drone, etc., and images are taken by a camera based on an image sensor such as a CCD or CMOS.

For example, the image difference feature can be obtained through a convolutional neural network (CNN).

For example, the optical flow network (Flownet: Learning Optical Flow with Convolutional Networks) model can be used to obtain image difference features.

For example, the optical flow network (FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks) model can be used to obtain image difference features.

In some embodiments, two adjacent frames of images can be superimposed and input into the optical flow network model, and the feature extraction part of the optical flow network model is used to extract the image difference features. The image difference feature is a high-dimensional feature, and the number of channels (such as 1024) of the high-dimensional feature can be determined according to the resolution of the current frame image. For example, the optical flow network model can perform multiple convolution processing on the overlapped image, and extract the offset of each pixel of two adjacent frames of image as the image difference feature according to the convolution processing result.

In this way, high-dimensional redundant image information can be converted into high-level, abstract semantic features, which solves the problem that related technologies based on geometric features are susceptible to environmental factors (such as occlusion, lighting changes, dynamic objects, etc.), thereby improving The accuracy of attitude determination is improved.

In step 120, the first machine learning model is used to determine the current encoding information according to the image difference characteristics. For example, the first machine learning model may be an RNN (Recurrent Neural Network) model, such as a ConvLSTM model.

In some embodiments, historical coding information (that is, coding information corresponding to key frames) that has an important influence on pose determination can be filtered from the historical output of the RNN model as effective information. The effective information can be fused with the current coded information to jointly determine the current posture of the mobile device. For example, in the case that at least one of the movement distance or posture change corresponding to the mobile device from the Nth frame to the N-1th frame exceeds the threshold, the Nth frame is determined to be a key frame; the code of the Nth frame extracted by the RNN model is stored The information is used as historical coding information.

In step 130, a second machine learning model is used to determine the posture of the mobile device according to the current encoding information and at least one historical encoding information. For example, the second machine learning model may be an RNN model, such as a ConvLSTM model. Using the RNN model to decode the encoded information, the posture of the mobile device can be determined.

This current posture determined based on current coding information and historical coding information is a posture determined by global optimization (that is, absolute posture) in the global range from the first frame to the current frame of the video stream. Compared with the locally optimized posture (that is, the relative posture) determined only in the local range of the current frame and the previous frame, the absolute posture is more accurate.

In addition, the ConvLSTM model does not need to rely on the information provided by the IMU, and only relies on visual information to determine the attitude determination state, thereby reducing the cost of attitude determination.

Fig. 2a is a schematic diagram showing a method for determining a posture of a mobile device according to an embodiment of the present disclosure.

As shown in Figure 2a, the extracted current coding information from ₁ to _T is x ₁ to x _T. The historical coding information stored at each time is S ² to S ^T. The current coding information and historical coding information at each time are used as the input of the first machine learning model (such as ConvLSTM) to obtain output coding information O ₁ to O _T at each time. Input O ₁ to O _T into the second machine learning model (such as ConvLSTM) to obtain the postures P to P _T of the mobile device at each moment.

As shown in Figure 2b, a principle implementation of ConvLSTM is shown. X _t , h _t , and o _t represent input characteristics, state variables and output respectively.

In some embodiments, step 130 may be implemented by the steps in Figure 2a.

Although the embodiment of the present disclosure enumerates ConvLSTM as an implementation of the machine learning model, other machine learning models may also be applicable to the present disclosure, such as FC-LSTM (Fully Connection LSTM, fully connected long short-term memory) model.

As understood by those skilled in the art, in order to make the machine learning model (such as neural network, etc.) have the required functions, before using the machine learning model, it also includes the use of multiple samples, such as sample images, sample data, etc. for machine learning The steps of model training. The trained machine learning model can be used in the above methods. For example, the required machine learning model can be trained and obtained in a supervised manner (samples and labels corresponding to the samples).

FIG. 3 is a flowchart showing an embodiment of step 130 in FIG. 1.

As shown in FIG. 3, step 130 includes: step 1310, fusing each channel component of the current coded information; step 1320, fusing each channel component of the historical coded information; and step 1330, determining the posture of the mobile device.

In step 1310, according to the correlation between the channel components of the current encoded information, the channel components of the current encoded information are fused.

In some embodiments, the first weight of each channel component is determined according to the correlation between each channel component of the current encoded information; each channel component is weighted according to the first weight to obtain the current encoded information after fusion.

For example, the current encoding information is the output O _{t of} the first machine learning model at the current moment. O _t has J channel components: O _t1 , O _t2 … O _tJ . Calculate the correlation between O _t1 , O _t2 …O _tJ , and determine the corresponding weights of O _t1 , O _t2 …O _tJ according to the correlation. With corresponding weights O _t1, O _t2 ... O _tJ weighting process to obtain O _'t.

In this way, it is equivalent to selecting each channel component according to the spatial information of the current coding information. Such a technical solution increases the channel components important for attitude determination and reduces the unimportant channel components, thereby improving the accuracy of attitude determination.

In step 1320, according to the correlation between the various channel components of the historical coding information, the various channel components of the historical coding information are fused.

In some embodiments, the second weight of each channel component is determined according to the correlation between each channel component of each historical coding information; each channel component is weighted according to the second weight to obtain the fused historical coding information .

For example, the set of stored historical coding information (valid information) is S, and S contains I historical coding information S ₁ , S ₂ ... S _i ... S _I , and i is a positive integer smaller than 1. Any S _i has J channel components: S _i1 , S _i2 ... S _iJ . Calculate the correlation between S _i1 , S _i2 …S _iJ , and determine the corresponding weights of S _i1 , S _i2 …S _iJ according to the correlation. Using respective weights of S _i1, S _i2 ... S _iJ weighted processing to obtain S _'i. These _S'i constitute the fused historical coded information set S'.

In this way, it is equivalent to selecting each channel component based on the spatial information of the historical coding information. Such a technical solution increases the channel components important for attitude determination and reduces the unimportant channel components, thereby improving the accuracy of attitude determination.

In step 1330, the second machine learning model is used to determine the posture of the mobile device according to the fused current coding information and historical coding information.

In some embodiments, step 1310 and step 1320 are not executed in an order, and can also be processed in parallel; only step 1310 or step 1320 can also be executed.

FIG. 4 is a schematic diagram showing an embodiment of step 1320 in FIG. 3.

As shown, any history of a reservoir having a plurality of encoded information S _i 4 channel components. According to the correlation coefficient between each channel component, the weight of each channel component is calculated by the gate function. The channel components are weighted to obtain the fused _S'i .

In some embodiments, 130 may be implemented through the steps in FIG. 3.

FIG. 5 is a flowchart showing another embodiment of step 130 in FIG. 1.

As shown in FIG. 5, step 130 includes: step 1321, fusing various historical coding information; and step 1330', determining the posture of the mobile device.

In step 1321, according to the correlation between the historical coding information, the historical coding information is merged to obtain comprehensive historical coding information.

In some embodiments, the third weight of each historical coding information is determined according to the correlation between each historical coding information; according to the third weight, each historical coding information is weighted and summed to obtain comprehensive historical coding information.

For example, the correlation between the historical coding information S ₁ , S ₂ ... S _i ... S _I is calculated, and the corresponding weight of S ₁ , S ₂ ... S _i ... S _I is determined according to the correlation. Perform weighted summation on S ₁ , S ₂ …S _i …S _I to obtain comprehensive historical coding information

In this way, the continuity of each frame image in time is used to fuse historical coding information based on time information. Such a technical solution enhances historical coding information that is important for posture determination, and weakens unimportant historical coding information, thereby improving the accuracy of posture determination.

In some embodiments, according to the embodiment in FIG. 2, the integrated historical coding information

The channel components of S'can be fused; it is also possible to fuse each channel component of the historical coding information according to the embodiment in Figure 2 to obtain S', and then perform the fusion of each historical coding information in S'according to the embodiment in Figure 3 Fusion. In other words, the historical coding information can be firstly integrated in space or time.

In step 1330', a second machine learning model is used to determine the posture of the mobile device according to the integrated historical coding information and current coding information.

FIG. 6 is a schematic diagram showing an embodiment of step 1321 in FIG. 5.

As shown in Fig. 6, the set S of stored historical coding information includes S ₁ , S ₂ ... S _i ... S _I. The correlation coefficient between the _{_{_{S 1, S 2 ... S i}}} ... S I, S ₁ is calculated using the gate function, S ₂ ... S _i ... S _I corresponding to the right weight. After _{_{_{S 1, S 2 ... S i}}} ... S I obtained weighted _{_{S '1, S 2 ... S}} ' i ... S 'I. Of _{_{S '1, S 2 ... S}} ' i ... S 'I summed integrated encoded information history

In some embodiments, step 130 may be implemented by the steps in FIG. 7.

FIG. 7 is a flowchart showing another embodiment of step 130 in FIG. 1.

As shown in Fig. 7, step 130 includes: step 1322, splicing current coding information and historical coding information; and step 1330", determining the posture of the mobile device.

In step 1322, the current coding information and historical coding information are spliced according to the channel dimension direction to generate output coding information. That is to say, the current coding information and historical coding information are used as the feature matrix, and each layer (ie, each channel) of the matrix is used as a part for splicing. For example, it can be spliced by a neural network model with two convolutional layers (for example, the size of the convolution kernel is 3×3, and the convolution step size is 1).

In some embodiments, the historical coding information and the current coding information may be merged in time and space before splicing.

In step 1330", the second machine learning model is used to determine the posture of the mobile device according to the output code information.

The posture determination method provided by the embodiment of the present disclosure was tested on the public unmanned driving data set KITTI, and the average rotation error did not exceed 3 degrees per 100 meters, and the average translation error did not exceed 5%.

Fig. 8 is a block diagram showing an apparatus for determining a posture of a mobile device according to an embodiment of the present disclosure.

As shown in FIG. 8, the device 8 for determining the posture of the mobile device includes one or more processors 81.

The processor 81 is configured to obtain the image difference feature between the current frame and the previous frame in the video stream shot by the mobile device. For example, the image difference feature is obtained through the optical flow network model.

The processor 81 is configured to: use the first machine learning model to obtain current coding information according to the image difference characteristics; and use the second machine learning model to determine the posture of the mobile device according to the current coding information and at least one piece of historical coding information. For example, at least one of the first machine learning model and the second machine learning model is a ConvLSTM model.

In some embodiments, the posture determination device further includes a memory 82. The memory 82 is configured to store the encoding information of the Nth frame as historical encoding information when at least one of the movement distance or the posture change corresponding to the mobile device from the Nth frame to the N-1th frame exceeds a threshold.

In some embodiments, the processor 81 fuses each channel component of the currently encoded information according to the correlation between each channel component of the currently encoded information. The processor 81 fuses the various channel components of the historical coding information according to the correlation between the various channel components of the historical coding information. The processor 81 uses the second machine learning model to determine the posture of the mobile device according to the fused current coding information and historical coding information.

For example, the processor 81 determines the first weight of each channel component according to the correlation between each channel component of the currently encoded information. The processor 81 weights each channel component according to the first weight to obtain the current encoded information after fusion.

For example, the processor 81 determines the second weight of each channel component according to the correlation between each channel component of each piece of historical coding information. The processor 81 weights each channel component according to the second weight to obtain the fused historical coding information.

In some embodiments, the processor 81 fuses various historical coding information according to the correlation between various historical coding information to obtain comprehensive historical coding information. The processor 81 uses the second machine learning model to determine the posture of the mobile device according to the integrated historical coding information.

For example, the processor 81 determines the third weight of each historical encoding information according to the correlation between each historical encoding information. The processor 81 performs a weighted summation on the historical coding information according to the third weight to obtain comprehensive historical coding information.

In some embodiments, the processor 81 splices current encoding information and historical encoding information according to the channel dimension direction to generate output encoding information. The processor 81 uses the second machine learning model to determine the posture of the mobile device according to the output code information.

FIG. 9 is a block diagram showing an apparatus for determining a posture of a mobile device according to another embodiment of the present disclosure.

As shown in Figure 9, the posture determination device can be expressed in the form of a general-purpose computing device. The computer system includes a memory 910, a processor 920, and a bus 900 connecting different system components.

The memory 910 may include, for example, a system memory, a nonvolatile storage medium, and the like. The system memory, for example, stores an operating system, an application program, a boot loader (Boot Loader), and other programs. The system memory may include volatile storage media, such as random access memory (RAM) and/or cache memory. The non-volatile storage medium stores, for example, instructions for executing corresponding embodiments of the display method. Non-volatile storage media include, but are not limited to, magnetic disk storage, optical storage, flash memory, etc.

The processor 920 can be implemented by a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistors and other discrete hardware components. achieve. Correspondingly, each module such as the judgment module and the determination module can be implemented by a central processing unit (CPU) running instructions for executing corresponding steps in the memory, or can be implemented by a dedicated circuit that executes the corresponding steps.

The bus 900 can use any bus structure among a variety of bus structures. For example, the bus structure includes, but is not limited to, an industry standard architecture (ISA) bus, a microchannel architecture (MCA) bus, and a peripheral component interconnect (PCI) bus.

The computer system may also include an input/output interface 930, a network interface 940, a storage interface 950, and so on. These

interfaces

930, 940, 950, and the memory 910 and the processor 920 may be connected through a bus 900. The input and output interface 930 can provide a connection interface for input and output devices such as a display, a mouse, and a keyboard. The network interface 940 provides a connection interface for various networked devices. The storage interface 940 provides a connection interface for external storage devices such as floppy disks, U disks, and SD cards.

As shown in FIG. 10, the visual odometer 10 includes the posture determination device 11 in any of the above embodiments, which is used to determine the posture of the mobile device according to the video stream shot by the mobile device.

In some embodiments, the visual odometer 10 further includes an imaging device, such as an image sensor 12, for acquiring a video stream.

In some embodiments, the imaging device can communicate with the processor in the attitude determination device 11 through wireless, such as Bluetooth, Wi-Fi, etc.; or through wired, such as network cables, cables, wiring, etc., and the attitude determination device The processor communication connection in 11.

Those skilled in the art should understand that the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. .

So far, the posture determination apparatus of the mobile device, the posture method of the mobile device, the visual odometer, and the computer-readable storage medium according to the present disclosure have been described in detail. In order to avoid obscuring the concept of the present disclosure, some details known in the art are not described. Based on the above description, those skilled in the art can fully understand how to implement the technical solutions disclosed herein.

The method and system of the present disclosure may be implemented in many ways. For example, the method and system of the present disclosure can be implemented by software, hardware, firmware or any combination of software, hardware, and firmware. The above-mentioned order of the steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above, unless otherwise specifically stated. In addition, in some embodiments, the present disclosure can also be implemented as programs recorded in a recording medium, and these programs include machine-readable instructions for implementing the method according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Although some specific embodiments of the present disclosure have been described in detail through examples, those skilled in the art should understand that the above examples are only for illustration and not for limiting the scope of the present disclosure. Those skilled in the art should understand that the above embodiments can be modified without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

An apparatus for determining a posture of a mobile device includes one or more processors configured to:

Determining the image difference feature between the current frame and the previous frame in the video stream acquired by the mobile device;

Using the first machine learning model to determine current coding information according to the image difference feature;

According to the current encoding information and at least one historical encoding information, a second machine learning model is used to determine the posture of the mobile device.
The posture determination device according to claim 1, wherein the current frame is the Mth frame, and M is a positive integer greater than 1;

The posture determination device further includes a memory, and the memory is configured to:

When at least one of the movement distance or the posture change of the mobile device corresponding to the Nth frame to the N-1th frame exceeds a threshold, the encoding information of the Nth frame is stored as the historical encoding information, and N is less than Positive integer of M.
The posture determination device according to claim 1, wherein, based on the current coding information and at least one piece of historical coding information, determining the posture of the mobile device using a second machine learning model comprises:

Fusing each channel component of the current coding information according to the correlation between the various channel components of the current coding information to obtain the current coding information after fusion;

Fusing each channel component of the historical coding information according to the correlation between the various channel components of the historical coding information to obtain fused historical coding information;

According to the fused current coding information and the fused historical coding information, a second machine learning model is used to determine the posture of the mobile device.
The posture determination device according to claim 3, wherein fusing each channel component of the currently encoded information comprises:

Determine the first weight of each channel component according to the correlation between each channel component of the current coding information;

According to the first weight, the respective channel components are weighted to obtain the current encoded information after the fusion.
The posture determination device according to claim 3, wherein fusing each channel component of the historical coding information comprises:

Determine the second weight of each channel component according to the correlation between each channel component of each historical coding information;

According to the second weight, the respective channel components are weighted to obtain the fused historical coding information.
The posture determination device according to claim 1, wherein the at least one piece of historical coding information includes multiple pieces of historical coding information, and determining the posture of the mobile device by using a second machine learning model comprises:

According to the correlation between the historical coding information, fusing the historical coding information to obtain comprehensive historical coding information;

According to the integrated historical coding information and the current coding information, the second machine learning model is used to determine the posture of the mobile device.
The posture determination device according to claim 6, wherein the fusion of the historical coding information comprises:

Determining the third weight of each historical coding information according to the correlation between each historical coding information;

According to the third weight, weighted summation is performed on the historical coding information to obtain the comprehensive historical coding information.
The posture determination apparatus according to claim 1, wherein the determining the posture of the mobile device using the second machine learning model comprises:

Splicing the current encoding information and the historical encoding information according to the channel dimension direction to generate output encoding information;

According to the output encoding information, the second machine learning model is used to determine the posture of the mobile device.
The posture determination device according to any one of claims 1-8, wherein:

The image difference feature is acquired through an optical flow network model;

At least one of the first machine learning model and the second machine learning model is a convolutional long short-term memory ConvLSTM model.
A method for determining the posture of a mobile device includes:

Determining the image difference feature between the current frame and the previous frame in the video stream acquired by the mobile device;

Using the first machine learning model to determine current coding information according to the image difference feature;

According to the current encoding information and at least one historical encoding information, a second machine learning model is used to determine the posture of the mobile device.
The posture determination method according to claim 10, wherein the current frame is the Mth frame, and M is a positive integer greater than 1,

The posture determination method further includes:

When at least one of the movement distance or the posture change of the mobile device corresponding to the Nth frame to the N-1th frame exceeds a threshold, the encoding information of the Nth frame is stored as the historical encoding information, and N is less than Positive integer of M.
The posture determination method according to claim 10, wherein determining the posture of the mobile device using a second machine learning model according to the current coding information and at least one historical coding information comprises:

Fusing each channel component of the current coding information according to the correlation between the various channel components of the current coding information to obtain the current coding information after fusion;

Fusing each channel component of the historical coding information according to the correlation between the various channel components of the historical coding information to obtain fused historical coding information;

According to the fused current coding information and the fused historical coding information, the second machine learning model is used to determine the posture of the mobile device.
The posture determination method according to claim 12, wherein fusing each channel component of the currently encoded information comprises:

Determine the first weight of each channel component according to the correlation between each channel component of the current coding information;

According to the first weight, the respective channel components are weighted to obtain the current encoded information after the fusion.
The posture determination method according to claim 12, wherein fusing each channel component of the historical coding information comprises:

Determine the second weight of each channel component according to the correlation between each channel component of each historical coding information;

According to the second weight, the respective channel components are weighted to obtain the historical coding information after the fusion.
The posture determination method according to claim 10, wherein the at least one piece of historical coding information includes multiple pieces of historical coding information,

Using the second machine learning model to determine the posture of the mobile device includes:

According to the correlation between the historical coding information, fusing the historical coding information to obtain comprehensive historical coding information;

According to the integrated historical coding information and the current coding information, a second machine learning model is used to determine the posture of the mobile device.
The posture determination method according to claim 15, wherein the fusion of the historical coding information comprises:

Determining the third weight of each historical coding information according to the correlation between each historical coding information;

According to the third weight, weighted summation is performed on the historical coding information to obtain the comprehensive historical coding information.
The posture determination method according to claim 10, wherein the determining the posture of the mobile device using the second machine learning model comprises:

Splicing the current encoding information and the historical encoding information according to the channel dimension direction to generate output encoding information;

According to the output encoding information, the second machine learning model is used to determine the posture of the mobile device.
The posture determination method according to any one of claims 10-17, wherein:

The image difference feature is acquired through an optical flow network model;

At least one of the first machine learning model and the second machine learning model is a convolutional long short-term memory ConvLSTM model.
A visual odometer, including:

The posture determination device according to any one of claims 1-9, configured to determine the posture of the mobile device according to a video stream shot by the mobile device.
The visual odometer according to claim 19, further comprising:

The image sensor is used to obtain the video stream.
A computer-readable storage medium with a computer program stored thereon, which, when executed by a processor, implements the posture determination method according to any one of claims 10-18.