WO2020186943A1 - Mobile device posture determination apparatus and method, and visual odometer - Google Patents

Mobile device posture determination apparatus and method, and visual odometer Download PDF

Info

Publication number
WO2020186943A1
WO2020186943A1 PCT/CN2020/075049 CN2020075049W WO2020186943A1 WO 2020186943 A1 WO2020186943 A1 WO 2020186943A1 CN 2020075049 W CN2020075049 W CN 2020075049W WO 2020186943 A1 WO2020186943 A1 WO 2020186943A1
Authority
WO
WIPO (PCT)
Prior art keywords
coding information
historical
posture
mobile device
information
Prior art date
Application number
PCT/CN2020/075049
Other languages
French (fr)
Chinese (zh)
Inventor
查红彬
薛飞
方奕庚
姜立
Original Assignee
京东方科技集团股份有限公司
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司, 北京大学 filed Critical 京东方科技集团股份有限公司
Publication of WO2020186943A1 publication Critical patent/WO2020186943A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a posture determination device of a mobile device, a posture method of a mobile device, a visual odometer and a computer-readable storage medium.
  • the visual odometer can determine the position and posture of the robot by analyzing and processing related image sequences, and then record the entire trajectory of the robot.
  • the visual odometer combines the image information of adjacent frames in the video stream, and uses the local map to optimize the camera pose of the corresponding frame based on the geometric characteristics of the image; or based on the IMU (Inertial Measurement Unit) to provide Information to determine the camera pose.
  • IMU Inertial Measurement Unit
  • an apparatus for determining a posture of a mobile device including one or more processors configured to determine the current frame and the previous frame in the video stream acquired by the mobile device.
  • Image difference characteristics between frames according to the image difference characteristics, the first machine learning model is used to obtain current encoding information; according to the current encoding information and at least one historical encoding information, the second machine learning model is used to determine the movement The attitude of the device.
  • the current frame is the Mth frame, and M is a positive integer greater than 1. At least one of the movement distance or the posture change of the mobile device corresponding to the Nth frame to the N-1th frame If the threshold is exceeded, the coding information of the Nth frame is stored as the historical coding information, and N is a positive integer less than M.
  • the various channel components of the current coding information are fused to obtain the current coding information after fusion;
  • the correlation between the various channel components is to fuse each channel component of the historical coding information to obtain the fused historical coding information; according to the fused current coding information and the fused historical coding information, use
  • the second machine learning model determines the posture of the mobile device.
  • the first weight of each channel component is determined according to the correlation between each channel component of the current coding information; the each channel component is weighted according to the first weight to obtain the The current encoding information after the fusion is described.
  • the second weight of each channel component is determined according to the correlation between each channel component of each historical coded information; each channel component is weighted according to the second weight to obtain The fused historical coding information.
  • the historical coding information is fused to obtain integrated historical coding information; according to the integrated historical coding information and the current coding information, the second The machine learning model determines the posture of the mobile device.
  • the third weight of each historical coding information is determined according to the correlation between each historical coding information; according to the third weight, the weighted sum of each historical coding information is performed to obtain the Describes comprehensive historical coding information.
  • the current encoding information and the historical encoding information are spliced according to the channel dimension direction to generate output encoding information; according to the output encoding information, the second machine learning model is used to determine the mobile device Gesture.
  • the image difference feature is acquired through an optical flow network model; at least one of the first machine learning model and the second machine learning model is ConvLSTM (Convolutional Long Short-Term Memory Network, convolution Long short-term memory network) model.
  • ConvLSTM Convolutional Long Short-Term Memory Network, convolution Long short-term memory network
  • a method for determining the posture of a mobile device including: determining the image difference feature between the current frame and the previous frame in the video stream acquired by the mobile device; and according to the image difference Characteristic, using a first machine learning model to determine current encoding information; using a second machine learning model to determine the posture of the mobile device according to the current encoding information and at least one piece of historical encoding information.
  • the current frame is the Mth frame, and M is a positive integer greater than 1. At least one of the movement distance or the posture change of the mobile device corresponding to the Nth frame to the N-1th frame If the threshold is exceeded, the coding information of the Nth frame is stored as the historical coding information, and N is a positive integer less than M.
  • the various channel components of the current coding information are fused to obtain the current coding information after fusion;
  • the correlation between the various channel components is to fuse each channel component of the historical coding information to obtain the fused historical coding information; according to the fused current coding information and the fused historical coding information, use
  • the second machine learning model determines the posture of the mobile device.
  • the first weight of each channel component is determined according to the correlation between each channel component of the current coding information; the each channel component is weighted according to the first weight to obtain the The current encoding information after the fusion is described.
  • the second weight of each channel component is determined according to the correlation between each channel component of each historical coded information; each channel component is weighted according to the second weight to obtain The historical coding information after the fusion.
  • the at least one piece of historical coding information includes multiple pieces of historical coding information, and according to the correlation between each piece of historical coding information, the pieces of historical coding information are fused to obtain comprehensive historical coding information; Synthesize the historical coding information and the current coding information, and use a second machine learning model to determine the posture of the mobile device.
  • the third weight of each historical coding information is determined according to the correlation between each historical coding information; according to the third weight, the weighted sum of each historical coding information is performed to obtain the Describes comprehensive historical coding information.
  • the current encoding information and the historical encoding information are spliced according to the channel dimension direction to generate output encoding information; according to the output encoding information, the second machine learning model is used to determine the mobile device Gesture.
  • the image difference feature is obtained through an optical flow network model; at least one of the first machine learning model and the second machine learning model is a ConvLSTM model.
  • a visual odometer including: the posture determination apparatus as described in any of the foregoing embodiments, configured to determine the posture of the mobile device according to the video stream shot by the mobile device.
  • the visual odometer further includes an image sensor for acquiring the video stream.
  • a computer-readable storage medium having a computer program stored thereon, and when the program is executed by a processor, the posture determination method as described in any of the foregoing embodiments is implemented.
  • Fig. 1 is a flowchart showing a method for determining a posture of a mobile device according to an embodiment of the present disclosure
  • Fig. 2a is a schematic diagram showing a method for determining a posture of a mobile device according to an embodiment of the present disclosure
  • FIG. 2b is a schematic diagram showing ConvLSTM used in a method for determining a posture of a mobile device according to an embodiment of the present disclosure
  • FIG. 3 is a flowchart showing an embodiment of step 130 in FIG. 1;
  • FIG. 4 is a schematic diagram showing an embodiment of step 1320 in FIG. 3;
  • FIG. 5 is a flowchart showing another embodiment of step 130 in FIG. 1;
  • FIG. 6 is a schematic diagram showing an embodiment of step 1321 in FIG. 5;
  • FIG. 7 is a flowchart showing another embodiment of step 130 in FIG. 1;
  • Fig. 8 is a block diagram showing an apparatus for determining a posture of a mobile device according to an embodiment of the present disclosure
  • FIG. 9 is a block diagram showing an apparatus for determining a posture of a mobile device according to another embodiment of the present disclosure.
  • FIG. 10 is a block diagram showing a visual odometer according to an embodiment of the present disclosure.
  • Fig. 1 is a flowchart showing a method for determining a posture of a mobile device according to an embodiment of the present disclosure.
  • the method includes: step 110, determining the image difference feature; step 120, determining the current encoding information; and step 130, determining the posture of the mobile device.
  • step 110 the image difference feature between the current frame and the previous frame in the video stream acquired by the mobile device is determined.
  • the mobile device may be a movable platform such as a robot, an unmanned vehicle, a drone, etc., and images are taken by a camera based on an image sensor such as a CCD or CMOS.
  • a movable platform such as a robot, an unmanned vehicle, a drone, etc.
  • images are taken by a camera based on an image sensor such as a CCD or CMOS.
  • the image difference feature can be obtained through a convolutional neural network (CNN).
  • CNN convolutional neural network
  • optical flow network Learning Optical Flow with Convolutional Networks
  • Flownet Learning Optical Flow with Convolutional Networks
  • optical flow network (FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks) model can be used to obtain image difference features.
  • two adjacent frames of images can be superimposed and input into the optical flow network model, and the feature extraction part of the optical flow network model is used to extract the image difference features.
  • the image difference feature is a high-dimensional feature, and the number of channels (such as 1024) of the high-dimensional feature can be determined according to the resolution of the current frame image.
  • the optical flow network model can perform multiple convolution processing on the overlapped image, and extract the offset of each pixel of two adjacent frames of image as the image difference feature according to the convolution processing result.
  • high-dimensional redundant image information can be converted into high-level, abstract semantic features, which solves the problem that related technologies based on geometric features are susceptible to environmental factors (such as occlusion, lighting changes, dynamic objects, etc.), thereby improving The accuracy of attitude determination is improved.
  • the first machine learning model is used to determine the current encoding information according to the image difference characteristics.
  • the first machine learning model may be an RNN (Recurrent Neural Network) model, such as a ConvLSTM model.
  • historical coding information (that is, coding information corresponding to key frames) that has an important influence on pose determination can be filtered from the historical output of the RNN model as effective information.
  • the effective information can be fused with the current coded information to jointly determine the current posture of the mobile device. For example, in the case that at least one of the movement distance or posture change corresponding to the mobile device from the Nth frame to the N-1th frame exceeds the threshold, the Nth frame is determined to be a key frame; the code of the Nth frame extracted by the RNN model is stored The information is used as historical coding information.
  • a second machine learning model is used to determine the posture of the mobile device according to the current encoding information and at least one historical encoding information.
  • the second machine learning model may be an RNN model, such as a ConvLSTM model. Using the RNN model to decode the encoded information, the posture of the mobile device can be determined.
  • This current posture determined based on current coding information and historical coding information is a posture determined by global optimization (that is, absolute posture) in the global range from the first frame to the current frame of the video stream.
  • the absolute posture is more accurate.
  • the ConvLSTM model does not need to rely on the information provided by the IMU, and only relies on visual information to determine the attitude determination state, thereby reducing the cost of attitude determination.
  • Fig. 2a is a schematic diagram showing a method for determining a posture of a mobile device according to an embodiment of the present disclosure.
  • the extracted current coding information from 1 to T is x 1 to x T.
  • the historical coding information stored at each time is S 2 to S T.
  • the current coding information and historical coding information at each time are used as the input of the first machine learning model (such as ConvLSTM) to obtain output coding information O 1 to O T at each time.
  • X t , h t , and o t represent input characteristics, state variables and output respectively.
  • step 130 may be implemented by the steps in Figure 2a.
  • ConvLSTM Framework Connection LSTM, fully connected long short-term memory
  • the machine learning model (such as neural network, etc.) have the required functions, before using the machine learning model, it also includes the use of multiple samples, such as sample images, sample data, etc. for machine learning The steps of model training.
  • the trained machine learning model can be used in the above methods.
  • the required machine learning model can be trained and obtained in a supervised manner (samples and labels corresponding to the samples).
  • FIG. 3 is a flowchart showing an embodiment of step 130 in FIG. 1.
  • step 130 includes: step 1310, fusing each channel component of the current coded information; step 1320, fusing each channel component of the historical coded information; and step 1330, determining the posture of the mobile device.
  • step 1310 according to the correlation between the channel components of the current encoded information, the channel components of the current encoded information are fused.
  • the first weight of each channel component is determined according to the correlation between each channel component of the current encoded information; each channel component is weighted according to the first weight to obtain the current encoded information after fusion.
  • the current encoding information is the output O t of the first machine learning model at the current moment.
  • O t has J channel components: O t1 , O t2 ... O tJ .
  • step 1320 according to the correlation between the various channel components of the historical coding information, the various channel components of the historical coding information are fused.
  • the second weight of each channel component is determined according to the correlation between each channel component of each historical coding information; each channel component is weighted according to the second weight to obtain the fused historical coding information .
  • the set of stored historical coding information is S
  • S contains I historical coding information S 1 , S 2 ... S i ... S I
  • i is a positive integer smaller than 1.
  • Any S i has J channel components: S i1 , S i2 ... S iJ .
  • S i1, S i2 ... S iJ weighted processing to obtain S 'i.
  • These S'i constitute the fused historical coded information set S'.
  • step 1330 the second machine learning model is used to determine the posture of the mobile device according to the fused current coding information and historical coding information.
  • step 1310 and step 1320 are not executed in an order, and can also be processed in parallel; only step 1310 or step 1320 can also be executed.
  • FIG. 4 is a schematic diagram showing an embodiment of step 1320 in FIG. 3.
  • any history of a reservoir having a plurality of encoded information S i 4 channel components According to the correlation coefficient between each channel component, the weight of each channel component is calculated by the gate function. The channel components are weighted to obtain the fused S'i .
  • 130 may be implemented through the steps in FIG. 3.
  • FIG. 5 is a flowchart showing another embodiment of step 130 in FIG. 1.
  • step 130 includes: step 1321, fusing various historical coding information; and step 1330', determining the posture of the mobile device.
  • step 1321 according to the correlation between the historical coding information, the historical coding information is merged to obtain comprehensive historical coding information.
  • the third weight of each historical coding information is determined according to the correlation between each historical coding information; according to the third weight, each historical coding information is weighted and summed to obtain comprehensive historical coding information.
  • the correlation between the historical coding information S 1 , S 2 ... S i ... S I is calculated, and the corresponding weight of S 1 , S 2 ... S i ... S I is determined according to the correlation. Perform weighted summation on S 1 , S 2 ...S i ...S I to obtain comprehensive historical coding information
  • the integrated historical coding information can be fused; it is also possible to fuse each channel component of the historical coding information according to the embodiment in Figure 2 to obtain S', and then perform the fusion of each historical coding information in S'according to the embodiment in Figure 3 Fusion.
  • the historical coding information can be firstly integrated in space or time.
  • a second machine learning model is used to determine the posture of the mobile device according to the integrated historical coding information and current coding information.
  • FIG. 6 is a schematic diagram showing an embodiment of step 1321 in FIG. 5.
  • the set S of stored historical coding information includes S 1 , S 2 ... S i ... S I.
  • the correlation coefficient between the S 1, S 2 ... S i ... S I, S 1 is calculated using the gate function, S 2 ... S i ... S I corresponding to the right weight.
  • S 1, S 2 ... S i ... S I obtained weighted S '1, S 2 ... S ' i ... S 'I.
  • S '1, S 2 ... S ' i ... S 'I summed integrated encoded information history
  • step 130 may be implemented by the steps in FIG. 7.
  • FIG. 7 is a flowchart showing another embodiment of step 130 in FIG. 1.
  • step 130 includes: step 1322, splicing current coding information and historical coding information; and step 1330", determining the posture of the mobile device.
  • the current coding information and historical coding information are spliced according to the channel dimension direction to generate output coding information. That is to say, the current coding information and historical coding information are used as the feature matrix, and each layer (ie, each channel) of the matrix is used as a part for splicing. For example, it can be spliced by a neural network model with two convolutional layers (for example, the size of the convolution kernel is 3 ⁇ 3, and the convolution step size is 1).
  • the historical coding information and the current coding information may be merged in time and space before splicing.
  • step 1330 the second machine learning model is used to determine the posture of the mobile device according to the output code information.
  • the posture determination method provided by the embodiment of the present disclosure was tested on the public unmanned driving data set KITTI, and the average rotation error did not exceed 3 degrees per 100 meters, and the average translation error did not exceed 5%.
  • Fig. 8 is a block diagram showing an apparatus for determining a posture of a mobile device according to an embodiment of the present disclosure.
  • the device 8 for determining the posture of the mobile device includes one or more processors 81.
  • the processor 81 is configured to obtain the image difference feature between the current frame and the previous frame in the video stream shot by the mobile device.
  • the image difference feature is obtained through the optical flow network model.
  • the processor 81 is configured to: use the first machine learning model to obtain current coding information according to the image difference characteristics; and use the second machine learning model to determine the posture of the mobile device according to the current coding information and at least one piece of historical coding information.
  • at least one of the first machine learning model and the second machine learning model is a ConvLSTM model.
  • the posture determination device further includes a memory 82.
  • the memory 82 is configured to store the encoding information of the Nth frame as historical encoding information when at least one of the movement distance or the posture change corresponding to the mobile device from the Nth frame to the N-1th frame exceeds a threshold.
  • the processor 81 fuses each channel component of the currently encoded information according to the correlation between each channel component of the currently encoded information.
  • the processor 81 fuses the various channel components of the historical coding information according to the correlation between the various channel components of the historical coding information.
  • the processor 81 uses the second machine learning model to determine the posture of the mobile device according to the fused current coding information and historical coding information.
  • the processor 81 determines the first weight of each channel component according to the correlation between each channel component of the currently encoded information.
  • the processor 81 weights each channel component according to the first weight to obtain the current encoded information after fusion.
  • the processor 81 determines the second weight of each channel component according to the correlation between each channel component of each piece of historical coding information.
  • the processor 81 weights each channel component according to the second weight to obtain the fused historical coding information.
  • the processor 81 fuses various historical coding information according to the correlation between various historical coding information to obtain comprehensive historical coding information.
  • the processor 81 uses the second machine learning model to determine the posture of the mobile device according to the integrated historical coding information.
  • the processor 81 determines the third weight of each historical encoding information according to the correlation between each historical encoding information.
  • the processor 81 performs a weighted summation on the historical coding information according to the third weight to obtain comprehensive historical coding information.
  • the processor 81 splices current encoding information and historical encoding information according to the channel dimension direction to generate output encoding information.
  • the processor 81 uses the second machine learning model to determine the posture of the mobile device according to the output code information.
  • FIG. 9 is a block diagram showing an apparatus for determining a posture of a mobile device according to another embodiment of the present disclosure.
  • the posture determination device can be expressed in the form of a general-purpose computing device.
  • the computer system includes a memory 910, a processor 920, and a bus 900 connecting different system components.
  • the memory 910 may include, for example, a system memory, a nonvolatile storage medium, and the like.
  • the system memory for example, stores an operating system, an application program, a boot loader (Boot Loader), and other programs.
  • the system memory may include volatile storage media, such as random access memory (RAM) and/or cache memory.
  • RAM random access memory
  • the non-volatile storage medium stores, for example, instructions for executing corresponding embodiments of the display method.
  • Non-volatile storage media include, but are not limited to, magnetic disk storage, optical storage, flash memory, etc.
  • the processor 920 can be implemented by a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistors and other discrete hardware components.
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • each module such as the judgment module and the determination module can be implemented by a central processing unit (CPU) running instructions for executing corresponding steps in the memory, or can be implemented by a dedicated circuit that executes the corresponding steps.
  • CPU central processing unit
  • the bus 900 can use any bus structure among a variety of bus structures.
  • the bus structure includes, but is not limited to, an industry standard architecture (ISA) bus, a microchannel architecture (MCA) bus, and a peripheral component interconnect (PCI) bus.
  • ISA industry standard architecture
  • MCA microchannel architecture
  • PCI peripheral component interconnect
  • the computer system may also include an input/output interface 930, a network interface 940, a storage interface 950, and so on. These interfaces 930, 940, 950, and the memory 910 and the processor 920 may be connected through a bus 900.
  • the input and output interface 930 can provide a connection interface for input and output devices such as a display, a mouse, and a keyboard.
  • the network interface 940 provides a connection interface for various networked devices.
  • the storage interface 940 provides a connection interface for external storage devices such as floppy disks, U disks, and SD cards.
  • FIG. 10 is a block diagram showing a visual odometer according to an embodiment of the present disclosure.
  • the visual odometer 10 includes the posture determination device 11 in any of the above embodiments, which is used to determine the posture of the mobile device according to the video stream shot by the mobile device.
  • the visual odometer 10 further includes an imaging device, such as an image sensor 12, for acquiring a video stream.
  • an imaging device such as an image sensor 12, for acquiring a video stream.
  • the imaging device can communicate with the processor in the attitude determination device 11 through wireless, such as Bluetooth, Wi-Fi, etc.; or through wired, such as network cables, cables, wiring, etc., and the attitude determination device The processor communication connection in 11.
  • the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. .
  • the method and system of the present disclosure may be implemented in many ways.
  • the method and system of the present disclosure can be implemented by software, hardware, firmware or any combination of software, hardware, and firmware.
  • the above-mentioned order of the steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above, unless otherwise specifically stated.
  • the present disclosure can also be implemented as programs recorded in a recording medium, and these programs include machine-readable instructions for implementing the method according to the present disclosure.
  • the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

A mobile device posture determination apparatus and method and a visual odometer. The apparatus comprises one or more processors (81920) configured to: determine an image-difference feature with respect to a current frame and a preceding frame in a video stream acquired by a mobile device; determine, according to the image-difference feature, current encoding information by means of a first machine learning model; and determine, according to the current encoding information and at least one piece of historical encoding information, a posture of the mobile device by means of a second machine learning model.

Description

移动设备的姿态确定装置、方法和视觉里程计Device and method for determining posture of mobile equipment and visual odometer
相关申请的交叉引用Cross references to related applications
本申请是以CN申请号为201910199169.7,申请日为2019年3月15日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。This application is based on the application with the CN application number 201910199169.7 and the filing date of March 15, 2019, and claims its priority. The disclosure of the CN application is hereby incorporated into this application as a whole.
技术领域Technical field
本公开涉及计算机技术领域,特别涉及一种移动设备的姿态确定装置、移动设备的姿态方法、视觉里程计和计算机可读存储介质。The present disclosure relates to the field of computer technology, and in particular to a posture determination device of a mobile device, a posture method of a mobile device, a visual odometer and a computer-readable storage medium.
背景技术Background technique
视觉里程计能够通过分析处理相关图像序列,确定机器人的位置和姿态,进而记录机器人行驶的整个轨迹。The visual odometer can determine the position and posture of the robot by analyzing and processing related image sequences, and then record the entire trajectory of the robot.
在相关技术中,视觉里程计将视频流中相邻帧的图像信息联合起来,基于图像的几何特征利用局部地图优化确定对应帧的相机姿态;或者基于IMU(Inertial measurement unit,惯性测量单元)提供的信息,确定相机姿态。In related technologies, the visual odometer combines the image information of adjacent frames in the video stream, and uses the local map to optimize the camera pose of the corresponding frame based on the geometric characteristics of the image; or based on the IMU (Inertial Measurement Unit) to provide Information to determine the camera pose.
发明内容Summary of the invention
根据本公开的一些实施例,提供了一种移动设备的姿态确定装置,包括一个或多个处理器,所述处理器被配置为:确定所述移动设备获取的视频流中当前帧与上一帧之间的图像差别特征;根据所述图像差别特征,利用第一机器学习模型,获取当前编码信息;根据所述当前编码信息和至少一个历史编码信息,利用第二机器学习模型确定所述移动设备的姿态。According to some embodiments of the present disclosure, there is provided an apparatus for determining a posture of a mobile device, including one or more processors configured to determine the current frame and the previous frame in the video stream acquired by the mobile device. Image difference characteristics between frames; according to the image difference characteristics, the first machine learning model is used to obtain current encoding information; according to the current encoding information and at least one historical encoding information, the second machine learning model is used to determine the movement The attitude of the device.
在一些实施例中,所述当前帧为第M帧,M为大于1的正整数;在从第N帧到第N-1帧对应的所述移动设备的运动距离或者姿态变化中的至少一个超过阈值的情况下,存储第N帧的编码信息作为所述历史编码信息,N为小于M的正整数。In some embodiments, the current frame is the Mth frame, and M is a positive integer greater than 1. At least one of the movement distance or the posture change of the mobile device corresponding to the Nth frame to the N-1th frame If the threshold is exceeded, the coding information of the Nth frame is stored as the historical coding information, and N is a positive integer less than M.
在一些实施例中,根据所述当前编码信息的各通道分量之间的相关性,对所述当前编码信息的各通道分量进行融合,得到融合后的当前编码信息;根据所述历史编码信息的各通道分量之间的相关性,对所述历史编码信息的各通道分量进行融合,得到融合后的历史编码信息;根据所述融合后的当前编码信息和所述融合后的历史编码信 息,利用第二机器学习模型确定所述移动设备的姿态。In some embodiments, according to the correlation between the various channel components of the current coding information, the various channel components of the current coding information are fused to obtain the current coding information after fusion; The correlation between the various channel components is to fuse each channel component of the historical coding information to obtain the fused historical coding information; according to the fused current coding information and the fused historical coding information, use The second machine learning model determines the posture of the mobile device.
在一些实施例中,根据所述当前编码信息各通道分量之间的相关性,确定所述各通道分量的第一权重;根据所述第一权重,对所述各通道分量进行加权,得到所述融合后的当前编码信息。In some embodiments, the first weight of each channel component is determined according to the correlation between each channel component of the current coding information; the each channel component is weighted according to the first weight to obtain the The current encoding information after the fusion is described.
在一些实施例中,根据每个历史编码信息的各通道分量之间的相关性,确定所述各通道分量的第二权重;根据所述第二权重,对所述各通道分量进行加权,得到所述融合后的历史编码信息。In some embodiments, the second weight of each channel component is determined according to the correlation between each channel component of each historical coded information; each channel component is weighted according to the second weight to obtain The fused historical coding information.
在一些实施例中,根据各历史编码信息之间的相关性,对所述各历史编码信息进行融合,得到综合历史编码信息;根据所述综合历史编码信息和所述当前编码信息,利用第二机器学习模型确定所述移动设备的姿态。In some embodiments, according to the correlation between the historical coding information, the historical coding information is fused to obtain integrated historical coding information; according to the integrated historical coding information and the current coding information, the second The machine learning model determines the posture of the mobile device.
在一些实施例中,根据各历史编码信息之间的相关性,确定所述各历史编码信息的第三权重;根据所述第三权重,对所述各历史编码信息进行加权求和,得到所述综合历史编码信息。In some embodiments, the third weight of each historical coding information is determined according to the correlation between each historical coding information; according to the third weight, the weighted sum of each historical coding information is performed to obtain the Describes comprehensive historical coding information.
在一些实施例中,将所述当前编码信息和所述历史编码信息,按照通道维度方向拼接,生成输出编码信息;根据所述输出编码信息,利用所述第二机器学习模型确定所述移动设备的姿态。In some embodiments, the current encoding information and the historical encoding information are spliced according to the channel dimension direction to generate output encoding information; according to the output encoding information, the second machine learning model is used to determine the mobile device Gesture.
在一些实施例中,所述图像差别特征通过光流网络模型获取;所述第一机器学习模型和所述第二机器学习模型中的至少一个为ConvLSTM(Convolutional Long Short-Term Memory Network,卷积长短期记忆网络)模型。In some embodiments, the image difference feature is acquired through an optical flow network model; at least one of the first machine learning model and the second machine learning model is ConvLSTM (Convolutional Long Short-Term Memory Network, convolution Long short-term memory network) model.
根据本公开的另一些实施例,提供了一种移动设备的姿态确定方法,包括:确定所述移动设备获取的视频流中当前帧与上一帧之间的图像差别特征;根据所述图像差别特征,利用第一机器学习模型,确定当前编码信息;根据所述当前编码信息和至少一个历史编码信息,利用第二机器学习模型确定所述移动设备的姿态。According to other embodiments of the present disclosure, a method for determining the posture of a mobile device is provided, including: determining the image difference feature between the current frame and the previous frame in the video stream acquired by the mobile device; and according to the image difference Characteristic, using a first machine learning model to determine current encoding information; using a second machine learning model to determine the posture of the mobile device according to the current encoding information and at least one piece of historical encoding information.
在一些实施例中,所述当前帧为第M帧,M为大于1的正整数;在从第N帧到第N-1帧对应的所述移动设备的运动距离或者姿态变化中的至少一个超过阈值的情况下,存储第N帧的编码信息作为所述历史编码信息,N为小于M的正整数。In some embodiments, the current frame is the Mth frame, and M is a positive integer greater than 1. At least one of the movement distance or the posture change of the mobile device corresponding to the Nth frame to the N-1th frame If the threshold is exceeded, the coding information of the Nth frame is stored as the historical coding information, and N is a positive integer less than M.
在一些实施例中,根据所述当前编码信息的各通道分量之间的相关性,对所述当前编码信息的各通道分量进行融合,得到融合后的当前编码信息;根据所述历史编码信息的各通道分量之间的相关性,对所述历史编码信息的各通道分量进行融合,得到融合后的历史编码信息;根据所述融合后的当前编码信息和所述融合后的历史编码信 息,利用第二机器学习模型确定所述移动设备的姿态。In some embodiments, according to the correlation between the various channel components of the current coding information, the various channel components of the current coding information are fused to obtain the current coding information after fusion; The correlation between the various channel components is to fuse each channel component of the historical coding information to obtain the fused historical coding information; according to the fused current coding information and the fused historical coding information, use The second machine learning model determines the posture of the mobile device.
在一些实施例中,根据所述当前编码信息各通道分量之间的相关性,确定所述各通道分量的第一权重;根据所述第一权重,对所述各通道分量进行加权,得到所述融合后的当前编码信息。In some embodiments, the first weight of each channel component is determined according to the correlation between each channel component of the current coding information; the each channel component is weighted according to the first weight to obtain the The current encoding information after the fusion is described.
在一些实施例中,根据每个历史编码信息的各通道分量之间的相关性,确定所述各通道分量的第二权重;根据所述第二权重,对所述各通道分量进行加权,得到所述融合后的所述历史编码信息。In some embodiments, the second weight of each channel component is determined according to the correlation between each channel component of each historical coded information; each channel component is weighted according to the second weight to obtain The historical coding information after the fusion.
在一些实施例中,所述至少一个历史编码信息包括多个历史编码信息,根据各历史编码信息之间的相关性,对所述各历史编码信息进行融合,得到综合历史编码信息;根据所述综合历史编码信息和所述当前编码信息,利用第二机器学习模型确定所述移动设备的姿态。In some embodiments, the at least one piece of historical coding information includes multiple pieces of historical coding information, and according to the correlation between each piece of historical coding information, the pieces of historical coding information are fused to obtain comprehensive historical coding information; Synthesize the historical coding information and the current coding information, and use a second machine learning model to determine the posture of the mobile device.
在一些实施例中,根据各历史编码信息之间的相关性,确定所述各历史编码信息的第三权重;根据所述第三权重,对所述各历史编码信息进行加权求和,得到所述综合历史编码信息。In some embodiments, the third weight of each historical coding information is determined according to the correlation between each historical coding information; according to the third weight, the weighted sum of each historical coding information is performed to obtain the Describes comprehensive historical coding information.
在一些实施例中,将所述当前编码信息和所述历史编码信息,按照通道维度方向拼接,生成输出编码信息;根据所述输出编码信息,利用所述第二机器学习模型确定所述移动设备的姿态。In some embodiments, the current encoding information and the historical encoding information are spliced according to the channel dimension direction to generate output encoding information; according to the output encoding information, the second machine learning model is used to determine the mobile device Gesture.
在一些实施例中,所述图像差别特征通过光流网络模型获取;所述第一机器学习模型和所述第二机器学习模型中的至少一个为ConvLSTM模型。In some embodiments, the image difference feature is obtained through an optical flow network model; at least one of the first machine learning model and the second machine learning model is a ConvLSTM model.
根据本公开的又一些实施例,提供了一种视觉里程计,包括:如前述任一实施例所述的姿态确定装置,用于根据移动设备拍摄的视频流确定所述移动设备的姿态。According to still other embodiments of the present disclosure, there is provided a visual odometer, including: the posture determination apparatus as described in any of the foregoing embodiments, configured to determine the posture of the mobile device according to the video stream shot by the mobile device.
在一些实施例中,所述的视觉里程计还包括图像传感器,用于获取所述视频流。In some embodiments, the visual odometer further includes an image sensor for acquiring the video stream.
根据本公开的再一些实施例,提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如前述任一实施例所述的姿态确定方法。According to still other embodiments of the present disclosure, there is provided a computer-readable storage medium having a computer program stored thereon, and when the program is executed by a processor, the posture determination method as described in any of the foregoing embodiments is implemented.
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。Through the following detailed description of exemplary embodiments of the present disclosure with reference to the accompanying drawings, other features and advantages of the present disclosure will become clear.
附图说明Description of the drawings
此处所说明的附图用来提供对本公开的进一步理解,构成本申请的一部分,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图 中:The drawings described here are used to provide a further understanding of the present disclosure and constitute a part of the present application. The exemplary embodiments of the present disclosure and their descriptions are used to explain the present disclosure, and do not constitute an improper limitation of the present disclosure. In the attached drawing:
图1是示出根据本公开一个实施例的移动设备的姿态确定方法的流程图;Fig. 1 is a flowchart showing a method for determining a posture of a mobile device according to an embodiment of the present disclosure;
图2a是示出根据本公开一个实施例的移动设备的姿态确定方法的示意图;Fig. 2a is a schematic diagram showing a method for determining a posture of a mobile device according to an embodiment of the present disclosure;
图2b是示出根据本公开一个实施例的移动设备的姿态确定方法所用的ConvLSTM的示意图;FIG. 2b is a schematic diagram showing ConvLSTM used in a method for determining a posture of a mobile device according to an embodiment of the present disclosure;
图3是示出图1中步骤130的一个实施例的流程图;FIG. 3 is a flowchart showing an embodiment of step 130 in FIG. 1;
图4是示出图3中步骤1320的一个实施例的示意图;FIG. 4 is a schematic diagram showing an embodiment of step 1320 in FIG. 3;
图5是示出图1中步骤130的另一个实施例的流程图;FIG. 5 is a flowchart showing another embodiment of step 130 in FIG. 1;
图6是示出图5中步骤1321的一个实施例的示意图;FIG. 6 is a schematic diagram showing an embodiment of step 1321 in FIG. 5;
图7是示出图1中步骤130的又一个实施例的流程图;FIG. 7 is a flowchart showing another embodiment of step 130 in FIG. 1;
图8是示出根据本公开一个实施例的移动设备的姿态确定装置的框图;Fig. 8 is a block diagram showing an apparatus for determining a posture of a mobile device according to an embodiment of the present disclosure;
图9是示出用于根据本公开另一个实施例的移动设备的姿态确定装置的框图;FIG. 9 is a block diagram showing an apparatus for determining a posture of a mobile device according to another embodiment of the present disclosure;
图10是示出根据本公开一个实施例的视觉里程计的框图。FIG. 10 is a block diagram showing a visual odometer according to an embodiment of the present disclosure.
应当明白,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。此外,相同或类似的参考标号表示相同或类似的构件。It should be understood that the sizes of the various parts shown in the drawings are not drawn in accordance with the actual proportional relationship. In addition, the same or similar reference numerals indicate the same or similar components.
具体实施方式detailed description
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The following description of at least one exemplary embodiment is actually only illustrative, and in no way serves as any limitation to the present disclosure and its application or use. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present disclosure.
除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。Unless specifically stated otherwise, the relative arrangement of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure. At the same time, it should be understood that, for ease of description, the sizes of the various parts shown in the drawings are not drawn in accordance with actual proportional relationships. The technologies, methods, and equipment known to those of ordinary skill in the relevant fields may not be discussed in detail, but where appropriate, the technologies, methods, and equipment should be regarded as part of the authorization specification. In all examples shown and discussed here, any specific value should be interpreted as merely exemplary, rather than as a limitation. Therefore, other examples of the exemplary embodiment may have different values. It should be noted that similar reference numerals and letters indicate similar items in the following drawings, so once a certain item is defined in one drawing, it does not need to be further discussed in subsequent drawings.
图1是示出根据本公开一个实施例的移动设备的姿态确定方法的流程图。Fig. 1 is a flowchart showing a method for determining a posture of a mobile device according to an embodiment of the present disclosure.
如图1所示,该方法包括:步骤110,确定图像差别特征;步骤120,确定当前编码信息;和步骤130,确定移动设备的姿态。As shown in Fig. 1, the method includes: step 110, determining the image difference feature; step 120, determining the current encoding information; and step 130, determining the posture of the mobile device.
在步骤110中,确定移动设备获取的视频流中当前帧与上一帧之间的图像差别特征。In step 110, the image difference feature between the current frame and the previous frame in the video stream acquired by the mobile device is determined.
例如,移动设备可以是机器人、无人驾驶车、无人机等可以移动的平台,通过基于CCD或CMOS等图像传感器的相机拍摄图像。For example, the mobile device may be a movable platform such as a robot, an unmanned vehicle, a drone, etc., and images are taken by a camera based on an image sensor such as a CCD or CMOS.
例如,可以通过卷积神经网络(CNN)获取图像差别特征。For example, the image difference feature can be obtained through a convolutional neural network (CNN).
例如,可以通过光流网络(Flownet:Learning Optical Flow with Convolutional Networks)模型获取图像差别特征。For example, the optical flow network (Flownet: Learning Optical Flow with Convolutional Networks) model can be used to obtain image difference features.
例如,可以通过光流网络(FlowNet 2.0:Evolution of Optical Flow Estimation with Deep Networks)模型获取图像差别特征。For example, the optical flow network (FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks) model can be used to obtain image difference features.
在一些实施例中,可以将相邻两帧图像重叠起来输入光流网络模型,利用光流网络模型的特征提取部分提取图像差别特征。图像差别特征为高维特征,高维特征的通道数(如1024个)可以根据当前帧图像的分辨率确定。例如,光流网络模型可以对重叠后的图像进行多次卷积处理,并根据卷积处理结果提取相邻两帧图像每个像素的偏移量作为图像差别特征。In some embodiments, two adjacent frames of images can be superimposed and input into the optical flow network model, and the feature extraction part of the optical flow network model is used to extract the image difference features. The image difference feature is a high-dimensional feature, and the number of channels (such as 1024) of the high-dimensional feature can be determined according to the resolution of the current frame image. For example, the optical flow network model can perform multiple convolution processing on the overlapped image, and extract the offset of each pixel of two adjacent frames of image as the image difference feature according to the convolution processing result.
这样,可以将高维冗余的图像信息转换为高层的、抽象的语义特征,解决了基于几何特征的相关技术易受环境因素(如遮挡、光照变化、动态物体等)影响的问题,从而提高了姿态确定的准确性。In this way, high-dimensional redundant image information can be converted into high-level, abstract semantic features, which solves the problem that related technologies based on geometric features are susceptible to environmental factors (such as occlusion, lighting changes, dynamic objects, etc.), thereby improving The accuracy of attitude determination is improved.
在步骤120中,根据图像差别特征,利用第一机器学习模型,确定当前编码信息。例如,第一机器学习模型可以为RNN(Recurrent Neural Network,循环神经网络)模型,如ConvLSTM模型。In step 120, the first machine learning model is used to determine the current encoding information according to the image difference characteristics. For example, the first machine learning model may be an RNN (Recurrent Neural Network) model, such as a ConvLSTM model.
在一些实施例中,可以从RNN模型的历史输出中筛选出对姿态确定具有重要影响的历史编码信息(即关键帧相应的编码信息)作为有效信息。可以将有效信息与当前编码信息融合,共同确定移动设备的当前姿态。例如,在移动设备从第N帧到第N-1帧对应的运动距离或者姿态变化中的至少一个超过阈值的情况下,确定第N帧为关键帧;存储RNN模型提取的第N帧的编码信息作为历史编码信息。In some embodiments, historical coding information (that is, coding information corresponding to key frames) that has an important influence on pose determination can be filtered from the historical output of the RNN model as effective information. The effective information can be fused with the current coded information to jointly determine the current posture of the mobile device. For example, in the case that at least one of the movement distance or posture change corresponding to the mobile device from the Nth frame to the N-1th frame exceeds the threshold, the Nth frame is determined to be a key frame; the code of the Nth frame extracted by the RNN model is stored The information is used as historical coding information.
在步骤130中,根据当前编码信息和至少一个历史编码信息,利用第二机器学习模型,确定移动设备的姿态。例如,第二机器学习模型可以为RNN模型,如ConvLSTM 模型。利用RNN模型对编码信息进行解码,可以确定移动设备的姿态。In step 130, a second machine learning model is used to determine the posture of the mobile device according to the current encoding information and at least one historical encoding information. For example, the second machine learning model may be an RNN model, such as a ConvLSTM model. Using the RNN model to decode the encoded information, the posture of the mobile device can be determined.
这种基于当前编码信息和历史编码信息确定的当前姿态,是在视频流的第一帧到当前帧的全局范围内,进行全局优化确定的姿态(即绝对姿态)。相比于相关技术仅在当前帧和前一帧的局部范围内确定的局部优化姿态(即相对姿态),绝对姿态更加准确。This current posture determined based on current coding information and historical coding information is a posture determined by global optimization (that is, absolute posture) in the global range from the first frame to the current frame of the video stream. Compared with the locally optimized posture (that is, the relative posture) determined only in the local range of the current frame and the previous frame, the absolute posture is more accurate.
另外,ConvLSTM模型不必依赖于IMU提供的信息,仅依赖视觉信息即可确姿态确定态,从而降低了姿态确定成本。In addition, the ConvLSTM model does not need to rely on the information provided by the IMU, and only relies on visual information to determine the attitude determination state, thereby reducing the cost of attitude determination.
图2a是示出根据本公开一个实施例的移动设备的姿态确定方法的示意图。Fig. 2a is a schematic diagram showing a method for determining a posture of a mobile device according to an embodiment of the present disclosure.
如图2a所示,提取的1到T时刻的当前编码信息为x 1到x T。各时刻存储的历史编码信息为S 2到S T。将各时刻的当前编码信息和历史编码信息作为第一机器学习模型(如ConvLSTM)的输入,得到各时刻的输出编码信息O 1到O T。将O 1到O T输入第二机器学习模型(如ConvLSTM),得到各时刻的移动设备的姿态P到P TAs shown in Figure 2a, the extracted current coding information from 1 to T is x 1 to x T. The historical coding information stored at each time is S 2 to S T. The current coding information and historical coding information at each time are used as the input of the first machine learning model (such as ConvLSTM) to obtain output coding information O 1 to O T at each time. Input O 1 to O T into the second machine learning model (such as ConvLSTM) to obtain the postures P to P T of the mobile device at each moment.
如图2b所示,显示了ConvLSTM的一个原理性实现。X t,h t,o t分别表示输入特征、状态变量和输出。 As shown in Figure 2b, a principle implementation of ConvLSTM is shown. X t , h t , and o t represent input characteristics, state variables and output respectively.
在一些实施例中,步骤130可以通过图2a中的步骤实现。In some embodiments, step 130 may be implemented by the steps in Figure 2a.
尽管本公开的实施例列举了ConvLSTM作为机器学习模型的一种实现,其它的机器学习模型也可以适用于本公开,例如FC-LSTM(Fully Connection LSTM,全连接长短期记忆)模型等。Although the embodiment of the present disclosure enumerates ConvLSTM as an implementation of the machine learning model, other machine learning models may also be applicable to the present disclosure, such as FC-LSTM (Fully Connection LSTM, fully connected long short-term memory) model.
如本领域技术人员所理解的,为了使得机器学习模型(例如神经网络等)具有所需的功能,在使用机器学习模型前,还包括利用多个样本,如样本图像、样本数据等对机器学习模型进行训练的步骤。训练好的机器学习模型可以用于上述方法。例如,可以通过有监督的方式(样本和与样本对应的标注)训练并获得所需机器学习模型。As understood by those skilled in the art, in order to make the machine learning model (such as neural network, etc.) have the required functions, before using the machine learning model, it also includes the use of multiple samples, such as sample images, sample data, etc. for machine learning The steps of model training. The trained machine learning model can be used in the above methods. For example, the required machine learning model can be trained and obtained in a supervised manner (samples and labels corresponding to the samples).
图3是示出图1中步骤130的一个实施例的流程图。FIG. 3 is a flowchart showing an embodiment of step 130 in FIG. 1.
如图3所示,步骤130包括:步骤1310,融合当前编码信息的各通道分量;步骤1320,融合历史编码信息的各通道分量;和步骤1330,确定移动设备的姿态。As shown in FIG. 3, step 130 includes: step 1310, fusing each channel component of the current coded information; step 1320, fusing each channel component of the historical coded information; and step 1330, determining the posture of the mobile device.
在步骤1310中,根据当前编码信息的各通道分量之间的相关性,对当前编码信息的各通道分量进行融合。In step 1310, according to the correlation between the channel components of the current encoded information, the channel components of the current encoded information are fused.
在一些实施例中,根据当前编码信息各通道分量之间的相关性,确定各通道分量的第一权重;根据第一权重,对各通道分量进行加权,得到融合后的当前编码信息。In some embodiments, the first weight of each channel component is determined according to the correlation between each channel component of the current encoded information; each channel component is weighted according to the first weight to obtain the current encoded information after fusion.
例如,当前编码信息即为第一机器学习模型在当前时刻的输出O t。O t具有J个通 道分量:O t1、O t2…O tJ。计算O t1、O t2…O tJ之间的相关性,并根据相关性确定O t1、O t2…O tJ的相应权重。利用相应权重对O t1、O t2…O tJ进行加权处理得到O' tFor example, the current encoding information is the output O t of the first machine learning model at the current moment. O t has J channel components: O t1 , O t2 … O tJ . Calculate the correlation between O t1 , O t2 …O tJ , and determine the corresponding weights of O t1 , O t2 …O tJ according to the correlation. With corresponding weights O t1, O t2 ... O tJ weighting process to obtain O 't.
这样,相当于根据当前编码信息的空间信息,对各通道分量进行选择。这样的技术方案增大了对姿态确定重要的通道分量,减小了不重要的通道分量,从而提高了姿态确定准确性。In this way, it is equivalent to selecting each channel component according to the spatial information of the current coding information. Such a technical solution increases the channel components important for attitude determination and reduces the unimportant channel components, thereby improving the accuracy of attitude determination.
在步骤1320中,根据历史编码信息的各通道分量之间的相关性,对历史编码信息的各通道分量进行融合。In step 1320, according to the correlation between the various channel components of the historical coding information, the various channel components of the historical coding information are fused.
在一些实施例中,根据每个历史编码信息的各通道分量之间的相关性,确定各通道分量的第二权重;根据第二权重,对各通道分量进行加权,得到融合后的历史编码信息。In some embodiments, the second weight of each channel component is determined according to the correlation between each channel component of each historical coding information; each channel component is weighted according to the second weight to obtain the fused historical coding information .
例如,存储的历史编码信息(有效信息)的集合为S,S中包含I个历史编码信息S 1、S 2…S i…S I,i为小于I的正整数。任一个S i具有J个通道分量:S i1、S i2…S iJ。计算S i1、S i2…S iJ之间的相关性,并根据相关性确定S i1、S i2…S iJ的相应权重。利用相应权重对S i1、S i2…S iJ进行加权处理得到S' i。这些S' i组成了融合后的历史编码信息集合S'。 For example, the set of stored historical coding information (valid information) is S, and S contains I historical coding information S 1 , S 2 ... S i ... S I , and i is a positive integer smaller than 1. Any S i has J channel components: S i1 , S i2 ... S iJ . Calculate the correlation between S i1 , S i2 …S iJ , and determine the corresponding weights of S i1 , S i2 …S iJ according to the correlation. Using respective weights of S i1, S i2 ... S iJ weighted processing to obtain S 'i. These S'i constitute the fused historical coded information set S'.
这样,相当于根据历史编码信息的空间信息,对各通道分量进行选择。这样的技术方案增大了对姿态确定重要的通道分量,减小了不重要的通道分量,从而提高了姿态确定准确性。In this way, it is equivalent to selecting each channel component based on the spatial information of the historical coding information. Such a technical solution increases the channel components important for attitude determination and reduces the unimportant channel components, thereby improving the accuracy of attitude determination.
在步骤1330中,根据融合后的当前编码信息和历史编码信息,利用第二机器学习模型确定移动设备的姿态。In step 1330, the second machine learning model is used to determine the posture of the mobile device according to the fused current coding information and historical coding information.
在一些实施例中,步骤1310和步骤1320没有执行顺序,也可以并行处理;还可以仅执行步骤1310或步骤1320。In some embodiments, step 1310 and step 1320 are not executed in an order, and can also be processed in parallel; only step 1310 or step 1320 can also be executed.
图4是示出图3中步骤1320的一个实施例的示意图。FIG. 4 is a schematic diagram showing an embodiment of step 1320 in FIG. 3.
如图4所示,任一个储的历史编码信息S i具有多个通道分量。根据各通道分量之间的相关系数,利用门函数计算各通道分量的权重。对通道分量进行加权处理得到融合后的S' iAs shown, any history of a reservoir having a plurality of encoded information S i 4 channel components. According to the correlation coefficient between each channel component, the weight of each channel component is calculated by the gate function. The channel components are weighted to obtain the fused S'i .
在一些实施例中,130可以通过图3中的步骤实现。In some embodiments, 130 may be implemented through the steps in FIG. 3.
图5是示出图1中步骤130的另一个实施例的流程图。FIG. 5 is a flowchart showing another embodiment of step 130 in FIG. 1.
如图5所示,步骤130包括:步骤1321,融合各历史编码信息;和步骤1330',确定移动设备的姿态。As shown in FIG. 5, step 130 includes: step 1321, fusing various historical coding information; and step 1330', determining the posture of the mobile device.
在步骤1321中,根据各历史编码信息之间的相关性,对各历史编码信息进行融合,得到综合历史编码信息。In step 1321, according to the correlation between the historical coding information, the historical coding information is merged to obtain comprehensive historical coding information.
在一些实施例中,根据各历史编码信息之间的相关性,确定各历史编码信息的第三权重;根据第三权重,对各历史编码信息进行加权求和,得到综合历史编码信息。In some embodiments, the third weight of each historical coding information is determined according to the correlation between each historical coding information; according to the third weight, each historical coding information is weighted and summed to obtain comprehensive historical coding information.
例如,计算历史编码信息S 1、S 2…S i…S I之间的相关性,并根据相关性确定S 1、S 2…S i…S I的相应权重。对S 1、S 2…S i…S I进行加权求和得到综合历史编码信息
Figure PCTCN2020075049-appb-000001
For example, the correlation between the historical coding information S 1 , S 2 ... S i ... S I is calculated, and the corresponding weight of S 1 , S 2 ... S i ... S I is determined according to the correlation. Perform weighted summation on S 1 , S 2 …S i …S I to obtain comprehensive historical coding information
Figure PCTCN2020075049-appb-000001
这样,利用各帧图像在时间上的连续性,基于时间信息对历史编码信息进行融合。这样的技术方案增强了对姿态确定重要的历史编码信息,减弱了不重要的历史编码信息,从而提高了姿态确定准确性。In this way, the continuity of each frame image in time is used to fuse historical coding information based on time information. Such a technical solution enhances historical coding information that is important for posture determination, and weakens unimportant historical coding information, thereby improving the accuracy of posture determination.
在一些实施例中,可以根据图2中的实施例,继续对综合历史编码信息
Figure PCTCN2020075049-appb-000002
的各通道分量进行融合;也可以先根据图2中的实施例对各历史编码信息的各通道分量进行融合得到S',然后根据图3中的实施例对S'中的各历史编码信息进行融合。也就是说,可以对历史编码信息先进行空间上的融合,也可以先进行时间上的融合。
In some embodiments, according to the embodiment in FIG. 2, the integrated historical coding information
Figure PCTCN2020075049-appb-000002
The channel components of S'can be fused; it is also possible to fuse each channel component of the historical coding information according to the embodiment in Figure 2 to obtain S', and then perform the fusion of each historical coding information in S'according to the embodiment in Figure 3 Fusion. In other words, the historical coding information can be firstly integrated in space or time.
在步骤1330'中,根据综合历史编码信息和当前编码信息,利用第二机器学习模型确定移动设备的姿态。In step 1330', a second machine learning model is used to determine the posture of the mobile device according to the integrated historical coding information and current coding information.
图6是示出图5中步骤1321的一个实施例的示意图。FIG. 6 is a schematic diagram showing an embodiment of step 1321 in FIG. 5.
如图6所示,存储的历史编码信息的集合S包括S 1、S 2…S i…S I。根据S 1、S 2…S i…S I之间的相关系数,利用门函数计算S 1、S 2…S i…S I的相应权重。对S 1、S 2…S i…S I进行加权后得到S' 1、S 2…S' i…S' I。对S' 1、S 2…S' i…S' I求和得到综合历史编码信息
Figure PCTCN2020075049-appb-000003
As shown in Fig. 6, the set S of stored historical coding information includes S 1 , S 2 ... S i ... S I. The correlation coefficient between the S 1, S 2 ... S i ... S I, S 1 is calculated using the gate function, S 2 ... S i ... S I corresponding to the right weight. After S 1, S 2 ... S i ... S I obtained weighted S '1, S 2 ... S ' i ... S 'I. Of S '1, S 2 ... S ' i ... S 'I summed integrated encoded information history
Figure PCTCN2020075049-appb-000003
在一些实施例中,步骤130可以通过图7中的步骤实现。In some embodiments, step 130 may be implemented by the steps in FIG. 7.
图7是示出图1中步骤130的又一个实施例的流程图。FIG. 7 is a flowchart showing another embodiment of step 130 in FIG. 1.
如图7所示,步骤130包括:步骤1322,拼接当前编码信息和历史编码信息;和步骤1330”,确定移动设备的姿态。As shown in Fig. 7, step 130 includes: step 1322, splicing current coding information and historical coding information; and step 1330", determining the posture of the mobile device.
在步骤1322中,将当前编码信息和历史编码信息,按照通道维度方向拼接,生成输出编码信息。也就是说,以当前编码信息和历史编码信息为特征矩阵,以矩阵的每一层(即每一通道)为一个部分进行拼接。例如,可以通过具有两层卷积层(如卷积核大小为3×3,卷积步长为1)的神经网络模型进行拼接。In step 1322, the current coding information and historical coding information are spliced according to the channel dimension direction to generate output coding information. That is to say, the current coding information and historical coding information are used as the feature matrix, and each layer (ie, each channel) of the matrix is used as a part for splicing. For example, it can be spliced by a neural network model with two convolutional layers (for example, the size of the convolution kernel is 3×3, and the convolution step size is 1).
在一些实施例中,可以对历史编码信息、当前编码信息进行时间上和空间上的融合后再拼接。In some embodiments, the historical coding information and the current coding information may be merged in time and space before splicing.
在步骤1330”中,根据输出编码信息,利用第二机器学习模型确定移动设备的姿 态。In step 1330", the second machine learning model is used to determine the posture of the mobile device according to the output code information.
本公开实施例提供的姿态确定方法,在公开无人驾驶数据集KITTI上进行了测试,平均旋转误差不超过每100米3度,平均平移误差不超过5%。The posture determination method provided by the embodiment of the present disclosure was tested on the public unmanned driving data set KITTI, and the average rotation error did not exceed 3 degrees per 100 meters, and the average translation error did not exceed 5%.
图8是示出根据本公开一个实施例的移动设备的姿态确定装置的框图。Fig. 8 is a block diagram showing an apparatus for determining a posture of a mobile device according to an embodiment of the present disclosure.
如图8所示,移动设备的姿态确定装置8包括一个或多个处理器81。As shown in FIG. 8, the device 8 for determining the posture of the mobile device includes one or more processors 81.
处理器81被配置为获取移动设备拍摄的视频流中当前帧与上一帧之间的图像差别特征。例如,图像差别特征通过光流网络模型获取。The processor 81 is configured to obtain the image difference feature between the current frame and the previous frame in the video stream shot by the mobile device. For example, the image difference feature is obtained through the optical flow network model.
处理器81被配置为:根据图像差别特征,利用第一机器学习模型,获取当前编码信息;根据当前编码信息和至少一个历史编码信息,利用第二机器学习模型确定移动设备的姿态。例如,第一机器学习模型和第二机器学习模型中的至少一个为ConvLSTM模型。The processor 81 is configured to: use the first machine learning model to obtain current coding information according to the image difference characteristics; and use the second machine learning model to determine the posture of the mobile device according to the current coding information and at least one piece of historical coding information. For example, at least one of the first machine learning model and the second machine learning model is a ConvLSTM model.
在一些实施例中,姿态确定装置还包括存储器82。存储器82被配置为:在移动设备从第N帧到第N-1帧对应的运动距离或者姿态变化中的至少一个超过阈值的情况下,存储第N帧的编码信息作为历史编码信息。In some embodiments, the posture determination device further includes a memory 82. The memory 82 is configured to store the encoding information of the Nth frame as historical encoding information when at least one of the movement distance or the posture change corresponding to the mobile device from the Nth frame to the N-1th frame exceeds a threshold.
在一些实施例中,处理器81根据当前编码信息的各通道分量之间的相关性,对当前编码信息的各通道分量进行融合。处理器81根据历史编码信息的各通道分量之间的相关性,对历史编码信息的各通道分量进行融合。处理器81根据融合后的当前编码信息和历史编码信息,利用第二机器学习模型确定移动设备的姿态。In some embodiments, the processor 81 fuses each channel component of the currently encoded information according to the correlation between each channel component of the currently encoded information. The processor 81 fuses the various channel components of the historical coding information according to the correlation between the various channel components of the historical coding information. The processor 81 uses the second machine learning model to determine the posture of the mobile device according to the fused current coding information and historical coding information.
例如,处理器81根据当前编码信息各通道分量之间的相关性,确定各通道分量的第一权重。处理器81根据第一权重,对各通道分量进行加权,得到融合后的当前编码信息。For example, the processor 81 determines the first weight of each channel component according to the correlation between each channel component of the currently encoded information. The processor 81 weights each channel component according to the first weight to obtain the current encoded information after fusion.
例如,处理器81根据每个历史编码信息的各通道分量之间的相关性,确定所述各通道分量的第二权重。处理器81根据第二权重,对各通道分量进行加权,得到融合后的历史编码信息。For example, the processor 81 determines the second weight of each channel component according to the correlation between each channel component of each piece of historical coding information. The processor 81 weights each channel component according to the second weight to obtain the fused historical coding information.
在一些实施例中,处理器81根据各历史编码信息之间的相关性,对各历史编码信息进行融合,得到综合历史编码信息。处理器81根据综合历史编码信息,利用第二机器学习模型确定移动设备的姿态。In some embodiments, the processor 81 fuses various historical coding information according to the correlation between various historical coding information to obtain comprehensive historical coding information. The processor 81 uses the second machine learning model to determine the posture of the mobile device according to the integrated historical coding information.
例如,处理器81根据各历史编码信息之间的相关性,确定各历史编码信息的第三权重。处理器81根据第三权重,对各历史编码信息进行加权求和,得到综合历史编码信息。For example, the processor 81 determines the third weight of each historical encoding information according to the correlation between each historical encoding information. The processor 81 performs a weighted summation on the historical coding information according to the third weight to obtain comprehensive historical coding information.
在一些实施例中,处理器81将当前编码信息和历史编码信息,按照通道维度方向拼接,生成输出编码信息。处理器81根据输出编码信息,利用第二机器学习模型确定移动设备的姿态。In some embodiments, the processor 81 splices current encoding information and historical encoding information according to the channel dimension direction to generate output encoding information. The processor 81 uses the second machine learning model to determine the posture of the mobile device according to the output code information.
图9是示出用于根据本公开另一个实施例的移动设备的姿态确定装置的框图。FIG. 9 is a block diagram showing an apparatus for determining a posture of a mobile device according to another embodiment of the present disclosure.
如图9所示,姿态确定装置可以通用计算设备的形式表现。计算机系统包括存储器910、处理器920和连接不同系统组件的总线900。As shown in Figure 9, the posture determination device can be expressed in the form of a general-purpose computing device. The computer system includes a memory 910, a processor 920, and a bus 900 connecting different system components.
存储器910例如可以包括系统存储器、非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)以及其他程序等。系统存储器可以包括易失性存储介质,例如随机存取存储器(RAM)和/或高速缓存存储器。非易失性存储介质例如存储有执行显示方法的对应实施例的指令。非易失性存储介质包括但不限于磁盘存储器、光学存储器、闪存等。The memory 910 may include, for example, a system memory, a nonvolatile storage medium, and the like. The system memory, for example, stores an operating system, an application program, a boot loader (Boot Loader), and other programs. The system memory may include volatile storage media, such as random access memory (RAM) and/or cache memory. The non-volatile storage medium stores, for example, instructions for executing corresponding embodiments of the display method. Non-volatile storage media include, but are not limited to, magnetic disk storage, optical storage, flash memory, etc.
处理器920可以用通用处理器、数字信号处理器(DSP)、应用专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其它可编程逻辑设备、分立门或晶体管等分立硬件组件方式来实现。相应地,诸如判断模块和确定模块的每个模块,可以通过中央处理器(CPU)运行存储器中执行相应步骤的指令来实现,也可以通过执行相应步骤的专用电路来实现。The processor 920 can be implemented by a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistors and other discrete hardware components. achieve. Correspondingly, each module such as the judgment module and the determination module can be implemented by a central processing unit (CPU) running instructions for executing corresponding steps in the memory, or can be implemented by a dedicated circuit that executes the corresponding steps.
总线900可以使用多种总线结构中的任意总线结构。例如,总线结构包括但不限于工业标准体系结构(ISA)总线、微通道体系结构(MCA)总线、外围组件互连(PCI)总线。The bus 900 can use any bus structure among a variety of bus structures. For example, the bus structure includes, but is not limited to, an industry standard architecture (ISA) bus, a microchannel architecture (MCA) bus, and a peripheral component interconnect (PCI) bus.
计算机系统还可以包括输入输出接口930、网络接口940、存储接口950等。这些接口930、940、950以及存储器910和处理器920之间可以通过总线900连接。输入输出接口930可以为显示器、鼠标、键盘等输入输出设备提供连接接口。网络接口940为各种联网设备提供连接接口。存储接口940为软盘、U盘、SD卡等外部存储设备提供连接接口。The computer system may also include an input/output interface 930, a network interface 940, a storage interface 950, and so on. These interfaces 930, 940, 950, and the memory 910 and the processor 920 may be connected through a bus 900. The input and output interface 930 can provide a connection interface for input and output devices such as a display, a mouse, and a keyboard. The network interface 940 provides a connection interface for various networked devices. The storage interface 940 provides a connection interface for external storage devices such as floppy disks, U disks, and SD cards.
图10是示出根据本公开一个实施例的视觉里程计的框图。FIG. 10 is a block diagram showing a visual odometer according to an embodiment of the present disclosure.
如图10所示,视觉里程计10包括上述任一个实施例中的姿态确定装置11,用于根据移动设备拍摄的视频流确定所述移动设备的姿态。As shown in FIG. 10, the visual odometer 10 includes the posture determination device 11 in any of the above embodiments, which is used to determine the posture of the mobile device according to the video stream shot by the mobile device.
在一些实施例中,视觉里程计10还包括成像器件,例如图像传感器12,用于获取视频流。In some embodiments, the visual odometer 10 further includes an imaging device, such as an image sensor 12, for acquiring a video stream.
在一些实施例中,成像器件可以通过无线,例如蓝牙、Wi-Fi等方式与姿态确定 装置11中的处理器通讯连接;也可以通过有线,例如网线、线缆、走线等与姿态确定装置11中的处理器通讯连接。In some embodiments, the imaging device can communicate with the processor in the attitude determination device 11 through wireless, such as Bluetooth, Wi-Fi, etc.; or through wired, such as network cables, cables, wiring, etc., and the attitude determination device The processor communication connection in 11.
本领域内的技术人员应当明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. .
至此,已经详细描述了根据本公开的移动设备的姿态确定装置、移动设备的姿态方法、视觉里程计和计算机可读存储介质。为了避免遮蔽本公开的构思,没有描述本领域所公知的一些细节。本领域技术人员根据上面的描述,完全可以明白如何实施这里公开的技术方案。So far, the posture determination apparatus of the mobile device, the posture method of the mobile device, the visual odometer, and the computer-readable storage medium according to the present disclosure have been described in detail. In order to avoid obscuring the concept of the present disclosure, some details known in the art are not described. Based on the above description, those skilled in the art can fully understand how to implement the technical solutions disclosed herein.
可能以许多方式来实现本公开的方法和系统。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和系统。用于所述方法的步骤的上述顺序仅是为了进行说明,本公开的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。The method and system of the present disclosure may be implemented in many ways. For example, the method and system of the present disclosure can be implemented by software, hardware, firmware or any combination of software, hardware, and firmware. The above-mentioned order of the steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above, unless otherwise specifically stated. In addition, in some embodiments, the present disclosure can also be implemented as programs recorded in a recording medium, and these programs include machine-readable instructions for implementing the method according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
虽然已经通过示例对本公开的一些特定实施例进行了详细说明,但是本领域的技术人员应该理解,以上示例仅是为了进行说明,而不是为了限制本公开的范围。本领域的技术人员应该理解,可在不脱离本公开的范围和精神的情况下,对以上实施例进行修改。本公开的范围由所附权利要求来限定。Although some specific embodiments of the present disclosure have been described in detail through examples, those skilled in the art should understand that the above examples are only for illustration and not for limiting the scope of the present disclosure. Those skilled in the art should understand that the above embodiments can be modified without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims (21)

  1. 一种移动设备的姿态确定装置,包括一个或多个处理器,所述处理器被配置为:An apparatus for determining a posture of a mobile device includes one or more processors configured to:
    确定所述移动设备获取的视频流中当前帧与上一帧之间的图像差别特征;Determining the image difference feature between the current frame and the previous frame in the video stream acquired by the mobile device;
    根据所述图像差别特征,利用第一机器学习模型,确定当前编码信息;Using the first machine learning model to determine current coding information according to the image difference feature;
    根据所述当前编码信息和至少一个历史编码信息,利用第二机器学习模型确定所述移动设备的姿态。According to the current encoding information and at least one historical encoding information, a second machine learning model is used to determine the posture of the mobile device.
  2. 根据权利要求1所述的姿态确定装置,其中,所述当前帧为第M帧,M为大于1的正整数;The posture determination device according to claim 1, wherein the current frame is the Mth frame, and M is a positive integer greater than 1;
    所述姿态确定装置还包括存储器,所述存储器被配置为:The posture determination device further includes a memory, and the memory is configured to:
    在从第N帧到第N-1帧对应的所述移动设备的运动距离或者姿态变化中的至少一个超过阈值的情况下,存储第N帧的编码信息作为所述历史编码信息,N为小于M的正整数。When at least one of the movement distance or the posture change of the mobile device corresponding to the Nth frame to the N-1th frame exceeds a threshold, the encoding information of the Nth frame is stored as the historical encoding information, and N is less than Positive integer of M.
  3. 根据权利要求1所述的姿态确定装置,其中,根据所述当前编码信息和至少一个历史编码信息,利用第二机器学习模型确定所述移动设备的姿态包括:The posture determination device according to claim 1, wherein, based on the current coding information and at least one piece of historical coding information, determining the posture of the mobile device using a second machine learning model comprises:
    根据所述当前编码信息的各通道分量之间的相关性,对所述当前编码信息的各通道分量进行融合,得到融合后的当前编码信息;Fusing each channel component of the current coding information according to the correlation between the various channel components of the current coding information to obtain the current coding information after fusion;
    根据所述历史编码信息的各通道分量之间的相关性,对所述历史编码信息的各通道分量进行融合,得到融合后的历史编码信息;Fusing each channel component of the historical coding information according to the correlation between the various channel components of the historical coding information to obtain fused historical coding information;
    根据所述融合后的当前编码信息和所述融合后的历史编码信息,利用第二机器学习模型确定所述移动设备的姿态。According to the fused current coding information and the fused historical coding information, a second machine learning model is used to determine the posture of the mobile device.
  4. 根据权利要求3所述的姿态确定装置,其中,对所述当前编码信息的各通道分量进行融合包括:The posture determination device according to claim 3, wherein fusing each channel component of the currently encoded information comprises:
    根据所述当前编码信息各通道分量之间的相关性,确定所述各通道分量的第一权重;Determine the first weight of each channel component according to the correlation between each channel component of the current coding information;
    根据所述第一权重,对所述各通道分量进行加权,得到所述融合后的当前编码信息。According to the first weight, the respective channel components are weighted to obtain the current encoded information after the fusion.
  5. 根据权利要求3所述的姿态确定装置,其中,对所述历史编码信息的各通道分量进行融合包括:The posture determination device according to claim 3, wherein fusing each channel component of the historical coding information comprises:
    根据每个历史编码信息的各通道分量之间的相关性,确定所述各通道分量的第二权重;Determine the second weight of each channel component according to the correlation between each channel component of each historical coding information;
    根据所述第二权重,对所述各通道分量进行加权,得到所述融合后的历史编码信息。According to the second weight, the respective channel components are weighted to obtain the fused historical coding information.
  6. 根据权利要求1所述的姿态确定装置,其中,所述至少一个历史编码信息包括多个历史编码信息,利用第二机器学习模型确定所述移动设备的姿态包括:The posture determination device according to claim 1, wherein the at least one piece of historical coding information includes multiple pieces of historical coding information, and determining the posture of the mobile device by using a second machine learning model comprises:
    根据各历史编码信息之间的相关性,对所述各历史编码信息进行融合,得到综合历史编码信息;According to the correlation between the historical coding information, fusing the historical coding information to obtain comprehensive historical coding information;
    根据所述综合历史编码信息和所述当前编码信息,利用所述第二机器学习模型确定所述移动设备的姿态。According to the integrated historical coding information and the current coding information, the second machine learning model is used to determine the posture of the mobile device.
  7. 根据权利要求6所述的姿态确定装置,其中,对所述各历史编码信息进行融合包括:The posture determination device according to claim 6, wherein the fusion of the historical coding information comprises:
    根据各历史编码信息之间的相关性,确定所述各历史编码信息的第三权重;Determining the third weight of each historical coding information according to the correlation between each historical coding information;
    根据所述第三权重,对所述各历史编码信息进行加权求和,得到所述综合历史编码信息。According to the third weight, weighted summation is performed on the historical coding information to obtain the comprehensive historical coding information.
  8. 根据权利要求1所述的姿态确定装置,其中,所述利用第二机器学习模型确定所述移动设备的姿态包括:The posture determination apparatus according to claim 1, wherein the determining the posture of the mobile device using the second machine learning model comprises:
    将所述当前编码信息和所述历史编码信息,按照通道维度方向拼接,生成输出编码信息;Splicing the current encoding information and the historical encoding information according to the channel dimension direction to generate output encoding information;
    根据所述输出编码信息,利用所述第二机器学习模型确定所述移动设备的姿态。According to the output encoding information, the second machine learning model is used to determine the posture of the mobile device.
  9. 根据权利要求1-8任一项所述的姿态确定装置,其中,The posture determination device according to any one of claims 1-8, wherein:
    所述图像差别特征通过光流网络模型获取;The image difference feature is acquired through an optical flow network model;
    所述第一机器学习模型和所述第二机器学习模型中的至少一个为卷积长短期记忆ConvLSTM模型。At least one of the first machine learning model and the second machine learning model is a convolutional long short-term memory ConvLSTM model.
  10. 一种移动设备的姿态确定方法,包括:A method for determining the posture of a mobile device includes:
    确定所述移动设备获取的视频流中当前帧与上一帧之间的图像差别特征;Determining the image difference feature between the current frame and the previous frame in the video stream acquired by the mobile device;
    根据所述图像差别特征,利用第一机器学习模型,确定当前编码信息;Using the first machine learning model to determine current coding information according to the image difference feature;
    根据所述当前编码信息和至少一个历史编码信息,利用第二机器学习模型,确定所述移动设备的姿态。According to the current encoding information and at least one historical encoding information, a second machine learning model is used to determine the posture of the mobile device.
  11. 根据权利要求10所述的姿态确定方法,其中,所述当前帧为第M帧,M为大于1的正整数,The posture determination method according to claim 10, wherein the current frame is the Mth frame, and M is a positive integer greater than 1,
    所述的姿态确定方法还包括:The posture determination method further includes:
    在从第N帧到第N-1帧对应的所述移动设备的运动距离或者姿态变化中的至少一个超过阈值的情况下,存储第N帧的编码信息作为所述历史编码信息,N为小于M的正整数。When at least one of the movement distance or the posture change of the mobile device corresponding to the Nth frame to the N-1th frame exceeds a threshold, the encoding information of the Nth frame is stored as the historical encoding information, and N is less than Positive integer of M.
  12. 根据权利要求10所述的姿态确定方法,其中,根据所述当前编码信息和至少一个历史编码信息,利用第二机器学习模型确定所述移动设备的姿态包括:The posture determination method according to claim 10, wherein determining the posture of the mobile device using a second machine learning model according to the current coding information and at least one historical coding information comprises:
    根据所述当前编码信息的各通道分量之间的相关性,对所述当前编码信息的各通道分量进行融合,得到融合后的当前编码信息;Fusing each channel component of the current coding information according to the correlation between the various channel components of the current coding information to obtain the current coding information after fusion;
    根据所述历史编码信息的各通道分量之间的相关性,对所述历史编码信息的各通道分量进行融合,得到融合后的历史编码信息;Fusing each channel component of the historical coding information according to the correlation between the various channel components of the historical coding information to obtain fused historical coding information;
    根据所述融合后的当前编码信息和所述融合后的历史编码信息,利用所述第二机器学习模型确定所述移动设备的姿态。According to the fused current coding information and the fused historical coding information, the second machine learning model is used to determine the posture of the mobile device.
  13. 根据权利要求12所述的姿态确定方法,其中,对所述当前编码信息的各通道分量进行融合包括:The posture determination method according to claim 12, wherein fusing each channel component of the currently encoded information comprises:
    根据所述当前编码信息各通道分量之间的相关性,确定所述各通道分量的第一权重;Determine the first weight of each channel component according to the correlation between each channel component of the current coding information;
    根据所述第一权重,对所述各通道分量进行加权,得到所述融合后的当前编码信息。According to the first weight, the respective channel components are weighted to obtain the current encoded information after the fusion.
  14. 根据权利要求12所述的姿态确定方法,其中,对所述历史编码信息的各通道分量进行融合包括:The posture determination method according to claim 12, wherein fusing each channel component of the historical coding information comprises:
    根据每个历史编码信息的各通道分量之间的相关性,确定所述各通道分量的第二权重;Determine the second weight of each channel component according to the correlation between each channel component of each historical coding information;
    根据所述第二权重,对所述各通道分量进行加权,得到所述融合后的所述历史编码信息。According to the second weight, the respective channel components are weighted to obtain the historical coding information after the fusion.
  15. 根据权利要求10所述的姿态确定方法,其中,所述至少一个历史编码信息包括多个历史编码信息,The posture determination method according to claim 10, wherein the at least one piece of historical coding information includes multiple pieces of historical coding information,
    利用第二机器学习模型确定所述移动设备的姿态包括:Using the second machine learning model to determine the posture of the mobile device includes:
    根据各历史编码信息之间的相关性,对所述各历史编码信息进行融合,得到综合历史编码信息;According to the correlation between the historical coding information, fusing the historical coding information to obtain comprehensive historical coding information;
    根据所述综合历史编码信息和所述当前编码信息,利用第二机器学习模型确定所述移动设备的姿态。According to the integrated historical coding information and the current coding information, a second machine learning model is used to determine the posture of the mobile device.
  16. 根据权利要求15所述的姿态确定方法,其中,对所述各历史编码信息进行融合包括:The posture determination method according to claim 15, wherein the fusion of the historical coding information comprises:
    根据各历史编码信息之间的相关性,确定所述各历史编码信息的第三权重;Determining the third weight of each historical coding information according to the correlation between each historical coding information;
    根据所述第三权重,对所述各历史编码信息进行加权求和,得到所述综合历史编码信息。According to the third weight, weighted summation is performed on the historical coding information to obtain the comprehensive historical coding information.
  17. 根据权利要求10所述的姿态确定方法,其中,所述利用第二机器学习模型确定所述移动设备的姿态包括:The posture determination method according to claim 10, wherein the determining the posture of the mobile device using the second machine learning model comprises:
    将所述当前编码信息和所述历史编码信息,按照通道维度方向拼接,生成输出编码信息;Splicing the current encoding information and the historical encoding information according to the channel dimension direction to generate output encoding information;
    根据所述输出编码信息,利用所述第二机器学习模型确定所述移动设备的姿态。According to the output encoding information, the second machine learning model is used to determine the posture of the mobile device.
  18. 根据权利要求10-17任一项所述的姿态确定方法,其中,The posture determination method according to any one of claims 10-17, wherein:
    所述图像差别特征通过光流网络模型获取;The image difference feature is acquired through an optical flow network model;
    所述第一机器学习模型和所述第二机器学习模型中的至少一个为卷积长短期记忆ConvLSTM模型。At least one of the first machine learning model and the second machine learning model is a convolutional long short-term memory ConvLSTM model.
  19. 一种视觉里程计,包括:A visual odometer, including:
    权利要求1-9任一项所述的姿态确定装置,用于根据移动设备拍摄的视频流确定所述移动设备的姿态。The posture determination device according to any one of claims 1-9, configured to determine the posture of the mobile device according to a video stream shot by the mobile device.
  20. 根据权利要求19所述的视觉里程计,还包括:The visual odometer according to claim 19, further comprising:
    图像传感器,用于获取所述视频流。The image sensor is used to obtain the video stream.
  21. 一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如权利要求10-18中任一项所述的姿态确定方法。A computer-readable storage medium with a computer program stored thereon, which, when executed by a processor, implements the posture determination method according to any one of claims 10-18.
PCT/CN2020/075049 2019-03-15 2020-02-13 Mobile device posture determination apparatus and method, and visual odometer WO2020186943A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910199169.7 2019-03-15
CN201910199169.7A CN109798888B (en) 2019-03-15 2019-03-15 Posture determination device and method for mobile equipment and visual odometer

Publications (1)

Publication Number Publication Date
WO2020186943A1 true WO2020186943A1 (en) 2020-09-24

Family

ID=66563026

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/075049 WO2020186943A1 (en) 2019-03-15 2020-02-13 Mobile device posture determination apparatus and method, and visual odometer

Country Status (2)

Country Link
CN (1) CN109798888B (en)
WO (1) WO2020186943A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112344922A (en) * 2020-10-26 2021-02-09 中国科学院自动化研究所 Monocular vision odometer positioning method and system

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109798888B (en) * 2019-03-15 2021-09-17 京东方科技集团股份有限公司 Posture determination device and method for mobile equipment and visual odometer
CN110595466B (en) * 2019-09-18 2020-11-03 电子科技大学 Lightweight inertial-assisted visual odometer implementation method based on deep learning
CN111028282A (en) * 2019-11-29 2020-04-17 浙江省北大信息技术高等研究院 Unsupervised pose and depth calculation method and system
CN112268564B (en) * 2020-12-25 2021-03-02 中国人民解放军国防科技大学 Unmanned aerial vehicle landing space position and attitude end-to-end estimation method
CN112651345B (en) * 2020-12-29 2023-11-10 深圳市优必选科技股份有限公司 Human body posture recognition model optimization method and device and terminal equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102519481A (en) * 2011-12-29 2012-06-27 中国科学院自动化研究所 Implementation method of binocular vision speedometer
CN106504265A (en) * 2015-09-08 2017-03-15 株式会社理光 Estimation optimization method, equipment and system
CN108332750A (en) * 2018-01-05 2018-07-27 深圳市功夫机器人有限公司 Robot localization method and terminal device
CN108491763A (en) * 2018-03-01 2018-09-04 北京市商汤科技开发有限公司 Three-dimensional scenic identifies unsupervised training method, device and the storage medium of network
CN108648216A (en) * 2018-04-19 2018-10-12 长沙学院 A kind of visual odometry method and system based on light stream and deep learning
US20190066326A1 (en) * 2017-08-28 2019-02-28 Nec Laboratories America, Inc. Learning good features for visual odometry
US20190079533A1 (en) * 2017-09-13 2019-03-14 TuSimple Neural network architecture method for deep odometry assisted by static scene optical flow
CN109798888A (en) * 2019-03-15 2019-05-24 京东方科技集团股份有限公司 Posture determining device, method and the visual odometry of mobile device

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005099423A2 (en) * 2004-04-16 2005-10-27 Aman James A Automatic event videoing, tracking and content generation system
JP2009182870A (en) * 2008-01-31 2009-08-13 Toshiba Corp Form entry record management system and form entry record monitoring program
CN104463216B (en) * 2014-12-15 2017-07-28 北京大学 Eye movement mode data automatic obtaining method based on computer vision
WO2016179303A1 (en) * 2015-05-04 2016-11-10 Kamama, Inc. System and method of vehicle sensor management
JP6575325B2 (en) * 2015-11-27 2019-09-18 富士通株式会社 Camera position / orientation estimation apparatus, camera position / orientation estimation method, and camera position / orientation estimation program
CN106485729A (en) * 2016-09-29 2017-03-08 江苏云光智慧信息科技有限公司 A kind of moving target detecting method based on mixed Gauss model
CN108230328B (en) * 2016-12-22 2021-10-22 新沂阿凡达智能科技有限公司 Method and device for acquiring target object and robot
CN106643699B (en) * 2016-12-26 2023-08-04 北京互易科技有限公司 Space positioning device and positioning method in virtual reality system
US10309777B2 (en) * 2016-12-30 2019-06-04 DeepMap Inc. Visual odometry and pairwise alignment for high definition map creation
CN107423727B (en) * 2017-08-14 2018-07-10 河南工程学院 Face complex expression recognition methods based on neural network
CN107577651B (en) * 2017-08-25 2020-11-10 上海媒智科技有限公司 Chinese character font migration system based on countermeasure network
CN107561503B (en) * 2017-08-28 2020-08-14 哈尔滨工业大学 Adaptive target tracking filtering method based on multiple fading factors
CN107796397B (en) * 2017-09-14 2020-05-15 杭州迦智科技有限公司 Robot binocular vision positioning method and device and storage medium
CN108537848B (en) * 2018-04-19 2021-10-15 北京工业大学 Two-stage pose optimization estimation method for indoor scene reconstruction
CN109344840B (en) * 2018-08-07 2022-04-01 深圳市商汤科技有限公司 Image processing method and apparatus, electronic device, storage medium, and program product
CN109272493A (en) * 2018-08-28 2019-01-25 中国人民解放军火箭军工程大学 A kind of monocular vision odometer method based on recursive convolution neural network
CN109040691B (en) * 2018-08-29 2020-08-28 一石数字技术成都有限公司 Scene video reduction device based on front-end target detection
CN109359578A (en) * 2018-10-09 2019-02-19 四川师范大学 Weighted Fusion triple channel eigengait characterizing method
CN109360226B (en) * 2018-10-17 2021-09-24 武汉大学 Multi-target tracking method based on time series multi-feature fusion
CN109448024B (en) * 2018-11-06 2022-02-11 深圳大学 Visual tracking method and system for constructing constraint correlation filter by using depth data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102519481A (en) * 2011-12-29 2012-06-27 中国科学院自动化研究所 Implementation method of binocular vision speedometer
CN106504265A (en) * 2015-09-08 2017-03-15 株式会社理光 Estimation optimization method, equipment and system
US20190066326A1 (en) * 2017-08-28 2019-02-28 Nec Laboratories America, Inc. Learning good features for visual odometry
US20190079533A1 (en) * 2017-09-13 2019-03-14 TuSimple Neural network architecture method for deep odometry assisted by static scene optical flow
CN108332750A (en) * 2018-01-05 2018-07-27 深圳市功夫机器人有限公司 Robot localization method and terminal device
CN108491763A (en) * 2018-03-01 2018-09-04 北京市商汤科技开发有限公司 Three-dimensional scenic identifies unsupervised training method, device and the storage medium of network
CN108648216A (en) * 2018-04-19 2018-10-12 长沙学院 A kind of visual odometry method and system based on light stream and deep learning
CN109798888A (en) * 2019-03-15 2019-05-24 京东方科技集团股份有限公司 Posture determining device, method and the visual odometry of mobile device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112344922A (en) * 2020-10-26 2021-02-09 中国科学院自动化研究所 Monocular vision odometer positioning method and system

Also Published As

Publication number Publication date
CN109798888B (en) 2021-09-17
CN109798888A (en) 2019-05-24

Similar Documents

Publication Publication Date Title
WO2020186943A1 (en) Mobile device posture determination apparatus and method, and visual odometer
JP7335274B2 (en) Systems and methods for geolocation prediction
US9542621B2 (en) Spatial pyramid pooling networks for image processing
CN108734210B (en) Object detection method based on cross-modal multi-scale feature fusion
US10339421B2 (en) RGB-D scene labeling with multimodal recurrent neural networks
US11755889B2 (en) Method, system and apparatus for pattern recognition
KR20200092894A (en) On-device classification of fingertip motion patterns into gestures in real-time
US20220012909A1 (en) Camera pose determination method and apparatus, and electronic device
KR102073162B1 (en) Small object detection based on deep learning
CN109583290B (en) System and method for improving visual feature detection using motion-related data
CN113807361B (en) Neural network, target detection method, neural network training method and related products
US9865061B2 (en) Constructing a 3D structure
CN109977872B (en) Motion detection method and device, electronic equipment and computer readable storage medium
CN111428805B (en) Method for detecting salient object, model, storage medium and electronic device
US20150213323A1 (en) Video anomaly detection based upon a sparsity model
CN113191318A (en) Target detection method and device, electronic equipment and storage medium
CN114742112A (en) Object association method and device and electronic equipment
Al Mamun et al. Efficient lane marking detection using deep learning technique with differential and cross-entropy loss.
Zhou et al. Learned monocular depth priors in visual-inertial initialization
JP2023036795A (en) Image processing method, model training method, apparatus, electronic device, storage medium, computer program, and self-driving vehicle
Wang et al. ATG-PVD: Ticketing parking violations on a drone
CN113762231B (en) End-to-end multi-pedestrian posture tracking method and device and electronic equipment
Kang et al. ETLi: Efficiently annotated traffic LiDAR dataset using incremental and suggestive annotation
JP2023553630A (en) Keypoint-based behavioral localization
CN111339226B (en) Method and device for constructing map based on classification detection network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20773408

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20773408

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 08/11/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 20773408

Country of ref document: EP

Kind code of ref document: A1