移动设备的姿态确定装置、方法和视觉里程计Device and method for determining posture of mobile equipment and visual odometer
相关申请的交叉引用Cross references to related applications
本申请是以CN申请号为201910199169.7,申请日为2019年3月15日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。This application is based on the application with the CN application number 201910199169.7 and the filing date of March 15, 2019, and claims its priority. The disclosure of the CN application is hereby incorporated into this application as a whole.
技术领域Technical field
本公开涉及计算机技术领域,特别涉及一种移动设备的姿态确定装置、移动设备的姿态方法、视觉里程计和计算机可读存储介质。The present disclosure relates to the field of computer technology, and in particular to a posture determination device of a mobile device, a posture method of a mobile device, a visual odometer and a computer-readable storage medium.
背景技术Background technique
视觉里程计能够通过分析处理相关图像序列,确定机器人的位置和姿态,进而记录机器人行驶的整个轨迹。The visual odometer can determine the position and posture of the robot by analyzing and processing related image sequences, and then record the entire trajectory of the robot.
在相关技术中,视觉里程计将视频流中相邻帧的图像信息联合起来,基于图像的几何特征利用局部地图优化确定对应帧的相机姿态;或者基于IMU(Inertial measurement unit,惯性测量单元)提供的信息,确定相机姿态。In related technologies, the visual odometer combines the image information of adjacent frames in the video stream, and uses the local map to optimize the camera pose of the corresponding frame based on the geometric characteristics of the image; or based on the IMU (Inertial Measurement Unit) to provide Information to determine the camera pose.
发明内容Summary of the invention
根据本公开的一些实施例,提供了一种移动设备的姿态确定装置,包括一个或多个处理器,所述处理器被配置为:确定所述移动设备获取的视频流中当前帧与上一帧之间的图像差别特征;根据所述图像差别特征,利用第一机器学习模型,获取当前编码信息;根据所述当前编码信息和至少一个历史编码信息,利用第二机器学习模型确定所述移动设备的姿态。According to some embodiments of the present disclosure, there is provided an apparatus for determining a posture of a mobile device, including one or more processors configured to determine the current frame and the previous frame in the video stream acquired by the mobile device. Image difference characteristics between frames; according to the image difference characteristics, the first machine learning model is used to obtain current encoding information; according to the current encoding information and at least one historical encoding information, the second machine learning model is used to determine the movement The attitude of the device.
在一些实施例中,所述当前帧为第M帧,M为大于1的正整数;在从第N帧到第N-1帧对应的所述移动设备的运动距离或者姿态变化中的至少一个超过阈值的情况下,存储第N帧的编码信息作为所述历史编码信息,N为小于M的正整数。In some embodiments, the current frame is the Mth frame, and M is a positive integer greater than 1. At least one of the movement distance or the posture change of the mobile device corresponding to the Nth frame to the N-1th frame If the threshold is exceeded, the coding information of the Nth frame is stored as the historical coding information, and N is a positive integer less than M.
在一些实施例中,根据所述当前编码信息的各通道分量之间的相关性,对所述当前编码信息的各通道分量进行融合,得到融合后的当前编码信息;根据所述历史编码信息的各通道分量之间的相关性,对所述历史编码信息的各通道分量进行融合,得到融合后的历史编码信息;根据所述融合后的当前编码信息和所述融合后的历史编码信 息,利用第二机器学习模型确定所述移动设备的姿态。In some embodiments, according to the correlation between the various channel components of the current coding information, the various channel components of the current coding information are fused to obtain the current coding information after fusion; The correlation between the various channel components is to fuse each channel component of the historical coding information to obtain the fused historical coding information; according to the fused current coding information and the fused historical coding information, use The second machine learning model determines the posture of the mobile device.
在一些实施例中,根据所述当前编码信息各通道分量之间的相关性,确定所述各通道分量的第一权重;根据所述第一权重,对所述各通道分量进行加权,得到所述融合后的当前编码信息。In some embodiments, the first weight of each channel component is determined according to the correlation between each channel component of the current coding information; the each channel component is weighted according to the first weight to obtain the The current encoding information after the fusion is described.
在一些实施例中,根据每个历史编码信息的各通道分量之间的相关性,确定所述各通道分量的第二权重;根据所述第二权重,对所述各通道分量进行加权,得到所述融合后的历史编码信息。In some embodiments, the second weight of each channel component is determined according to the correlation between each channel component of each historical coded information; each channel component is weighted according to the second weight to obtain The fused historical coding information.
在一些实施例中,根据各历史编码信息之间的相关性,对所述各历史编码信息进行融合,得到综合历史编码信息;根据所述综合历史编码信息和所述当前编码信息,利用第二机器学习模型确定所述移动设备的姿态。In some embodiments, according to the correlation between the historical coding information, the historical coding information is fused to obtain integrated historical coding information; according to the integrated historical coding information and the current coding information, the second The machine learning model determines the posture of the mobile device.
在一些实施例中,根据各历史编码信息之间的相关性,确定所述各历史编码信息的第三权重;根据所述第三权重,对所述各历史编码信息进行加权求和,得到所述综合历史编码信息。In some embodiments, the third weight of each historical coding information is determined according to the correlation between each historical coding information; according to the third weight, the weighted sum of each historical coding information is performed to obtain the Describes comprehensive historical coding information.
在一些实施例中,将所述当前编码信息和所述历史编码信息,按照通道维度方向拼接,生成输出编码信息;根据所述输出编码信息,利用所述第二机器学习模型确定所述移动设备的姿态。In some embodiments, the current encoding information and the historical encoding information are spliced according to the channel dimension direction to generate output encoding information; according to the output encoding information, the second machine learning model is used to determine the mobile device Gesture.
在一些实施例中,所述图像差别特征通过光流网络模型获取;所述第一机器学习模型和所述第二机器学习模型中的至少一个为ConvLSTM(Convolutional Long Short-Term Memory Network,卷积长短期记忆网络)模型。In some embodiments, the image difference feature is acquired through an optical flow network model; at least one of the first machine learning model and the second machine learning model is ConvLSTM (Convolutional Long Short-Term Memory Network, convolution Long short-term memory network) model.
根据本公开的另一些实施例,提供了一种移动设备的姿态确定方法,包括:确定所述移动设备获取的视频流中当前帧与上一帧之间的图像差别特征;根据所述图像差别特征,利用第一机器学习模型,确定当前编码信息;根据所述当前编码信息和至少一个历史编码信息,利用第二机器学习模型确定所述移动设备的姿态。According to other embodiments of the present disclosure, a method for determining the posture of a mobile device is provided, including: determining the image difference feature between the current frame and the previous frame in the video stream acquired by the mobile device; and according to the image difference Characteristic, using a first machine learning model to determine current encoding information; using a second machine learning model to determine the posture of the mobile device according to the current encoding information and at least one piece of historical encoding information.
在一些实施例中,所述当前帧为第M帧,M为大于1的正整数;在从第N帧到第N-1帧对应的所述移动设备的运动距离或者姿态变化中的至少一个超过阈值的情况下,存储第N帧的编码信息作为所述历史编码信息,N为小于M的正整数。In some embodiments, the current frame is the Mth frame, and M is a positive integer greater than 1. At least one of the movement distance or the posture change of the mobile device corresponding to the Nth frame to the N-1th frame If the threshold is exceeded, the coding information of the Nth frame is stored as the historical coding information, and N is a positive integer less than M.
在一些实施例中,根据所述当前编码信息的各通道分量之间的相关性,对所述当前编码信息的各通道分量进行融合,得到融合后的当前编码信息;根据所述历史编码信息的各通道分量之间的相关性,对所述历史编码信息的各通道分量进行融合,得到融合后的历史编码信息;根据所述融合后的当前编码信息和所述融合后的历史编码信 息,利用第二机器学习模型确定所述移动设备的姿态。In some embodiments, according to the correlation between the various channel components of the current coding information, the various channel components of the current coding information are fused to obtain the current coding information after fusion; The correlation between the various channel components is to fuse each channel component of the historical coding information to obtain the fused historical coding information; according to the fused current coding information and the fused historical coding information, use The second machine learning model determines the posture of the mobile device.
在一些实施例中,根据所述当前编码信息各通道分量之间的相关性,确定所述各通道分量的第一权重;根据所述第一权重,对所述各通道分量进行加权,得到所述融合后的当前编码信息。In some embodiments, the first weight of each channel component is determined according to the correlation between each channel component of the current coding information; the each channel component is weighted according to the first weight to obtain the The current encoding information after the fusion is described.
在一些实施例中,根据每个历史编码信息的各通道分量之间的相关性,确定所述各通道分量的第二权重;根据所述第二权重,对所述各通道分量进行加权,得到所述融合后的所述历史编码信息。In some embodiments, the second weight of each channel component is determined according to the correlation between each channel component of each historical coded information; each channel component is weighted according to the second weight to obtain The historical coding information after the fusion.
在一些实施例中,所述至少一个历史编码信息包括多个历史编码信息,根据各历史编码信息之间的相关性,对所述各历史编码信息进行融合,得到综合历史编码信息;根据所述综合历史编码信息和所述当前编码信息,利用第二机器学习模型确定所述移动设备的姿态。In some embodiments, the at least one piece of historical coding information includes multiple pieces of historical coding information, and according to the correlation between each piece of historical coding information, the pieces of historical coding information are fused to obtain comprehensive historical coding information; Synthesize the historical coding information and the current coding information, and use a second machine learning model to determine the posture of the mobile device.
在一些实施例中,根据各历史编码信息之间的相关性,确定所述各历史编码信息的第三权重;根据所述第三权重,对所述各历史编码信息进行加权求和,得到所述综合历史编码信息。In some embodiments, the third weight of each historical coding information is determined according to the correlation between each historical coding information; according to the third weight, the weighted sum of each historical coding information is performed to obtain the Describes comprehensive historical coding information.
在一些实施例中,将所述当前编码信息和所述历史编码信息,按照通道维度方向拼接,生成输出编码信息;根据所述输出编码信息,利用所述第二机器学习模型确定所述移动设备的姿态。In some embodiments, the current encoding information and the historical encoding information are spliced according to the channel dimension direction to generate output encoding information; according to the output encoding information, the second machine learning model is used to determine the mobile device Gesture.
在一些实施例中,所述图像差别特征通过光流网络模型获取;所述第一机器学习模型和所述第二机器学习模型中的至少一个为ConvLSTM模型。In some embodiments, the image difference feature is obtained through an optical flow network model; at least one of the first machine learning model and the second machine learning model is a ConvLSTM model.
根据本公开的又一些实施例,提供了一种视觉里程计,包括:如前述任一实施例所述的姿态确定装置,用于根据移动设备拍摄的视频流确定所述移动设备的姿态。According to still other embodiments of the present disclosure, there is provided a visual odometer, including: the posture determination apparatus as described in any of the foregoing embodiments, configured to determine the posture of the mobile device according to the video stream shot by the mobile device.
在一些实施例中,所述的视觉里程计还包括图像传感器,用于获取所述视频流。In some embodiments, the visual odometer further includes an image sensor for acquiring the video stream.
根据本公开的再一些实施例,提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如前述任一实施例所述的姿态确定方法。According to still other embodiments of the present disclosure, there is provided a computer-readable storage medium having a computer program stored thereon, and when the program is executed by a processor, the posture determination method as described in any of the foregoing embodiments is implemented.
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。Through the following detailed description of exemplary embodiments of the present disclosure with reference to the accompanying drawings, other features and advantages of the present disclosure will become clear.
附图说明Description of the drawings
此处所说明的附图用来提供对本公开的进一步理解,构成本申请的一部分,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图 中:The drawings described here are used to provide a further understanding of the present disclosure and constitute a part of the present application. The exemplary embodiments of the present disclosure and their descriptions are used to explain the present disclosure, and do not constitute an improper limitation of the present disclosure. In the attached drawing:
图1是示出根据本公开一个实施例的移动设备的姿态确定方法的流程图;Fig. 1 is a flowchart showing a method for determining a posture of a mobile device according to an embodiment of the present disclosure;
图2a是示出根据本公开一个实施例的移动设备的姿态确定方法的示意图;Fig. 2a is a schematic diagram showing a method for determining a posture of a mobile device according to an embodiment of the present disclosure;
图2b是示出根据本公开一个实施例的移动设备的姿态确定方法所用的ConvLSTM的示意图;FIG. 2b is a schematic diagram showing ConvLSTM used in a method for determining a posture of a mobile device according to an embodiment of the present disclosure;
图3是示出图1中步骤130的一个实施例的流程图;FIG. 3 is a flowchart showing an embodiment of step 130 in FIG. 1;
图4是示出图3中步骤1320的一个实施例的示意图;FIG. 4 is a schematic diagram showing an embodiment of step 1320 in FIG. 3;
图5是示出图1中步骤130的另一个实施例的流程图;FIG. 5 is a flowchart showing another embodiment of step 130 in FIG. 1;
图6是示出图5中步骤1321的一个实施例的示意图;FIG. 6 is a schematic diagram showing an embodiment of step 1321 in FIG. 5;
图7是示出图1中步骤130的又一个实施例的流程图;FIG. 7 is a flowchart showing another embodiment of step 130 in FIG. 1;
图8是示出根据本公开一个实施例的移动设备的姿态确定装置的框图;Fig. 8 is a block diagram showing an apparatus for determining a posture of a mobile device according to an embodiment of the present disclosure;
图9是示出用于根据本公开另一个实施例的移动设备的姿态确定装置的框图;FIG. 9 is a block diagram showing an apparatus for determining a posture of a mobile device according to another embodiment of the present disclosure;
图10是示出根据本公开一个实施例的视觉里程计的框图。FIG. 10 is a block diagram showing a visual odometer according to an embodiment of the present disclosure.
应当明白,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。此外,相同或类似的参考标号表示相同或类似的构件。It should be understood that the sizes of the various parts shown in the drawings are not drawn in accordance with the actual proportional relationship. In addition, the same or similar reference numerals indicate the same or similar components.
具体实施方式detailed description
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The following description of at least one exemplary embodiment is actually only illustrative, and in no way serves as any limitation to the present disclosure and its application or use. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present disclosure.
除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。Unless specifically stated otherwise, the relative arrangement of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure. At the same time, it should be understood that, for ease of description, the sizes of the various parts shown in the drawings are not drawn in accordance with actual proportional relationships. The technologies, methods, and equipment known to those of ordinary skill in the relevant fields may not be discussed in detail, but where appropriate, the technologies, methods, and equipment should be regarded as part of the authorization specification. In all examples shown and discussed here, any specific value should be interpreted as merely exemplary, rather than as a limitation. Therefore, other examples of the exemplary embodiment may have different values. It should be noted that similar reference numerals and letters indicate similar items in the following drawings, so once a certain item is defined in one drawing, it does not need to be further discussed in subsequent drawings.
图1是示出根据本公开一个实施例的移动设备的姿态确定方法的流程图。Fig. 1 is a flowchart showing a method for determining a posture of a mobile device according to an embodiment of the present disclosure.
如图1所示,该方法包括:步骤110,确定图像差别特征;步骤120,确定当前编码信息;和步骤130,确定移动设备的姿态。As shown in Fig. 1, the method includes: step 110, determining the image difference feature; step 120, determining the current encoding information; and step 130, determining the posture of the mobile device.
在步骤110中,确定移动设备获取的视频流中当前帧与上一帧之间的图像差别特征。In step 110, the image difference feature between the current frame and the previous frame in the video stream acquired by the mobile device is determined.
例如,移动设备可以是机器人、无人驾驶车、无人机等可以移动的平台,通过基于CCD或CMOS等图像传感器的相机拍摄图像。For example, the mobile device may be a movable platform such as a robot, an unmanned vehicle, a drone, etc., and images are taken by a camera based on an image sensor such as a CCD or CMOS.
例如,可以通过卷积神经网络(CNN)获取图像差别特征。For example, the image difference feature can be obtained through a convolutional neural network (CNN).
例如,可以通过光流网络(Flownet:Learning Optical Flow with Convolutional Networks)模型获取图像差别特征。For example, the optical flow network (Flownet: Learning Optical Flow with Convolutional Networks) model can be used to obtain image difference features.
例如,可以通过光流网络(FlowNet 2.0:Evolution of Optical Flow Estimation with Deep Networks)模型获取图像差别特征。For example, the optical flow network (FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks) model can be used to obtain image difference features.
在一些实施例中,可以将相邻两帧图像重叠起来输入光流网络模型,利用光流网络模型的特征提取部分提取图像差别特征。图像差别特征为高维特征,高维特征的通道数(如1024个)可以根据当前帧图像的分辨率确定。例如,光流网络模型可以对重叠后的图像进行多次卷积处理,并根据卷积处理结果提取相邻两帧图像每个像素的偏移量作为图像差别特征。In some embodiments, two adjacent frames of images can be superimposed and input into the optical flow network model, and the feature extraction part of the optical flow network model is used to extract the image difference features. The image difference feature is a high-dimensional feature, and the number of channels (such as 1024) of the high-dimensional feature can be determined according to the resolution of the current frame image. For example, the optical flow network model can perform multiple convolution processing on the overlapped image, and extract the offset of each pixel of two adjacent frames of image as the image difference feature according to the convolution processing result.
这样,可以将高维冗余的图像信息转换为高层的、抽象的语义特征,解决了基于几何特征的相关技术易受环境因素(如遮挡、光照变化、动态物体等)影响的问题,从而提高了姿态确定的准确性。In this way, high-dimensional redundant image information can be converted into high-level, abstract semantic features, which solves the problem that related technologies based on geometric features are susceptible to environmental factors (such as occlusion, lighting changes, dynamic objects, etc.), thereby improving The accuracy of attitude determination is improved.
在步骤120中,根据图像差别特征,利用第一机器学习模型,确定当前编码信息。例如,第一机器学习模型可以为RNN(Recurrent Neural Network,循环神经网络)模型,如ConvLSTM模型。In step 120, the first machine learning model is used to determine the current encoding information according to the image difference characteristics. For example, the first machine learning model may be an RNN (Recurrent Neural Network) model, such as a ConvLSTM model.
在一些实施例中,可以从RNN模型的历史输出中筛选出对姿态确定具有重要影响的历史编码信息(即关键帧相应的编码信息)作为有效信息。可以将有效信息与当前编码信息融合,共同确定移动设备的当前姿态。例如,在移动设备从第N帧到第N-1帧对应的运动距离或者姿态变化中的至少一个超过阈值的情况下,确定第N帧为关键帧;存储RNN模型提取的第N帧的编码信息作为历史编码信息。In some embodiments, historical coding information (that is, coding information corresponding to key frames) that has an important influence on pose determination can be filtered from the historical output of the RNN model as effective information. The effective information can be fused with the current coded information to jointly determine the current posture of the mobile device. For example, in the case that at least one of the movement distance or posture change corresponding to the mobile device from the Nth frame to the N-1th frame exceeds the threshold, the Nth frame is determined to be a key frame; the code of the Nth frame extracted by the RNN model is stored The information is used as historical coding information.
在步骤130中,根据当前编码信息和至少一个历史编码信息,利用第二机器学习模型,确定移动设备的姿态。例如,第二机器学习模型可以为RNN模型,如ConvLSTM 模型。利用RNN模型对编码信息进行解码,可以确定移动设备的姿态。In step 130, a second machine learning model is used to determine the posture of the mobile device according to the current encoding information and at least one historical encoding information. For example, the second machine learning model may be an RNN model, such as a ConvLSTM model. Using the RNN model to decode the encoded information, the posture of the mobile device can be determined.
这种基于当前编码信息和历史编码信息确定的当前姿态,是在视频流的第一帧到当前帧的全局范围内,进行全局优化确定的姿态(即绝对姿态)。相比于相关技术仅在当前帧和前一帧的局部范围内确定的局部优化姿态(即相对姿态),绝对姿态更加准确。This current posture determined based on current coding information and historical coding information is a posture determined by global optimization (that is, absolute posture) in the global range from the first frame to the current frame of the video stream. Compared with the locally optimized posture (that is, the relative posture) determined only in the local range of the current frame and the previous frame, the absolute posture is more accurate.
另外,ConvLSTM模型不必依赖于IMU提供的信息,仅依赖视觉信息即可确姿态确定态,从而降低了姿态确定成本。In addition, the ConvLSTM model does not need to rely on the information provided by the IMU, and only relies on visual information to determine the attitude determination state, thereby reducing the cost of attitude determination.
图2a是示出根据本公开一个实施例的移动设备的姿态确定方法的示意图。Fig. 2a is a schematic diagram showing a method for determining a posture of a mobile device according to an embodiment of the present disclosure.
如图2a所示,提取的1到T时刻的当前编码信息为x
1到x
T。各时刻存储的历史编码信息为S
2到S
T。将各时刻的当前编码信息和历史编码信息作为第一机器学习模型(如ConvLSTM)的输入,得到各时刻的输出编码信息O
1到O
T。将O
1到O
T输入第二机器学习模型(如ConvLSTM),得到各时刻的移动设备的姿态P到P
T。
As shown in Figure 2a, the extracted current coding information from 1 to T is x 1 to x T. The historical coding information stored at each time is S 2 to S T. The current coding information and historical coding information at each time are used as the input of the first machine learning model (such as ConvLSTM) to obtain output coding information O 1 to O T at each time. Input O 1 to O T into the second machine learning model (such as ConvLSTM) to obtain the postures P to P T of the mobile device at each moment.
如图2b所示,显示了ConvLSTM的一个原理性实现。X
t,h
t,o
t分别表示输入特征、状态变量和输出。
As shown in Figure 2b, a principle implementation of ConvLSTM is shown. X t , h t , and o t represent input characteristics, state variables and output respectively.
在一些实施例中,步骤130可以通过图2a中的步骤实现。In some embodiments, step 130 may be implemented by the steps in Figure 2a.
尽管本公开的实施例列举了ConvLSTM作为机器学习模型的一种实现,其它的机器学习模型也可以适用于本公开,例如FC-LSTM(Fully Connection LSTM,全连接长短期记忆)模型等。Although the embodiment of the present disclosure enumerates ConvLSTM as an implementation of the machine learning model, other machine learning models may also be applicable to the present disclosure, such as FC-LSTM (Fully Connection LSTM, fully connected long short-term memory) model.
如本领域技术人员所理解的,为了使得机器学习模型(例如神经网络等)具有所需的功能,在使用机器学习模型前,还包括利用多个样本,如样本图像、样本数据等对机器学习模型进行训练的步骤。训练好的机器学习模型可以用于上述方法。例如,可以通过有监督的方式(样本和与样本对应的标注)训练并获得所需机器学习模型。As understood by those skilled in the art, in order to make the machine learning model (such as neural network, etc.) have the required functions, before using the machine learning model, it also includes the use of multiple samples, such as sample images, sample data, etc. for machine learning The steps of model training. The trained machine learning model can be used in the above methods. For example, the required machine learning model can be trained and obtained in a supervised manner (samples and labels corresponding to the samples).
图3是示出图1中步骤130的一个实施例的流程图。FIG. 3 is a flowchart showing an embodiment of step 130 in FIG. 1.
如图3所示,步骤130包括:步骤1310,融合当前编码信息的各通道分量;步骤1320,融合历史编码信息的各通道分量;和步骤1330,确定移动设备的姿态。As shown in FIG. 3, step 130 includes: step 1310, fusing each channel component of the current coded information; step 1320, fusing each channel component of the historical coded information; and step 1330, determining the posture of the mobile device.
在步骤1310中,根据当前编码信息的各通道分量之间的相关性,对当前编码信息的各通道分量进行融合。In step 1310, according to the correlation between the channel components of the current encoded information, the channel components of the current encoded information are fused.
在一些实施例中,根据当前编码信息各通道分量之间的相关性,确定各通道分量的第一权重;根据第一权重,对各通道分量进行加权,得到融合后的当前编码信息。In some embodiments, the first weight of each channel component is determined according to the correlation between each channel component of the current encoded information; each channel component is weighted according to the first weight to obtain the current encoded information after fusion.
例如,当前编码信息即为第一机器学习模型在当前时刻的输出O
t。O
t具有J个通 道分量:O
t1、O
t2…O
tJ。计算O
t1、O
t2…O
tJ之间的相关性,并根据相关性确定O
t1、O
t2…O
tJ的相应权重。利用相应权重对O
t1、O
t2…O
tJ进行加权处理得到O'
t。
For example, the current encoding information is the output O t of the first machine learning model at the current moment. O t has J channel components: O t1 , O t2 … O tJ . Calculate the correlation between O t1 , O t2 …O tJ , and determine the corresponding weights of O t1 , O t2 …O tJ according to the correlation. With corresponding weights O t1, O t2 ... O tJ weighting process to obtain O 't.
这样,相当于根据当前编码信息的空间信息,对各通道分量进行选择。这样的技术方案增大了对姿态确定重要的通道分量,减小了不重要的通道分量,从而提高了姿态确定准确性。In this way, it is equivalent to selecting each channel component according to the spatial information of the current coding information. Such a technical solution increases the channel components important for attitude determination and reduces the unimportant channel components, thereby improving the accuracy of attitude determination.
在步骤1320中,根据历史编码信息的各通道分量之间的相关性,对历史编码信息的各通道分量进行融合。In step 1320, according to the correlation between the various channel components of the historical coding information, the various channel components of the historical coding information are fused.
在一些实施例中,根据每个历史编码信息的各通道分量之间的相关性,确定各通道分量的第二权重;根据第二权重,对各通道分量进行加权,得到融合后的历史编码信息。In some embodiments, the second weight of each channel component is determined according to the correlation between each channel component of each historical coding information; each channel component is weighted according to the second weight to obtain the fused historical coding information .
例如,存储的历史编码信息(有效信息)的集合为S,S中包含I个历史编码信息S
1、S
2…S
i…S
I,i为小于I的正整数。任一个S
i具有J个通道分量:S
i1、S
i2…S
iJ。计算S
i1、S
i2…S
iJ之间的相关性,并根据相关性确定S
i1、S
i2…S
iJ的相应权重。利用相应权重对S
i1、S
i2…S
iJ进行加权处理得到S'
i。这些S'
i组成了融合后的历史编码信息集合S'。
For example, the set of stored historical coding information (valid information) is S, and S contains I historical coding information S 1 , S 2 ... S i ... S I , and i is a positive integer smaller than 1. Any S i has J channel components: S i1 , S i2 ... S iJ . Calculate the correlation between S i1 , S i2 …S iJ , and determine the corresponding weights of S i1 , S i2 …S iJ according to the correlation. Using respective weights of S i1, S i2 ... S iJ weighted processing to obtain S 'i. These S'i constitute the fused historical coded information set S'.
这样,相当于根据历史编码信息的空间信息,对各通道分量进行选择。这样的技术方案增大了对姿态确定重要的通道分量,减小了不重要的通道分量,从而提高了姿态确定准确性。In this way, it is equivalent to selecting each channel component based on the spatial information of the historical coding information. Such a technical solution increases the channel components important for attitude determination and reduces the unimportant channel components, thereby improving the accuracy of attitude determination.
在步骤1330中,根据融合后的当前编码信息和历史编码信息,利用第二机器学习模型确定移动设备的姿态。In step 1330, the second machine learning model is used to determine the posture of the mobile device according to the fused current coding information and historical coding information.
在一些实施例中,步骤1310和步骤1320没有执行顺序,也可以并行处理;还可以仅执行步骤1310或步骤1320。In some embodiments, step 1310 and step 1320 are not executed in an order, and can also be processed in parallel; only step 1310 or step 1320 can also be executed.
图4是示出图3中步骤1320的一个实施例的示意图。FIG. 4 is a schematic diagram showing an embodiment of step 1320 in FIG. 3.
如图4所示,任一个储的历史编码信息S
i具有多个通道分量。根据各通道分量之间的相关系数,利用门函数计算各通道分量的权重。对通道分量进行加权处理得到融合后的S'
i。
As shown, any history of a reservoir having a plurality of encoded information S i 4 channel components. According to the correlation coefficient between each channel component, the weight of each channel component is calculated by the gate function. The channel components are weighted to obtain the fused S'i .
在一些实施例中,130可以通过图3中的步骤实现。In some embodiments, 130 may be implemented through the steps in FIG. 3.
图5是示出图1中步骤130的另一个实施例的流程图。FIG. 5 is a flowchart showing another embodiment of step 130 in FIG. 1.
如图5所示,步骤130包括:步骤1321,融合各历史编码信息;和步骤1330',确定移动设备的姿态。As shown in FIG. 5, step 130 includes: step 1321, fusing various historical coding information; and step 1330', determining the posture of the mobile device.
在步骤1321中,根据各历史编码信息之间的相关性,对各历史编码信息进行融合,得到综合历史编码信息。In step 1321, according to the correlation between the historical coding information, the historical coding information is merged to obtain comprehensive historical coding information.
在一些实施例中,根据各历史编码信息之间的相关性,确定各历史编码信息的第三权重;根据第三权重,对各历史编码信息进行加权求和,得到综合历史编码信息。In some embodiments, the third weight of each historical coding information is determined according to the correlation between each historical coding information; according to the third weight, each historical coding information is weighted and summed to obtain comprehensive historical coding information.
例如,计算历史编码信息S
1、S
2…S
i…S
I之间的相关性,并根据相关性确定S
1、S
2…S
i…S
I的相应权重。对S
1、S
2…S
i…S
I进行加权求和得到综合历史编码信息
For example, the correlation between the historical coding information S 1 , S 2 ... S i ... S I is calculated, and the corresponding weight of S 1 , S 2 ... S i ... S I is determined according to the correlation. Perform weighted summation on S 1 , S 2 …S i …S I to obtain comprehensive historical coding information
这样,利用各帧图像在时间上的连续性,基于时间信息对历史编码信息进行融合。这样的技术方案增强了对姿态确定重要的历史编码信息,减弱了不重要的历史编码信息,从而提高了姿态确定准确性。In this way, the continuity of each frame image in time is used to fuse historical coding information based on time information. Such a technical solution enhances historical coding information that is important for posture determination, and weakens unimportant historical coding information, thereby improving the accuracy of posture determination.
在一些实施例中,可以根据图2中的实施例,继续对综合历史编码信息
的各通道分量进行融合;也可以先根据图2中的实施例对各历史编码信息的各通道分量进行融合得到S',然后根据图3中的实施例对S'中的各历史编码信息进行融合。也就是说,可以对历史编码信息先进行空间上的融合,也可以先进行时间上的融合。
In some embodiments, according to the embodiment in FIG. 2, the integrated historical coding information The channel components of S'can be fused; it is also possible to fuse each channel component of the historical coding information according to the embodiment in Figure 2 to obtain S', and then perform the fusion of each historical coding information in S'according to the embodiment in Figure 3 Fusion. In other words, the historical coding information can be firstly integrated in space or time.
在步骤1330'中,根据综合历史编码信息和当前编码信息,利用第二机器学习模型确定移动设备的姿态。In step 1330', a second machine learning model is used to determine the posture of the mobile device according to the integrated historical coding information and current coding information.
图6是示出图5中步骤1321的一个实施例的示意图。FIG. 6 is a schematic diagram showing an embodiment of step 1321 in FIG. 5.
如图6所示,存储的历史编码信息的集合S包括S
1、S
2…S
i…S
I。根据S
1、S
2…S
i…S
I之间的相关系数,利用门函数计算S
1、S
2…S
i…S
I的相应权重。对S
1、S
2…S
i…S
I进行加权后得到S'
1、S
2…S'
i…S'
I。对S'
1、S
2…S'
i…S'
I求和得到综合历史编码信息
As shown in Fig. 6, the set S of stored historical coding information includes S 1 , S 2 ... S i ... S I. The correlation coefficient between the S 1, S 2 ... S i ... S I, S 1 is calculated using the gate function, S 2 ... S i ... S I corresponding to the right weight. After S 1, S 2 ... S i ... S I obtained weighted S '1, S 2 ... S ' i ... S 'I. Of S '1, S 2 ... S ' i ... S 'I summed integrated encoded information history
在一些实施例中,步骤130可以通过图7中的步骤实现。In some embodiments, step 130 may be implemented by the steps in FIG. 7.
图7是示出图1中步骤130的又一个实施例的流程图。FIG. 7 is a flowchart showing another embodiment of step 130 in FIG. 1.
如图7所示,步骤130包括:步骤1322,拼接当前编码信息和历史编码信息;和步骤1330”,确定移动设备的姿态。As shown in Fig. 7, step 130 includes: step 1322, splicing current coding information and historical coding information; and step 1330", determining the posture of the mobile device.
在步骤1322中,将当前编码信息和历史编码信息,按照通道维度方向拼接,生成输出编码信息。也就是说,以当前编码信息和历史编码信息为特征矩阵,以矩阵的每一层(即每一通道)为一个部分进行拼接。例如,可以通过具有两层卷积层(如卷积核大小为3×3,卷积步长为1)的神经网络模型进行拼接。In step 1322, the current coding information and historical coding information are spliced according to the channel dimension direction to generate output coding information. That is to say, the current coding information and historical coding information are used as the feature matrix, and each layer (ie, each channel) of the matrix is used as a part for splicing. For example, it can be spliced by a neural network model with two convolutional layers (for example, the size of the convolution kernel is 3×3, and the convolution step size is 1).
在一些实施例中,可以对历史编码信息、当前编码信息进行时间上和空间上的融合后再拼接。In some embodiments, the historical coding information and the current coding information may be merged in time and space before splicing.
在步骤1330”中,根据输出编码信息,利用第二机器学习模型确定移动设备的姿 态。In step 1330", the second machine learning model is used to determine the posture of the mobile device according to the output code information.
本公开实施例提供的姿态确定方法,在公开无人驾驶数据集KITTI上进行了测试,平均旋转误差不超过每100米3度,平均平移误差不超过5%。The posture determination method provided by the embodiment of the present disclosure was tested on the public unmanned driving data set KITTI, and the average rotation error did not exceed 3 degrees per 100 meters, and the average translation error did not exceed 5%.
图8是示出根据本公开一个实施例的移动设备的姿态确定装置的框图。Fig. 8 is a block diagram showing an apparatus for determining a posture of a mobile device according to an embodiment of the present disclosure.
如图8所示,移动设备的姿态确定装置8包括一个或多个处理器81。As shown in FIG. 8, the device 8 for determining the posture of the mobile device includes one or more processors 81.
处理器81被配置为获取移动设备拍摄的视频流中当前帧与上一帧之间的图像差别特征。例如,图像差别特征通过光流网络模型获取。The processor 81 is configured to obtain the image difference feature between the current frame and the previous frame in the video stream shot by the mobile device. For example, the image difference feature is obtained through the optical flow network model.
处理器81被配置为:根据图像差别特征,利用第一机器学习模型,获取当前编码信息;根据当前编码信息和至少一个历史编码信息,利用第二机器学习模型确定移动设备的姿态。例如,第一机器学习模型和第二机器学习模型中的至少一个为ConvLSTM模型。The processor 81 is configured to: use the first machine learning model to obtain current coding information according to the image difference characteristics; and use the second machine learning model to determine the posture of the mobile device according to the current coding information and at least one piece of historical coding information. For example, at least one of the first machine learning model and the second machine learning model is a ConvLSTM model.
在一些实施例中,姿态确定装置还包括存储器82。存储器82被配置为:在移动设备从第N帧到第N-1帧对应的运动距离或者姿态变化中的至少一个超过阈值的情况下,存储第N帧的编码信息作为历史编码信息。In some embodiments, the posture determination device further includes a memory 82. The memory 82 is configured to store the encoding information of the Nth frame as historical encoding information when at least one of the movement distance or the posture change corresponding to the mobile device from the Nth frame to the N-1th frame exceeds a threshold.
在一些实施例中,处理器81根据当前编码信息的各通道分量之间的相关性,对当前编码信息的各通道分量进行融合。处理器81根据历史编码信息的各通道分量之间的相关性,对历史编码信息的各通道分量进行融合。处理器81根据融合后的当前编码信息和历史编码信息,利用第二机器学习模型确定移动设备的姿态。In some embodiments, the processor 81 fuses each channel component of the currently encoded information according to the correlation between each channel component of the currently encoded information. The processor 81 fuses the various channel components of the historical coding information according to the correlation between the various channel components of the historical coding information. The processor 81 uses the second machine learning model to determine the posture of the mobile device according to the fused current coding information and historical coding information.
例如,处理器81根据当前编码信息各通道分量之间的相关性,确定各通道分量的第一权重。处理器81根据第一权重,对各通道分量进行加权,得到融合后的当前编码信息。For example, the processor 81 determines the first weight of each channel component according to the correlation between each channel component of the currently encoded information. The processor 81 weights each channel component according to the first weight to obtain the current encoded information after fusion.
例如,处理器81根据每个历史编码信息的各通道分量之间的相关性,确定所述各通道分量的第二权重。处理器81根据第二权重,对各通道分量进行加权,得到融合后的历史编码信息。For example, the processor 81 determines the second weight of each channel component according to the correlation between each channel component of each piece of historical coding information. The processor 81 weights each channel component according to the second weight to obtain the fused historical coding information.
在一些实施例中,处理器81根据各历史编码信息之间的相关性,对各历史编码信息进行融合,得到综合历史编码信息。处理器81根据综合历史编码信息,利用第二机器学习模型确定移动设备的姿态。In some embodiments, the processor 81 fuses various historical coding information according to the correlation between various historical coding information to obtain comprehensive historical coding information. The processor 81 uses the second machine learning model to determine the posture of the mobile device according to the integrated historical coding information.
例如,处理器81根据各历史编码信息之间的相关性,确定各历史编码信息的第三权重。处理器81根据第三权重,对各历史编码信息进行加权求和,得到综合历史编码信息。For example, the processor 81 determines the third weight of each historical encoding information according to the correlation between each historical encoding information. The processor 81 performs a weighted summation on the historical coding information according to the third weight to obtain comprehensive historical coding information.
在一些实施例中,处理器81将当前编码信息和历史编码信息,按照通道维度方向拼接,生成输出编码信息。处理器81根据输出编码信息,利用第二机器学习模型确定移动设备的姿态。In some embodiments, the processor 81 splices current encoding information and historical encoding information according to the channel dimension direction to generate output encoding information. The processor 81 uses the second machine learning model to determine the posture of the mobile device according to the output code information.
图9是示出用于根据本公开另一个实施例的移动设备的姿态确定装置的框图。FIG. 9 is a block diagram showing an apparatus for determining a posture of a mobile device according to another embodiment of the present disclosure.
如图9所示,姿态确定装置可以通用计算设备的形式表现。计算机系统包括存储器910、处理器920和连接不同系统组件的总线900。As shown in Figure 9, the posture determination device can be expressed in the form of a general-purpose computing device. The computer system includes a memory 910, a processor 920, and a bus 900 connecting different system components.
存储器910例如可以包括系统存储器、非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)以及其他程序等。系统存储器可以包括易失性存储介质,例如随机存取存储器(RAM)和/或高速缓存存储器。非易失性存储介质例如存储有执行显示方法的对应实施例的指令。非易失性存储介质包括但不限于磁盘存储器、光学存储器、闪存等。The memory 910 may include, for example, a system memory, a nonvolatile storage medium, and the like. The system memory, for example, stores an operating system, an application program, a boot loader (Boot Loader), and other programs. The system memory may include volatile storage media, such as random access memory (RAM) and/or cache memory. The non-volatile storage medium stores, for example, instructions for executing corresponding embodiments of the display method. Non-volatile storage media include, but are not limited to, magnetic disk storage, optical storage, flash memory, etc.
处理器920可以用通用处理器、数字信号处理器(DSP)、应用专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其它可编程逻辑设备、分立门或晶体管等分立硬件组件方式来实现。相应地,诸如判断模块和确定模块的每个模块,可以通过中央处理器(CPU)运行存储器中执行相应步骤的指令来实现,也可以通过执行相应步骤的专用电路来实现。The processor 920 can be implemented by a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistors and other discrete hardware components. achieve. Correspondingly, each module such as the judgment module and the determination module can be implemented by a central processing unit (CPU) running instructions for executing corresponding steps in the memory, or can be implemented by a dedicated circuit that executes the corresponding steps.
总线900可以使用多种总线结构中的任意总线结构。例如,总线结构包括但不限于工业标准体系结构(ISA)总线、微通道体系结构(MCA)总线、外围组件互连(PCI)总线。The bus 900 can use any bus structure among a variety of bus structures. For example, the bus structure includes, but is not limited to, an industry standard architecture (ISA) bus, a microchannel architecture (MCA) bus, and a peripheral component interconnect (PCI) bus.
计算机系统还可以包括输入输出接口930、网络接口940、存储接口950等。这些接口930、940、950以及存储器910和处理器920之间可以通过总线900连接。输入输出接口930可以为显示器、鼠标、键盘等输入输出设备提供连接接口。网络接口940为各种联网设备提供连接接口。存储接口940为软盘、U盘、SD卡等外部存储设备提供连接接口。The computer system may also include an input/output interface 930, a network interface 940, a storage interface 950, and so on. These interfaces 930, 940, 950, and the memory 910 and the processor 920 may be connected through a bus 900. The input and output interface 930 can provide a connection interface for input and output devices such as a display, a mouse, and a keyboard. The network interface 940 provides a connection interface for various networked devices. The storage interface 940 provides a connection interface for external storage devices such as floppy disks, U disks, and SD cards.
图10是示出根据本公开一个实施例的视觉里程计的框图。FIG. 10 is a block diagram showing a visual odometer according to an embodiment of the present disclosure.
如图10所示,视觉里程计10包括上述任一个实施例中的姿态确定装置11,用于根据移动设备拍摄的视频流确定所述移动设备的姿态。As shown in FIG. 10, the visual odometer 10 includes the posture determination device 11 in any of the above embodiments, which is used to determine the posture of the mobile device according to the video stream shot by the mobile device.
在一些实施例中,视觉里程计10还包括成像器件,例如图像传感器12,用于获取视频流。In some embodiments, the visual odometer 10 further includes an imaging device, such as an image sensor 12, for acquiring a video stream.
在一些实施例中,成像器件可以通过无线,例如蓝牙、Wi-Fi等方式与姿态确定 装置11中的处理器通讯连接;也可以通过有线,例如网线、线缆、走线等与姿态确定装置11中的处理器通讯连接。In some embodiments, the imaging device can communicate with the processor in the attitude determination device 11 through wireless, such as Bluetooth, Wi-Fi, etc.; or through wired, such as network cables, cables, wiring, etc., and the attitude determination device The processor communication connection in 11.
本领域内的技术人员应当明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present disclosure may be provided as methods, systems, or computer program products. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. .
至此,已经详细描述了根据本公开的移动设备的姿态确定装置、移动设备的姿态方法、视觉里程计和计算机可读存储介质。为了避免遮蔽本公开的构思,没有描述本领域所公知的一些细节。本领域技术人员根据上面的描述,完全可以明白如何实施这里公开的技术方案。So far, the posture determination apparatus of the mobile device, the posture method of the mobile device, the visual odometer, and the computer-readable storage medium according to the present disclosure have been described in detail. In order to avoid obscuring the concept of the present disclosure, some details known in the art are not described. Based on the above description, those skilled in the art can fully understand how to implement the technical solutions disclosed herein.
可能以许多方式来实现本公开的方法和系统。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和系统。用于所述方法的步骤的上述顺序仅是为了进行说明,本公开的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。The method and system of the present disclosure may be implemented in many ways. For example, the method and system of the present disclosure can be implemented by software, hardware, firmware or any combination of software, hardware, and firmware. The above-mentioned order of the steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above, unless otherwise specifically stated. In addition, in some embodiments, the present disclosure can also be implemented as programs recorded in a recording medium, and these programs include machine-readable instructions for implementing the method according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
虽然已经通过示例对本公开的一些特定实施例进行了详细说明,但是本领域的技术人员应该理解,以上示例仅是为了进行说明,而不是为了限制本公开的范围。本领域的技术人员应该理解,可在不脱离本公开的范围和精神的情况下,对以上实施例进行修改。本公开的范围由所附权利要求来限定。Although some specific embodiments of the present disclosure have been described in detail through examples, those skilled in the art should understand that the above examples are only for illustration and not for limiting the scope of the present disclosure. Those skilled in the art should understand that the above embodiments can be modified without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.