WO2023184795A1 - 面向虚拟现实元宇宙的先验双向补偿的全景图像配准方法 - Google Patents

面向虚拟现实元宇宙的先验双向补偿的全景图像配准方法 Download PDF

Info

Publication number
WO2023184795A1
WO2023184795A1 PCT/CN2022/107421 CN2022107421W WO2023184795A1 WO 2023184795 A1 WO2023184795 A1 WO 2023184795A1 CN 2022107421 W CN2022107421 W CN 2022107421W WO 2023184795 A1 WO2023184795 A1 WO 2023184795A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
compensation
pixel
image registration
network
Prior art date
Application number
PCT/CN2022/107421
Other languages
English (en)
French (fr)
Inventor
张晖
赵梦
赵海涛
朱洪波
Original Assignee
南京邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京邮电大学 filed Critical 南京邮电大学
Priority to JP2023517808A priority Critical patent/JP2024515913A/ja
Publication of WO2023184795A1 publication Critical patent/WO2023184795A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • G06T7/344Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06T5/80
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the invention belongs to the technical fields of deep learning and image processing, and in particular relates to a panoramic image registration method with a priori bidirectional compensation for the virtual reality metaverse.
  • Image registration is to map a single or multiple images to a target image according to some optimal transformation.
  • the image registration method based on grayscale information is a process that seeks to maximize the similarity between the registered image and the image to be registered based on grayscale information. Similarity measurement methods generally use measures such as mutual information, squared difference, Hausdorff distance, cross-correlation coefficient, and sum of squared differences. Among them, mutual information is frequently used in image fusion.
  • the process of maximizing the similarity between the registered image and the image to be registered is actually an optimization process. By continuously adjusting the parameters of the transformation model, only when the similarity is maximized, can the parameters of the transformation model reach the optimal level.
  • the image to be registered is transformed according to the optimal model to complete the registration with the registration image.
  • Image registration based on feature information uses a feature extraction algorithm to extract features, and then obtains a transformation model by establishing a mapping relationship between feature points between the registered image and the image to be registered, thereby completing image registration.
  • Image registration is becoming increasingly important in medical image processing. By registering images of the same patient taken at different times, the dynamic changes of the patient's organs and pathology can be well analyzed, so that more accurate medical diagnosis can be made and more targeted treatment plans can be formulated.
  • Image registration is also a method for studying mechanical properties in the direction of material mechanics. The temperature, shape and other information collected by different sensors are fused and compared to obtain various values such as temperature field and deformation field. Then various numerical values are brought into the theoretical model to optimize parameters. Image registration can also be used for automatic tracking of moving targets, pattern recognition, and video analysis.
  • Technical problems solved Technical problems such as low image registration efficiency and poor effect when there is a moving foreground in the image due to camera clock desynchronization in the existing virtual reality metaverse scene.
  • a panoramic image registration method for a priori bidirectional compensation oriented to the virtual reality metaverse includes the following steps:
  • Vibe-based background subtraction method to extract moving targets from the two images to be registered, perform adaptive deformation pixel correction on the image foreground set extracted from the moving targets, use the SURF feature extraction algorithm to extract image features, and use the feature threshold to Complete sports target matching;
  • the first image registration model is used to perform image registration on the bidirectionally compensated image based on position prediction
  • the second image registration model is used to perform image registration including target extraction failure. Perform image registration with images that have not been compensated in both directions, including images that failed in target registration.
  • step S1 the background subtraction method based on Vibe is used to extract moving targets from the two images to be registered.
  • the process of adaptive deformation pixel correction for the image foreground set extracted from the moving targets includes the following steps:
  • N G (x) represents the spatial neighborhood of the pixel at position x
  • v y represents the color value of the pixel in the spatial neighborhood
  • N sample points of the background model are randomly selected from the neighborhood pixels to generate
  • Pixels are corrected according to the following formula:
  • (x d , y d ) represents the deformed pixel coordinates
  • (x c , y c ) represents the deformed central pixel coordinates
  • (x u , y u ) represents the pixel coordinates after correction
  • K 1 and K 2 represents the first-order and second-order radiation distortion coefficients respectively
  • r represents the distance from the deformed coordinates to the central pixel coordinate
  • ⁇ and 1- ⁇ represent the relative position of the camera focal length:
  • f max and f min represent the maximum and minimum value of the camera focal length respectively
  • f c represents the actual focal length of the camera.
  • the sparse optical flow method is used to calculate the speed and direction of the moving target in the foreground, and the speeds in the x and y directions are u and v respectively:
  • the foreground set of the left image in the two images A and B to be registered is expressed as:
  • a N ⁇ p A1 ,p A2 ...p AN ⁇ ;
  • the foreground set of the image on the right is expressed as:
  • p Ai and p Bi represent the pixels that constitute the foreground set of image A and image B;
  • the compensation time t B is respectively:
  • the left foreground set is:
  • a N ′ ⁇ p A1 ′,p A2 ′...p AN ′ ⁇
  • p Ai ′ represents the pixels that constitute the compensated foreground set of image A
  • p Aix represents the pixel component in the x direction before compensation
  • p Aix ′ represents the pixel component after compensation in the x direction
  • u A represents the pixel in the x direction speed on
  • p Aiy represents the pixel component in the y direction before compensation
  • p Aiy ′ represents the pixel component after compensation in the y direction
  • u B represents the speed of the pixel in the y direction
  • the foreground set of the right image is:
  • p Bi ′ represents the pixels that constitute the compensated foreground set of image B
  • p Bix represents the pixel component in the x direction before compensation
  • p Bix ′ represents the pixel component after compensation in the x direction
  • u B represents the pixel in the x direction
  • p Biy represents the pixel component in the y direction before compensation
  • p Biy ′ represents the pixel component after compensation in the y direction
  • v B represents the speed of the pixel in the y direction.
  • the first image registration model consists of three dense network blocks, a convolutional layer and a first pooling layer connected in sequence, wherein each dense network block is composed of a convolutional layer , a second pooling layer and a dense network are connected in sequence; the output of the first pooling layer is flattened and connected to the regression network.
  • the regression network is composed of five fully connected layers connected in sequence, which is used for output image registration. Parameters of geometric transformation.
  • the second image registration model includes a feature extraction network, a feature matching network and a regression network connected in sequence, wherein the feature extraction network includes two convolutional layers and a pooling layer connected in sequence, It is used to extract relevant features of the image to be registered and generate feature maps; the feature extraction network includes two-way matching networks that share weights. Each matching network is used to use the correlation map to perform correlation calculations and perform feature descriptors. The output of the matching is passed through the regression network to output the parameters of the geometric transformation of the image registration.
  • the scenes of the virtual reality metaverse include virtual reality scenes.
  • the present invention provides a compensation plan for image registration efficiency problems caused by camera out-of-synchronization and foreground moving targets.
  • the background subtraction method is used to extract targets from the image, and then the deformation of the two cameras is considered during target matching to perform adaptive Deformation correction, and finally compensation operations are performed on the foregrounds in the two images.
  • the present invention also proposes an image registration algorithm based on a dense convolutional neural network, providing an end-to-end method for obtaining transformation parameters.
  • different registration schemes are used for images with or without bidirectional compensation to improve algorithm efficiency.
  • the present invention uses different registration methods for images captured by asynchronous cameras according to whether there are moving objects in the foreground.
  • the present invention uses adaptive deformation pixel correction and position prediction.
  • the two-way compensation can effectively avoid the offset caused by moving objects within the unsynchronized time difference, making the registration result more accurate.
  • the DSCNN-based image registration method proposed by the present invention can realize the conversion parameters required for end-to-end output registration. Compared with the traditional image registration algorithm, the calculation time is greatly reduced, and it has broader application prospects.
  • Figure 1 is an overall flow chart of the panoramic image registration method for a priori bidirectional compensation of the virtual reality metaverse according to the present invention
  • Figure 2 is the target extraction flow chart
  • Figure 3 is a flow chart of target matching considering deformation
  • Figure 4 is an image registration flow chart based on bidirectional compensation
  • Figure 5 is the DSCNN image registration flow chart
  • Figure 6 is the DSCNN image registration network structure diagram
  • Figure 7 shows the structure diagram of the image registration network based on VGG-16.
  • the present invention proposes a panoramic image registration method for a priori two-way compensation for the virtual reality metaverse, which basically includes moving target extraction, moving target matching considering deformation, two-way compensation based on position prediction, and different registration methods for different foreground sets.
  • the scenes of the virtual reality metaverse include virtual reality scenes, VR monitoring scenes, etc. As shown in Figure 1, it specifically includes the following steps:
  • Step 1 The ViBe algorithm establishes a background model M(x) containing N sample values for each pixel value in the image sequence to be detected, where vi represents the background sample with index i:
  • v(x) represents the color value of the pixel located at x in a given color space, and the background model of this pixel is M(x).
  • ViBe assumes that adjacent pixels have similar spatial distribution relationships. ViBe background model initialization can be completed using a single frame image. This technology can respond quickly when faced with sudden changes in the light source, that is, discard all original sample points and re-initialize:
  • N G (x) represents the spatial neighborhood of the pixel at position x
  • v y represents the color value of the pixel in this spatial neighborhood.
  • N sample points of the background model are randomly selected from neighborhood pixels and generated.
  • Adaptive deformation pixel correction mainly causes different types of deformation based on different focal lengths. For example, when the focal length is at the middle value of the device, it is easy to produce barrel deformation. When the focal length is too long, it is easy to produce a mixed deformation of barrel and pincushion deformation. No matter where All deformations will have a negative impact on the matching results, so the following formula is proposed to correct the pixels:
  • (x d , y d ) represents the pixel coordinates where deformation occurs
  • (x c , y c ) represents the central pixel coordinates with deformation
  • (x u , y u ) represents the pixel coordinates after correction
  • K 1 and K 2 respectively Represents first-order and second-order radiation distortion coefficients.
  • r represents the distance from the deformed coordinate to the center pixel coordinate.
  • ⁇ and 1- ⁇ represent the relative position of the camera’s focal length:
  • f max and f min represent the maximum and minimum values of the camera's focal length respectively
  • f c represents the actual focal length of the camera.
  • Step 2 Use the LK optical flow method to calculate the displacement of the matched foreground moving target in each direction per unit time:
  • LK optical flow finds their optical flow equations based on the same movement of 9 pixels in a square with a side length of 3, forming a system of 9 equations and 2 unknowns, which is solved using least squares fitting. The solution results are as follows:
  • the foreground images of the images to be registered on the left and right sides are:
  • a N ′ ⁇ p A1 ′,p A2 ′...p AN ′ ⁇
  • p Ai ′ represents the pixels that constitute the foreground set after A compensation
  • p Aix represents the pixel component in the x direction before compensation
  • p Aix ′ represents the pixel component after compensation in the x direction
  • u A represents the pixel in the x direction speed on the y direction
  • p Aiy represents the pixel component in the y direction before compensation
  • p Aiy ′ represents the pixel component after compensation in the y direction
  • u B represents the speed of the pixel in the y direction
  • ⁇ t is the physical time of the two images Difference
  • p Bi ′ represents the pixels that constitute the foreground set after B compensation
  • p Bix represents the pixel component in the x direction before compensation
  • p Bix ′ represents the pixel component after compensation in the x direction
  • u B represents the pixel in x
  • v is the physical time of the two images Difference
  • p Bi ′ represents the pixels that constitute the foreground set after B compensation
  • Step 3 Image A and image B to be registered respectively extract features through the DSCNN network.
  • the two feature extraction networks share weight parameters, and then go through a regression network composed of 5 fully connected layers to finally output the transformation of image registration. parameter.
  • the feature extraction network used in image registration is based on a dense convolutional neural network.
  • the network structure is composed of three dense network blocks and a convolution layer and a pooling layer.
  • the dense network block is composed of a convolution layer. It consists of a pooling layer and a dense network, and the output is flattened and connected to the regression network.
  • ResNets and Highway Netwoks have used different methods to confirm that the shorter the path from the starting value to the end of the neural network, the better it can alleviate the problem of gradient disappearance.
  • ResNets shunts the signal through identification connections to obtain better gradient flow and information.
  • the present invention introduces dense network blocks into the feature extraction network, and proposes a feature extraction network based on dense networks, which not only ensures the extraction of more comprehensive feature information, but also does not cause the problem of gradient disappearance, as shown in Figure 6.
  • Each layer of the dense network block in the figure is directly connected, ensuring that the information flow between each layer of the network is maximized.
  • the input of each layer in the dense network block is the output of all previous layers, ensuring the feed-forward characteristics of the network, and the feature map of this layer will also be passed to the subsequent layers as input.
  • the regression network of DSCNN in the present invention is composed of 5 fully connected layers. Fully connected layers are used to find non-linear relationships between features and advanced reasoning. The final output is transformation parameters that can align the two images.
  • Step 4 Perform image registration based on VGG16 for images where no moving target is extracted and the moving target fails to match.
  • the core idea of the image registration algorithm based on bidirectional compensation proposed by the present invention is to use different registration algorithms for images with different foregrounds to achieve higher registration accuracy, as shown in Figure 1.
  • DSCNN-based image registration is performed for images that successfully detect and match moving foregrounds.
  • VGG16-based image registration is directly performed without performing a two-way compensation operation to reduce the overall algorithm time complexity.
  • the image registration algorithm based on VGG16 mainly includes three parts: feature extraction network, feature matching network and regression network.
  • the feature extraction network part uses improved VGG-16.
  • the two-way matching network shares weights and the matching network is used for feature description. sub-matching, and its output outputs the parameters of the geometric transformation through the regression network.
  • the feature extraction network adopts a standard convolutional neural network architecture.
  • the image to be registered is input into a convolutional neural network without fully connected layers to generate a feature map.
  • the convolutional neural network intercepts part of the VGG-16 network, and its network structure is shown in Figure 7.
  • the matching network of the present invention also draws on this idea, adopts an association layer, and only considers the spatial position and similarity between descriptor pairs without considering the original descriptor.
  • the matching network of the present invention adopts the structure of correlation layer and normalization layer. Among them, the correlation layer calculates the similarity of all feature descriptors, and the normalization layer is used to process and normalize the similarity scores to remove fuzzy matching.

Abstract

一种面向虚拟现实元宇宙的先验双向补偿的全景图像配准方法,包括:对两幅待配准图像进行运动目标提取,对提取到运动目标的图像前景集进行自适应形变像素校正,提取图像特征,完成运动目标匹配;对匹配成功的运动目标进行运动方向和速度检测,根据两个摄像头物理时间的差值计算得到运动目标的速度,对各自运动目标进行补偿;对目标提取失败的图像、目标匹配失败的图像以及进行了双向补偿的图像进行图像配准。能够解决现有虚拟现实元宇宙场景下存在的因摄像机时钟不同步导致的图像中存在运动前景时图像配准效率低、效果差等技术问题。

Description

面向虚拟现实元宇宙的先验双向补偿的全景图像配准方法 技术领域
本发明属于深度学习和图像处理技术领域,尤其涉及一种面向虚拟现实元宇宙的先验双向补偿的全景图像配准方法。
背景技术
图像配准就是将单幅或多幅图片依据某种最优的变换映射到目标图片。基于灰度信息的图像配准方法是基于灰度信息寻求配准图像与待配准图像之间相似性最大化的过程。相似性度量方法一般采用互信息、平方差、Hausdorff距离、互相关系数、平方差和等度量等方法。其中互信息在图像融合中使用较为频繁。配准图像与待配准图像相似性最大化的过程实际上也是最优化的过程,通过对变换模型的参数进行不断调整,只有当相似性最大化时,变换模型的参数才能达到最优,将待配准图像按照该最优模型进行变换,完成与配准图像的配准。基于特征信息的图像配准利用特征提取算法提取特征,然后通过建立配准图像与待配准图像之间的特征点的映射关系得到变换模型,从而完成图像配准。
不同类型的图片在进行配准时采用的算法、评价标准也不同。随着医学方面各种设备的日渐进步,图像配准在医学图像处理方面重要性也愈来愈高。对不同时间拍摄的同一患者的图像进行配准,可以很好的分析患者器官和病理的动态变化状况,以便进行更加准确的医疗诊断,制定更具有针对性的治疗方案。图像配准也是材料力学方向研究力学性质的一种方法。将不同的传感器采集到的温度、形状等信息进行融合与比较得到温度场、形变场等各类数值。再把各类数值带入理论模型当中进行参数的优化。图像配准还可以用来进行运动目标自动追踪、模式识别和视频分析等。
然而,现有虚拟现实元宇宙场景下,因摄像机时钟不同步导致图像中存在运动前景时图像配准效率低、效果差等技术问题。
发明内容
解决的技术问题:现有虚拟现实元宇宙场景下因摄像机时钟不同步导致的图像中存在运动前景时图像配准效率低、效果差等技术问题。
技术方案:
一种面向虚拟现实元宇宙的先验双向补偿的全景图像配准方法,所述全景图像配准方法包括以下步骤:
S1、采用基于Vibe的背景消减法对两幅待配准图像进行运动目标提取,对提取到运动目标的图像前景集进行自适应形变像素校正,使用SURF特征提取算法提取图像特征,并根据特征阈值完成运动目标匹配;
S2、采用稀疏光流法对匹配成功的运动目标进行运动方向和速度检测,根据摄像机的物理时间计算逻辑时间,根据两个摄像头物理时间的差值计算得到运动目标的速度,对各自运动目标进行补偿;
S3、基于密集卷积神经网络构建第一图像配准模型,基于VGG16网络构建第二图像配准模型;对目标提取失败的图像、目标匹配失败的图像以及进行了双向补偿的图像进行图像配准,以图像是否进行双向补偿将图像分为两类,其中,采用第一图像配准模型对基于位置预测的双向补偿后的图像进行图像配准,采用第二图像配准模型对包括目标提取失败和目标配准失败的图像在内的没有进行双向补偿的图像进行图像配准。
进一步地,步骤S1中,采用基于Vibe的背景消减法对两幅待配准图像进行运动目标提取,对提取到运动目标的图像前景集进行自适应形变像素校正的过程包括以下步骤:
为待检测的图像序列中每个像素值都建立一个包含N个样本值的背景模型M(x),M(x)={v 1,v 2…v N},v i表示索引为i的背景样本,i=1,2,…,N;
更新背景模型M(x)如下:
M 0(x)={v y(y|y∈N G(x))}
式中,N G(x)表示x位置的像素的空间邻域,v y表示处于该空间邻域的像素点的颜色值;背景模型的样本点从邻域像素点随机抽取N个生成;
根据以下公式对像素进行校正:
Figure PCTCN2022107421-appb-000001
Figure PCTCN2022107421-appb-000002
Figure PCTCN2022107421-appb-000003
式中,(x d,y d)表示发生形变的像素坐标,(x c,y c)表示有形变的中心像素坐标,(x u,y u)表示矫正之后的像素坐标,K 1和K 2分别表示一阶和二阶放射畸变系数;r表示形变后的坐标到中心像素坐标的距离;α和1-α表示相机焦距所处的相对位置:
Figure PCTCN2022107421-appb-000004
Figure PCTCN2022107421-appb-000005
式中,f max、f min分别表示相机焦距的最大值和最小值,f c表示相机实际焦距。
进一步地,采用稀疏光流法计算前景中运动目标的速度和方向,得出在x、y两个方向上的速度分别为u、v:
Figure PCTCN2022107421-appb-000006
其中
Figure PCTCN2022107421-appb-000007
Figure PCTCN2022107421-appb-000008
表示第i个像素在x和y方向上的图像梯度,
Figure PCTCN2022107421-appb-000009
表示第i个像素时间上的梯度;
两幅待配准图像A和图像B中左侧图像前景集表示为:
A N={p A1,p A2…p AN};
右侧图像前景集表示为:
B N={p B1,p B2…p BN};
其中p Ai、p Bi表示构成图像A和图像B的前景集的像素;
右侧图像真正的物理时间t′与左侧图像时间t满足t′=t+Δt,Δt为两幅图像物理时间的差值;对图像A进行位置补偿的时间t A和对图像B进行位置补偿的时间t B分别为:
t A=t+Δt/2
t B=t′-Δt/2=t+Δt/2;
补偿后左侧前景集为:
A N′={p A1′,p A2′…p AN′}
Figure PCTCN2022107421-appb-000010
Figure PCTCN2022107421-appb-000011
其中p Ai′表示构成图像A补偿后的前景集的像素,p Aix表示补偿前x方向上的像素分量,p Aix′表示在x方向上补偿后的像素分量,u A表示该像素在x方向上的速度,p Aiy表示补偿前y方向上的像素分量,p Aiy′表示在y方向上补偿后的像素分量,u B表示该像素在y方向上的速度;
补偿后右侧图像前景集为:
B N′={p B1′,p B2′…p BN′}
Figure PCTCN2022107421-appb-000012
Figure PCTCN2022107421-appb-000013
其中p Bi′表示构成图像B补偿后的前景集的像素,p Bix表示补偿前x方向上的像素分量,p Bix′表示在x方向上补偿后的像素分量,u B表示该像素在x方向上的速度,p Biy表示补偿前y方向上的像素分量,p Biy′表示在y方向上补偿后的像素分量,v B表示该像素在y方向上的速度。
进一步地,步骤S3中,所述第一图像配准模型由依次连接的三个密集网络块、一个卷积层和一个第一池化层组成,其中每个密集网络块是由一个卷积层、一个第二池化层和一个密集网络依次连接组成;第一池化层的输出经过压平后与回归网络连接,回归网络采用五个完全连接层依次连接组成,用于输出图像配准的几何变换的参数。
进一步地,步骤S3中,所述第二图像配准模型包括依次连接的特征提取网络、特征匹配网络和回归网络,其中,特征提取网络包括依次连接的两个卷积层和一个池化层,用于提取待配准图像的相关特征,生成特征图;特征提取网络包括两路匹配网络,两路匹配网络共享权值,每个匹配网络用来采用相关图进行相关性计算,进行特征描述子的匹配,其输出通过回归网络输出图像配准的几何变换的参数。
进一步地,所述虚拟现实元宇宙的场景包括虚拟现实场景。
本发明针对摄像机不同步和前景运动目标导致的图像配准效率问题给出了补偿方案,首先使用背景消减法对图像进行目标提取,然后在目标匹配时考虑了两个相机的形变,进行自适应形变校正,最后对两幅图像中的前景分别进行补偿操作。本发明还提出了基于密集卷积神经网络的图像配准算法,提供了一种端到端获取变换参数的方法,最后对是否进行双向补 偿的图像采用不同的配准方案,提高算法效率。
有益效果:
第一,本发明针对不同步的摄像头拍摄到的图像,根据前景中是否包含运动物体采用了不同的配准方法;当前景中存在运动物体时,本发明的自适应形变像素校正和基于位置预测的双向补偿可以有效的避免不同步的时间差内运动物体造成的偏移,配准结果更加精准。
第二,本发明提出的基于DSCNN的图像配准方法可以实现端到端的输出配准所需的转换参数与传统的图像配准算法相比计算时间大幅缩小,有更加广阔的应用前景。
附图说明
图1为本发明的面向虚拟现实元宇宙的先验双向补偿的全景图像配准方法的整体流程图;
图2为目标提取流程图;
图3为考虑形变的目标匹配流程图;
图4为基于双向补偿的图像配准流程图;
图5为DSCNN图像配准流程图;
图6为DSCNN图像配准网络结构图;
图7为基于VGG-16的图像配准网络结构图。
具体实施方式
下面的实施例可使本专业技术人员更全面地理解本发明,但不以任何方式限制本发明。
本发明提出面向虚拟现实元宇宙的先验双向补偿的全景图像配准方法,基本包括运动目标提取、考虑形变的运动目标匹配、基于位置预测的双向补偿、针对不同前景集采用不同的配准方法,其中,虚拟现实元宇宙的场景,包括诸如虚拟现实场景,VR监控场景等。如图1所示,具体包括以下步骤:
步骤1:ViBe算法为待检测的图像序列中每个像素值都建立一个包含N个样本值的背景模型M(x),v i表示索引为i的背景样本:
M(x)={v 1,v 2…v N}
v(x)表示位置在x处的像素在给定的颜色空间中的颜色值,该像素的背景模型为M(x)。
ViBe假定相邻的像素点空间分布关系也相似。ViBe背景模型初始化是利用单帧图像就可以完成的,该技术在面临光源突然变化时可以快速响应,即将原来所有样本点全部抛弃,重新初始化:
M 0(x)={v y(y|y∈N G(x))}
N G(x)表示x位置的像素的空间邻域,v y表示处于该空间邻域的像素点的颜色值。具体实施时背景模型的样本点从邻域像素点随机抽取N个生成。
自适应形变像素校正主要是根据不同焦距会导致不同类型的形变,如焦距处于设备的中间值时容易产生桶形形变,焦距过长时,容易产生桶形和枕形形变的混合形变,无论哪种变形都会对匹配结果产生负面影响,故提出以下公式对像素进行校正:
Figure PCTCN2022107421-appb-000014
Figure PCTCN2022107421-appb-000015
Figure PCTCN2022107421-appb-000016
其中(x d,y d)表示发生形变的像素坐标,(x c,y c)表示有形变的中心像素坐标,(x u,y u)表示矫正之后的像素坐标,K 1和K 2分别表示一阶和二阶放射畸变系数。r表示形变后的坐标到中心像素坐标的距离。α和1-α表示相机焦距所处的相对位置:
Figure PCTCN2022107421-appb-000017
Figure PCTCN2022107421-appb-000018
其中f max、f min分别表示相机焦距的最大值和最小值,f c表示相机实际焦距。
步骤2:使用LK光流法计算匹配好的前景运动目标单位时间内在各个方向上的位移:
I(x,y,t)=I(x+dx,y+dy,t+dt)
右边进行泰勒级数展开,两边同除以dt得公式:
f xu+f yv+f t=0
其中
Figure PCTCN2022107421-appb-000019
f x,f y表示图像梯度,f t表示时间上的梯度。LK光流根据边长为3的正方形内的9个像素点具有相同的运动来找到他们的光流方程,组成9个等式2个未知数的方程组,使用最小二乘拟合求解。求解结果如下:
Figure PCTCN2022107421-appb-000020
由于摄像机内部晶振等各种原因导致右侧图像与左侧图像存在一个微小的时间间隔Δt,右侧图像真正的物理时间t′与左侧图像时间t满足t′=t+Δt。t时刻左侧待配准图像提取出的运动目标表示为A N={p A1,p A2…p AN},同一时间右侧待配准图像提取出的运动目标表示为B N={p B1,p B2…p BN},对两幅图像进行位置补偿的时间分别为:
t A=t+Δt/2
t B=t′-Δt/2=t+Δt/2。
位置补偿后左右两侧待配准图像的前景图为:
A N′={p A1′,p A2′…p AN′}
Figure PCTCN2022107421-appb-000021
Figure PCTCN2022107421-appb-000022
B N′={p B1′,p B2′…p BN′}
Figure PCTCN2022107421-appb-000023
Figure PCTCN2022107421-appb-000024
其中,p Ai′表示构成A补偿后的前景集的像素,p Aix表示补偿前x方向上的像素分量,p Aix′表示在x方向上补偿后的像素分量,u A表示该像素在x方向上的速度,p Aiy表示补偿前y方向上的像素分量,p Aiy′表示在y方向上补偿后的像素分量,u B表示该像素在y方向上的速度,Δt为两幅图像物理时间的差值;p Bi′表示构成B补偿后的前景集的像素,p Bix表示补偿前x方向上的像素分量,p Bix′表示在x方向上补偿后的像素分量,u B表示该像素在x方向上的速度,p Biy表示补偿前y方向上的像素分量,p Biy′表示在y方向上补偿后的像素分量,v B表示该像素在y方向上的速度。
步骤3:待配准图像A与待配准图像B分别经过DSCNN网络提取特征,两个特征提取网络共享权重参数,之后经过由5个全连接层组成的回归网络,最终输出图像配准的变换参 数。
图像配准所用到的特征提取网络为基于密集卷积神经网络的,该网络结构是由三个密集网络块和一个卷积层、一个池化层组成,其中密集网络块是由一个卷积层一个池化层和一个密集网络组成,输出经过压平后与回归网络连接。随着CNN卷积层数的增加,网络的输入信息和梯度信息会随着网络深度的增加而逐渐消失。ResNets和Highway Netwoks用不同的方法证实了神经网络从开始值至结束的路径越短,越能够缓解梯度消失的问题。ResNets通过标识连接将信号分流,获取了更好的梯度流和信息。FractalNets保证了网络的短路径,降低了梯度消失的影响。因此本发明将密集网络块引入特征提取网络,提出了一种基于密集网络的特征提取网络,既保证了提取较为全面的特征信息,又不会出现梯度消失问题,如图6所示。图中密集网络块的各层是直接相连接的,保证了各层网络之间的信息流达到最大。密集网络块中的每一层的输入都是前面所有层的输出,保证了网络的前馈特性,本层的特征映射也会传递给后面层作为输入。本发明中DSCNN的回归网络是由5个完全连接层组成。全连接层用来寻找特征之间的非线性关系和高级推理。最后输出可以对齐两幅图像的转换参数。
步骤4:对没有提取到运动目标和运动目标匹配失败的图像进行基于VGG16的图像配准。这是因为本发明提出的基于双向补偿的图像配准算法核心思想是针对不同前景的图像采用不同的配准算法以求较高的配准准确率,如图1所示。针对成功检测并匹配到运动前景的图像来进行基于DSCNN的图像配准,当检测不到目标或匹配不成功时,直接进行基于VGG16的图像配准,而不必进行双向补偿操作,以降低整体算法的时间复杂度。
基于VGG16的图像配准算法主要包括特征提取网络,特征匹配网络和回归网络三大部分,其中特征提取网络部分采用改进的VGG-16,两路匹配网络共享权值,匹配网络用来进行特征描述子的匹配,其输出通过回归网络输出几何变换的参数。特征提取网络采用标准的卷积神经网络架构,将待配准图像输入没有完全连接层的卷积神经网络,生成特征图。卷积神经网络截取VGG-16部分网络,其网络结构如图7所示。
经典的图像配准几何估计丢弃了原始的描述符,而关注描述符对之间的相似程度,这是因为描述符对相似程度和空间位置已包含了几何估计所需的必要信息,通过阈值化相似值仅保留最相似邻居的匹配去修建描述符对。本发明匹配网络也借鉴了这种思想,采用关联层,只考虑描述符对之间的空间位置和相似程度而不考虑原始描述符。本发明匹配网络采用相关层和归一化层的结构。其中,相关层计算所有特征描述子的相似性,归一化层用于对相似度分数进行处理和归一化以便去除模糊匹配。
应当指出,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进 和修饰,这些改进和修饰也在本申请权利要求的保护范围内。

Claims (6)

  1. 一种面向虚拟现实元宇宙的先验双向补偿的全景图像配准方法,其特征在于,所述全景图像配准方法包括以下步骤:
    S1、采用基于Vibe的背景消减法对两幅待配准图像进行运动目标提取,对提取到运动目标的图像前景集进行自适应形变像素校正,使用SURF特征提取算法提取图像特征,并根据特征阈值完成运动目标匹配;
    S2、采用稀疏光流法对匹配成功的运动目标进行运动方向和速度检测,根据摄像机的物理时间计算逻辑时间,根据两个摄像头物理时间的差值计算得到运动目标的速度,对各自运动目标进行补偿;
    S3、基于密集卷积神经网络构建第一图像配准模型,基于VGG16网络构建第二图像配准模型;对目标提取失败的图像、目标匹配失败的图像以及进行了双向补偿的图像进行图像配准,以图像是否进行双向补偿将图像分为两类,其中,采用第一图像配准模型对基于位置预测的双向补偿后的图像进行图像配准,采用第二图像配准模型对包括目标提取失败和目标配准失败的图像在内的没有进行双向补偿的图像进行图像配准。
  2. 根据权利要求1所述的面向虚拟现实元宇宙的先验双向补偿的全景图像配准方法,其特征在于,步骤S1中,采用基于Vibe的背景消减法对两幅待配准图像进行运动目标提取,对提取到运动目标的图像前景集进行自适应形变像素校正的过程包括以下步骤:
    为待检测的图像序列中每个像素值都建立一个包含N个样本值的背景模型M(x),M(x)={v 1,v 2…v N},v i表示索引为i的背景样本,i=1,2,…,N;
    更新背景模型M(x)如下:
    M 0(x)={v y(y|y∈N G(x))}
    式中,N G(x)表示x位置的像素的空间邻域,v y表示处于该空间邻域的像素点的颜色值;背景模型的样本点从邻域像素点随机抽取N个生成;
    根据以下公式对像素进行校正:
    Figure PCTCN2022107421-appb-100001
    Figure PCTCN2022107421-appb-100002
    Figure PCTCN2022107421-appb-100003
    式中,(x d,y d)表示发生形变的像素坐标,(x c,y c)表示有形变的中心像素坐标,(x u,y u)表示矫正之后的像素坐标,K 1和K 2分别表示一阶和二阶放射畸变系数;r表示形变后的坐标到中心像素坐标的距离;α和1-α表示相机焦距所处的相对位置:
    Figure PCTCN2022107421-appb-100004
    Figure PCTCN2022107421-appb-100005
    式中,f max、f min分别表示相机焦距的最大值和最小值,f c表示相机实际焦距。
  3. 根据权利要求1所述的面向虚拟现实元宇宙的先验双向补偿的全景图像配准方法,其特征在于,步骤S2中,对各自运动目标进行补偿的过程包括以下步骤:
    采用稀疏光流法计算前景中运动目标的速度和方向,得出在x、y两个方向上的速度分别为u、v:
    Figure PCTCN2022107421-appb-100006
    其中
    Figure PCTCN2022107421-appb-100007
    Figure PCTCN2022107421-appb-100008
    表示第i个像素在x和y方向上的图像梯度,
    Figure PCTCN2022107421-appb-100009
    表示第i个像素时间上的梯度;
    两幅待配准图像A和图像B中左侧图像前景集表示为:
    A N={p A1,p A2…p AN};
    右侧图像前景集表示为:
    B N={p B1,p B2…p BN};
    其中p Ai、p Bi表示构成图像A和图像B的前景集的像素;
    右侧图像真正的物理时间t′与左侧图像时间t满足t′=t+Δt,Δt为两幅图像物理时间的差值;对图像A进行位置补偿的时间t A和对图像B进行位置补偿的时间t B分别为:
    t A=t+Δt/2
    t B=t′-Δt/2=t+Δt/2;
    补偿后左侧前景集为:
    A N′={p A1′,p A2′…p AN′}
    Figure PCTCN2022107421-appb-100010
    Figure PCTCN2022107421-appb-100011
    其中p Ai′表示构成图像A补偿后的前景集的像素,p Aix表示补偿前x方向上的像素分量,p Aix′表示在x方向上补偿后的像素分量,u A表示该像素在x方向上的速度,p Aiy表示补偿前y方向上的像素分量,p Aiy′表示在y方向上补偿后的像素分量,u B表示该像素在y方向上的速度;
    补偿后右侧图像前景集为:
    B N′={p B1′,p B2′…p BN′}
    Figure PCTCN2022107421-appb-100012
    Figure PCTCN2022107421-appb-100013
    其中p Bi′表示构成图像B补偿后的前景集的像素,p Bix表示补偿前x方向上的像素分量,p Bix′表示在x方向上补偿后的像素分量,u B表示该像素在x方向上的速度,p Biy表示补偿前y方向上的像素分量,p Biy′表示在y方向上补偿后的像素分量,v B表示该像素在y方向上的速度。
  4. 根据权利要求1所述的面向虚拟现实元宇宙的先验双向补偿的全景图像配准方法,其特征在于,步骤S3中,所述第一图像配准模型由依次连接的三个密集网络块、一个卷积层和一个第一池化层组成,其中每个密集网络块是由一个卷积层、一个第二池化层和一个密集网络依次连接组成;第一池化层的输出经过压平后与回归网络连接,回归网络采用五个完全连接层依次连接组成,用于输出图像配准的几何变换的参数。
  5. 根据权利要求1所述的面向虚拟现实元宇宙的先验双向补偿的全景图像配准方法,其特征在于,步骤S3中,所述第二图像配准模型包括依次连接的特征提取网络、特征匹配网络和回归网络,其中,特征提取网络包括依次连接的两个卷积层和一个池化层,用于提取待配 准图像的相关特征,生成特征图;特征提取网络包括两路匹配网络,两路匹配网络共享权值,每个匹配网络用来采用相关图进行相关性计算,进行特征描述子的匹配,其输出通过回归网络输出图像配准的几何变换的参数。
  6. 根据权利要求1所述的面向虚拟现实元宇宙的先验双向补偿的全景图像配准方法,其特征在于,所述虚拟现实元宇宙的场景包括虚拟现实场景。
PCT/CN2022/107421 2022-03-28 2022-07-22 面向虚拟现实元宇宙的先验双向补偿的全景图像配准方法 WO2023184795A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023517808A JP2024515913A (ja) 2022-03-28 2022-07-22 仮想現実メタバース向けの事前双方向補償ベースのパノラマ画像位置合わせ方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210316082.5A CN114842058A (zh) 2022-03-28 2022-03-28 面向虚拟现实的先验驱动双向补偿的全景图像配准方法
CN202210316082.5 2022-03-28

Publications (1)

Publication Number Publication Date
WO2023184795A1 true WO2023184795A1 (zh) 2023-10-05

Family

ID=82563796

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/107421 WO2023184795A1 (zh) 2022-03-28 2022-07-22 面向虚拟现实元宇宙的先验双向补偿的全景图像配准方法

Country Status (3)

Country Link
JP (1) JP2024515913A (zh)
CN (1) CN114842058A (zh)
WO (1) WO2023184795A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010046309A1 (en) * 2000-03-30 2001-11-29 Toshio Kamei Method and system for tracking a fast moving object
CN104778653A (zh) * 2014-11-28 2015-07-15 浙江工商大学 一种图像配准的方法
CN110059699A (zh) * 2019-03-18 2019-07-26 中南大学 一种基于卷积神经网络的图像中天际线自动检测方法
CN114066955A (zh) * 2021-11-19 2022-02-18 安徽大学 一种红外光图像配准到可见光图像的配准方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010046309A1 (en) * 2000-03-30 2001-11-29 Toshio Kamei Method and system for tracking a fast moving object
CN104778653A (zh) * 2014-11-28 2015-07-15 浙江工商大学 一种图像配准的方法
CN110059699A (zh) * 2019-03-18 2019-07-26 中南大学 一种基于卷积神经网络的图像中天际线自动检测方法
CN114066955A (zh) * 2021-11-19 2022-02-18 安徽大学 一种红外光图像配准到可见光图像的配准方法

Also Published As

Publication number Publication date
JP2024515913A (ja) 2024-04-11
CN114842058A (zh) 2022-08-02

Similar Documents

Publication Publication Date Title
CN107392964B (zh) 基于室内特征点和结构线结合的室内slam方法
WO2020042419A1 (zh) 基于步态的身份识别方法、装置、电子设备
EP2426642B1 (en) Method, device and system for motion detection
CN102819847B (zh) 基于ptz移动摄像头的运动轨迹提取方法
CN110555408B (zh) 一种基于自适应映射关系的单摄像头实时三维人体姿态检测方法
CN105631899B (zh) 一种基于灰度纹理特征的超声图像运动目标跟踪方法
CN109389630B (zh) 可见光图像与红外图像特征点集确定、配准方法及装置
CN111639580B (zh) 一种结合特征分离模型和视角转换模型的步态识别方法
CN111783748A (zh) 人脸识别方法、装置、电子设备及存储介质
CN112419497A (zh) 基于单目视觉的特征法与直接法相融合的slam方法
CN111536981A (zh) 一种嵌入式的双目非合作目标相对位姿测量方法
CN113744315B (zh) 一种基于双目视觉的半直接视觉里程计
CN115035546B (zh) 三维人体姿态检测方法、装置及电子设备
CN111401113A (zh) 一种基于人体姿态估计的行人重识别方法
CN112989889B (zh) 一种基于姿态指导的步态识别方法
CN114463619B (zh) 基于集成融合特征的红外弱小目标检测方法
Saif et al. Crowd density estimation from autonomous drones using deep learning: challenges and applications
WO2023184795A1 (zh) 面向虚拟现实元宇宙的先验双向补偿的全景图像配准方法
Cai et al. A target tracking method based on KCF for omnidirectional vision
CN112381774A (zh) 一种基于多角度深度信息融合的奶牛体况评分方法及系统
CN115457127A (zh) 基于特征观测数和imu预积分的自适应协方差方法
CN112069997B (zh) 一种基于DenseHR-Net的无人机自主着陆目标提取方法及装置
CN111160115B (zh) 一种基于孪生双流3d卷积神经网络的视频行人再识别方法
JP7253967B2 (ja) 物体対応付け装置、物体対応付けシステム、物体対応付け方法及びコンピュータプログラム
Xu et al. Research on target tracking algorithm based on parallel binocular camera

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2023517808

Country of ref document: JP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22934617

Country of ref document: EP

Kind code of ref document: A1