CN102509104A

CN102509104A - Confidence map-based method for distinguishing and detecting virtual object of augmented reality scene

Info

Publication number: CN102509104A
Application number: CN2011102998574A
Authority: CN
Inventors: 陈小武; 赵沁平; 穆珺; 王哲
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2011-09-30
Filing date: 2011-09-30
Publication date: 2012-06-20
Anticipated expiration: 2031-09-30
Also published as: CN102509104B

Abstract

A method for discriminating and detecting virtual objects in an augmented reality scene based on a confidence map, including: selecting virtual and real classification features; using virtual and real classification features; constructing a pixel-level virtual and real classifier; using virtual and real classification features to extract the regional comparison between the augmented reality scene and the real scene features, build a region-level virtual-real classifier; given a test augmented reality scene, use a pixel-level virtual-real classifier and a small-sized detection window for detection, and obtain a virtual score map reflecting the virtual-real classification result of each pixel; define a virtual confidence map, use The virtual confidence map of the test augmented reality scene is obtained by thresholding; according to the distribution of high virtual response points in the virtual confidence map, the rough shape and position of the bounding box of the virtual object are obtained; in the test augmented reality scene, the region-level virtual-real classifier and The large-size detection window is used for detection, and the final detection result of the virtual object is obtained. The invention can be applied to fields such as film and television production, digital entertainment, education and training.

Description

Discrimination and detection method of virtual objects in augmented reality scenes based on confidence map

技术领域 technical field

本发明涉及图像处理、计算机视觉和增强现实领域，具体地说是一种基于置信度图的增强现实场景虚拟对象判别和检测方法。The invention relates to the fields of image processing, computer vision and augmented reality, in particular to a method for discriminating and detecting a virtual object in an augmented reality scene based on a confidence map.

背景技术 Background technique

增强现实是虚拟现实的进一步拓展，它借助必要的设备使计算机生成的虚拟对象与客观存在的真实环境共存于同一个增强现实系统中，从感官和体验效果上给用户呈现出虚拟对象与真实环境融为一体的增强现实环境。随着增强现实技术的发展，具有较高的图像真实感的增强现实场景的出现，急需度量和评价增强现实场景可信度的标准和依据。如何判定一个场景是否增强现实场景，并且进一步地将增强现实场景中的虚拟对象检测出来，作为增强现实场景图像可信度评价的一个途径，有重要的研究意义和应用需求。Augmented reality is a further extension of virtual reality. With the help of necessary equipment, computer-generated virtual objects and objectively existing real environments coexist in the same augmented reality system, presenting virtual objects and real environments to users from the perspective of sensory and experience effects. An integrated augmented reality environment. With the development of augmented reality technology and the emergence of augmented reality scenes with higher image realism, there is an urgent need for standards and basis for measuring and evaluating the credibility of augmented reality scenes. How to determine whether a scene is an augmented reality scene, and further detect the virtual objects in the augmented reality scene, as a way to evaluate the credibility of augmented reality scene images, has important research significance and application requirements.

2011年，意大利特伦多大学的研究人员提出一种图像伪造鉴别方法，该方法能将融入真实场景中的计算机生成成分检测出来。该工作是已知的现有工作中唯一一个以增强现实场景为处理对象的。但是该工作所进行的检测不是以对象为单位，而是只检测增强现实场景中的虚拟成分，即检测结果可能是一个区域，也可能是零散分布的点。In 2011, researchers at the University of Trento in Italy proposed an image forgery identification method that can detect computer-generated components integrated into real scenes. This work is the only known existing work dealing with augmented reality scenes. However, the detection in this work is not based on the object, but only detects the virtual components in the augmented reality scene, that is, the detection result may be a region or scattered points.

2005年美国达特茅斯大学的研究人员提出了基于小波分解的自然图像统计模型并采用支持向量机和线性鉴别分析来分类虚拟图像和真实图像的方法。该首先提取彩色图像小波分解后各个子带和方向上分解系数的四阶统计特征(均值、方差、偏度、峰度)；同时考虑小波分解后相邻分解系数之间的四阶线性预测误差特征，然后利用支持向量机和线性判别分析法训练出分类器，再将测试集输入训练好的分类器得到分类结果。该方法的虚实分类是针对整张图像进行的，且分类准确率随着虚实分类特征的提取区域大小不同而有较大波动。In 2005, researchers at Dartmouth University in the United States proposed a statistical model of natural images based on wavelet decomposition and used support vector machines and linear discriminant analysis to classify virtual images and real images. First, extract the fourth-order statistical features (mean, variance, skewness, kurtosis) of the decomposition coefficients in each sub-band and direction of the color image after wavelet decomposition; at the same time, consider the fourth-order linear prediction error between adjacent decomposition coefficients after wavelet decomposition features, and then use the support vector machine and linear discriminant analysis to train the classifier, and then input the test set into the trained classifier to get the classification result. The virtual and real classification of this method is carried out for the entire image, and the classification accuracy fluctuates greatly with the size of the extraction area of the virtual and real classification features.

2007年，美国纽约科技大学的研究人员提出了利用颜色滤波阵列插值检测特点以及图像中色差一致性来区分虚拟图像和真实图像的方法。该方法首先从训练集正负样本中提取基于颜色滤波阵列插值检测特点以及图像中色差一致性的特征，然后将提取的特征输入支持向量机中训练出分类器，再将测试集输入训练好的分类器得到分类结果。In 2007, researchers at the New York University of Science and Technology proposed a method to distinguish virtual images from real images by using color filter array interpolation detection characteristics and the consistency of color difference in images. This method first extracts the features based on the color filter array interpolation detection characteristics and the consistency of color difference in the image from the positive and negative samples of the training set, and then inputs the extracted features into the support vector machine to train the classifier, and then inputs the test set into the trained The classifier gets the classification result.

2009年加拿大艾伯塔大学的研究人员提出了利用图像块重采样参数的一致性来分类虚拟图像和真实图像的方法。该方法的原理是基于虚拟图像生成中对模型表面纹理映射的过程可能会用到对纹理图像的旋转、缩放等操作，造成虚拟图像中各图像块重采样的参数不一致。这样就可以通过检测图像块重采样的参数是否一致来区别虚拟图像和真实图像。该方法的图像块重采样的参数估计是针对整张图像进行。In 2009, researchers at the University of Alberta in Canada proposed a method to classify virtual images and real images by using the consistency of image block resampling parameters. The principle of this method is based on the fact that the process of mapping the surface texture of the model in the virtual image generation may use operations such as rotation and scaling of the texture image, resulting in inconsistencies in the resampling parameters of each image block in the virtual image. In this way, the virtual image and the real image can be distinguished by detecting whether the parameters of image block resampling are consistent. The parameter estimation of image block resampling in this method is carried out for the whole image.

2004年，美国康柏电脑公司剑桥研究实验室的研究人员提出了利用基于哈尔滤波器并采用AdaBoost分类算法来进行人脸检测的方法。该方法首先从训练集中提取分类特征，再训练出基于人脸和非人脸统计特征的分类器，然后将提取的待检测图像的分类特征输入分类器并通过级联分类器来减少需计算的检测窗口的数目以提高效率，最终得到检测结果。该方法的特征提取是基于哈尔滤波器，描述的是人脸固有结构带来的区域对比度。In 2004, researchers at the Cambridge Research Laboratory of Compaq Computer Corporation of the United States proposed a method for face detection based on the Haar filter and the AdaBoost classification algorithm. This method first extracts the classification features from the training set, and then trains a classifier based on the statistical features of faces and non-faces, and then inputs the extracted classification features of the image to be detected into the classifier and reduces the calculation time by cascading classifiers. The number of detection windows is used to improve efficiency, and finally the detection results are obtained. The feature extraction of this method is based on the Haar filter, which describes the regional contrast brought about by the inherent structure of the face.

2005年，法国国立计算机及自动化研究院的研究人员提出了利用方向梯度直方图和线性支持向量机进行人物检测的方法。该方法分首先对输入图片进行颜色归一化，然后计算图片中的梯度，统计落在不同方向和方位区间的像素点，并对重叠的空间块进行对比归一化，再生成每个检测窗的方向梯度直方图，最后用线性支持向量机分类器分类出人物/非人物区域，得到检测结果。该方法相比于其他检测方法有更高的检测效果，但要求图片中的人物要大致保持竖直站立的状态。该方法特征提取采用的是图像梯度直方图，描述的是人体轮廓的固有特点。In 2005, researchers at the French National Institute of Computer and Automation proposed a method for person detection using a histogram of oriented gradients and a linear support vector machine. This method firstly normalizes the color of the input image, then calculates the gradient in the image, counts the pixels falling in different directions and orientation intervals, compares and normalizes the overlapping spatial blocks, and then generates each detection window The histogram of the direction gradient, and finally use the linear support vector machine classifier to classify the person/non-person area, and get the detection result. Compared with other detection methods, this method has a higher detection effect, but requires the characters in the picture to roughly maintain a vertical standing state. The feature extraction of this method uses the image gradient histogram, which describes the inherent characteristics of the human body contour.

上述区分虚拟图像和真实图像的方法，共同点是它们所提取的虚实分类特征都不适用于针对图像中任意给定区域的虚实分类。此外，现有的对象检测的工作中，一般处理的对象都有较强的易于描述的外观特点作为先验信息。相对而言，增强现实场景中的虚拟对象检测，其检测目标(即虚拟对象)不具有外观上显式的易于描述的先验信息，如颜色、形状、大小等，因此判别和检测难度较大。The above-mentioned methods for distinguishing virtual images from real images have in common that none of the features extracted by them for the virtual-real classification are suitable for any given region in the image. In addition, existing object detection works generally deal with objects that have strong and easy-to-describe appearance characteristics as prior information. Relatively speaking, in the detection of virtual objects in augmented reality scenes, the detection target (that is, the virtual object) does not have explicit prior information that is easy to describe in appearance, such as color, shape, size, etc., so it is difficult to distinguish and detect .

发明内容 Contents of the invention

本发明的技术解决方案：克服现有技术的不足，提供一种基于置信度图的增强现实场景虚拟对象判别和检测方法，该方法不需要预先知道虚拟对象的任何外观信息，如颜色、形状、大小，也不需要知道虚拟对象在增强现实场景中所处的位置，而是利用区分虚拟对象与真实图像的物理成像差异，进行虚实分类特征提取，分别计算训练集正负样本的区域自身特征与区域对比特征，并构造出像素级虚实分类器与区域级虚实分类器；在此基础上，通过基于虚拟置信度图的虚拟对象判别与检测进行虚拟对象初步定形定位和精确检测。The technical solution of the present invention: overcome the deficiencies of the prior art, and provide a method for discriminating and detecting virtual objects in augmented reality scenes based on confidence maps, which does not need to know any appearance information of virtual objects in advance, such as color, shape, It does not need to know the position of the virtual object in the augmented reality scene, but uses the physical imaging difference between the virtual object and the real image to extract the virtual and real classification features, and calculates the region's own characteristics of the positive and negative samples of the training set and Based on the regional comparison features, a pixel-level virtual-real classifier and a region-level virtual-real classifier are constructed; on this basis, the virtual object is initially shaped and accurately detected through the virtual object discrimination and detection based on the virtual confidence map.

本发明采用的技术方案：基于置信度图的增强现实场景虚拟对象判别和检测方法，步骤如下：构建增强现实场景训练数据集，并利用虚拟对象与真实图像的物理成像差异，选取虚实分类特征；在训练数据集上，利用虚实分类特征，分别提取增强现实场景与真实场景的区域自身特征，构建像素级虚实分类器；在训练数据集上，利用虚实分类特征，分别提取增强现实场景与真实场景的区域对比特征，构建区域级虚实分类器；给定测试增强现实场景，利用像素级虚实分类器和小尺寸检测窗进行检测，得到反映每个像素虚实分类结果的虚拟得分图；定义虚拟置信度图，并在虚拟得分图的基础之上，利用阈值化得到测试增强现实场景的虚拟置信度图；根据虚拟置信度图中高虚拟响应点的分布情况，得到虚拟对象包围盒的粗略形状和位置；在虚拟对象粗略定位的基础之上，在测试增强现实场景图像中利用区域级虚实分类器和大尺寸检测窗进行检测，得到虚拟对象的最终检测结果。The technical scheme adopted by the present invention: a method for discriminating and detecting virtual objects in augmented reality scenes based on a confidence map, the steps are as follows: construct an augmented reality scene training data set, and use the physical imaging difference between virtual objects and real images to select virtual and real classification features; On the training data set, use the virtual and real classification features to extract the regional characteristics of the augmented reality scene and the real scene respectively, and construct a pixel-level virtual and real classifier; on the training data set, use the virtual and real classification features to extract the augmented reality scene and the real scene respectively Construct a region-level false-real classifier based on the regional comparison features of the region; given a test augmented reality scene, use a pixel-level virtual-real classifier and a small-sized detection window for detection, and obtain a virtual score map reflecting the virtual-real classification result of each pixel; define virtual confidence , and on the basis of the virtual score map, thresholding is used to obtain a virtual confidence map of the test augmented reality scene; according to the distribution of high virtual response points in the virtual confidence map, the rough shape and position of the virtual object's bounding box are obtained; Based on the rough positioning of the virtual object, the region-level virtual-real classifier and large-size detection window are used to detect in the test augmented reality scene image, and the final detection result of the virtual object is obtained.

构建增强现实场景训练数据集。在训练数据集中，将包含虚拟对象的增强现实场景图像作为正样本，将真实场景图像作为负样本。利用虚拟对象与真实图像的物理成像差异，选取虚实分类特征。选取的虚拟分类特征包括：局部统计量、表面梯度、第二基本形式、贝尔特拉米流。在图像的每一个像素点处都可以提取得到该点对应的上述虚实分类特征。Build an augmented reality scene training dataset. In the training dataset, images of augmented reality scenes containing virtual objects are used as positive samples, and images of real scenes are used as negative samples. Using the physical imaging difference between the virtual object and the real image, the virtual and real classification features are selected. The selected virtual classification features include: local statistics, surface gradient, second fundamental form, Beltrami flow. At each pixel point of the image, the above-mentioned virtual-real classification feature corresponding to the point can be extracted.

在训练数据集上，利用虚实分类特征，提取增强现实场景的区域自身特征，构建像素级虚实分类器。在构建像素级分类器时，对增强现实场景图像，只选取虚拟对象区域作为正样本区域；而对真实场景图像，只选取与正样本中虚拟对象相近似的区域作为负样本区域。对于给定的图像区域，计算出区域内每一点的虚实分类特征(包括：局部统计量、表面梯度、第二基本形式、贝尔特拉米流)；利用转动惯量压缩方法对给定区域的虚实分类特征进行压缩，得到该区域对应的区域自身特征。将正负样本的区域自身特征集合输入支持向量机分类器进行训练，得到像素级虚实分类器。On the training data set, the features of virtual and real classification are used to extract the region's own characteristics of the augmented reality scene, and a pixel-level virtual and real classifier is constructed. When building a pixel-level classifier, for augmented reality scene images, only virtual object regions are selected as positive sample regions; for real scene images, only regions similar to virtual objects in positive samples are selected as negative sample regions. For a given image area, the virtual and real classification features of each point in the area are calculated (including: local statistics, surface gradient, second basic form, Beltrami flow); The classification features are compressed to obtain the region's own characteristics corresponding to the region. Input the region's own feature set of positive and negative samples into the support vector machine classifier for training, and obtain a pixel-level virtual-real classifier.

在训练数据集上，利用虚实分类特征，提取增强现实场景的区域对比特征，构建区域级虚实分类器。对于正负样本区域，将其本身视为待判定的对象区域；而将区域包围盒之外的等面积矩形区域视为对象所处的背景区域；分别提取对象区域与背景区域内每一点的虚实分类特征；统计对象区域与背景区域内所有点对应的虚实分类特征，分别构成对象区域特征的联合分布直方图与背景区域特征的联合分布直方图；计算两个直方图之间的卡方距离，将其视为衡量对象与其所处背景之间差异的特征，称为区域对比特征；将提取的正负样本的区域对比特征集合输入支持向量机分类器进行训练，得到区域级虚实分类器。On the training data set, the virtual-real classification feature is used to extract the regional contrast features of the augmented reality scene, and a region-level virtual-real classifier is constructed. For the positive and negative sample areas, it is regarded as the object area to be determined; and the equal-area rectangular area outside the bounding box of the area is regarded as the background area where the object is located; the virtual and real points of each point in the object area and the background area are respectively extracted Classification features; count the virtual and real classification features corresponding to all points in the object area and the background area, respectively constitute the joint distribution histogram of the object area features and the joint distribution histogram of the background area features; calculate the chi-square distance between the two histograms, It is regarded as a feature to measure the difference between the object and its background, which is called the regional contrast feature; the extracted regional contrast feature set of positive and negative samples is input into the support vector machine classifier for training, and a region-level virtual-real classifier is obtained.

虚拟得分图构建，步骤是对于输入的增强现实场景图像，利用小尺寸检测窗(检测窗尺寸为[10，30]×[10，30]像素)以较小的移动步长(如{1，2，3，4，5}像素)扫描整幅图像；计算每个小尺寸检测窗内的小图像块的区域自身特征；将所有小图像块的区域自身特征输入到像素级虚实分类器中，得到每个小图像块的区域自身特征得分，得分高表示像素级分类器将该图像块分类为虚拟区域的确定度高；由于检测窗的尺寸相对整幅图像很小且分布密集，因此可以将每个小图像块的区域自身特征得分映射到该图像块的中心像素，并将其作为该中心像素点的虚拟得分；由此构成了整个增强现实场景图像的虚拟得分图。该过程可通过二维积分图提高计算效率。The virtual score map is constructed. The step is to use a small-sized detection window (the detection window size is [10, 30] × [10, 30] pixels) for the input augmented reality scene image with a small moving step (such as {1, 2, 3, 4, 5} pixels) to scan the entire image; calculate the regional self-features of the small image blocks in each small-size detection window; input the regional self-features of all small image blocks into the pixel-level virtual-real classifier, Obtain the region's own feature score of each small image block, and a high score indicates that the pixel-level classifier has a high degree of certainty in classifying the image block as a virtual area; since the size of the detection window is relatively small and densely distributed compared to the entire image, it is possible to The regional characteristic score of each small image block is mapped to the central pixel of the image block, and it is used as the virtual score of the central pixel; thus, a virtual score map of the entire augmented reality scene image is formed. This procedure improves computational efficiency through 2D integral plots.

虚拟置信度图构建，步骤是对于增强现实场景图像的虚拟得分图进行阈值化处理，记录所有虚拟得分为正的点；设置一个固定的百分比N％，记录所有虚拟得分为正的点的前N％以及这些点在原图像上所处的位置，这些点称为高虚拟响应点；设置一个固定且相对较小的常数M(如令M∈[10，100])，记录所有虚拟得分为正的点的前M个点以及这些点在原图像上所处的位置，这些点称为最高虚拟响应点；通过参数设置可以保证，最高虚拟响应点同时也包含于高虚拟响应点所在的集合，即最高虚拟响应点是高虚拟响应点中虚拟得分值最高的一部分。综合高虚拟响应点、最高虚拟响应点及其所在原图像上的位置信息，构成虚拟置信度图。The virtual confidence map is constructed, and the step is to threshold the virtual score map of the augmented reality scene image, and record all points with positive virtual scores; set a fixed percentage N%, and record the top N of points with positive virtual scores % and the positions of these points on the original image, these points are called high virtual response points; set a fixed and relatively small constant M (such as let M∈[10, 100]), record all virtual scores as positive The first M points of the point and the positions of these points on the original image are called the highest virtual response points; through parameter setting, it can be guaranteed that the highest virtual response point is also included in the set of high virtual response points, that is, the highest The virtual response point is the part with the highest virtual score value among the high virtual response points. The high virtual response point, the highest virtual response point and its location information on the original image are combined to form a virtual confidence map.

虚拟对象包围盒的粗略形状和位置推理步骤如下：对得到的虚拟置信度图，将其分为五个等面积、可重叠的子区域，分别求得每个子区域中的高虚拟响应点的分布中心；将子区域中心视为候选的虚拟对象中心点，从各个中心点分别向外扩展搜索得到高虚拟响应点分布密集的区域，对于高虚拟响应点分布密集的区域，近似推算出该区域内的候选对象形状(体现为候选的虚拟对象包围盒)，结合该区域的位置信息，构成虚拟对象初步候选区域；在多个虚拟对象初步候选区域中，根据其各自包含的高虚拟响应点与最高虚拟响应点的数目，选择出加权数目最多的一个，将其作为虚拟对象候选区域，该区域即包含了虚拟对象包围盒粗略形状与位置信息。The rough shape and position inference steps of the virtual object bounding box are as follows: Divide the obtained virtual confidence map into five equal-area, overlapping sub-regions, and obtain the distribution of high virtual response points in each sub-region respectively Center; regard the center of the sub-area as the candidate center point of the virtual object, and search outwards from each center point to obtain the area with dense distribution of high virtual response points. For the area with dense distribution of high virtual response points, approximately calculate the The candidate object shape (reflected as the candidate virtual object bounding box), combined with the location information of the region, constitutes a virtual object preliminary candidate region; in multiple virtual object preliminary candidate regions, according to their respective high virtual response points and the highest The number of virtual response points, select the one with the largest weighted number, and use it as a virtual object candidate area, which contains the rough shape and position information of the virtual object's bounding box.

对于得到的虚拟对象的粗略定位，进一步优化，得到虚拟对象的最终检测结果。具体步骤为：在虚拟对象候选区域周围取面积为虚拟对象候选区域两倍的区域，在该区域内构造形状大小和虚拟对象候选区域相同的多个相互重叠的大尺寸检测窗(大尺寸检测窗的尺寸范围通常为[200，500]×[200，500]，其长度和宽度的具体取值等于虚拟对象候选区域中虚拟对象包围盒的长度和宽度)；取每个大尺寸检测窗内图像块并计算其区域对比特征；将所有大尺度检测窗内图像块的区域对比特征输入区域级虚实分类器进行分类，选出对应得分最高的检测窗作为虚拟对象的最终检测结果。The obtained rough positioning of the virtual object is further optimized to obtain the final detection result of the virtual object. The specific steps are: take an area around the virtual object candidate area twice as large as the virtual object candidate area, and construct a plurality of overlapping large-size detection windows (large-size detection windows) with the same shape and size as the virtual object candidate area in this area. The size range of is usually [200, 500] × [200, 500], and the specific values of its length and width are equal to the length and width of the virtual object bounding box in the virtual object candidate area); take each large-size detection window image block and calculate its regional contrast features; the regional contrast features of all image blocks in the large-scale detection window are input into the region-level virtual-real classifier for classification, and the detection window with the highest corresponding score is selected as the final detection result of the virtual object.

本发明与现有的技术相比，其有益效果是：Compared with the prior art, the present invention has the beneficial effects of:

(1)本发明以增强现实场景中的虚拟对象为检测对象，可以将增强现实场景中的虚拟对象作为一个整体判别和检测出来。(1) The present invention takes the virtual object in the augmented reality scene as the detection object, and can distinguish and detect the virtual object in the augmented reality scene as a whole.

(2)本发明构建了两级虚实分类器，包括像素级虚实分类器和区域级虚实分类器，满足置信度图构建和虚拟对象最终检测的需求。(2) The present invention builds a two-level virtual-real classifier, including a pixel-level virtual-real classifier and an area-level virtual-real classifier, to meet the requirements of confidence map construction and virtual object final detection.

(3)本发明建了一个置信度图，基于虚拟置信度图，能在没有虚拟对象外观、形状、位置等先验信息的条件下，得出增强现实场景中的虚拟对象大致位置和形状。(3) The present invention builds a confidence map, based on the virtual confidence map, the approximate position and shape of the virtual object in the augmented reality scene can be obtained without prior information such as the appearance, shape, and position of the virtual object.

(4)本发明不需要预先知道虚拟对象的任何外观信息，如颜色、形状、大小等先验信息，也不需要知道虚拟对象在增强现实场景中所处的位置，有较广的适用性，可广泛应用推广到影视制作、数字娱乐、教育培训等领域。(4) The present invention does not need to know any appearance information of the virtual object in advance, such as prior information such as color, shape, size, and does not need to know the position of the virtual object in the augmented reality scene, so it has wider applicability. It can be widely applied to the fields of film and television production, digital entertainment, education and training, etc.

附图说明 Description of drawings

图1是本发明的总体设计结构；Fig. 1 is the overall design structure of the present invention;

图2是本发明的虚拟置信度图构建流程图；Fig. 2 is a flow chart of constructing a virtual confidence map of the present invention;

图3是本发明的虚拟对象包围盒形状、位置推理流程图；Fig. 3 is the flow chart of the virtual object bounding box shape and position inference of the present invention;

图4是本发明的获取候选中心点的流程图；Fig. 4 is the flow chart of obtaining candidate central point of the present invention;

图5是本发明的扩展搜索、获得高虚拟响应点分布密集区域的流程图。Fig. 5 is a flow chart of the present invention to expand the search and obtain densely distributed areas of high virtual response points.

具体实施方式 Detailed ways

如图1所示，本发明的主要步骤如下：构建增强现实场景训练数据集，并利用虚拟对象与真实图像的物理成像差异，选取虚实分类特征；在训练数据集上，利用虚实分类特征，提取增强现实场景的区域自身特征，构建像素级虚实分类器；在训练数据集上，利用虚实分类特征，提取增强现实场景的区域对比特征，构建区域级虚实分类器；给定测试增强现实场景，利用像素级虚实分类器进行小尺度检测，得到反映每个像素虚实分类结果的虚拟得分图；定义虚拟置信度图，并在虚拟得分图的基础之上，利用阈值化得到测试增强现实场景的虚拟置信度图；根据虚拟置信度图中高虚拟响应点的分布情况，得到虚拟对象包围盒的粗略形状和位置；在虚拟对象粗略定位的基础之上，在测试增强现实场景图像中利用区域级虚实分类器和大尺寸检测窗进行检测，得到虚拟对象的最终检测结果。As shown in Figure 1, the main steps of the present invention are as follows: build the augmented reality scene training data set, and use the physical imaging difference between the virtual object and the real image to select the virtual and real classification features; on the training data set, use the virtual and real classification features to extract Augment the region's own characteristics of the augmented reality scene, and build a pixel-level virtual-real classifier; on the training data set, use the virtual-real classification feature to extract the regional contrast features of the augmented reality scene, and build a region-level virtual-real classifier; given the test augmented reality scene, use The pixel-level virtual and real classifier performs small-scale detection to obtain a virtual score map reflecting the virtual and real classification results of each pixel; define a virtual confidence map, and use thresholding to obtain the virtual confidence of the test augmented reality scene on the basis of the virtual score map degree map; according to the distribution of high virtual response points in the virtual confidence map, the rough shape and position of the bounding box of the virtual object are obtained; on the basis of the rough positioning of the virtual object, a region-level virtual-real classifier is used in the test augmented reality scene image and the large-size detection window are detected to obtain the final detection result of the virtual object.

构造训练数据集，用于训练虚实分类器。训练数据集由包含虚拟对象的增强现实场景图像作为正样本、真实场景图像作为负样本构成。在训练像素级分类器时，对增强现实场景图像，只选取虚拟对象区域作为正样本；而对真实场景图像，只选取与正样本中虚拟对象相近似的区域作为负样本。在训练区域级分类器时，对增强现实场景图像，选取虚拟对象及其周围等面积的图像区域作为正样本；而对真实场景图像，选取与正样本中虚拟对象相近似的区域及其周围等面积的图像区域作为负样本。Construct a training data set for training a virtual-real classifier. The training dataset consists of augmented reality scene images containing virtual objects as positive samples and real scene images as negative samples. When training a pixel-level classifier, for augmented reality scene images, only virtual object regions are selected as positive samples; for real scene images, only regions similar to virtual objects in positive samples are selected as negative samples. When training the region-level classifier, for the augmented reality scene image, select the image area of the same area as the virtual object and its surrounding area as the positive sample; for the real scene image, select the area similar to the virtual object in the positive sample and its surrounding area, etc. The image region of the area is used as a negative sample.

区域自身特征的提取。对于给定的图像区域，计算出区域内每一点的虚实分类特征，包括：局部统计量、表面梯度、第二基本形式、贝尔特拉米流；利用转动惯量压缩方法对给定区域的虚实分类特征进行压缩，得到该区域对应的区域自身特征。The extraction of the region's own features. For a given image area, calculate the virtual and real classification features of each point in the area, including: local statistics, surface gradient, second basic form, Beltrami flow; use the moment of inertia compression method to classify the virtual and real in a given area The features are compressed to obtain the region's own characteristics corresponding to the region.

局部统计量、表面梯度、第二基本形式、贝尔特拉米流各自的物理意义及其计算方法分别如下：The physical meanings and calculation methods of local statistics, surface gradient, second basic form, and Beltrami flow are as follows:

局部统计量反映的是局部微小的边缘结构。局部统计量的计算方法如下：取原图像的灰度图上任意一点P，以P点为中心的一个3×3像素的小图像块，将其中每一点的像素值按顺序排列成9维向量x＝[x₁，x₂，...x₉]。P点的局部统计量y是一个9维向量，其定义为：Local statistics reflect the local tiny edge structure. The calculation method of local statistics is as follows: take any point P on the grayscale image of the original image, a small image block of 3×3 pixels centered on point P, and arrange the pixel values of each point in order into a 9-dimensional vector x=[x ₁ , x ₂ , . . . x ₉ ]. The local statistic y of point P is a 9-dimensional vector, which is defined as:

$y = \frac{x - \overset{&OverBar;}{x}}{{| | x - \overset{&OverBar;}{x} | |}_{D}} .$ 其中， $\overset{&OverBar;}{x} = \frac{1}{9} Σ_{i = 1}^{9} x_{i};$ 而||·||_D是D范数操作。 $the y = \frac{x - \overset{&OverBar;}{x}}{{| | x - \overset{&OverBar;}{x} | |}_{D.}} .$ in, $\overset{&OverBar;}{x} = \frac{1}{9} Σ_{i = 1}^{9} x_{i};$ And ||·|| _D is the D norm operation.

D范数操作的定义是：

其中i～j表示图像块中所有四邻域关系的点对。The definition of the D-norm operation is:

Among them, i~j represent all four-neighborhood relationship point pairs in the image block.

任意点p处的局部统计量虚实分类特征即为该点处的9维向量y。The virtual-real classification feature of local statistics at any point p is the 9-dimensional vector y at that point.

表面梯度是用来度量真实场景成像过程中的非线性变化特点。图像中任一点处的表面梯度S定义为：The surface gradient is used to measure the nonlinear change characteristics in the real scene imaging process. The surface gradient S at any point in the image is defined as:

其中，

为该点处的图像梯度模值，I_x、I_x分别表示图像x方向(水平方向)和y方向(竖直方向)的偏导。α为常数，α＝0.25。

in,

is the image gradient modulus at this point, I _x , I _x represent the partial derivatives in the x direction (horizontal direction) and y direction (vertical direction) of the image, respectively. α is a constant, α=0.25.

任意点p处的表面梯度虚实分类特征由该点处的图像像素值I与该点处表面梯度S联合构成。The surface gradient virtual-real classification feature at any point p is composed of the image pixel value I at this point and the surface gradient S at this point.

第二基本形式是用来描述图像表面局部的凹凸程度。第二基本形式的两个分量λ₁和λ₂分别对应矩阵A的两个特征值。The second basic form is used to describe the local unevenness of the image surface. The two components λ ₁ and λ ₂ of the second basic form correspond to the two eigenvalues of the matrix A, respectively.

$A = \frac{1}{\sqrt{1 + I_{x}^{2} + I_{y}^{2}}} (\begin{matrix} I_{xx} & I_{xy} \\ I_{xy} & I_{yy} \end{matrix}),$ 其中，I_x、I_x分别表示图像x方向和y方向的偏导；I_xx、I_xy、I_yy分别表示图像xx方向、xy方向、yy方向的二阶偏导；由该式可计算出矩阵A的值。不妨将A记为： $A = (\begin{matrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{matrix}),$ 其中a₁₁、a₁₂、α₂₁、α₂₂分别表示矩阵A中对应的四个元素值。因此，矩阵A的两个特征值λ₁和λ₂计算公式如下： $A = \frac{1}{\sqrt{1 + I_{x}^{2} + I_{the y}^{2}}} (\begin{matrix} I_{xx} & I_{xy} \\ I_{xy} & I_{yy} \end{matrix}),$ Among them, I _x , I _x represent the partial derivatives in the x direction and y direction of the image respectively; I _xx , I _xy , I _yy represent the second order partial derivatives in the xx direction, the xy direction, and the yy direction of the image respectively; from this formula, we can calculate The value of matrix A. Let us write A as: $A = (\begin{matrix} a_{11} & a_{12} \\ a_{twenty one} & a_{twenty two} \end{matrix}),$ Wherein, a ₁₁ , a ₁₂ , α ₂₁ , and α ₂₂ represent the corresponding four element values in the matrix A, respectively. Therefore, the two eigenvalues λ ₁ and λ ₂ of matrix A are calculated as follows:

${{{λ λ}_{11},, {λ λ}_{22}}} = = \frac{{a a}_{1111} + + {a a}_{22 twenty two} &PlusMinus; &PlusMinus; \sqrt{{(({a a}_{1111} - - {a a}_{22 twenty two}))}^{22} + + 44 {a a}_{1212} {a a}_{21 twenty one}}}{22},, {λ λ}_{11} {λ λ}_{22}$

任意点p处的第二基本形式虚实分类特征由该点处的图像梯度模值

与该点处第二基本形式的两个分量λ₁、λ₂联合构成。The second basic form of virtual and real classification features at any point p is determined by the image gradient modulus at this point

Formed jointly with the two components λ ₁ , λ ₂ of the second fundamental form at this point.

贝尔特拉米流可以用来描述不同颜色通道之间的相关性。颜色通道c(c＝{R，G，B})对应的贝尔特拉米流Δ_gI_c定义为：Beltrami flow can be used to describe the correlation between different color channels. The Beltrami flow Δ _g I _c corresponding to the color channel c (c={R, G, B}) is defined as:

${Δ Δ}_{g g} {I I}_{c c} = = \frac{11}{| | g g | |} (({&PartialD; &PartialD;}_{x x} ((\sqrt{| | g g | |} (({g g}^{xx xx} {&PartialD; &PartialD;}_{x x} {I I}_{c c} + + {g g}^{xy xy} {&PartialD; &PartialD;}_{y the y} {I I}_{c c})))))) + + \frac{11}{| | g g | |} (({&PartialD; &PartialD;}_{y the y} ((\sqrt{| | g g | |} (({g g}^{yx yx} {&PartialD; &PartialD;}_{x x} {I I}_{c c} + + {g g}^{yy yy} {&PartialD; &PartialD;}_{y the y} {I I}_{c c}))))))$

其中，I_c表示原始图像的颜色通道c(c＝{R，G，B})对应的图像；算子

分别表示对于作用量取x方向和y方向的偏导；Among them, I _c represents the image corresponding to the color channel c (c={R, G, B}) of the original image; the operator

Respectively represent the partial derivatives in the x-direction and y-direction for the action;

矩阵 $g = (\begin{matrix} 1 + {(I_{x}^{R})}^{2} + {(I_{x}^{G})}^{2} + {(I_{x}^{B})}^{2} & I_{x}^{R} I_{y}^{R} + I_{x}^{G} I_{y}^{G} + I_{x}^{B} I_{y}^{B} \\ I_{x}^{R} I_{y}^{R} + I_{x}^{G} I_{y}^{G} + I_{x}^{B} I_{y}^{B} & 1 + {(I_{y}^{R})}^{2} + {(I_{y}^{G})}^{2} + {(I_{y}^{B})}^{2} \end{matrix}),$

分别表示图像R通道(红色通道)x方向和y方向的偏导；

分别表示图像G通道(绿色通道)x方向和y方向的偏导；

分别表示图像B通道(蓝色通道)x方向和y方向的偏导；|g|为矩阵g的行列式；而g^xx、g^xy、g^yy、g^yx则由

g^{- 1} = (\begin{matrix} g^{xx} & g^{xy} \\ g^{yx} & g^{yy} \end{matrix})

分别给出，即g^xx、g^xy、g^yy、g^yx分别为矩阵g的逆矩阵对应的四个元素值。matrix

g = (\begin{matrix} 1 + {(I_{x}^{R})}^{2} + {(I_{x}^{G})}^{2} + {(I_{x}^{B})}^{2} & I_{x}^{R} I_{the y}^{R} + I_{x}^{G} I_{the y}^{G} + I_{x}^{B} I_{the y}^{B} \\ I_{x}^{R} I_{the y}^{R} + I_{x}^{G} I_{the y}^{G} + I_{x}^{B} I_{they}^{B} & 1 + {(I_{the y}^{R})}^{2} + {(I_{the y}^{G})}^{2} + {(I_{the y}^{B})}^{2} \end{matrix}),

Respectively represent the partial derivatives in the x direction and y direction of the image R channel (red channel);

Respectively represent the partial derivatives in the x direction and y direction of the image G channel (green channel);

respectively represent the partial derivatives in the x direction and y direction of the image B channel (blue channel); |g| is the determinant of the matrix g; and g ^xx , g ^xy , g ^yy , g ^yx are represented by

g^{- 1} = (\begin{matrix} g^{xx} & g^{xy} \\ g^{yx} & g^{yy} \end{matrix})

They are respectively given, that is, g ^xx , g ^xy , g ^yy , and g ^yx are the four element values corresponding to the inverse matrix of the matrix g.

任意点p处的贝尔特拉米流虚实分类特征由该点处的各颜色通道c(c＝{R，G，B})的贝尔特拉米流分量Δ_gI_c与各颜色通道的图像梯度模值

联合构成。The Beltrami flow virtual-real classification feature at any point p is composed of the Beltrami flow component Δ _g I _c of each color channel c (c={R, G, B}) at this point and the image of each color channel gradient modulus

joint composition.

当计算出区域内每一点的四组虚实分类特征(包括：局部统计量、表面梯度、第二基本形式、贝尔特拉米流)后，需要利用转动惯量压缩方法对虚实分类特征进行压缩。转动惯量压缩方法步骤如下：先单独考虑局部统计量、表面梯度、第二基本形式、贝尔特拉米流这四组虚实分类特征中的任意一组(每一组虚实分类特征的处理方式都相同)。若给定区域内共有N个点，任一点P_i(i＝1，…，N)的一组虚实分类特征的共有M维(M的值根据在四组虚实分类特征中取定其中一组即可确定)，将点P_i(i＝1，…，N)的一组虚实分类特征记为v_i＝(v_i1，...，v_im)。此时，将点P_i的虚实分类特征v_i＝(v_i1，...，v_im)视为M维特征空间中的一个质点，规定该质点的质量为，该质点在M维特征空间中的位置坐标为v_i＝(v_i1，...，v_im)，则可以通过刚体转动惯量矩阵公式计算出所有N个质点构成的质点系的转动惯量矩阵J。转动惯量矩阵J是一个M×M维矩阵，矩阵J可以写成如下形式：

矩阵J的任意一个元素记为J_jk(j，k＝1，...，M)。J_jk的计算方法为：

(j，k＝1，...，M)。其中m_i表示质点P_i的质量，v_i＝(v_i1，...，v_im)表示点P_i在特征空间中的位置坐标；|v_i|表示质点P_i到坐标原点的欧氏距离，即

δ_jk为克罗内克函数，其计算方法为

δ_{jk} = \{\begin{matrix} 1 & , & if & i = j \\ 0 & , & if & i &NotEqual; j \end{matrix} .

由此可以确定转动惯量矩阵J的所有元素J_jk(j，k＝1，...，M)。由转动惯量矩阵J的对称性可知，J_jk＝J_kj，因此只取矩阵J的主对角线以及主对角线以上的所有元素J_jk(j，k＝1，...，M且j≤k)，这些元素即可表示原始矩阵J的所有信息。After calculating the four sets of virtual and real classification features (including: local statistics, surface gradient, second basic form, Beltrami flow) for each point in the region, it is necessary to use the moment of inertia compression method to compress the virtual and real classification features. The steps of the moment of inertia compression method are as follows: First, consider any one of the four groups of virtual and real classification features, such as local statistics, surface gradient, second basic form, and Beltrami flow (the processing method of each group of virtual and real classification features is the same. ). If there are N points in the given area, the total M dimension of a group of virtual and real classification features of any point P _i (i=1, ..., N) (the value of M is determined according to one of the four groups of virtual and real classification features. can be determined), and a group of virtual and real classification features of point P _i (i=1, ..., N) is recorded as v _i = (v _i1 , ..., vi _im ). At this time, the virtual-real classification feature v _i =(v _i1 ,...,v _im ) of point P _i is regarded as a particle in the M-dimensional feature space, and the quality of the particle is stipulated as , the position coordinates of the particle in the M-dimensional feature space are v _i =(v _i1 ,...,v _im ), then the moment of inertia matrix of the mass point system composed of all N mass points can be calculated by the rigid body moment of inertia matrix formula J. The moment of inertia matrix J is an M×M dimensional matrix, and the matrix J can be written as follows:

Any element of the matrix J is denoted as J _jk (j, k=1, . . . , M). The calculation method of J _jk is:

(j, k=1, . . . , M). Where m _i represents the mass of particle P _i , v _i = (v _i1 ,..., v _im ) represents the position coordinates of point P _i in the feature space; |v _i | represents the Euclidean distance from the particle P _i to the coordinate origin, namely

δ _jk is the Kronecker function, and its calculation method is

δ_{jk} = \{\begin{matrix} 1 & , & if & i = j \\ 0 & , & if & i &NotEqual; j \end{matrix} .

All elements J _jk (j, k=1, . . . , M) of the moment of inertia matrix J can thus be determined. It can be seen from the symmetry of the moment of inertia matrix J that J _jk =J _kj , so only the main diagonal of matrix J and all elements above the main diagonal J _jk (j, k=1,..., M and j≤k), these elements can represent all the information of the original matrix J.

取转动惯量矩阵中的所有元素J_jk(j，k＝1，...，M且j≤k)；联合所有质点的质心向量

联合所有质点与坐标原点距离|v_i|的均值、方差、偏度、峰度；组合构成一个特征向量。该特征向量即为该区域内所有点的该组虚实分类特征经转动惯量压缩方法得到的压缩表示结果。将分别得到的四组压缩表示结果联合，即可得到该区域对应的区域自身特征。由于转动惯量矩阵可以较好地描述多个质点在特征空间中的分布，因此转动惯量矩阵压缩方法可以在对区域多个高维数据点进行压缩的同时尽可能低保证较大程度地保留原有数据分布的信息。Take all elements J _jk (j, k=1,..., M and j≤k) in the moment of inertia matrix; combine the centroid vectors of all mass points

Combine the mean, variance, skewness, and kurtosis of the distance |v _i | of all particles and the coordinate origin; the combination forms a feature vector. The eigenvector is the compressed expression result obtained by the moment of inertia compression method for the group of virtual and real classification features of all points in the area. Combining the four sets of compressed representation results obtained separately, the region's own characteristics corresponding to the region can be obtained. Since the moment of inertia matrix can better describe the distribution of multiple mass points in the feature space, the moment of inertia matrix compression method can compress multiple high-dimensional data points in the region while keeping the original Information about the data distribution.

区域对比特征的提取。对于给定的图像区域，将区域本身视为待判定的对象区域；而将区域包围盒之外紧邻包围盒的等面积矩形区域视为对象所处的背景区域；分别计算出对象区域与背景区域内每一点的虚实分类特征；分别统计对象区域内和背景区域内所有点对应的虚实分类特征，构成对象区域内和背景区域特征的联合分布直方图；计算对象区域特征的联合分布直方图与背景区域特征的联合分布直方图之间的卡方距离，将其视为可以衡量对象与其所处背景之间对比度或者差异的特征，称为区域对比特征。Extraction of regional contrast features. For a given image area, the area itself is regarded as the object area to be determined; and the equal-area rectangular area outside the bounding box of the area is regarded as the background area where the object is located; the object area and the background area are calculated separately The virtual and real classification features of each point in the object area; the virtual and real classification features corresponding to all points in the object area and the background area are counted separately to form a joint distribution histogram of the object area and background area features; the joint distribution histogram of the object area features and the background area are calculated. The chi-square distance between the joint distribution histograms of regional features is regarded as a feature that can measure the contrast or difference between the object and its background, called the regional contrast feature.

构建像素级虚实分类器和区域级虚实分类器，用于分别从区域自身特征的角度和区域对比特征的角度区分给定区域是否属于虚拟对象所在区域。A pixel-level virtual-real classifier and a region-level virtual-real classifier are constructed to distinguish whether a given region belongs to the region where the virtual object is located from the perspective of the region's own characteristics and the region's comparative characteristics.

像素级虚实分类器构建，通过输入训练集正负样本；分别提取正负样本的区域自身特征；将提取的正负样本的区域自身特征集合输入支持向量机分类器进行训练，得到像素级虚实分类器。像素级虚实分类器的特点是，其所采用的特征压缩方法使得其分类结果具有一定的尺寸适应性。即当待分类的区域自身特征是由与训练集的区域尺寸显著不同的区域提取得到时，像素级虚实分类器对于给定区域是否属于虚拟对象所在区域的分类结果具有较好的准确性。具体而言：尽管像素级分类器是由训练集中虚拟对象(尺寸为[10，30]×[10，30]像素)的区域自身特征集合训练得到的，但实验结果显示：该分类器对于相对小很多的区域(尺寸为[10，30]×[10，30]像素)的区域特征，其分类结果仍具有较好的准确性。由于该分类器分类的对象是针对小尺寸区域，而这些小区域用来近似描述区域中心点所对应的像素，因此将该分类器成为像素级虚实分类器。The pixel-level virtual-real classifier is constructed by inputting the positive and negative samples of the training set; the region's own characteristics of the positive and negative samples are extracted respectively; the extracted region's own feature set of the positive and negative samples is input into the support vector machine classifier for training, and the pixel-level virtual-real classification is obtained device. The feature of the pixel-level virtual-real classifier is that the feature compression method it adopts makes its classification results have certain size adaptability. That is, when the features of the region to be classified are extracted from regions whose size is significantly different from that of the training set, the pixel-level virtual-real classifier has better accuracy in classifying whether a given region belongs to the region where the virtual object is located. Specifically: Although the pixel-level classifier is trained by the region’s own feature set of the virtual object (with a size of [10, 30]×[10, 30] pixels) in the training set, the experimental results show that the classifier is relatively The classification results of regional features of much smaller regions ([10, 30]×[10, 30] pixels in size) still have better accuracy. Since the objects classified by this classifier are for small-sized areas, and these small areas are used to approximately describe the pixels corresponding to the center point of the area, the classifier is called a pixel-level virtual-real classifier.

区域级虚实分类器构建，通过输入训练集正负样本；分别提取正负样本的区域对比特征；将提取的正负样本的区域对比特征集合输入支持向量机分类器进行训练，得到区域级虚实分类器。由于区域级虚实分类器使用的分类特征是反应区域及其所在背景之间的总体分布差异，因此可以将待检测对象作为一个整体较好地判别和检测出来。The region-level virtual-real classifier is constructed by inputting the positive and negative samples of the training set; the regional comparison features of the positive and negative samples are extracted respectively; the extracted regional comparison feature set of the positive and negative samples is input into the support vector machine classifier for training, and the regional-level virtual and real classification is obtained device. Since the classification feature used by the area-level virtual-real classifier is the overall distribution difference between the reaction area and its background, it can better distinguish and detect the object to be detected as a whole.

构建虚拟得分图。对于输入的增强现实场景图像，利用小尺寸检测窗(检测窗尺寸为[10，30]×[10，30]像素)以较小的移动步长(如{1，2，3，4，5}像素)扫描整幅图像；计算每个小尺寸检测窗内的小图像块的区域自身特征；将所有小图像块的区域自身特征输入到像素级虚实分类器中，得到每个小图像块的区域自身特征得分，得分高表示像素级分类器将该图像块分类为虚拟区域的确定度高；由于检测窗的尺寸相对整幅图像很小且分布密集，因此可以将每个小图像块的区域自身特征得分映射到该图像块的中心像素，并将其作为该中心像素点的虚拟得分；由此构成了整个增强现实场景图像的虚拟得分图。由于区域自身体征计算中的虚实分类特征计算和特征压缩操作比较耗时，而对于生成的大量相互重叠的图像块需要逐个计算其区域自身特征，因此在该步骤中采用积分图方法加速计算过程。由此得到的虚拟得分图构建结果能够较好地反映出图像上的点与该点是否属于虚拟对象的关系。即：实验结果显示：虚拟得分图中，虚拟得分高的点一般都集中在虚拟对象所在区域；反之，虚拟对象所在区域的点，对应的虚拟得分都较高。Build a virtual score map. For the input augmented reality scene image, use a small size detection window (the detection window size is [10, 30]×[10, 30] pixels) with a small moving step (such as {1, 2, 3, 4, 5 } pixel) to scan the entire image; calculate the regional self-features of the small image blocks in each small-size detection window; input the regional self-features of all small image blocks into the pixel-level virtual-real classifier, and obtain the region-specific features of each small image block The feature score of the area itself, a high score indicates that the pixel-level classifier has a high degree of certainty in classifying the image block as a virtual area; since the size of the detection window is relatively small and densely distributed compared to the entire image, the area of each small image block can be The self-feature score is mapped to the central pixel of the image block, and it is used as the virtual score of the central pixel; thus, a virtual score map of the entire augmented reality scene image is formed. Since the calculation of virtual and real classification features and feature compression operations in the area's own physical signs calculation are time-consuming, and for the generated large number of overlapping image blocks, the area's own features need to be calculated one by one, so the integral map method is used in this step to speed up the calculation process. The resulting virtual score map construction results can better reflect the relationship between the point on the image and whether the point belongs to the virtual object. That is: the experimental results show that: in the virtual score map, the points with high virtual scores are generally concentrated in the area where the virtual object is located; on the contrary, the points in the area where the virtual object is located have higher virtual scores.

构建虚拟置信度图，其流程如图2所示。首先，对于得到的虚拟得分图进行阈值化处理，先选择并记录所有虚拟得分为正的点；设置一个固定的百分比N％，选择并记录所有虚拟得分为正的点的前N％以及这些点在原图像上所处的位置。这些点称为高虚拟响应点。设置一个固定且相对较小的常数M(如令M∈[10，100])，选择并记录所有虚拟得分为正的点的前M个点以及这些点在原图像上所处的位置。这些点称为最高虚拟响应点。最高虚拟响应点的数目远小于高虚拟响应点的数目。综合高虚拟响应点、最高虚拟响应点及其所在原图像上的位置信息，即构成虚拟置信度图。所述的虚拟置信度图构建结果能够较好地反映出图像上的点与该点是否属于虚拟对象的关系。即：实验结果显示：虚拟置信度图中，高虚拟响应点一般都集中在虚拟对象所在区域；反之，虚拟对象所在区域的点，对应的高虚拟响应点分布较为密集。类似地，最高虚拟响应点一般只出现在虚拟对象所在区域；反之，虚拟对象所在区域中，一般会出现较多的最高虚拟响应点。对于高虚拟响应点和最高虚拟响应点中出现在虚拟对象区域以外的，将其称为噪声点。The process of constructing a virtual confidence map is shown in Figure 2. First, threshold the obtained virtual score map, first select and record all points with positive virtual scores; set a fixed percentage N%, select and record the top N% of all points with positive virtual scores and these points position on the original image. These points are called high virtual response points. Set a fixed and relatively small constant M (for example, let M∈[10, 100]), select and record the first M points of all points with positive virtual scores and the positions of these points on the original image. These points are called the highest virtual response points. The number of highest virtual response points is much smaller than the number of high virtual response points. Combining high virtual response points, highest virtual response points and their location information on the original image constitutes a virtual confidence map. The construction result of the virtual confidence map can better reflect the relationship between a point on the image and whether the point belongs to a virtual object. That is: the experimental results show that: in the virtual confidence graph, the high virtual response points are generally concentrated in the area where the virtual object is located; on the contrary, the points in the area where the virtual object is located correspond to a relatively dense distribution of high virtual response points. Similarly, the highest virtual response point generally appears only in the area where the virtual object is located; on the contrary, in the area where the virtual object is located, there are generally more highest virtual response points. Among the high virtual response points and the highest virtual response points that appear outside the virtual object area, they are called noise points.

虚拟对象包围盒的粗略形状和位置推理流程如图3所示，包括以下步骤：划分子区域、获取候选中心点；扩展搜索，获得高虚拟响应点分布密集区域；虚拟对象初步候选区域确定；虚拟对象候选区域确定。其中具体而言，划分子区域，是对得到的虚拟置信度图，将其分为五个等面积、可重叠的子区域。获取候选中心点的流程图如图4所示，分别根据每个子区域中的高虚拟响应点的分布，利用均值漂移算法求得每个子区域中的高虚拟响应点分布的中心点，该中心点称为候选中心点。候选中心点数目为k(k≤5，k小于5的情况对应于某些子区域中不存在高虚拟响应点)，在此不妨假设虚拟对象所对应区域的中心点必为上述k个候选中心点中的一个。扩展搜索，获得高虚拟响应点分布密集区域，该过程如图5所示：对于每个候选中心点，以候选中心点为圆心，以按照固定步长增大的长度为半径，动态构造依次增大的圆形搜索区域，直到当前搜索区域内高虚拟响应点的数目不再增加时，则可以认为搜索到了虚拟对象区域的边界；理想情况下，扩展搜索停止的条件是当搜索半径增加时，搜索区域内高虚拟响应点的数目增量为零，但为了消除虚拟置信度图中存在的噪声点的影响，设置一个噪声抑制参数，将扩展搜索停止的条件加强为：当搜索半径增加时，搜索区域内高虚拟响应点的数目增量必须大于噪声抑制参数。The rough shape and position inference process of the virtual object’s bounding box is shown in Figure 3, including the following steps: divide sub-regions and obtain candidate center points; expand the search to obtain densely distributed regions with high virtual response points; determine the preliminary candidate regions of virtual objects; virtual Object candidate area determination. Specifically, dividing the sub-regions is to divide the obtained virtual confidence map into five equal-area, overlapping sub-regions. The flow chart of obtaining candidate center points is shown in Figure 4. According to the distribution of high virtual response points in each sub-region, the mean shift algorithm is used to obtain the center point of the distribution of high virtual response points in each sub-region. The center point called candidate center points. The number of candidate center points is k (k ≤ 5, the case of k less than 5 corresponds to the absence of high virtual response points in some sub-regions), here it may be assumed that the center points of the area corresponding to the virtual object must be the above k candidate centers one of the points. Extend the search to obtain a densely distributed area of high virtual response points. The process is shown in Figure 5: For each candidate center point, the candidate center point is the center of the circle, and the length increased by a fixed step size is the radius, and the dynamic structure increases in turn. Large circular search area, until the number of high virtual response points in the current search area no longer increases, it can be considered that the boundary of the virtual object area has been searched; ideally, the condition for the extended search to stop is when the search radius increases, The increment of the number of high virtual response points in the search area is zero, but in order to eliminate the influence of noise points in the virtual confidence map, a noise suppression parameter is set to strengthen the condition of extended search stop as follows: when the search radius increases, The increment of the number of high virtual response points in the search area must be greater than the noise suppression parameter.

虚拟对象初步候选区域确定是由高虚拟响应点分布密集的区域近似推算出该区域内的候选对象形状，结合该区域的位置信息，构成虚拟对象初步候选区域。当扩展搜索停止时，得到了高虚拟响应点分布密集区域，可知该区域中所有的高虚拟响应点的集合P，根据P可以得出该区域内的候选对象形状，即体现为候选对象包围盒的形状：The determination of the preliminary candidate area of the virtual object is to approximate the shape of the candidate object in the area with dense distribution of high virtual response points, and combine the position information of the area to form the preliminary candidate area of the virtual object. When the extended search stops, the densely distributed area of high virtual response points is obtained, and the set P of all high virtual response points in this area can be known. According to P, the shape of the candidate object in the area can be obtained, which is reflected as the candidate object bounding box shape:

x_min＝min({x|<x，y>∈P})；x_max＝max({x|<x，y>∈P})；x _min = min({x|<x, y>∈P}); x _max = max({x|<x, y>∈P});

y_min＝min({y|<x，y>∈P})；y_max＝max({y|<x，y>∈P})；y _min = min({y|<x, y>∈P}); y _max = max({y|<x, y>∈P});

其中x_min、x_max分别表示候选对象包围盒所在区域对应图像坐标中的x方向最小值和最大值；y_min、y_max分别表示候选对象包围盒所在位置对应图像坐标中的y方向最小值和最大值。由此即可确定候选对象包围盒相对于图像中的位置和形状。Among them, x _min and x _max represent the minimum and maximum values in the x direction in the image coordinates corresponding to the area where the bounding box of the candidate object is located; y _min and y _max represent the minimum and maximum values in the y direction in the image coordinates where the bounding box of the candidate object is located. maximum value. From this, the location and shape of the candidate object bounding box relative to the image can be determined.

该区域内的候选对象形状，结合该区域的位置信息(候选中心点位置)，构成虚拟对象初步候选区域。The shape of the candidate object in the area is combined with the position information of the area (the position of the candidate center point) to form a preliminary candidate area of the virtual object.

在得到的k个虚拟对象初步候选区域中，根据其各自包含的高虚拟响应点与最高虚拟响应点的数目，选择出加权数目最多的一个，将其作为虚拟对象候选区域，该区域即包含了虚拟对象包围盒的大致形状与位置的信息。Among the obtained k virtual object preliminary candidate areas, according to the number of high virtual response points and the highest virtual response points contained in each of them, select the one with the largest weighted number, and use it as the virtual object candidate area, which contains Information about the approximate shape and position of the bounding box of the virtual object.

对于得到的虚拟对象的粗略定位，进一步优化，以减小虚拟对象候选区域计算过程中可能出现的误差，从而得到虚拟对象的最终检测结果。具体步骤为：在虚拟对象候选区域周围取面积为虚拟对象候选区域两倍的区域；在该区域内构造形状大小和虚拟对象候选区域相同的多个相互重叠的大尺寸检测窗(大尺寸检测窗的尺寸范围通常为[200，500]×[200，500]，其长度和宽度的具体取值等于虚拟对象候选区域中虚拟对象包围盒的长度和宽度)；取每个大尺寸检测窗内图像块并计算其区域对比特征；将所有大尺度检测窗内图像块的区域对比特征输入区域级虚实分类器进行分类，选出对应得分最高的检测窗作为虚拟对象的最终检测结果。For the obtained rough positioning of the virtual object, it is further optimized to reduce errors that may occur during the calculation of the candidate region of the virtual object, so as to obtain the final detection result of the virtual object. The specific steps are: take an area around the virtual object candidate area that is twice the area of the virtual object candidate area; construct a plurality of overlapping large-size detection windows (large-size detection windows) that have the same shape and size as the virtual object candidate area in this area The size range of is usually [200, 500] × [200, 500], and the specific values of its length and width are equal to the length and width of the virtual object bounding box in the virtual object candidate area); take each large-size detection window image block and calculate its regional contrast features; the regional contrast features of all image blocks in the large-scale detection window are input into the region-level virtual-real classifier for classification, and the detection window with the highest corresponding score is selected as the final detection result of the virtual object.

以上所述仅为本发明的一些基本说明，依据本发明的技术方案所做的任何等效变换，均应属于本发明的保护范围。The above descriptions are only some basic explanations of the present invention, and any equivalent transformation made according to the technical solution of the present invention shall fall within the scope of protection of the present invention.

本发明未详细阐述部分属于本领域公知技术。Parts not described in detail in the present invention belong to the well-known technology in the art.

Claims

1. The augmented reality scene virtual object discrimination and detection method based on confidence map, it is characterized in that the realization steps are as follows:

(1) Construct an augmented reality scene training data set with augmented reality images containing virtual objects as positive samples and real scene images as negative samples; and use the physical imaging difference between virtual objects and real images to select virtual and real classification features;

(2) On the training data set, using the virtual-real classification features, extract the regional features of the augmented reality scene and the real scene respectively, and construct a pixel-level virtual-real classifier;

(3) On the training data set, use the virtual-real classification feature to extract the regional comparison features of the augmented reality scene and the real scene respectively, and construct a region-level virtual-real classifier;

(4) Given a test augmented reality scene, use a pixel-level virtual-real classifier and a small-sized detection window to detect, and obtain a virtual score map reflecting the virtual-real classification result of each pixel;

(5) Define a virtual confidence map, and on the basis of the virtual score map, use thresholding to obtain a virtual confidence map of the test augmented reality scene;

(6) Based on the virtual confidence map, roughly locate the virtual object, and obtain the rough shape and position of the bounding box of the virtual object;

(7) On the basis of the rough positioning of the virtual object, the region-level virtual-real classifier and the large-size detection window are used for detection in the test augmented reality scene image, and the final detection result of the virtual object is obtained.

2. the augmented reality scene virtual object discrimination and detection method based on confidence map according to claim 1, is characterized in that: the virtual and real classification feature that chooses in the described step (1), comprises: local statistic, surface gradient, Second elementary form and Beltrami flow. At each pixel point of the image, the above-mentioned virtual-real classification feature corresponding to the point can be extracted.

3. the augmented reality scene virtual object discrimination and detection method based on confidence map according to claim 1, is characterized in that: when constructing pixel-level classifier in described step (2), on training data set, to enhanced In the real scene image, only the virtual object area is selected as the positive sample area; for the real scene image, only the area similar to the virtual object in the positive sample is selected as the negative sample area; for a given image area, each point in the area is calculated The virtual and real classification features of the given positive and negative sample areas are compressed using the moment of inertia compression method to obtain the corresponding regional features of the area; the positive and negative sample regional feature sets are input into the support vector machine classifier for training , to obtain a pixel-level virtual-real classifier.

4. the augmented reality scene virtual object discrimination and detection method based on confidence map according to claim 1, is characterized in that: when described step (3) constructs region-level virtual and real classifier, on training data set, for positive Negative sample area, regard itself as the object area to be determined; and regard the equal-area rectangular area outside the bounding box of the area as the background area where the object is located; extract the virtual and real classification features of each point in the object area and the background area respectively ; The virtual and real classification features corresponding to all points in the object area and the background area are counted to form the joint distribution histogram of the object area features and the joint distribution histogram of the background area features; calculate the chi-square distance between the two histograms, and combine them As a feature to measure the difference between the object and its background, it is called the regional contrast feature; the extracted regional contrast feature set of the positive and negative samples is input into the support vector machine classifier for training, and a region-level virtual-real classifier is obtained.

5. the augmented reality scene virtual object discrimination and detection method based on confidence map according to claim 1, is characterized in that: described step (4) virtual score map construction step is: for the augmented reality scene image of input, utilize The small size detection window scans the entire image with a small moving step; calculates the region's own characteristics of the small image blocks in each small size detection window; inputs the region's own characteristics of all small image blocks into the pixel-level virtual-real classifier , to get the region's own feature score of each small image block, a high score indicates that the pixel-level classifier has a high degree of certainty in classifying the image block as a virtual area; since the size of the detection window is relatively small compared to the entire image and the distribution is dense, it can be The regional feature score of each small image block is mapped to the central pixel of the image block, and it is used as the virtual score of the central pixel; thus, a virtual score map of the entire augmented reality scene image is formed.

6. the augmented reality scene virtual object discrimination and detection method based on confidence map according to claim 1, is characterized in that: described step (5) virtual confidence map construction method is: for the virtual score of augmented reality scene image Thresholding the graph, recording all points with positive virtual scores; setting a fixed percentage N%, recording the top N% of all points with positive virtual scores and the positions of these points on the original image, these points are called High virtual response points; set a fixed and relatively small constant M, record the first M points of all points with positive virtual scores and the positions of these points on the original image, these points are called the highest virtual response points; through The parameter setting can ensure that the highest virtual response point is also included in the set of high virtual response points, that is, the highest virtual response point is the part with the highest virtual score value among the high virtual response points; the comprehensive high virtual response point, the highest virtual response point and its location information on the original image constitute a virtual confidence map.

7. the augmented reality scene virtual object discrimination and detection method based on confidence map according to claim 1, is characterized in that: the method that described step (6) obtains the rough shape and the position of virtual object bounding box is: to obtain Divide it into five equal-area, overlapping sub-regions, and obtain the distribution center of high virtual response points in each sub-region respectively; regard the center of the sub-region as a candidate virtual object center point, from Each central point expands the search outwards to obtain the area with dense distribution of high virtual response points; for the area with dense distribution of high virtual response points, the shape of the candidate object in the area is approximately calculated, and combined with the location information of the area, a virtual Object preliminary candidate area; in the virtual object preliminary candidate area, according to the number of high virtual response points and the highest virtual response point contained in each, select the one with the largest weighted number, and use it as the virtual object candidate area, this area contains The rough shape and position information of the bounding box of the virtual object is obtained.

8. The augmented reality scene virtual object discrimination and detection method based on the confidence map according to claim 1, characterized in that: the step (7) virtual object detection method is specifically: testing the virtual object in the augmented reality scene Densely sample around the candidate area, construct multiple overlapping detection windows, and use a region-level virtual-real classifier for classification, and select the detection window with the best score as the final detection result of the virtual object.