CN102509104B - Confidence map-based method for distinguishing and detecting virtual object of augmented reality scene - Google Patents
Confidence map-based method for distinguishing and detecting virtual object of augmented reality scene Download PDFInfo
- Publication number
- CN102509104B CN102509104B CN 201110299857 CN201110299857A CN102509104B CN 102509104 B CN102509104 B CN 102509104B CN 201110299857 CN201110299857 CN 201110299857 CN 201110299857 A CN201110299857 A CN 201110299857A CN 102509104 B CN102509104 B CN 102509104B
- Authority
- CN
- China
- Prior art keywords
- virtual
- region
- real
- image
- classifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 230000003190 augmentative effect Effects 0.000 title claims abstract description 65
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000001514 detection method Methods 0.000 claims abstract description 65
- 238000012549 training Methods 0.000 claims abstract description 34
- 238000012360 testing method Methods 0.000 claims abstract description 12
- 238000012706 support-vector machine Methods 0.000 claims description 11
- 230000006835 compression Effects 0.000 claims description 10
- 238000007906 compression Methods 0.000 claims description 10
- 238000003384 imaging method Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims 1
- 230000002194 synthesizing effect Effects 0.000 claims 1
- 238000004519 manufacturing process Methods 0.000 abstract description 2
- 239000011159 matrix material Substances 0.000 description 19
- 239000002245 particle Substances 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 7
- 239000013598 vector Substances 0.000 description 6
- 238000000354 decomposition reaction Methods 0.000 description 5
- 238000012952 Resampling Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention relates to a confidence map-based method for distinguishing and detecting a virtual object of an augmented reality scene. The method comprises the following steps of: selecting vitality and reality classification features; constructing a pixel level vitality and reality classifier by means of the vitality and reality classification features; extracting regional comparison features of the augmented reality scene and a real scene respectively by means of the vitality and reality classification features, and constructing a region level vitality and reality classifier; giving a test augmented reality scene, detecting by means of the pixel level vitality and reality classifier and a small-size detection window to acquire a virtual score plot which reflects each pixel vitality and reality classification result; defining a virtual confidence map, and acquiring the virtual confidence map of the test augmented reality scene by thresholding; acquiring the rough shape and the position of a virtual object bounding box according to the distribution situation of high virtual response points in the virtual confidence map; and detecting by means of the region level vitality and reality classifier and a large-size detection window in the test augmented reality scene to acquire a final detection result of the virtual object. The method can be applied to the fields of film and television manufacturing, digital entertainment, education training and the like.
Description
Technical Field
The invention relates to the field of image processing, computer vision and augmented reality, in particular to a confidence map-based method for distinguishing and detecting virtual objects in an augmented reality scene.
Background
Augmented reality is a further extension of virtual reality, and enables a virtual object generated by a computer and an objectively existing real environment to coexist in the same augmented reality system by means of necessary equipment, so that an augmented reality environment in which the virtual object and the real environment are integrated is presented to a user from the aspects of sense and experience effects. With the development of augmented reality technology, standards and bases for measuring and evaluating the reliability of augmented reality scenes are urgently needed for the appearance of augmented reality scenes with higher image reality. How to judge whether a scene is an augmented reality scene or not and further detect a virtual object in the augmented reality scene, which is used as a way for evaluating the image credibility of the augmented reality scene, has important research significance and application requirements.
In 2011, researchers at the university of telun, italy proposed an image forgery identification method that could detect computer-generated components that blend into real scenes. The only one of the known existing works deals with augmented reality scenes. However, the detection performed by this work is not in units of objects, but only detects virtual components in the augmented reality scene, that is, the detection result may be a region or may be scattered points.
Researchers at the university of daltemos in 2005 proposed a method for classifying virtual images and real images based on a statistical model of wavelet decomposition natural images using a support vector machine and linear discriminant analysis. Firstly, extracting four-order statistical characteristics (mean, variance, skewness and kurtosis) of decomposition coefficients in each sub-band and direction after wavelet decomposition of a color image; meanwhile, the four-order linear prediction error characteristics between adjacent decomposition coefficients after wavelet decomposition are considered, then a classifier is trained by using a support vector machine and a linear discriminant analysis method, and then a test set is input into the trained classifier to obtain a classification result. The virtual and real classification of the method is carried out on the whole image, and the classification accuracy rate greatly fluctuates along with the different sizes of the extraction areas of the virtual and real classification features.
In 2007, researchers at the science and technology university of new york, usa, have proposed a method for distinguishing a virtual image from a real image by using the interpolation detection feature of a color filter array and the consistency of color difference in the image. Firstly, extracting features based on interpolation detection features of a color filter array and color difference consistency in an image from positive and negative samples of a training set, then inputting the extracted features into a support vector machine to train a classifier, and then inputting a test set into the trained classifier to obtain a classification result.
Researchers at the canadian alberta university in 2009 have proposed methods for classifying virtual images and real images using the consistency of image block resampling parameters. The principle of the method is that operations such as rotation and scaling of texture images may be used in the process of mapping the surface texture of the model in the virtual image generation, so that the resampling parameters of all image blocks in the virtual image are inconsistent. Thus, the virtual image and the real image can be distinguished by detecting whether the resampling parameters of the image block are consistent. The parameter estimation of the image block resampling of the method is carried out on the whole image.
In 2004, researchers in cambridge research laboratories, compaq computer corporation, usa, proposed a method for face detection using a haar-based filter and using the AdaBoost classification algorithm. The method comprises the steps of firstly extracting classification features from a training set, then training a classifier based on human face and non-human face statistical features, then inputting the extracted classification features of an image to be detected into the classifier, reducing the number of detection windows needing to be calculated through a cascade classifier to improve efficiency, and finally obtaining a detection result. The feature extraction of the method is based on a haar filter, and the regional contrast brought by the inherent structure of the human face is described.
In 2005, researchers at the national computer and automated research institute of france proposed a method for detecting persons using histogram of oriented gradient and linear support vector machine. The method comprises the steps of firstly carrying out color normalization on an input picture, then calculating gradients in the picture, counting pixel points falling in different directions and orientation intervals, carrying out contrast normalization on overlapped space blocks, then generating a direction gradient histogram of each detection window, and finally classifying person/non-person areas by using a linear support vector machine classifier to obtain detection results. This method has a higher detection effect than other detection methods, but requires the person in the picture to be kept in a substantially upright standing state. The method adopts image gradient histogram to extract the characteristics, which describes the inherent characteristics of human body contour.
The above method for distinguishing virtual images from real images has the common feature that the virtual-real classification features extracted by the method are not suitable for the virtual-real classification of any given area in the images. In addition, in the existing object detection work, objects generally processed have strong appearance characteristics which are easy to describe as prior information. In contrast, in the detection of virtual objects in an augmented reality scene, detection targets (i.e., virtual objects) of the detection targets do not have explicit prior information which is easy to describe in appearance, such as color, shape, size and the like, so that the difficulty in distinguishing and detecting the virtual objects is high.
Disclosure of Invention
The technical solution of the invention is as follows: the method does not need to know any appearance information of the virtual object in advance, such as color, shape and size, and does not need to know the position of the virtual object in the augmented reality scene, but utilizes the physical imaging difference for distinguishing the virtual object from a real image to extract virtual and real classification features, respectively calculates the region self features and region comparison features of positive and negative samples of a training set, and constructs a pixel-level virtual and real classifier and a region-level virtual and real classifier; on the basis, the virtual object is preliminarily shaped and positioned and accurately detected through virtual object discrimination and detection based on the virtual confidence map.
The technical scheme adopted by the invention is as follows: the method for distinguishing and detecting the virtual object of the augmented reality scene based on the confidence map comprises the following steps: constructing an augmented reality scene training data set, and selecting virtual and real classification features by using the physical imaging difference of a virtual object and a real image; on a training data set, respectively extracting the self characteristics of the regions of an augmented reality scene and a real scene by using the virtual and real classification characteristics, and constructing a pixel-level virtual and real classifier; on the training data set, respectively extracting the region comparison characteristics of the augmented reality scene and the real scene by using the virtual-real classification characteristics, and constructing a region-level virtual-real classifier; a testing augmented reality scene is given, a pixel-level virtual-real classifier and a small-size detection window are used for detection, and a virtual score map reflecting the virtual-real classification result of each pixel is obtained; defining a virtual confidence map, and obtaining the virtual confidence map of the tested augmented reality scene by thresholding on the basis of the virtual score map; obtaining the rough shape and position of the virtual object bounding box according to the distribution condition of the high virtual response points in the virtual confidence map; on the basis of rough positioning of the virtual object, detecting by using a region-level virtual-real classifier and a large-size detection window in the image of the test augmented reality scene to obtain a final detection result of the virtual object.
And constructing an augmented reality scene training data set. In the training dataset, an augmented reality scene image containing a virtual object is taken as a positive sample, and a real scene image is taken as a negative sample. And selecting the virtual and real classification features by using the physical imaging difference between the virtual object and the real image. The selected virtual classification features include: local statistics, surface gradients, second base form, Belladrami flow. The virtual-real classification feature corresponding to each pixel point of the image can be extracted.
And on the training data set, extracting the self characteristics of the region of the augmented reality scene by using the virtual and real classification characteristics, and constructing a pixel-level virtual and real classifier. When a pixel-level classifier is constructed, only a virtual object region is selected as a positive sample region for an augmented reality scene image; and for the real scene image, only the area similar to the virtual object in the positive sample is selected as the negative sample area. For a given image region, calculating the virtual and real classification features (including local statistics, surface gradient, second basic form and Bellam flow) of each point in the region; and compressing the virtual and real classification features of the given region by using a rotational inertia compression method to obtain the self features of the region corresponding to the region. And inputting the region characteristic sets of the positive and negative samples into a support vector machine classifier for training to obtain a pixel-level virtual-real classifier.
And on the training data set, extracting the region contrast characteristics of the augmented reality scene by using the virtual-real classification characteristics, and constructing a region-level virtual-real classifier. Regarding the positive and negative sample regions, regarding themselves as the object region to be determined; the rectangular region with the same area outside the region bounding box is taken as a background region where the object is located; respectively extracting the virtual and real classification characteristics of each point in the object region and the background region; counting the virtual and real classification features corresponding to all points in the object region and the background region, and respectively forming a joint distribution histogram of the object region features and a joint distribution histogram of the background region features; calculating the chi-square distance between the two histograms, and regarding the chi-square distance as a characteristic for measuring the difference between the object and the background where the object is located, wherein the characteristic is called as a regional contrast characteristic; and inputting the extracted region comparison feature set of the positive and negative samples into a support vector machine classifier for training to obtain a region-level virtual-real classifier.
Constructing a virtual score map, namely scanning the whole image by using a small-size detection window (the size of the detection window is [10, 30] × [10, 30] pixel) and a small moving step (such as {1, 2, 3, 4, 5} pixel) for the input augmented reality scene image; calculating the self characteristics of the area of the small image block in each small-size detection window; inputting the self-characteristics of the regions of all the small image blocks into a pixel-level virtual-real classifier to obtain the self-characteristic score of the region of each small image block, wherein the high score indicates that the pixel-level classifier has high certainty factor for classifying the image block into a virtual region; because the size of the detection window is very small relative to the whole image and the detection window is densely distributed, the characteristic score of the area of each small image block can be mapped to the central pixel of the image block and used as the virtual score of the central pixel point; thereby forming a virtual score map of the entire augmented reality scene image. The process can improve the computational efficiency through the two-dimensional integral graph.
Constructing a virtual confidence map, namely performing thresholding treatment on a virtual score map of an augmented reality scene image and recording points with positive virtual scores; setting a fixed percentage N%, and recording the first N% of all points with positive virtual scores and the positions of the points on the original image, wherein the points are called high virtual response points; setting a fixed and relatively small constant M (for example, making M be equal to [10, 100]), recording the first M points of all points with positive virtual scores and the positions of the points on the original image, wherein the points are called the highest virtual response points; by setting the parameters, it can be ensured that the highest virtual response point is also included in the set of the high virtual response points, i.e. the highest virtual response point is a part of the high virtual response points with the highest virtual score value. And integrating the high virtual response point, the highest virtual response point and the position information of the original image where the high virtual response point and the highest virtual response point are located to form a virtual confidence map.
The rough shape and position inference steps for the virtual object bounding box are as follows: dividing the obtained virtual confidence map into five equal-area and overlapped sub-regions, and respectively obtaining the distribution center of the high virtual response point in each sub-region; regarding the center of the sub-region as a candidate virtual object center point, respectively obtaining a region with high virtual response point distribution density by outward expansion search from each center point, approximately calculating the shape of a candidate object (embodied as a candidate virtual object bounding box) in the region with high virtual response point distribution density, and combining the position information of the region to form a virtual object preliminary candidate region; and in the plurality of virtual object preliminary candidate regions, selecting one with the largest weighting number according to the number of high virtual response points and the highest virtual response points contained in the virtual object preliminary candidate regions, and taking the selected one as a virtual object candidate region, wherein the region contains the rough shape and the position information of the virtual object bounding box.
And further optimizing the obtained rough positioning of the virtual object to obtain a final detection result of the virtual object. The method comprises the following specific steps: taking a region with the area twice as large as that of the virtual object candidate region around the virtual object candidate region, and constructing a plurality of mutually overlapped large-size detection windows with the same shape and size as the virtual object candidate region in the region (the size range of the large-size detection window is usually [200, 500] × [200, 500], and the specific values of the length and the width of the large-size detection window are equal to the length and the width of a virtual object bounding box in the virtual object candidate region); taking an image block in each large-size detection window and calculating the regional contrast characteristic of the image block; and inputting the region comparison characteristics of the image blocks in all the large-scale detection windows into a region-level virtual-real classifier for classification, and selecting the detection window with the highest corresponding score as a final detection result of the virtual object.
Compared with the prior art, the invention has the beneficial effects that:
(1) the virtual object in the augmented reality scene is used as the detection object, and the virtual object in the augmented reality scene can be judged and detected as a whole.
(2) The invention constructs two stages of virtual-real classifiers, which comprise a pixel stage virtual-real classifier and a region stage virtual-real classifier, and meets the requirements of confidence map construction and virtual object final detection.
(3) The invention establishes a confidence map, and can obtain the approximate position and shape of the virtual object in the augmented reality scene based on the virtual confidence map under the condition of no prior information such as the appearance, shape, position and the like of the virtual object.
(4) The method does not need to know any appearance information of the virtual object in advance, such as color, shape, size and other prior information, and does not need to know the position of the virtual object in the augmented reality scene, has wide applicability, and can be widely applied and popularized to the fields of movie and television production, digital entertainment, education training and the like.
Drawings
FIG. 1 is an overall design structure of the present invention;
FIG. 2 is a flow chart of virtual confidence map construction of the present invention;
FIG. 3 is a flow diagram of virtual object bounding box shape, position inference of the present invention;
FIG. 4 is a flow chart of the present invention for obtaining candidate center points;
fig. 5 is a flow chart of the invention for expanding search and obtaining a high virtual response point distribution dense area.
Detailed Description
As shown in fig. 1, the main steps of the present invention are as follows: constructing an augmented reality scene training data set, and selecting virtual and real classification features by using the physical imaging difference of a virtual object and a real image; extracting the self characteristics of the region of the augmented reality scene by using the virtual and real classification characteristics on a training data set, and constructing a pixel-level virtual and real classifier; on a training data set, extracting the region contrast characteristics of an augmented reality scene by using the virtual and real classification characteristics, and constructing a region-level virtual and real classifier; giving a test augmented reality scene, and performing small-scale detection by using a pixel-level virtual-real classifier to obtain a virtual score map reflecting the virtual-real classification result of each pixel; defining a virtual confidence map, and obtaining the virtual confidence map of the tested augmented reality scene by thresholding on the basis of the virtual score map; obtaining the rough shape and position of the virtual object bounding box according to the distribution condition of the high virtual response points in the virtual confidence map; on the basis of rough positioning of the virtual object, detecting by using a region-level virtual-real classifier and a large-size detection window in the image of the test augmented reality scene to obtain a final detection result of the virtual object.
And constructing a training data set for training the virtual-real classifier. The training data set is composed of an augmented reality scene image containing a virtual object as a positive sample and a real scene image as a negative sample. When a pixel-level classifier is trained, only a virtual object area is selected as a positive sample for an augmented reality scene image; and for the real scene image, only the area similar to the virtual object in the positive sample is selected as the negative sample. When the region-level classifier is trained, selecting a virtual object and an image region with equal area around the virtual object as a positive sample for an augmented reality scene image; and for the real scene image, selecting a region similar to the virtual object in the positive sample and an image region with the same area around the region as a negative sample.
And (4) extracting the characteristics of the region. For a given image region, calculating the virtual-real classification features of each point in the region, including: local statistics, surface gradients, second base forms, Belladrami flows; and compressing the virtual and real classification features of the given region by using a rotational inertia compression method to obtain the self features of the region corresponding to the region.
The respective physical meanings of the local statistic, the surface gradient, the second basic form and the Bellam flow and the calculation methods thereof are as follows:
the local statistics reflect the local tiny edge structure. The local statistics are calculated as follows: taking any point P on the gray scale image of the original image, taking a small image block of 3 multiplied by 3 pixels with the point P as the center, and arranging the pixel value of each point into a 9-dimensional vector x ═ x in sequence1,x2,...x9]. The local statistic y for P points is a 9-dimensional vector defined as:
The definition of the D-norm operation is:where i-j represent the point pairs of all four neighborhood relations in the image block.
The virtual-real classification feature of the local statistic at any point p is a 9-dimensional vector y at the point.
The surface gradient is used for measuring the non-linear change characteristics in the real scene imaging process. The surface gradient S at any point in the image is defined as:
wherein,is the image gradient module value at that point,Ix、Ixrepresenting the image's x-direction (horizontal) and y-direction (vertical) derivatives, respectively. α is a constant, and α is 0.25.
The surface gradient false-true classification feature at an arbitrary point p is formed by combining the image pixel value I at the point and the surface gradient S at the point.
The second basic form is to describe the degree of local relief of the image surface. Second basic formTwo components λ of1And λ2Corresponding to the two eigenvalues of matrix a, respectively.
the second basic form false-true classification feature at an arbitrary point p is determined by the image gradient norm value at that pointWith the two components λ of the second basic form at that point1、λ2And (4) combining and forming.
The belllamide flow can be used to describe the correlation between different color channels. The color channel c (c ═ { R, G, B }) corresponds to the belllamide stream ΔgIcIs defined as:
wherein, IcAn image corresponding to a color channel c (c ═ { R, G, B }) representing an original image; operatorRespectively representing the partial derivatives of the acting quantity in the x direction and the y direction;
matrix array Respectively representing the x-direction and y-direction partial derivatives of an image R channel (red channel);the partial derivatives of the image G channel (green channel) in the x direction and the y direction are respectively represented;respectively representing the x-direction and y-direction partial derivatives of the image B channel (blue channel); | g | is a determinant of the matrix g; and gxx、gxy、gyy、gyxThen is formed by Are given separately, i.e. gxx、gxy、gyy、gyxWhich are the four element values corresponding to the inverse of matrix g.
The virtual-real classification feature of the belltam flow at an arbitrary point p is represented by the belltam flow component Δ of each color channel c (c ═ { R, G, B }) at that pointgIcImage gradient module value of each color channelAnd (4) combining and forming.
After four groups of virtual and real classification features (including local statistics, surface gradient, second basic form and Bellamy flow) of each point in the region are calculated, the virtual and real classification features need to be compressed by using a rotational inertia compression method. The method for compressing the rotational inertia comprises the following steps: any one of four sets of virtual and real classification features, namely local statistics, surface gradients, a second basic form and a Bellamyday flow, are considered independently (each set of virtual and real classification features is processed in the same way). If there are N points in the given region, any point Pi(i-1, …, N) and a set of virtual-real classification features, wherein the value of M is determined by selecting one of the four sets of virtual-real classification features, and the point P is determined by the methodi(i-1, …, N) is labeled vi=(vi1,...,vim). At this time, point P is definediThe virtual and real classification characteristics v ofi=(vi1,...,vim) Consider a particle in an M-dimensional feature space, and specify the mass of the particle asThe position coordinate of the particle in the M-dimensional feature space is vi=(vi1,...,vim) Then, the moment of inertia matrix J of the mass point system composed of all the N mass points can be calculated by the rigid body moment of inertia matrix formula. The moment of inertia matrix J is an M × M dimensional matrix, which can be written as follows:any element of the matrix J is denoted Jjk(j,k=1,...,M)。JjkThe calculation method comprises the following steps:(j, k ═ 1.., M). Wherein m isiRepresents a particle PiThe mass of (a) of (b),vi=(vi1,...,vim) Representing point PiPosition coordinates in the feature space; | viI denotes particle PiEuclidean distance from the origin of coordinates, i.e.δjkIs a kronecker function and is calculated by From this all elements J of the moment of inertia matrix J can be determinedjk(j, k ═ 1.., M). From the symmetry of the moment of inertia matrix J, Jjk=JkjTherefore, only the principal diagonal of the matrix J and all elements J above the principal diagonal are takenjk(J, k 1.. times.m, and J ≦ k), these elements may represent all the information of the original matrix J.
Taking all elements J in the moment of inertia matrixjk(j, k ≦ 1., M, and j ≦ k); combining the centroid vectors of all particlesAssociating all particles with origin of coordinates distance | viThe mean, variance, skewness and kurtosis of |; the combination constitutes a feature vector. The feature vector is a compression representation result obtained by the rotational inertia compression method of the group of virtual and real classification features of all points in the region. And combining the four groups of compression expression results obtained respectively to obtain the self characteristics of the region corresponding to the region. Because the moment of inertia matrix can better describe the distribution of a plurality of particles in the feature space, the moment of inertia matrix compression method can compress a plurality of high-dimensional data points of a region and simultaneously ensure that the information of the original data distribution is kept to a great extent as far as possible.
And extracting the regional contrast characteristics. Regarding a given image area, regarding the area itself as an object area to be determined; the rectangular region with the same area, which is outside the region bounding box and is adjacent to the bounding box, is taken as a background region where the object is located; respectively calculating the virtual and real classification characteristics of each point in the object region and the background region; respectively counting the virtual and real classification features corresponding to all points in the object region and the background region to form a combined distribution histogram of the features in the object region and the background region; the chi-square distance between the joint distribution histogram of the object region feature and the joint distribution histogram of the background region feature is calculated, and the chi-square distance is regarded as a feature which can measure the contrast or difference between the object and the background where the object is located, and is called a region contrast feature.
And constructing a pixel-level virtual-real classifier and a region-level virtual-real classifier, which are used for distinguishing whether the given region belongs to the region where the virtual object is located from the angle of the self feature of the region and the angle of the region comparison feature.
Constructing a pixel-level virtual-real classifier, and inputting positive and negative samples of a training set; respectively extracting the self characteristics of the positive and negative samples; and inputting the extracted region self characteristic set of the positive and negative samples into a support vector machine classifier for training to obtain a pixel-level virtual-real classifier. The pixel-level virtual-real classifier is characterized in that the classification result has certain size adaptability by adopting a characteristic compression method. Namely, when the characteristics of the region to be classified are extracted from the region with the region size significantly different from that of the training set, the pixel-level virtual-real classifier has better accuracy on the classification result of whether the given region belongs to the region where the virtual object is located. Specifically, the method comprises the following steps: although the pixel-level classifier is trained from the region self feature set of the virtual object (size [10, 30] × [10, 30] pixels) in the training set, the experimental results show that: the classifier still has better accuracy of classification results for the regional characteristics of a relatively much smaller region (with the size of [10, 30] x [10, 30] pixels). Since the object classified by the classifier is directed to small-sized regions, and the small regions are used to approximately describe the pixels corresponding to the center points of the regions, the classifier is a pixel-level virtual-real classifier.
Constructing a region-level virtual-real classifier, and inputting positive and negative samples of a training set; respectively extracting the area comparison characteristics of the positive and negative samples; and inputting the extracted region comparison feature set of the positive and negative samples into a support vector machine classifier for training to obtain a region-level virtual-real classifier. Because the classification characteristic used by the region-level virtual-real classifier is the overall distribution difference between the reaction region and the background where the reaction region is located, the object to be detected can be better distinguished and detected as a whole.
And constructing a virtual score graph. For an input augmented reality scene image, scanning the whole image with a small size detection window (the detection window size is [10, 30] × [10, 30] pixels) and with a small moving step (such as {1, 2, 3, 4, 5} pixels); calculating the self characteristics of the area of the small image block in each small-size detection window; inputting the self-characteristics of the regions of all the small image blocks into a pixel-level virtual-real classifier to obtain the self-characteristic score of the region of each small image block, wherein the high score indicates that the pixel-level classifier has high certainty factor for classifying the image block into a virtual region; because the size of the detection window is very small relative to the whole image and the detection window is densely distributed, the characteristic score of the area of each small image block can be mapped to the central pixel of the image block and used as the virtual score of the central pixel point; thereby forming a virtual score map of the entire augmented reality scene image. Since the virtual and real classification feature calculation and the feature compression operation in the region self-body sign calculation are time-consuming, and the region self-body features of a large number of generated overlapped image blocks need to be calculated one by one, the integral graph method is adopted in the step to accelerate the calculation process. The virtual score map construction result obtained in this way can better reflect the relationship between a point on the image and whether the point belongs to a virtual object. Namely: the experimental results show that: in the virtual score map, points with high virtual scores are generally concentrated in the area where the virtual object is located; on the contrary, the corresponding virtual scores are all higher at the points of the region where the virtual object is located.
And constructing a virtual confidence map, wherein the flow is shown in FIG. 2. Firstly, carrying out thresholding treatment on an obtained virtual score map, and firstly selecting and recording points with positive virtual scores; a fixed percentage N% is set, the first N% of all points with positive virtual scores and the positions of these points on the original image are selected and recorded. These points are referred to as high virtual response points. A fixed and relatively small constant M is set (e.g., let M e 10, 100), and the first M points of all the points whose virtual scores are positive and the positions of the points on the original image are selected and recorded. These points are referred to as the highest virtual response points. The number of highest virtual response points is much smaller than the number of high virtual response points. And integrating the high virtual response point, the highest virtual response point and the position information of the original image where the high virtual response point and the highest virtual response point are located to form a virtual confidence map. The virtual confidence map construction result can better reflect the relation of whether the point on the image belongs to the virtual object or not. Namely: the experimental results show that: in the virtual confidence map, high virtual response points are generally concentrated in the region where the virtual object is located; on the contrary, the points in the area where the virtual object is located are distributed more densely with corresponding high virtual response points. Similarly, the highest virtual response point generally appears only in the region where the virtual object is located; on the contrary, in the area where the virtual object is located, more highest virtual response points generally appear. The high virtual response point and the highest virtual response point appearing outside the virtual object region are referred to as noise points.
The rough shape and position inference flow of the virtual object bounding box is shown in fig. 3, and comprises the following steps: dividing sub-regions and obtaining candidate central points; expanding and searching to obtain a high-virtual-response-point distribution dense area; determining a virtual object preliminary candidate area; and determining a virtual object candidate area. Specifically, the sub-region division is to divide the obtained virtual confidence map into five equal-area and overlapped sub-regions. The flowchart for obtaining the candidate center point is shown in fig. 4, and the center point of the distribution of the high virtual response points in each sub-region is obtained by using a mean shift algorithm according to the distribution of the high virtual response points in each sub-region, and the center point is called as the candidate center point. The number of candidate center points is k (k is less than or equal to 5, and the case where k is less than 5 corresponds to the absence of high virtual response points in some sub-regions), and it is not assumed here that the center point of the region corresponding to the virtual object must be one of the k candidate center points. Expanding search to obtain a dense area with high virtual response point distribution, and the process is shown in fig. 5: for each candidate center point, dynamically constructing sequentially increased circular search areas by taking the candidate center point as a circle center and taking the length increased according to a fixed step length as a radius, and considering that the boundary of a virtual object area is searched when the number of high virtual response points in the current search area is not increased any more; ideally, the condition for stopping the extended search is that when the search radius increases, the increment of the number of high virtual response points in the search area is zero, but in order to eliminate the influence of noise points existing in the virtual confidence map, a noise suppression parameter is set, and the condition for stopping the extended search is strengthened as follows: the number of high virtual response points in the search area must increase more than the noise suppression parameter as the search radius increases.
The virtual object preliminary candidate area determination is to approximately calculate the shape of a candidate object in an area with high virtual response points densely distributed, and form a virtual object preliminary candidate area by combining the position information of the area. When the extended search is stopped, a high virtual response point dense distribution region is obtained, a set P of all high virtual response points in the region is known, and a candidate object shape in the region can be obtained according to P, namely the shape of a candidate object bounding box is represented as:
xmin=min({x|<x,y>∈P});xmax=max({x|<x,y>∈P});
ymin=min({y|<x,y>∈P});ymax=max({y|<x,y>∈P});
wherein xmin、xmaxRespectively representing the minimum value and the maximum value in the x direction in the image coordinates corresponding to the region where the candidate object bounding box is located; y ismin、ymaxRespectively representing the minimum value and the maximum value in the y direction in the image coordinate corresponding to the position of the candidate object bounding box. The position and shape of the candidate bounding box relative to the image can be determined.
The candidate object shapes in the region are combined with the position information (candidate center point positions) of the region to form a virtual object preliminary candidate region.
And selecting one of the k virtual object preliminary candidate regions with the largest weighting number according to the number of high virtual response points and the highest virtual response points contained in the k virtual object preliminary candidate regions, and taking the selected one as a virtual object candidate region, wherein the region contains information of the approximate shape and the position of the virtual object bounding box.
And further optimizing the obtained rough positioning of the virtual object to reduce errors possibly occurring in the calculation process of the virtual object candidate area, thereby obtaining the final detection result of the virtual object. The method comprises the following specific steps: taking a region with the area twice as large as that of the virtual object candidate region around the virtual object candidate region; constructing a plurality of mutually overlapped large-size detection windows with the same shape and size as the virtual object candidate area in the area (the size range of the large-size detection windows is usually [200, 500] x [200, 500], and the specific values of the length and the width are equal to the length and the width of a virtual object bounding box in the virtual object candidate area); taking an image block in each large-size detection window and calculating the regional contrast characteristic of the image block; and inputting the region comparison characteristics of the image blocks in all the large-scale detection windows into a region-level virtual-real classifier for classification, and selecting the detection window with the highest corresponding score as a final detection result of the virtual object.
The above description is only a few basic descriptions of the present invention, and any equivalent changes made according to the technical solutions of the present invention should fall within the protection scope of the present invention.
The invention has not been described in detail and is within the skill of the art.
Claims (1)
1. The method for distinguishing and detecting the virtual object of the augmented reality scene based on the confidence map is characterized by comprising the following implementation steps of:
(1) constructing an augmented reality scene training data set by taking an augmented reality image containing a virtual object as a positive sample and a real scene image as a negative sample; selecting virtual and real classification features by using the physical imaging difference of the virtual object and the real image;
(2) on a training data set, respectively extracting the self characteristics of the regions of an augmented reality scene and a real scene by using the virtual and real classification characteristics, and constructing a pixel-level virtual and real classifier;
(3) on the training data set, respectively extracting the region comparison characteristics of the augmented reality scene and the real scene by using the virtual-real classification characteristics, and constructing a region-level virtual-real classifier;
(4) a testing augmented reality scene is given, a pixel-level virtual-real classifier and a small-size detection window are used for detection, and a virtual score map reflecting the virtual-real classification result of each pixel is obtained;
(5) defining a virtual confidence map, and obtaining the virtual confidence map of the tested augmented reality scene by thresholding on the basis of the virtual score map;
(6) based on the virtual confidence map, carrying out virtual object rough positioning to obtain the rough shape and position of the virtual object bounding box;
(7) on the basis of rough positioning of the virtual object, detecting by using a region-level virtual-real classifier and a large-size detection window in a test augmented reality scene image to obtain a final detection result of the virtual object;
the virtual and real classification features selected in the step (1) comprise: local statistics, surface gradient, a second basic form and a Bellam flow, wherein the virtual and real classification features corresponding to each pixel point of the image can be extracted and obtained;
when the pixel-level classifier is constructed in the step (2), only the virtual object region is selected as a positive sample region for the augmented reality scene image on the training data set; for the real scene image, only the area similar to the virtual object in the positive sample is selected as the negative sample area; for a given image area, calculating the virtual and real classification characteristics of each point in the area; compressing the virtual and real classification features of the given positive and negative sample regions by using a rotational inertia compression method to obtain the self-features of the regions corresponding to the regions; inputting the region self characteristic set of the positive and negative samples into a support vector machine classifier for training to obtain a pixel-level virtual-real classifier;
when the region-level virtual-real classifier is constructed in the step (3), regarding the positive and negative sample regions as object regions to be judged on a training data set; the rectangular region with the same area outside the region bounding box is taken as a background region where the object is located; respectively extracting the virtual and real classification characteristics of each point in the object region and the background region; counting the virtual and real classification features corresponding to all points in the object region and the background region, and respectively forming a joint distribution histogram of the object region features and a joint distribution histogram of the background region features; calculating the chi-square distance between the two histograms, and regarding the chi-square distance as a characteristic for measuring the difference between the object and the background where the object is located, wherein the characteristic is called as a regional contrast characteristic; inputting the extracted region comparison feature set of the positive and negative samples into a support vector machine classifier for training to obtain a region-level virtual-real classifier;
the step (4) of constructing the virtual score map comprises the following steps: for an input augmented reality scene image, scanning the whole image by using a small-size detection window in a small moving step length; calculating the self characteristics of the area of the small image block in each small-size detection window; inputting the self-characteristics of the regions of all the small image blocks into a pixel-level virtual-real classifier to obtain the self-characteristic score of the region of each small image block, wherein the high score indicates that the pixel-level classifier has high certainty factor for classifying the image block into a virtual region; because the size of the detection window is very small relative to the whole image and the detection window is densely distributed, the characteristic score of the area of each small image block can be mapped to the central pixel of the image block and used as the virtual score of the central pixel point; therefore, a virtual score map of the whole augmented reality scene image is formed;
the method for constructing the virtual confidence map in the step (5) comprises the following steps: thresholding is carried out on a virtual score map of an augmented reality scene image, and points with positive virtual scores are recorded; setting a fixed percentage N%, and recording the first N% of all points with positive virtual scores and the positions of the points on the original image, wherein the points are called high virtual response points; setting a fixed and relatively small constant M, and recording the front M points of all points with positive virtual scores and the positions of the points on the original image, wherein the points are called as the highest virtual response points; the parameter setting can ensure that the highest virtual response point is also contained in the set where the high virtual response points are located, namely the highest virtual response point is a part with the highest virtual score value in the high virtual response points; synthesizing the high virtual response point, the highest virtual response point and the position information of the original image where the high virtual response point and the highest virtual response point are located to form a virtual confidence map;
the method for obtaining the rough shape and the position of the bounding box of the virtual object in the step (6) comprises the following steps: dividing the obtained virtual confidence map into five equal-area and overlapped sub-regions, and respectively obtaining the distribution center of the high virtual response point in each sub-region; taking the center of the sub-region as a candidate virtual object center point, and respectively carrying out outward expansion search from each center point to obtain a region with high virtual response point distribution density; for the areas with high virtual response points densely distributed, respectively and approximately calculating the shapes of the candidate objects in the areas, and combining the position information of the areas to form a virtual object preliminary candidate area; in the preliminary virtual object candidate region, selecting one with the largest weighting number according to the number of high virtual response points and the highest virtual response points contained in the preliminary virtual object candidate region, and taking the selected one as a virtual object candidate region, wherein the region contains rough shape and position information of a virtual object bounding box;
the virtual object detection method in the step (7) is specifically as follows: densely sampling around a virtual object candidate region in a test augmented reality scene, constructing a plurality of mutually overlapped detection windows, classifying by using a region-level virtual-real classifier, and selecting the detection window with the best score as a final detection result of the virtual object.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110299857 CN102509104B (en) | 2011-09-30 | 2011-09-30 | Confidence map-based method for distinguishing and detecting virtual object of augmented reality scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110299857 CN102509104B (en) | 2011-09-30 | 2011-09-30 | Confidence map-based method for distinguishing and detecting virtual object of augmented reality scene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102509104A CN102509104A (en) | 2012-06-20 |
CN102509104B true CN102509104B (en) | 2013-03-20 |
Family
ID=46221185
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110299857 Expired - Fee Related CN102509104B (en) | 2011-09-30 | 2011-09-30 | Confidence map-based method for distinguishing and detecting virtual object of augmented reality scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102509104B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11830214B2 (en) * | 2018-06-01 | 2023-11-28 | Apple Inc. | Methods and devices for detecting and identifying features in an AR/VR scene |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102798583B (en) * | 2012-07-13 | 2014-07-30 | 长安大学 | Ore rock block degree measurement method based on improved FERRET |
WO2014193342A1 (en) | 2013-05-28 | 2014-12-04 | Hewlett-Packard Development Company, L.P. | Mobile augmented reality for managing enclosed areas |
WO2015062164A1 (en) * | 2013-10-31 | 2015-05-07 | The Chinese University Of Hong Kong | Method for optimizing localization of augmented reality-based location system |
CN105654504A (en) * | 2014-11-13 | 2016-06-08 | 丁业兵 | Adaptive bandwidth mean value drift object tracking method based on rotary inertia |
CN104780180B (en) * | 2015-05-12 | 2019-02-12 | 国电物资集团有限公司电子商务中心 | A kind of Virtual Reality Platform based on mobile terminal |
CN104794754B (en) * | 2015-05-12 | 2018-04-20 | 成都绿野起点科技有限公司 | A kind of Distributed Virtual Reality System |
CN104869160B (en) * | 2015-05-12 | 2018-07-31 | 成都绿野起点科技有限公司 | A kind of Distributed Virtual Reality System based on cloud platform |
CN108492374B (en) * | 2018-01-30 | 2022-05-27 | 青岛中兴智能交通有限公司 | Application method and device of AR (augmented reality) in traffic guidance |
CN111739084B (en) * | 2019-03-25 | 2023-12-05 | 上海幻电信息科技有限公司 | Picture processing method, atlas processing method, computer device, and storage medium |
CN112270063B (en) * | 2020-08-07 | 2023-03-28 | 四川航天川南火工技术有限公司 | Sensitive parameter hypothesis testing method for initiating explosive system |
CN115346002B (en) * | 2022-10-14 | 2023-01-17 | 佛山科学技术学院 | Virtual scene construction method and rehabilitation training application thereof |
CN117315375B (en) * | 2023-11-20 | 2024-03-01 | 腾讯科技(深圳)有限公司 | Virtual part classification method, device, electronic equipment and readable storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101520904B (en) * | 2009-03-24 | 2011-12-28 | 上海水晶石信息技术有限公司 | Reality augmenting method with real environment estimation and reality augmenting system |
WO2011084720A2 (en) * | 2009-12-17 | 2011-07-14 | Qderopateo, Llc | A method and system for an augmented reality information engine and product monetization therefrom |
CN101893935B (en) * | 2010-07-14 | 2012-01-11 | 北京航空航天大学 | Cooperative construction method for enhancing realistic table-tennis system based on real rackets |
-
2011
- 2011-09-30 CN CN 201110299857 patent/CN102509104B/en not_active Expired - Fee Related
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11830214B2 (en) * | 2018-06-01 | 2023-11-28 | Apple Inc. | Methods and devices for detecting and identifying features in an AR/VR scene |
Also Published As
Publication number | Publication date |
---|---|
CN102509104A (en) | 2012-06-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102509104B (en) | Confidence map-based method for distinguishing and detecting virtual object of augmented reality scene | |
CN104933414B (en) | A kind of living body faces detection method based on WLD-TOP | |
Zheng et al. | Non-local scan consolidation for 3D urban scenes | |
CN103116763B (en) | A kind of living body faces detection method based on hsv color Spatial Statistical Character | |
Avgerinakis et al. | Recognition of activities of daily living for smart home environments | |
US11106897B2 (en) | Human facial detection and recognition system | |
US9626585B2 (en) | Composition modeling for photo retrieval through geometric image segmentation | |
CN110309860A (en) | The method classified based on grade malignancy of the convolutional neural networks to Lung neoplasm | |
WO2019071976A1 (en) | Panoramic image saliency detection method based on regional growth and eye movement model | |
CN105139004A (en) | Face expression identification method based on video sequences | |
CN113052185A (en) | Small sample target detection method based on fast R-CNN | |
EP3073443B1 (en) | 3d saliency map | |
CN107301643B (en) | Well-marked target detection method based on robust rarefaction representation Yu Laplce's regular terms | |
Yi et al. | Motion keypoint trajectory and covariance descriptor for human action recognition | |
CN104834909B (en) | A kind of new image representation method based on Gabor comprehensive characteristics | |
CN107025444A (en) | Piecemeal collaboration represents that embedded nuclear sparse expression blocks face identification method and device | |
CN112801037A (en) | Face tampering detection method based on continuous inter-frame difference | |
CN108564043B (en) | Human body behavior recognition method based on space-time distribution diagram | |
JP2011150605A (en) | Image area dividing device, image area dividing method, and image area dividing program | |
CN107230220B (en) | Novel space-time Harris corner detection method and device | |
CN102968793B (en) | Based on the natural image of DCT domain statistical property and the discrimination method of computer generated image | |
CN117456376A (en) | Remote sensing satellite image target detection method based on deep learning | |
CN102163343A (en) | Three-dimensional model optimal viewpoint automatic obtaining method based on internet image | |
Sun et al. | Devil in the details: Delving into accurate quality scoring for DensePose | |
Liu et al. | Energy-based global ternary image for action recognition using sole depth sequences |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130320 Termination date: 20150930 |
|
EXPY | Termination of patent right or utility model |