CN104298758A

CN104298758A - Multi-perspective target retrieval method

Info

Publication number: CN104298758A
Application number: CN201410566595.7A
Authority: CN
Inventors: 刘安安; 苏育挺; 曹群
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2014-10-22
Filing date: 2014-10-22
Publication date: 2015-01-21

Abstract

The invention discloses a multi-perspective target retrieval method. The multi-perspective target retrieval method includes steps of acquiring a retrieval target input by a user and view sets of objects in database; utilizing the image characteristics extraction algorithm to extract characteristics of the retrieval target and the objects in the database; clustering the view sets by a clustering method after characteristics extraction and extracting representative views of each type; determining corresponding initial weight of each representative view according to scale of the belonging type and updating the weights by means of relation between the representative views and generating the final weights; establishing a weighted bipartite graph by means of the representative views of the two view sets and their weights; seeking the optimal matching of the weighted bipartite graph by the bipartite graph matching algorithm and acquiring similarity between the retrieval target and each object in the database, sequencing and outputting the sequencing result as retrieval results. By the multi-perspective target retrieval method, accuracy of multi-perspective target retrieval is improved.

Description

Multi-view target retrieval method

Technical Field

The invention relates to the field of image retrieval, in particular to a multi-view target retrieval method.

Background

The existence of the object in real life is spatial, and the perception of the object by human eyes is three-dimensional. Conventional camera technology obtains only two-dimensional planar views of objects, while RGB-D (three primary colors plus distance) cameras, for example: the Kinect three-dimensional somatosensory camera can obtain two-dimensional information and corresponding depth information, and therefore the defects of a traditional camera are overcome. Compared with the three-dimensional model and the image, the three-dimensional model has richer expressed perception details and is closer to the three-dimensional real feeling of human eyes, so that the three-dimensional model is more suitable for the cognitive feeling of human beings.

The problem of obtaining a three-dimensional model is a big problem that must be considered. If a set of three-dimensional model needs to be established every time, the workload is very huge, a great deal of energy and time are inevitably consumed, and the three-dimensional model cannot be completed by ordinary personnel, so that the method is obviously unrealistic. The prior three-dimensional model acquisition is established by self or depends on a three-dimensional scanner, the realization is difficult and inconvenient, the current situation is greatly improved, and the three-dimensional model can be searched and downloaded through a convenient network, so that the quantity of the sharable three-dimensional model presents a blasting growth trend. Therefore, it is necessary to rely on the network to fully utilize the existing model resources^[1]. The rapid development of network technology and the appearance of many search engine systems bring great convenience factors for the sharing and propagation of three-dimensional model resources. Therefore, it has become a problem and a research hotspot to be solved at present to help users to quickly and accurately retrieve a desired model from mass data in a daily database, i.e. to research a three-dimensional model retrieval technology.

The multi-view target retrieval algorithm is mainly divided into two types: text-based retrieval and content-based retrieval^[2]. The text-based retrieval technology is mature due to simple implementation algorithm and very wide in application, but due to inherent defects of the text-based retrieval technology, the amount of information borne by the text is too small, and the rich information such as the geometric dimension, the topological structure, the texture and the like of a three-dimensional object cannot be accurately and effectively described, so that the text-based retrieval technology is not suitable for three-dimensional model retrieval. In contrast, content-based retrieval has the features of: less human intervention, vivid visual effect and high retrieval accuracy. The internal features of the three-dimensional model are automatically calculated and extracted by a machine, and the similarity of the model in the query model and the database is calculated by a specific algorithm, so that a feature retrieval index is established, and the browsing and retrieval functions to be realized are achieved.

When the similarity between two objects is calculated by the current multi-view target retrieval algorithm, mostly only the euclidean distance between the views corresponding to the two objects is calculated, but the relevance and importance among the views of the same object are not considered, and the retrieval accuracy needs to be improved.

Disclosure of Invention

The invention provides a method for multi-view target retrieval, which is described in detail in the following:

a method of multi-perspective target retrieval, the method comprising the steps of:

(1) acquiring a retrieval target input by a user and a view set of an object in a database;

(2) performing feature extraction on the search target and the view set of the object in the database by using an image feature extraction algorithm;

(3) clustering the view sets after feature extraction by adopting a clustering method, and extracting a representative view of each type;

(4) determining the corresponding initial weight of each representative view according to the scale of the class, updating the weight by using the relationship between the representative views, and generating the final weight;

(5) constructing a weighted bipartite graph by using representative views of the two view sets and weight values of the representative views;

(6) and seeking the optimal matching of the weighted bipartite graph by using a bipartite graph matching algorithm, acquiring the similarity between a retrieval target and each object in the database, sequencing the similarity, and taking the sequenced result as retrieval output.

The technical scheme provided by the invention has the beneficial effects that: the method and the device have the advantages that the similarity between the retrieval target and the database object is obtained by clustering the acquired view set of the three-dimensional object, extracting the representative view, providing the weight and combining the bipartite graph optimal matching, so that the accuracy of multi-view target retrieval is improved. The updated weight values include information such as the relationship between the representative views and the size of the cluster. The similarity obtained by utilizing bipartite graph matching comprises the correlation between the two model representative views, and the effect is better than that of simply calculating the Euclidean distance.

Drawings

Fig. 1 is a flowchart of a multi-view target retrieval method.

Fig. 2 is a calibration-recall curve comparison of three algorithms in the ETH database.

FIG. 3 shows NN, FT, and ST comparisons for three algorithms in the ETH database.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.

The embodiment of the invention provides a multi-view target retrieval method, and with reference to fig. 1, the method comprises the following steps:

101: acquiring a retrieval target input by a user and a view set of an object in a database;

wherein the user-entered view-set of search targets and objects in the database is a corresponding set of two-dimensional views representing the three-dimensional objects. These two-dimensional views can be obtained by shooting a real three-dimensional object with a real camera, or by shooting a virtual three-dimensional object with a virtual camera of 3D program software (e.g., 3D MAX).

102: performing feature extraction on the search target and the view set of the object in the database by using an image feature extraction algorithm;

the method can adopt the current popular image visual characteristic extraction algorithm to extract and characterize the characteristics of the view set, and the embodiment of the invention adopts the gradient direction histogram which can effectively characterize the shape and the structural characteristics of the image without loss of generality^[3](history of organized Gradient, short for HOG operator) to perform feature characterization.

The specific calculation method of the HOG operator comprises the following steps: and respectively calculating the gradient of the local area of each view, and constructing a gradient direction histogram by using a statistical means, thereby forming an operator characteristic of the gradient direction histogram for describing the original view.

103: clustering the view sets after feature extraction by adopting a clustering method, and extracting a representative view of each type;

feature extraction can be performed by using a current popular view clustering algorithmClustering the obtained view set without loss of generality by adopting classical K-means^[4]And (4) clustering method.

The K-means clustering method specifically comprises the following steps: firstly, determining the accurate number K of the to-be-clustered, initially selecting K views as clustering centers, and assigning each remaining view to the nearest class according to the distance between the view and each clustering center. And recalculating the average value of the views in each class to form a new cluster center. This process is repeated until the clustering converges. E.g. a set of views of a three-dimensional model M is represented asWhereinIs a two-dimensional view in V, i is the view sequence number, M is the three-dimensional model, n_MIs the number of views. Each view is represented by the HOG operator in 102, and then two views are obtainedAndeuclidean distance between:

d (v_{i}^{M}, v_{j}^{M}) = {(f_{i} - f_{j})}^{T} (f_{i} - f_{j})

where i, j is the view sequence number, f_iAnd f_jAre respectivelyAndand T represents the matrix transpose.

From this view setAfter K-means clustering, K view subsets are obtained, i.e., V ═ V₁，V₂，…，V_kAnd the views in each subset of views are visually similar views. Calculating the sum of Euclidean distances between each view in each class and other views in the class, selecting the view with the minimum sum of the Euclidean distances as a representative view of the class to obtain K representative view sets { rv₁，rv₂，…，rv_i，…，rv_kWherein rv_iIs the ith representative view and i is the representative view sequence number. Specifically, the value of K is generally determined subjectively, mainly by referring to the number of views in the view set, and K is selected to be 15 in this experiment.

104: determining the corresponding initial weight of each representative view according to the scale of the class, updating the weight by using the relationship between the representative views, and generating the final weight;

the specific method comprises the following steps:

1) generating an initial weight;

according to the formula

p_{{rv}_{i}}^{0} = \frac{| N (i) |}{| A |}

Determining an initial weight for each of the representative views| n (i) | is the number of views in the ith class; | A | is the number of views in the model M; further obtain the initial weight value vector

2) Generating a final weight;

it is not accurate enough that the weight value of each representative view depends only on the size of the class. This problem is more pronounced when one representative view is in close proximity to another. Therefore, the weight update must be performed in consideration of the relationship between the selected representative views.

First, an association graph is constructed to describe the relationship between the representative views. Wherein each node represents a representative view and the edge between two nodes represents two representative views rv₁And rv₂Correlation between r (rv)₁，rv₂)。

According to the formula

Finding two representative views rv₁And rv₂The correlation between them; the value of σ is generally determined empirically, and the variance between all representative views is selected as a parameter in the embodiment; d (rv)₁，rv₂) Representing two representative views rv₁And rv₂The euclidean distance between.

Secondly, according to the formula

Finding the value from the representative view rv₁To the representative view rv₂The transition probability of (2); wherein, r (rv)₁，rv₂) Representing two representative views rv₁And rv₂The correlation between them.

Finally, according to the formula

<math> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msubsup> <mi>p</mi> <msub> <mi>rv</mi> <mn>1</mn> </msub> <mrow> <mi>n</mi> <mo>+</mo> <mn>1</mn> </mrow> </msubsup> <mo>=</mo> <mi>γ</mi> <msubsup> <mi>p</mi> <msub> <mi>rv</mi> <mn>1</mn> </msub> <mn>0</mn> </msubsup> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>γ</mi> <mo>)</mo> </mrow> <munder> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>&NotEqual;</mo> <mn>1</mn> </mrow> </munder> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>rv</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>rv</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> <msubsup> <mi>p</mi> <mi>i</mi> <mi>n</mi> </msubsup> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>p</mi> <msub> <mi>rv</mi> <mn>2</mn> </msub> <mrow> <mi>n</mi> <mo>+</mo> <mn>1</mn> </mrow> </msubsup> <mo>=</mo> <mi>γ</mi> <msubsup> <mi>p</mi> <msub> <mi>rv</mi> <mn>2</mn> </msub> <mn>0</mn> </msubsup> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>γ</mi> <mo>)</mo> </mrow> <munder> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>&NotEqual;</mo> <mn>2</mn> </mrow> </munder> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>rv</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>rv</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <msubsup> <mi>p</mi> <mi>i</mi> <mi>n</mi> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>p</mi> <msub> <mi>rv</mi> <mi>k</mi> </msub> <mrow> <mi>n</mi> <mo>+</mo> <mn>1</mn> </mrow> </msubsup> <mo>=</mo> <mi>γ</mi> <msubsup> <mi>p</mi> <msub> <mi>rv</mi> <mi>k</mi> </msub> <mn>0</mn> </msubsup> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>γ</mi> <mo>)</mo> </mrow> <munder> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>&NotEqual;</mo> <mi>k</mi> </mrow> </munder> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>rv</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>rv</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <msubsup> <mi>p</mi> <mi>i</mi> <mi>n</mi> </msubsup> </mtd> </mtr> </mtable> </mfenced> </math>

Respectively solving the final weight of each representative view;，…，after the (n + 1) th iteration, the 1 st representative view, the 2 nd representative view, …, and the k representative view respectively;，…，initial weights for the 1 st, 2 nd, …, k' th representative views, respectively; is a parameter for determining the importance degree of the initial weight value, and γ is selected to be 0.8 in this embodiment; t (rv)_i，rv_k) Is the transition probability from the ith representative view to the kth representative view;is the weight value of the ith representative view after the nth iteration, and k isThe number of clusters, i is more than or equal to 1 and less than or equal to k.

Experience shows that after several iterations, the process is converged and stopped, and the number of iterations is set to 5 in the embodiment. Further obtain the final weight value vector

p^{f} = (p_{{rv}_{1}}^{f}, p_{{rv}_{2}}^{f}, . . ., p_{{rv}_{k}}^{f}) .

105: constructing a weighted bipartite graph by using the representative views of the two view sets and the corresponding weight values;

is provided withIs a representative view-set of the retrieval object A, in whichThe 1 st, 2 nd, … nth representative views of the retrieval object A_aA representative view of a frame, n_aThe number of the representative view sets of the retrieval target;is a representative view-set of an object B in a database, wherein1 st representation respectively representing retrieval target BExemplary view, 2 nd representative view, …, n_bA representative view of a frame, n_bIs the number of representative view-sets of the object;andthe sets of weight values respectively represent the retrieval target a and one object B in the database. And sequentially constructing a weighted bipartite graph for the retrieval target A and all objects in the database. Specifically, the method for constructing the weighted bipartite graph comprises the following steps:

1) establishing a new set R';

since the number of representative views in the representative view sets Q and R is not necessarily the same, the dimension is unified first. In the present embodiment, assume that n_a≥n_bN is a handle_a-n_bA new element is added to R. Let j equal 1, 2, …, n_aIf j is>n_bThen, thenThe number of the air bags is empty,is 0. Thereby ensuring that both view-sets have the same number of representative views for subsequent computational comparison. Thus, a new set R' is established.

2) Calculating a weight value g of a side_i，j；

Each edge g in the weighted bipartite graph_i，j(i,j＝1，2，…，n_a) Representative view representing retrieval target AAnd a representative view of an object in the databaseThe relation between。

According to the formula

<math> <mrow> <msub> <mi>g</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msub> <mo>=</mo> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mrow> <mo>(</mo> <msubsup> <mi>p</mi> <msubsup> <mi>rv</mi> <mi>i</mi> <mi>a</mi> </msubsup> <mi>f</mi> </msubsup> <mo>+</mo> <msubsup> <mi>p</mi> <msubsup> <mi>rv</mi> <mi>j</mi> <mi>b</mi> </msubsup> <mi>f</mi> </msubsup> <mo>)</mo> </mrow> <mo>×</mo> <mi>d</mi> <mrow> <mo>(</mo> <msubsup> <mi>rv</mi> <mi>i</mi> <mi>a</mi> </msubsup> <msubsup> <mi>rv</mi> <mi>j</mi> <mi>b</mi> </msubsup> <mo>)</mo> </mrow> </mtd> <mtd> <mi>if j</mi> <mo>≤</mo> <msub> <mi>n</mi> <mi>b</mi> </msub> <mo>,</mo> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mi>otherwise</mi> <mo>.</mo> </mtd> </mtr> </mtable> </mfenced> </mrow> </math>

Obtaining the weight value g of each edge_i，j(ii) a WhereinAndare respectively representative views of the retrieval target AAnd a representative view of an object B in the databaseThe weight value of (1);representsAndthe euclidean distance between them.

3) Constructing a weighted bipartite graph;

in this embodiment, a weighted bipartite graph G ═ { Q, R ', U } is established from the representative view set Q of the search target a and the representative view set R' of one object B in the database. Wherein each node in the set of nodes Q represents a representative view in the set of representative views Q; each node in the node set R 'represents one representative view in the representative view set R'; set of edges U ═ g_i，jRepresents a weighted relationship between all representative views in the retrieval target a and all representative views of an object B in the database.

And sequentially constructing a weighted bipartite graph for the retrieval target A and all objects in the database.

106: and seeking the optimal matching of the weighted bipartite graph by using a bipartite graph matching algorithm, acquiring the similarity between a retrieval target and each object in the database, sequencing the similarity, and taking the sequenced result as retrieval output.

The optimal matching of the weighted bipartite graph can be solved by adopting the current popular bipartite graph matching algorithm, the Kuhn-Munkres algorithm is adopted without loss of generality^[5]。

1) Finding the best match

And (3) applying Kuhn-Munkres algorithm to the formed weighted bipartite graph G ═ { Q, R', U }, and under the constraint of one-to-one matching, obtaining the subgraph Λ with the minimum weight_MAnd the similarity value is used as the optimal matching of the bipartite graph, and the similarity value between the retrieval target A and an object B in the database is obtained by summing the weights.

Objective function formula based on maximum weight binary matching

<math> <mfenced open='' close=''> <mtable> <mtr> <mtd> <msub> <mi>Λ</mi> <mi>M</mi> </msub> <mo>=</mo> <mi>ar gma</mi> <msub> <mi>x</mi> <mrow> <msub> <mi>Λ</mi> <mi>k</mi> </msub> <mo>&Element;</mo> <mi>Λ</mi> </mrow> </msub> <munder> <mi>Σ</mi> <mrow> <mn>1</mn> <mo>≤</mo> <mi>i</mi> <mo>≤</mo> <mi>n</mi> </mrow> </munder> <msub> <mi>c</mi> <mrow> <msub> <mi>a</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>,</mo> <msub> <mi>b</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msub> </mtd> </mtr> <mtr> <mtd> <mo>=</mo> <mi>ar gma</mi> <msub> <mi>x</mi> <mrow> <msub> <mi>Λ</mi> <mi>k</mi> </msub> <mo>&Element;</mo> <mi>Λ</mi> </mrow> </msub> <munder> <mi>Σ</mi> <mrow> <mn>1</mn> <mo>≤</mo> <mi>i</mi> <mo>≤</mo> <mi>n</mi> </mrow> </munder> <mrow> <mo>(</mo> <mi>G</mi> <mo>-</mo> <msub> <mi>g</mi> <mrow> <msub> <mi>a</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>,</mo> <msub> <mi>b</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msub> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> </math>

And formula of similarity value

<math> <mrow> <msub> <mi>S</mi> <mi>Match</mi> </msub> <mo>=</mo> <msub> <mi>max</mi> <mrow> <msub> <mi>Λ</mi> <mi>k</mi> </msub> <mo>&Element;</mo> <mi>Λ</mi> </mrow> </msub> <munder> <mi>Σ</mi> <mrow> <mn>1</mn> <mo>≤</mo> <mi>i</mi> <mo>≤</mo> <mi>n</mi> </mrow> </munder> <mrow> <mo>(</mo> <mi>G</mi> <mo>-</mo> <msub> <mi>g</mi> <mrow> <msub> <mi>a</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>,</mo> <msub> <mi>b</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </math>

Solving the optimal match lambda_MAnd corresponding similarity values S_Match；Λ_kRepresenting bipartite graph matching; Λ is all possible bipartite graph matches;is an element of an nxn edge efficiency matrix C^[6]；Is a in the bipartite graph_k(i) And b_k(i) Is the weight value of an edge formed by two matching nodes; g is the ratio max (G)_ij) Slightly larger constants, the argmax function represents finding the parameter with the maximum value, and the max function represents finding the maximum value.

2) Similarity ranking

According to the similarity value S between the retrieval target and each object in the database_MatchSorting from big to small, S_MatchLarger means higher similarity between both. And outputting the sorted result as retrieval.

Experiment of

1. Experiment database

The database used for the experiment was an online shared ETH database, which had a total of 80 three-dimensional models, including 8 classes, with 10 objects in each class. Respectively apple, car, cow, cup, dog, horse, pear, tomato.

2. Evaluation criteria

Four evaluation criteria were applied in the experiment^[7]The method comprises the following steps:

(1) nearest neighbor (NN for short): nearest neighbors are the percentage of matches in the query that are closest to belonging to the query class.

(2) First-order precision (First tier, abbreviated as FT): k nearest neighbor matching response, where K is the cardinality of the query class. In this experiment, K is 10.

(3) Second level precision (Second tier, ST for short): response of 2K nearest neighbor match, where K is the cardinality of the query class.

(4) Precision-Recall curve (Precision-Recall): average Response (AR) and Average Precision (AP) in performance evaluation for three-dimensional object retrieval.

Solving AR and AP according to the following formula, and making a standard-recall curve:

Recall = \frac{N_{z}}{N_{r}}

wherein Recall is the response value; n is a radical of_zIs the number of correct retrieval objects; n is a radical of_rIs the number of all relevant objects.

Precision = \frac{N_{z}}{N_{all}}

Wherein Precision is the Precision value; n is a radical of_allIs the number of all retrieved objects.

<math> <mrow> <mi>AR</mi> <mo>=</mo> <msubsup> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>N</mi> <mi>m</mi> </msub> </msubsup> <mi>Recall</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </math>

Wherein N is_mIs the number of three-dimensional model classes; recall (i) is the response value for class i.

<math> <mrow> <mi>AP</mi> <mo>=</mo> <msubsup> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <msub> <mi>N</mi> <mi>m</mi> </msub> </msubsup> <mi>Rrecision</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </math>

Where precision (i) is the precision value of class i.

3. Comparison algorithm

The method was compared experimentally with two methods:

ED^[8](A3D Model Retrieval Based on the Elevation Descriptor), also called a height Descriptor-Based 3D Retrieval algorithm.

CCFV^[9](Camera Constraint-Free View-Based 3D Object Retrieval), also called View-Based 3D Retrieval algorithm under freeview.

4. Results of the experiment

The calibration-recall curve comparison of the three algorithms in the ETH database is shown in fig. 2. Wherein the ordinate represents essence (Precision) and the abscissa represents response (Recall). The larger the area enclosed by the standard-recall curve and the horizontal and vertical coordinates is, the better the retrieval performance is represented.

NN, FT and ST comparisons for the three algorithms in the ETH database are shown in fig. 3. The larger the NN, FT, and ST values are, the better the search performance is represented.

In the checking standard-checking curve, the area enclosed by the curve and the horizontal and vertical coordinates of the method is the largest and is obviously superior to ED and CCFV; in an ETH database, compared with a CCFV algorithm, the NN, FT and ST indexes of the method are respectively higher by 16.25%, 6% and 4.25%; compared with the ED algorithm, the NN, FT and ST indexes are respectively higher by 17.5%, 13.88% and 13%. As shown by experimental results, the method can achieve better retrieval performance compared with ED and CCFV.

Reference to the literature

[1] The building and retrieval method of the semantic web of the three-dimensional model library is researched by Jiahui, Liujian Yuan, Zhang Jiang, the academy of Western Ann post and telecommunications, 2012,17(3):53-57.

[2] Zhenbuchuan. content-based 3D model search technology research [ D ]. zhejiang university, 2004.

[3]Dalal N,Triggs B.Histograms of oriented gradients for human detection[C].//ComputerVision and Pattern Recognition,2005.CVPR 2005.IEEE Computer Society Conference on.IEEE,2005:886-893.

[4] King, queen, von johnson et al, K-means clustering algorithm study overview [ J ] electronic design engineering, 2012,20(7). DOI:10.3969/j.issn.1674-6236.2012.07.008.

[5] Huajian Xin, semantic-based Web service discovery and algorithmic research [ D ]. Changsha university, 2010.

[6]Gao Y,Dai Q,Wang M,et al.3D model retrieval using weighted bipartite graph matching[J].Signal Processing:Image Communication,2011,26(1):39-47.

[7]Gao Y,Dai Q,Zhang N Y.3D model comparison using spatial structure circular descriptor[J].PatternRecognition,2010,43(3):1142-1151.

[8]Shih J L,Lee C H,Wang J T.A new 3D model retrieval approach based on the elevationdescriptor[J].Pattern Recognition,2007,40(1):283-295.

[9]Gao Y,Tang J,Hong R,et al.Camera constraint-free view-based 3-D object retrieval[J].Image Processing,IEEE Transactions on,2012,21(4):2269-2281.

Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for multi-view object retrieval, the method comprising:

2. The method of claim 1, wherein the performing of feature extraction on the search target and the view-set of the object in the database by using the image feature extraction algorithm specifically comprises:

and respectively calculating the gradient of the local area of each view, and constructing a gradient direction histogram by using a statistical means, thereby forming an operator characteristic of the gradient direction histogram for describing the original view.

3. The method for multi-view target retrieval according to claim 1, wherein the clustering operation of the feature-extracted view set by using the clustering method specifically comprises:

firstly, determining the accurate number K to be clustered, initially selecting K views as clustering centers, and assigning each remaining view to the nearest class according to the distance between the view and each clustering center; recalculating the average value of the views in each class to form a new clustering center; this process is repeated until the clustering converges.

4. The method for multi-view object retrieval according to claim 1, wherein the representative view is specifically:

and calculating the sum of Euclidean distances between each view and other views in each class, and selecting the view with the minimum sum of the Euclidean distances with other views as a representative view.

5. The method of claim 1, wherein the initial weight is specifically:

p_{{rv}_{i}}^{0} = \frac{| N (i) |}{| A |}

wherein, | n (i) | is the number of views in the ith cluster; and | A | is the number of views in the model M.

6. The method of claim 1, wherein the final weight is specifically:

<math> <mfenced open='{' close=''> <mtable> <mtr> <mtd> <msubsup> <mi>p</mi> <msub> <mi>rv</mi> <mn>1</mn> </msub> <mrow> <mi>n</mi> <mo>+</mo> <mn>1</mn> </mrow> </msubsup> <mo>=</mo> <msubsup> <mi>γp</mi> <msub> <mi>rv</mi> <mn>1</mn> </msub> <mn>0</mn> </msubsup> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>γ</mi> <mo>)</mo> </mrow> <munder> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>&NotEqual;</mo> <mn>1</mn> </mrow> </munder> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>rv</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>rv</mi> <mn>1</mn> </msub> <mo>)</mo> </mrow> <msubsup> <mi>p</mi> <mi>i</mi> <mi>n</mi> </msubsup> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>p</mi> <msub> <mi>rv</mi> <mn>2</mn> </msub> <mrow> <mi>n</mi> <mo>+</mo> <mn>1</mn> </mrow> </msubsup> <mo>=</mo> <msubsup> <mi>γp</mi> <msub> <mi>rv</mi> <mn>2</mn> </msub> <mn>0</mn> </msubsup> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>γ</mi> <mo>)</mo> </mrow> <munder> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>&NotEqual;</mo> <mn>2</mn> </mrow> </munder> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>rv</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>rv</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <msubsup> <mi>p</mi> <mi>i</mi> <mi>n</mi> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <mo>.</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>p</mi> <msub> <mi>rv</mi> <mi>k</mi> </msub> <mrow> <mi>n</mi> <mo>+</mo> <mn>1</mn> </mrow> </msubsup> <mo>=</mo> <msubsup> <mi>γp</mi> <msub> <mi>rv</mi> <mi>k</mi> </msub> <mn>0</mn> </msubsup> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>γ</mi> <mo>)</mo> </mrow> <munder> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>&NotEqual;</mo> <mi>k</mi> </mrow> </munder> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>rv</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>rv</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <msubsup> <mi>p</mi> <mi>i</mi> <mi>n</mi> </msubsup> </mtd> </mtr> </mtable> </mfenced> </math>

wherein,after the (n + 1) th iteration, the 1 st representative view, the 2 nd representative view, …, and the k representative view respectively;initial weights for the 1 st, 2 nd, …, k' th representative views, respectively; gamma is a parameter for determining the importance degree of the original weight value; t (rv)_i，rv_k) Is the transition from the ith to the kth representative viewShifting the probability;the weight value of the ith representative view after the nth iteration is obtained, k is the number of clusters, and i is more than or equal to 1 and less than or equal to k.