CN101354728B - Method for measuring similarity based on interval right weight - Google Patents

Method for measuring similarity based on interval right weight Download PDF

Info

Publication number
CN101354728B
CN101354728B CN2008102229984A CN200810222998A CN101354728B CN 101354728 B CN101354728 B CN 101354728B CN 2008102229984 A CN2008102229984 A CN 2008102229984A CN 200810222998 A CN200810222998 A CN 200810222998A CN 101354728 B CN101354728 B CN 101354728B
Authority
CN
China
Prior art keywords
proper vector
interval
image
weights
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2008102229984A
Other languages
Chinese (zh)
Other versions
CN101354728A (en
Inventor
黄祥林
杨丽芳
李荫碧
吕锐
张洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Communication University of China
Original Assignee
Communication University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Communication University of China filed Critical Communication University of China
Priority to CN2008102229984A priority Critical patent/CN101354728B/en
Publication of CN101354728A publication Critical patent/CN101354728A/en
Application granted granted Critical
Publication of CN101354728B publication Critical patent/CN101354728B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for measuring the similarity. The method is based on an interval weight and belongs to the field of multimedia searching. The method similarly measures the eigen vectors of any two images. In the process of measuring the similarity, firstly, the difference value of each dimension of the component corresponding to the two eigen vectors is calculated, all the difference values of the component are normalized in (0, 1); the (0, 1) is divided into intervals and one weight is distributed to each interval; secondly, the interval where all the normalized difference values belong is judged, and the weights are acquired; and finally, all the weights are accumulated to calculate the average value, and the average value is taken as the similarity measuring value of the eigen vectors of the two images. The method neglects the difference which is quantified to the component of the same interval by dividing intervals and appointing the same weight to the same interval, and reflects the significance of each interval to the similarity measurement by the distribution of the weight of each interval. In the process of matching the similarity, the searching efficiency of images is improved.

Description

A kind of method for measuring similarity based on interval right weight
Technical field
The present invention is a kind of method for measuring similarity based on interval right weight, belongs to the multimedia retrieval field.
Background technology
The selection of method for measuring similarity is based on a key link of the image retrieval of content.In the CBIR system, at first give system by the submit queries image, system carries out feature extraction to query image, obtain the proper vector of query image, adopt the similarity measurement algorithm that the proper vector and the proper vector in the characteristics of image storehouse that obtain are carried out the similarity coupling then, final system returns to the user with the picture similar to query image.Method for measuring similarity commonly used has: city block distance, Euclidean distance or the like.
The various digital pictures of in computing machine, storing, owing to light difference or the like in scanning or transmission course introducing noise (for example scanning the e-book that obtains) and shooting process causes the image of identical content to have certain difference, in addition, associated picture also there are differences in subrange, therefore, in the process of coupling, the user wishes that system can have good robustness, can ignore these little differences and picture retrievals that these are relevant come out.Existing method for measuring similarity the difference of every dimension component between the characteristics of image component is not carried out interval division and weights distribute, to dwindle the influence of this difference to whole matching process.
Summary of the invention
The present invention proposes a kind of method for measuring similarity based on interval right weight, this method is distributed by the difference between every dimensional feature component being carried out interval division and weights, and then can improve the recall precision of image retrieval.
Overall thought of the present invention is as follows: the present invention carries out similarity measurement to the proper vector of any two width of cloth images.In the process of carrying out similarity measurement, at first calculate the difference of every dimension component corresponding between these two proper vectors, and all component differences are carried out [0,1] normalization; Again [0,1] is carried out interval division and is weights of each interval distribution; The interval of judging all normalized differences then and being fallen into obtains their weights; At last all weights are added up and average, and with the similarity measurement value of this average as this two width of cloth characteristics of image vector.
Concrete innovative point: the difference of every dimension component corresponding between two proper vectors has been carried out normalization, interval division and weights distributed, the present invention is by the given identical weights of interval division and same interval, thereby ignored the difference between the component of quantizing to same interval, distributed by each interval weights and reflected the importance of each interval similarity measurement.
Technical scheme of the present invention is: the retrieving images that is used for this image indexing system can be the image of bmp form (or extended formatting), be stored on hard disc of computer or the mobile storage medium, at first select query image, carry out corresponding computing and processing by computing machine again by the user.Its main process is: computer system receives the query image of user's input, by searching system it is handled again.
The concrete grammar step is:
At first, the prior off-line of searching system carries out pre-service and feature extraction to all images in the image library, obtains the proper vector of all images in the storehouse, forms the proper vector storehouse of image.And then by the user input query image, searching system is carried out pre-service and feature extraction to query image, obtain the proper vector of query image, and the proper vector in the proper vector of query image and the characteristics of image vector storehouse carried out the similarity coupling, the image the most similar to query image returned to the user.
Described proper vector and the concrete grammar that carries out the similarity coupling of the proper vector in the characteristics of image vector storehouse with query image is as follows:
Have proper vector A, the B of two width of cloth images to be matched now, suppose known: proper vector A={a i, proper vector B={b i, wherein: a iBe the i dimension component of proper vector A, b iBe the i dimension component of proper vector B, the value of i is 0,1 ...., L-1, L are the length of proper vector, round numbers.Similarity between the method for measuring similarity calculated characteristics based on interval right weight vectorial A, the B that employing the present invention proposes, step is as follows:
1) pairing every dimension component a among proper vector A, the B to two width of cloth images i, b i(i=0,1 ...., L-1) ask difference, and then difference carried out [0,1] normalization:
Calculate the difference a between corresponding every dimension component earlier i-b i, adopt following formula that the difference between every dimension component is carried out normalization then:
β i=|a i-b i|/max(|a i|,|b i|,|a i-b i|),i=0,1,.......,L-1
Wherein: β iFor to a i-b iValue after the normalization, its span are [0,1].
2) inhomogeneous (or evenly) carried out in [0,1] interval and divides, and distribute weights to the interval of each division:
At first, [0,1] inhomogeneous (or evenly) is divided into N interval (N gets certain integer of 4~8 usually), that is: [K 0=0, K 1), [K 1, K 2) ..., [K K-1, K k) ..., [K N-1, K N=1], then, is each interval [K K-1, K k) weights W of distribution k(k=1,2 ..., N, W kGeneral span is 0~10), W kCan distribute according to actual conditions, but round numbers also can be got decimal.
3) determine difference β between characteristics of image component after the normalization iThe pairing weights Q of ∈ [0,1] i:
If the difference β between the characteristics of image component after the normalization i∈ [K K-1, K k), difference β then iPairing weights are step 2) middle interval [K K-1, K k) the weights W that distributed k, i.e. Q i=W k, the value of i is 0,1 ...., L-1; The value of k is 1,2 ..., N.
4) all weights that obtain in the step 3) are added up average, obtain the similarity measurement value S between proper vector A, the B A, B:
S A , B = 1 L Σ i = 0 L - 1 Q i
The present invention is by the given identical weights of interval division and same interval, thereby ignored the difference between the component of quantizing to same interval, reflected the importance of each interval by each interval weights distribution to similarity measurement, in the process of similarity coupling, can improve the efficient of image retrieval.
Description of drawings
Fig. 1 image indexing system overall flow block diagram
Fig. 2 interval right weight synoptic diagram
The recall level average that the retrieval of Fig. 3 (a) system obtains
The average precision ratio that the retrieval of Fig. 3 (b) system obtains
Embodiment
The invention will be further described below in conjunction with accompanying drawing:
The technical scheme of present embodiment is as shown in Figure 1:
At first, the prior off-line of searching system carries out pre-service and feature extraction to all images in the image library, obtains the proper vector of all images in the storehouse, forms the proper vector storehouse of image.And then by the user input query image, searching system is carried out pre-service and feature extraction to query image, obtain the proper vector of query image, and the proper vector in the proper vector of query image and the characteristics of image vector storehouse carried out the similarity coupling, the image the most similar to query image returned to the user.
The retrieving images that is used for this image indexing system can be the image of bmp form (or extended formatting), is stored on hard disc of computer or the mobile storage medium, at first selects query image by the user, carries out corresponding computing and processing by computing machine again.The process of Computer Processing is: computer system receives the query image of user's input, by searching system it is handled again.Computing machine in the present embodiment is " Tsing Hua Tong Fang's microcomputer, Intel (R) Celeron (R) CPU 3.20GHz, 1.25G internal memory, a 80G hard disk ", adopts the VC++6.0 programming to realize.
The concrete grammar that proper vector and the proper vector in the characteristics of image vector storehouse with query image in the present embodiment carried out the similarity coupling is as follows:
The image data base that adopts is the document image data storehouse that is obtained by scanner scanning, and by the proper vector of Density Distribution feature extraction algorithm extraction file and picture, the proper vector A that obtains, the length of B are: the piecemeal piece of 2 * image is counted M, that is: L=2M.(in the present embodiment, M changes according to concrete piecemeal piece number is different in the experimentation, and this experimental system will provide 9 concrete M values.)
1) establishes proper vector A={a i, proper vector B={b i; a iBe the i dimension component of proper vector A, b iBe the i dimension component of proper vector B, the value of i is 0,1 ..., L-1, L are the length of proper vector, round numbers.To the proper vector A of two width of cloth images, every dimension component a of B correspondence i, b i(i=0,1 ...., L-1) to ask difference, and then difference is normalized in [0,1], the value of establishing after the normalization is β i, concrete method for normalizing is:
Calculate the difference a between corresponding every dimension component earlier i-b i, adopt following formula that the difference between every dimension component is carried out normalization then:
β i=|a i-b i|/max(|a i|,|b i|,|a i-b i|),i=0,1,.......,L-1
Wherein: β iFor to α i-b iValue after the normalization, its span are [0,1].
2) (adopting inhomogeneous division herein) divided in [0,1] interval, and distributes weights to the interval of each division:
At first, [0,1] is divided into 4 intervals, that is: [0,0.02), [0.02,0.05), [0.05,0.1) and [0.1,1], then, distribute a weights W for each is interval k(k=1,2,3,4), be specially [0,0.02) in, given weights W 1Be 5; [0.02,0.05) in, given weights W 2Be 2; [0.05,0.1) in, given weights W 3Be 1; In [0.1,1], given weights W 4Be 0.
3) determine difference β between characteristics of image component after the normalization iThe pairing weights Q of ∈ [0,1] i:
β iSpan be divided into 4 intervals, work as β i[0,0.02) in, given weights Q iBe 5; β i[0.02,0.05) in, given weights Q iBe 2; β i[0.05,0.1) in, given weights Q iBe 1; β iIn [0.1,1], given weights Q iBe 0, proper vector A and proper vector B's then apart from S A, BFor:
S A , B = 1 L Σ i = 0 L - 1 Q i
Q i = 5 β i ∈ [ 0,0.02 ) 2 β i ∈ [ 0.05,0.05 ) 1 β i ∈ [ 0.05,0.1 ) 0 β i ∈ [ 0.1,1 ]
Fig. 2 has provided the interval right weight synoptic diagram of distance between calculated characteristics vector A and the proper vector B.
In the document image data storehouse of this embodiment, have 900 width of cloth pictures, the associated picture that every width of cloth picture all has 9 width of cloth to be obtained through various conversion process by same file and picture, 13 width of cloth pictures are returned in each retrieval.Fig. 3 has provided native system and has randomly drawed some width of cloth images, the method for measuring similarity based on interval right weight that adopts city block distance, Euclidean distance and the present invention to provide respectively, the recall level average and the precision ratio of the every width of cloth image that records.Recall ratio and precision ratio are defined as follows:
Recall ratio (recall) represents that with R precision ratio (precision) is represented with P.
R = N A N A + N C
P = N A N B + N A
Wherein: N ABe the associated picture number that retrieval is returned, N BBe the incoherent picture number that retrieval is returned, N CBe the associated picture number that does not have retrieval to come out in the image library.
At Fig. 3 (a) with (b), the pairing one-dimension array count1=[1 of its horizontal ordinate span, 2,3,4,5,6,7,8,9], its expression is image division different piece numbers respectively, and pairing number array is: count2=[1 * 1,2 * 2,4 * 4,8 * 8,12 * 12,16 * 16,24 * 24,32 * 32,48 * 48], that is: 1 expression of the numerical value in the horizontal ordinate is not 1 to image division, M=1; Numerical value 2 expression in the horizontal ordinate is image division 2 * 2 promptly 4, M=4; Numerical value 3 expression in the horizontal ordinate is image division 4 * 4 promptly 16, M=16; And the like just can know this time that according to horizontal ordinate the experiment based on the Density Distribution feature has become several to image division, obtains the M value.
As can be seen from Figure 3, the present invention propose based on the method for measuring similarity of interval right weight under 9 kinds of partitioned modes that provide, recall ratio that obtains and precision ratio are better than Euclidean distance all the time.Because recall ratio is similar with the distribution curve that precision ratio obtains in this experimental system, and the recall ratio and the precision ratio that adopt Euclidean distance to obtain are minimum all the time, therefore, the recall ratio distribution plan that only adopts method for measuring similarity and city block distance based on interval right weight proposed by the invention to obtain to experimental system is herein analyzed.
Recall ratio distribution curve from Fig. 3 (a) as can be seen, only when image is divided into 48 * 48, the recall ratio that the method for measuring similarity based on interval right weight that adopts the present invention to provide obtains is a little less than city block distance, other the time all be better than or be equal to city block distance.When especially file and picture being divided into 2 * 2, what the present invention proposed can be up to 92.3% based on the recall ratio of the method for measuring similarity of interval right weight, and this moment city block distance recall ratio only be 5.3%.
Experimental result shows that the present invention can improve the efficient of image retrieval in conjunction with some feature extraction algorithm.
Below the invention will be further described:
1) simplification of algorithm of the present invention: when the proper vector that is used for similarity coupling is normalized proper vector, i.e. every dimension component a of proper vector A and proper vector B i, b iAll normalize in [0,1], this algorithm can be reduced to β i=| a i-b i|, the subsequent step of algorithm is as above.Algorithm after the simplification can greatly improve arithmetic speed owing to saved division arithmetic in the computation process.
2) β iThe division that [0,1] at place is interval: according to each segment the characteristic of similarity influence is divided, but unsuitable split hairs.Common 4~8 intervals get final product, and can guarantee that certain accuracy of measuring similarity has reduced operand again.
3) interval weights distribute: according to the interval significance level of similarity is distributed, can distribute according to experiment experience, also can distribute by other optimized Algorithm.
4) method for measuring similarity of the present invention's proposition mainly applies to high dimensional feature value similarity comparison aspects such as multimedia retrieval.

Claims (1)

1. method for measuring similarity based on interval right weight, concrete steps are: at first, the prior off-line of searching system carries out pre-service and feature extraction to all images in the image library, obtains the proper vector of all images in the storehouse, forms the proper vector storehouse of image; And then by the user input query image, searching system is carried out pre-service and feature extraction to query image, obtain the proper vector of query image, and the proper vector in the proper vector of query image and the characteristics of image vector storehouse carried out the similarity coupling, the image the most similar to query image returned to the user; It is characterized in that: described proper vector and the concrete grammar that carries out the similarity coupling of the proper vector in the characteristics of image vector storehouse with query image is as follows:
The proper vector of supposing two width of cloth images to be matched is respectively A, B, proper vector A={a i, proper vector B={b i; a iBe the i dimension component of proper vector A, b iBe the i dimension component of proper vector B, the value of i is 0,1 ...., L-1, L are the length of proper vector, round numbers;
1) to the proper vector A of two width of cloth images, every dimension component a of B correspondence i, b i(i=0,1 ...., L-1) ask difference, i.e. a i-b i, adopt following formula that the difference between every dimension component is carried out normalization then:
β i=|a i-b i|/max(|a i|,|b i|,|a i-b i|),i=0,1,.......,L-1
Wherein: β iFor to a i-b iValue after the normalization, its span are [0,1];
2) inhomogeneous or even division is carried out in [0,1] interval, and is distributed weights to the interval of each division:
At first, with [0,1] inhomogeneous or evenly be divided into N interval, the value of N is the integer between 4~8, that is: K 0=[0, K 1), [K 1, K 2) ..., [K K-1, K k) ..., [K N-1, K N=1], then, is each interval [K K-1, K k) weights W of distribution k, wherein: W kSpan be 0~10, k=1,2 ..., N;
3) determine difference β between characteristics of image component after the normalization iPairing weights Q i:
If the difference β i ∈ [K between the characteristics of image component after the normalization K-1, K k), difference β then iPairing weights Q iBe step 2) middle interval [K K-1, K i) weights that distributed, i.e. Q i=W kWherein: the value of i is 0,1 ...., L-1; The value of k is 1,2 ..., N;
4) all weights that obtain in the step 3) are added up average, obtain the similarity measurement value S between proper vector A, the B A, BFor:
S A , B = 1 L Σ i = 0 L - 1 Q i .
CN2008102229984A 2008-09-26 2008-09-26 Method for measuring similarity based on interval right weight Expired - Fee Related CN101354728B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008102229984A CN101354728B (en) 2008-09-26 2008-09-26 Method for measuring similarity based on interval right weight

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008102229984A CN101354728B (en) 2008-09-26 2008-09-26 Method for measuring similarity based on interval right weight

Publications (2)

Publication Number Publication Date
CN101354728A CN101354728A (en) 2009-01-28
CN101354728B true CN101354728B (en) 2010-06-09

Family

ID=40307535

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008102229984A Expired - Fee Related CN101354728B (en) 2008-09-26 2008-09-26 Method for measuring similarity based on interval right weight

Country Status (1)

Country Link
CN (1) CN101354728B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622366B (en) * 2011-01-28 2014-07-30 阿里巴巴集团控股有限公司 Similar picture identification method and similar picture identification device
JP5500194B2 (en) * 2012-03-22 2014-05-21 日本電気株式会社 Captured image processing apparatus and captured image processing method
CN104571468B (en) * 2013-10-11 2017-11-03 中国移动通信集团广东有限公司 A kind of method and apparatus for handling digital picture feature
CN105139012A (en) * 2015-08-25 2015-12-09 长沙市麓智信息科技有限公司 Appearance patent retrieving system based on image pre-processing
CN106355142A (en) * 2016-08-24 2017-01-25 深圳先进技术研究院 A Method and Device for Recognizing Human Falling State
CN108241868B (en) * 2016-12-26 2021-02-02 浙江宇视科技有限公司 Method and device for mapping objective similarity to subjective similarity of image
CN107103620B (en) * 2017-04-17 2020-01-07 北京航空航天大学 Depth extraction method of multi-optical coding camera based on spatial sampling under independent camera view angle
CN110083743B (en) * 2019-03-28 2021-11-16 哈尔滨工业大学(深圳) Rapid similar data detection method based on unified sampling
CN111738194B (en) * 2020-06-29 2024-02-02 深圳力维智联技术有限公司 Method and device for evaluating similarity of face images

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1595427A (en) * 2004-07-05 2005-03-16 南京大学 Digital human face image recognition method based on selective multi-eigen space integration
CN101211341A (en) * 2006-12-29 2008-07-02 上海芯盛电子科技有限公司 Image intelligent mode recognition and searching method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1595427A (en) * 2004-07-05 2005-03-16 南京大学 Digital human face image recognition method based on selective multi-eigen space integration
CN101211341A (en) * 2006-12-29 2008-07-02 上海芯盛电子科技有限公司 Image intelligent mode recognition and searching method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
黄春木,周利莉.密度分布特征及其在二值图像检索中的应用.中国图象图形学报13 2.2008,13(2),307-311.
黄春木,周利莉.密度分布特征及其在二值图像检索中的应用.中国图象图形学报13 2.2008,13(2),307-311. *

Also Published As

Publication number Publication date
CN101354728A (en) 2009-01-28

Similar Documents

Publication Publication Date Title
CN101354728B (en) Method for measuring similarity based on interval right weight
CN106407311B (en) Method and device for obtaining search result
JP5926291B2 (en) Method and apparatus for identifying similar images
CN105912611B (en) A kind of fast image retrieval method based on CNN
CN102609441B (en) Local-sensitive hash high-dimensional indexing method based on distribution entropy
WO2013129580A1 (en) Approximate nearest neighbor search device, approximate nearest neighbor search method, and program
KR20010031345A (en) Multidimensional data clustering and dimension reduction for indexing and searching
US9613287B2 (en) Local feature descriptor extracting apparatus, method for extracting local feature descriptor, and program
KR20040005895A (en) Image retrieval using distance measure
Jégou et al. Query adaptative locality sensitive hashing
CN103902704A (en) Multi-dimensional inverted index and quick retrieval algorithm for large-scale image visual features
KR20090065130A (en) Indexing and searching method for high-demensional data using signature file and the system thereof
Hammouda et al. Incremental document clustering using cluster similarity histograms
EP3115908A1 (en) Method and apparatus for multimedia content indexing and retrieval based on product quantization
CN110442749B (en) Video frame processing method and device
CN106528629A (en) A vector fuzzy search method and system based on geometric space division
CN110083732B (en) Picture retrieval method and device and computer storage medium
CN116304213B (en) RDF graph database sub-graph matching query optimization method based on graph neural network
Bingham Finding segmentations of sequences
CN108256058B (en) Real-time response big media neighbor retrieval method based on micro-computing platform
Mohamed et al. Quantized ranking for permutation-based indexing
Egas et al. Adapting kd trees to visual retrieval
CN104978729A (en) Image hashing method based on data sensing
Lakshmi et al. Content based image retrieval using signature based similarity search
KR20220095562A (en) Efficient Distributed In-memory High-dimensional Indexing System for Searching Objects in Images

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100609

Termination date: 20140926

EXPY Termination of patent right or utility model