CN1920822A - Interactive calligraphic character K approaching search method - Google Patents

Interactive calligraphic character K approaching search method Download PDF

Info

Publication number
CN1920822A
CN1920822A CN 200610053409 CN200610053409A CN1920822A CN 1920822 A CN1920822 A CN 1920822A CN 200610053409 CN200610053409 CN 200610053409 CN 200610053409 A CN200610053409 A CN 200610053409A CN 1920822 A CN1920822 A CN 1920822A
Authority
CN
China
Prior art keywords
word
inquiry
arest neighbors
index
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200610053409
Other languages
Chinese (zh)
Other versions
CN100401304C (en
Inventor
庄越挺
吴飞
庄毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CNB2006100534095A priority Critical patent/CN100401304C/en
Publication of CN1920822A publication Critical patent/CN1920822A/en
Application granted granted Critical
Publication of CN100401304C publication Critical patent/CN100401304C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a k neighbor inquire method of interactive writing brush word, wherein the invention can realize the interactive index and search based on mantic of writing brush word, that user can adjust the index to improve the inquire accuracy. The inventive method comprises: first calculating the distance of each couple of writing brush words in the word base at certain threshold value, to generate one local distance diagram, and building the index based on B+tree of said diagram; when user provides one sample writing brush word, system based on demand searches the word similar to the word; the user based on dynamic feedback selects the word similar to the mantic of said word. Therefore, the invention can based on the feedback information of user dynamically adjust the local distance diagram, to avoid irrelevant words to keep high inquire accuracy.

Description

Interactive calligraphic character K approaching search method
Technical field
The present invention relates to database and multimedia field, relate in particular to a kind of interactive calligraphic character K approaching search method.
Background technology
China the successive dynasties calligraphy work, accumulation the esthetic sentiment of Chinese traditional, philosophical thinking and culture psychology speciality, be the rarity in the Chinese traditional culture.The preservation medium of calligraphy work is stone, bone, metal, bamboo, paper etc. normally, is inconvenient to carry, and is damaged easily, is not easy to sharing of resource, is unfavorable for the again utilization of people to cultural resource.These calligraphy wories are carried out digitizing, described, management, and provide the retrieval service of Chinese calligraphy's word for the user by Digital Library Portals, reach resource sharing, to help the art lover to appreciate the artistic beauty of different dynasty different authors different-styles, the variation of research calligraphist various years calligraphic style helps culture of historical fan's studying history and history culture, makes art and history live again.This is to propaganda and carry forward the civilization and history of China, represents the excellent culture of China, makes people learn, appreciate Chinese calligraphy more easily and has great social significance and academic significance, has a tremendous social and economic benefits.
China's transition for thousands of years make that the different calligraphy volume morphings of same Chinese character are different.Writing brush word has following characteristic:
(1) stroke distortion: horizontal pen is uneven, and perpendicular pen is not straight, and a folding turning becomes circular arc.Sometimes or even for aesthetic feeling is intentional twist, as withered word.
(2) complicacy: different style, the stroke of this connection not to connect, and linking together of should not connecting.According to statistics, a Chinese character on average has 12.71 strokes [1], and the size of each stroke depends on the total stroke number of this word, and the size of each section depends on total hop count of this stroke.
(3) ambiguity: because works are experienced all kinds of historical vicissitudes, be subjected to the influence of natural cause, the part stroke may be smudgy.
Say that in essence writing brush word is a kind of handwritten form.Identification about handwritten form had a lot of researchs, document [3] has been looked back the mainstream technology online and identification of off-line handwritten form. and some comparatively successful researchs about handwritten form identification have appearred at present, as document [4] the Washington manuscript is discerned, document [5] however Hebraic clerical type is classified. rarer document is introduced the retrieval of Chinese calligraphy word and the research work of index aspect. in document [2] lining, execute people such as Bole and proposed a kind of ancient books content search method, this method is by the mode of Chinese character barycenter in the multistage calculating ancient books book, successfully the ancient books Chinese character of normalized written is retrieved, yet for writing lack of standardization and from the calligraphy work in different dynasties, this method is difficult to gather effect.
High-dimensional Index Technology has experienced 20 years of researches [7], and the technology of employing mainly is divided three classes: the first kind is based on the tree index of data and space burst, as R-tree[8] and mutation [9,10] etc.But these tree index methods only are fit to the lower situation of dimension, and along with the increase of dimension, the performance of its index often is inferior to sequential search, and dimension is in case increase, its inquiry overlay area increases very fast, causes the rapid decline of inquiry velocity, produces " dimension disaster "; Second class is to adopt approximate method to represent original vector, as VA-file[11] and IQ-tree[12] etc. the basic thought of these class methods is to quicken sequential search speed by the higher-dimension point data being compressed and being similar to storage.The inquiry precision after yet data compression and the information dropout that quantizes to bring make it filter first is also unsatisfactory.Although reduced simultaneously IO number of disk, the upper bound and lower bound owing to needing bit strings to decode to calculate simultaneously to the query point distance cause very high CPU computing cost; Last class is to carry out the higher-dimension inquiry by high dimensional data being converted into one-dimensional data, comprises NB-Tree[13] and iDistance[14] etc.(0, yardstick distance 0...0) is mapped to the one-dimensional space with the high dimensional data point to NB-Tree each point by calculating higher dimensional space, then these distance values is set up index with the B+ tree, thereby the higher-dimension inquiry is changed into the range query of the one-dimensional space with initial point O.Although it can obtain the result fast, because it can not effectively reduce search space, particularly when dimension is very high, range query efficient rapid deterioration.NB-Tree is a kind of method based on single reference point, iDistance is based on the method for multiple reference points, by introducing multiple reference points and having reduced the hunting zone of high-dimensional data space in conjunction with the method for cluster effectively, improved the inquiry precision, yet its search efficiency depends on choosing of reference point to a great extent and relies on data clusters and burst.Because unavoidably there is information dropout in iDistance when high dimensional data is mapped to one-dimensional distance, it is not very desirable causing inquiring about precision simultaneously.Under the worst situation, search space almost can cover whole higher dimensional space.
1 Wu helps the longevity, Ding Xiaoqing, " Chinese Character Recognition-principle, method and realization ". Beijing: Higher Education Publishing House, 1992
2 execute the Bole, Zhang Liang, and Wang Yong, Chen Zhifeng is based on the computing machine ancient books content search method of visual similarity. software journal .12 (9), 2001, pp.1336-1342
3R.Palmondon?and?S.N.Srihari,On-Line?and?Off-Line?Handwriting?Recognition:AComprehensive?Survey,IEEE?Transactions?on?Pattern?Analysis?and?Machine?Intelligence,Vol.22,No.1,January?2000,pp.63-84.
4T.M.Rath,S.Kane,A.Lehman,E.Partridge?and?R.Manmatha,Indexing?for?a?Digital?Libraryof?George?Washington’s?Manuscripts:A?Study?of?Word?Matching?Techniques,Technical?Report,Center?for?Intelligent?Information?Retrieval,University?of?Massachusetts,2002.
5Itay?Bar?Yosef,Klara?Kedem,etc,Classification?of?Hebrew?Calligraphic?Handwriting?Styles:Preliminary?Results.In?Proc.of?the?First?International?Workshop?on?Document?Image?Analysisfor?Libraries(DIAL’04),Palo?Alto,California,2004,pp.299-305.
6Yueting?Zhuang,Xiafeng?Zhang,et?al,Retrieval?of?Chinese?Calligraphic?Character?Image.InProc.of?PCM?2004
7Christian?Bhm,Stefan?Berchtold,Daniel?Keim:Searching?in?High-dimensional?Spaces:IndexStructurcs?for?Improving?the?Performance?of?Multimedia?Databases.ACM?Computing?Surveys33(3),2001.
8A.Guttman.R-tree:A?dynamic?index?structure?for?spatial?searching.In?Proc.of?the?ACMSIGMOD?Int.Conf.on?Management?of?Data.1984.pp.47-54.
9N.Beckmann,H.-P.Kriegel,R.Schneider,B.Seeger.The?R*-tree:An?Efficient?and?RobustAccess?Method?for?Points?and?Rectangles.In?Proc.ACM?SIGMOD?Int.Conf.on?Managementof?Data.1990,pp.322-331.
10S.Berchtold,D.A.Keim?and?H.P.Kriegel.The?X-tree:An?index?structure?for?high-dimensionaldata.In?Proc.22th?Int.Conf.on?Very?Large?Data?Bases,1996,pp.28-37.
11R.Weber,H.Schek?and?S.Blott.A?quantitative?analysis?and?performance?study?forsimilarity-search?methods?in?high-dimensional?spaces.In?Proc.24th?Iht.Conf.on?Very?LargeData?Bases,1998,pp.194-205.
12S.Berchtold,C.Bohm,H.P.Kriegel,J.Sander?and?H.V.Jagadish.Independent?quantization:An?index?compression?technique?for?high-dimensional?data?spaces.In?Proc.16th?Int.Conf.onData?Engineering,2000,pp.577-588.
13M?J.Fonseca?and?J?A.Jorge.NB-Tree:An?Indexing?Structure?for?Content-Based?Retrieval?inLarge?Databases.In?Proc.of?the?8th?International?Conference?on?Database?Systems?forAdvanced?Applications,Kyoto,Japan,Mar?2003,pp.267-274.
14H.V.Jagadish,B.C.Ooi,K.L.Tan,C.Yu,R.Zhang:iDistance:An?Adaptive?B+-tree?BasedIndexing?Method?for?Nearest?Neighbor?Search.ACM?Transactions?on?Data?Base?Systems,30,2,364-397,June?2005.
Summary of the invention
The present invention seeks to improve the inquiry precision, a kind of interactive calligraphic character K approaching search method is provided in order to improve the performance of k neighbour inquiry.
The technical scheme that technical solution problem of the present invention is adopted is:
1) at first reference word all is used as in each word in the higher dimensional space, under corresponding pseudo range fault value condition, calculates and obtain the candidate similar respectively, by circulation, generate a local distance figure, and this figure is set up the index of setting based on B+ to this word; By each inquiry of user, dynamically adjust this local distance figure afterwards by relevant feedback;
2) employing is found inquiry V based on the hypersphere heart reorientation of hierarchical clustering and unitized start distance qArest neighbors word V p
3) by arest neighbors word V pFinish pseudo-k neighbour with relevant feedback and inquire about Pk-NN, return Query Result.
Described employing is based on the hypersphere heart reorientation of hierarchical clustering and unitized start distance USD: by writing brush word is carried out hierarchical clustering, it is gathered into T class, each word after the cluster can be expressed as:
Word (V i) ∷=<numbering (i), the numbering of affiliated class (CID)>(3)
Then that it is corresponding USD combines the index key assignments that obtains this word with the numbering of this word place class, as the formula (4):
key ( V i ) = CID + USD ( V i ) MAX _ USD - - - ( 4 )
Wherein CID represents word V qThe numbering of affiliated class, MAX_USD is a constant, is provided with to be wide enough so that the maximum query context of each word is [CID, CID+1], at last n key assignments is set up based on B+ tree index;
For inquiry word V q, to find the needed least radius value of its arest neighbors word be ε in order, the size of this value is estimated to obtain by the statistical distribution situation to the nearest neighbor distance Δ of each writing brush word in the calligraphy character library; When the user submits an inquiry word V to q, at first be that radius passes through T cycle calculations judgement and inquiry hypersphere Θ (V with ε q, ε) crossing class hypersphere; In these class hyperspheres, try to achieve the arest neighbors word V of inquiry word then pThe new hypersphere heart as the candidate; In like manner, when two hyperspheres intersect, obtain earlier and V qThe word of arest neighbors is made comparisons with candidate's arest neighbors word that the last time circulation obtains then, tries to achieve apart from V qNearest word; At last, when two hyperspheres are all non-intersect, continue circulation, finally obtain word V qArest neighbors word V p
By arest neighbors word V pFinish pseudo-k neighbour with relevant feedback and inquire about Pk-NN: introduced relevant feedback, when k gets greatly, V qPseudo-k-NN inquiry only return less than k arest neighbors word; When k gets hour, to V qPseudo-k-NN inquiry return k arest neighbors word.
Beneficial effect of the present invention: the inquiry precision that can significantly improve writing brush word search efficiency while index makes the user can obtain the writing brush word based on identical semanteme fast also along with user's relevant feedback continues to improve.
Description of drawings
Fig. 1 is an interactive calligraphic character k neighbour inquiry system architectural schematic;
Fig. 2 is the FB(flow block) of interactive calligraphic character K approaching search method;
Fig. 3 (a) satisfies VDT (V p" it " word corresponding virtual inquiry radius synoptic diagram of) 〉=Δ+r condition;
Fig. 3 (b) satisfies VDT (V p" it " word corresponding virtual inquiry radius synoptic diagram of)<Δ+r condition;
Fig. 4 is the Gaussian distribution example synoptic diagram of Δ;
Fig. 5 is a hypersphere heart reorientation synoptic diagram;
Fig. 6 is the approximate minimum hypersphere synoptic diagram that surrounds;
Fig. 7 is the retrieval example synoptic diagram through feeding back not;
Fig. 8 is through the retrieval example synoptic diagram of feedback.
Specific implementation method
The concrete implementation step of interactive calligraphic character K approaching search method is as follows:
(1) local distance index of the picture:
For supporting the efficient accurate content-based similar inquiry of writing brush word, a kind of interactive high dimensional indexing structure at the calligraphy character seach characteristics---local distance figure (PDM) is proposed.By related feedback information in conjunction with the user, can more effectively dwindle search space, guaranteed higher precision ratio when improving search efficiency.The basic thought of PDM index is for an inquiry word V q, by its arest neighbors (the most similar) word V pFinish inquiry with pregenerated local distance figure.
According to observation, for any given writing brush word V to the calligraphy character seach result iTo this word distance value less than 150 the word of (below be defined as MAX_VDT) all very possible similar to it, in other words, two distances just fully can not be similar greater than 150 word, therefore only needs to consider getting final product as the candidate index key assignments from the word less than 150 with this character-spacing.Simultaneously for arbitrary word V i, similar to it and from its farthest distance value (below be defined as VDT (V i)) all may be not exclusively the same, can set by user's relevant feedback.Therefore, in PDM, respectively with each writing brush word as the reference word, will with its distance value less than the word of the vicinity of a certain fault value key assignments as index.
Definition 1 (pseudo range fault value). given two writing brush word V iAnd V j, V iPseudo range fault value (be designated as VDT (V i)) be meant V iWith V jDistance, V wherein jBe to be appointed as and V by user's relevant feedback iThe word similar and distance is the longest, formalization representation is: VDT (V i)=d (V i, V j), V wherein jWith V iDistance farthest and and V iSimilar and V i, V j∈ Ω.
For example, as shown in Figure 3, a given inquiry writing brush word V qAnd V q Ω, V pBe V qThe most contiguous writing brush word.Must there be a word V R, make it and V PSemanteme is identical and distance is the longest, so with V RWith V PDistance table be shown VDT (V P).Different writing brush word V iThere is different VDT.Pseudo range fault value table (VDTT) is used for writing down the VDT of each word, can guarantee a high precision ratio constantly thereby upgrade VDTT and revise PDM by user's relevant feedback simultaneously.
Definition 2 (local distance figure). local distance figure (being designated as PDM) is expressed as adjacency list, wherein a d Ij∈ PDM and D IjThe distance of representing i word and its j contiguous word.
Pseudo range fault value table (being designated as VDTT) is the sequence of the VDT of each word correspondence of record, is expressed as: VDTT=<<1, VDT (V 1),<2, VDT (V 2) ...,<n, VDT (V n), VDT (V wherein i) expression i word pseudo range fault value.
Definition 3 (maximum pseudo range fault values). maximum pseudo range fault value (being designated as MAX_VDT) refers to the initial pseudo range fault value of each word, all is greater than the VDT of itself, i.e. MAX_VDT 〉=max{VDT (V 1), VDT (V 2) ..., VDT (V n).
For writing brush word, rule of thumb MAX VDT is made as 150, the initial VDT value of each word is 150 among the expression VDTT.According to user's related feedback information, progressively adjust the VDT value of each word.The increment type that below is VDTT safeguards that it is one and continues and dynamic process.At first inquire about the related feedback information of (being designated as PkNNQuery), be divided into two kinds of situations and dynamically update VDTT by user's each pseudo-k neighbour.It should be noted that the renewal for VDTT, MIN_K generally is set at 40 for the minimum number of all words similar to inquiry word in the calligraphy character library that the user sets.Have only as k during greater than MIN_K, the candidate of returning just can comprise in the calligraphy character library and V qSimilar whole words do not omit (being that recall ratio is 100%), otherwise authorized user do not carry out relevant feedback.V in addition qArest neighbors word V pPass through V in the 2nd step of algorithm qThe reorientation of the hypersphere heart obtain.Flag[V p]=TRUE represents V pPassed through relevant feedback.
Input: a VDTT, PDM index RI, V q
Output: VDTT after the renewal and PDM index
(1) enters circulation
(2)S←PkNNQuery(V q,k);
(3) as k>MIN_K and flag[V p]=FALSE then
(4) by user's relevant feedback, obtain apart from V pFarthest and similar word V r
(5) calculate V pWith V rDistance is also upgraded VDTT;
(6) otherwise as k<MIN_K and flag[V p]=TRUE then
(7) by user's relevant feedback, obtain apart from V pFarthest and similar word V r
(8) if VDT is (V p)<d (V p, V r) then
(9) with V rAdd the PDM index to and upgrade VDTT;
(10) return VDTT and PDM index after the renewal;
(11) end loop;
With other based on the indexing means of distance different be, in local distance figure, each word in the higher dimensional space all is taken as reference word, is the similarity distance that radius (distance) upper limit is calculated each candidate in its radius (distance) scope with separately VDT respectively.Like this n of a higher dimensional space word just change into the one-dimensional space O (the individual distance value of n * k), wherein k<<n.For these distance values are carried out fast query, need set up efficient index to it.Because the similarity distance value of any two writing brush word is far longer than 1, need carries out normalization to it and handle simultaneously, make that any two writing brush word distance after handling is less than or equal to 1, like this for writing brush word V i, its index key can be expressed as:
key ( V i ) = i + d ( V i , V j ) MAX _ VDT - - - ( 5 )
Adopt the B+ tree to carry out index for the key assignments of these one dimensions. from formula (4) as can be seen the maximum query context of single word be [i, i+1].Be the generating algorithm of local distance index of the picture below, comprise the initialization (1-3 is capable) of VDTT and PDM index and PDM is set up index (4-12 is capable) two parts, wherein the conversion of function T ransValue () expression distance value.
Input: calligraphy character library Ω
Output: PDM index RI
(1) for each the word V among the calligraphy character library Ω i
(2) VDTT initialization;
(3) create B+ tree index RI;
(4) recirculate by two, as d (V i, V j) less than VDT (V i) then
(5) value of adjusting the distance d (V i, V j) be converted to key assignments and be inserted into the B+ tree;
(6) return PDM index RI;
(2) based on the hypersphere heart reorientation of cluster and unitized start distance:
The reorientation of the hypersphere heart is to find apart from inquiry word V qNearest that word V pThe present invention adopts and quickens the inquiry of arest neighbors word (1-NN) based on cluster and unitized start distance indexing means, by in advance writing brush word being carried out hierarchical clustering, it is gathered into T class, and each word after the cluster can be expressed as:
Word (V i) ∷=<numbering (i), the numbering of affiliated class (CID)>(6)
Unitized pilot with its correspondence obtains its index key assignments apart from combining with the numbering of this word place class then, as the formula (7):
key ( V i ) = CID + USD ( V i ) MAX _ USD - - - ( 7 )
Wherein CID represents word V qThe numbering of affiliated class, MAX_USD is a constant, is provided with to be wide enough so that the maximum query context of each word is [CID, CID+1].N key assignments set up based on B+ tree index at last.
For inquiry word V q, it is ε that the needed least radius value of its arest neighbors word is found in order.The size of this value can be estimated by the statistical distribution situation to the nearest neighbor distance Δ of each writing brush word in the calligraphy character library, as shown in Figure 4, the frequency that the Δ value of each word correspondence drops on different range satisfies Gaussian distribution (red line is represented the result of Gauss curve fitting), therefore can obtain the maximum likelihood estimator of corresponding σ.According to " 3 σ principle ", stochastic variable X satisfies P (μ-3 σ<X≤μ+3 σ)=0.9974 arbitrarily, that is to say that when the value range of X was 3 σ, the probability of getting the arest neighbors word was 99.74%, near 100% again.So make ε=3 σ.
When the user submits an inquiry word V to qAfter, as shown in Figure 5, at first be that radius passes through (cluster number) cycle calculations judgement hypersphere Θ (V T time with ε q, ε) position with these class hyperspheres concerns (the 2nd row). and comprise Θ (V when satisfying certain class hypersphere q, (the 3rd row) carries out the subrange inquiry by index in the time of ε), and the candidate that this inquiry obtains is calculated and V qDistance, get that word V of distance value minimum pThe new hypersphere heart (the 4th row) as the candidate withdraws from circulation (the 5th row) at last; In like manner, (the 6th row) obtains earlier and V when two hyperspheres intersect qThe word of arest neighbors (the 7th row) is made comparisons (eighth row) with candidate's arest neighbors word that the last time circulation obtains then, in order relatively whether to intersect with other class hypersphere, does not need end loop; At last, when two hyperspheres are all non-intersect (the 9th row), continue circulation (the 10th row).Below be hypersphere heart reorientation algorithm:
Input: writing brush word Ω and inquiry example writing brush word V q
Output: V qArest neighbors word V p
(1) initialization;
(2) for each class hypersphere Θ (O j, CR j)
(3) as Θ (O j, CR j) comprise Θ (V q, ε) then
(4) in j class hypersphere, return apart from V qNearest word V pAnd withdraw from circulation;
(5) as Θ (O j, CR j) and Θ (V q, ε) intersect then
(7) in j class hypersphere, return apart from V qNearest word V p
(8) make comparisons with the candidate's arest neighbors word that obtained last time, return the arest neighbors word;
(9) otherwise continue circulation up to end;
(10) return arest neighbors word V p
(3) pseudo-k search algorithm neighbour
At the writing brush word index characteristics based on PDM, the present invention proposes a kind of improvement of k-NN inquiry---pseudo-k neighbour inquiry (being designated as Pk-NN).Owing to introduced relevant feedback, made when k gets greatly, to V qApproximate k-NN inquiry not necessarily guarantee to return k arest neighbors word.Because in the calligraphy storehouse with V qThe quantity of similar word is limited, may be less than the k of user's setting, so be called pseudo-k-NN inquiry.Need to prove that if do not add relevant feedback the Pk-NN inquiry has just become common k-NN inquiry.
Shown in the broken circle among Fig. 6 (the approximate minimum hypersphere that surrounds), wherein dash area is represented real query context based on the hunting zone of the pseudo-k-NN inquiry of PDM; It is divided into two stages, as shown in Figure 2, at first finds inquiry word V by the reorientation of the hypersphere heart qArest neighbors word V p, be to carry out at last based on V pPseudo-k-NN inquiry, its essence is to obtain k arest neighbors writing brush word by nestedly calling the range query algorithm.Concrete steps are as follows: a given V qAnd k, at first pass through V qHypersphere heart reorientation (the 1st row) find its arest neighbors word V p, initialization and calculate V then qWith V pDistance (the 2nd row), enter circulation at last, beginning is to remove to carry out range query (4-5 is capable) with a less radius, when the candidate number that obtains during greater than k, then finds at this candidate collection S middle distance inquiry word V by circulate (eighth row) q(the individual word of ‖ S ‖-k-1) and farthest with they deletions (6-7 is capable).Just like this, obtain k arest neighbors word.Jump out While circulation (the 9th row) at last.Otherwise, when inquiring about radius r greater than V pVirtual inquiry radius the time, stop inquiry (the 10th row).Need to prove that in this case, the candidate number of returning can be less than k.:
Input: inquiry word V q, k
Output: Query Result s
(1) to V qThe reorientation of the hypersphere heart obtain V p
(2) initialization;
(3) be not more than k and (bStop=FALSE) as candidate number ‖ S ‖, continue circulation;
(4) increase radius r;
(5) to V pCarry out the range query that radius is r, obtain Query Result S;
(6) when returning candidate number ‖ S ‖ then greater than k
(7) in candidate, delete apart from V q‖-k-1 word of ‖ S farthest and jump out circulation;
(8) otherwise as r>VQR (V p) then
(9) do not have k the word similar, withdraw from circulation to inquiry word;
(10) end loop return results S;
As shown in Figure 7, when the user submit to one " my god " word, from the calligraphy character library, retrieve the candidate similar by the PDM index to this word shape, then the user can according to relevant feedback judge in these candidate which word with " day " semanteme is identical, which is different.In this way, dynamically update local distance figure, make higher precision ratio of maintenance of this searching system.Fig. 7 is the result without the interactive calligraphic character retrieval of feedback.
Similarly, as shown in Figure 8,, go out the candidate similar to " topic " by the PDM indexed search when the user submits " topic " word to.In this way, dynamically update local distance figure, make higher precision ratio of maintenance of this searching system.The result of Fig. 8 for retrieving through the interactive calligraphic character of feedback.

Claims (3)

1. interactive calligraphic character K approaching search method is characterized in that:
1) at first reference word all is used as in each word in the higher dimensional space, under corresponding pseudo range fault value condition, calculates and obtain the candidate similar respectively, by circulation, generate a local distance figure, and this figure is set up the index of setting based on B+ to this word; By each inquiry of user, dynamically adjust this local distance figure afterwards by relevant feedback;
2) employing is found inquiry V based on the hypersphere heart reorientation of hierarchical clustering and unitized start distance qArest neighbors word V p
3) by arest neighbors word V pFinish pseudo-k neighbour with relevant feedback and inquire about Pk-NN, return Query Result.
2. a kind of interactive calligraphic character K approaching search method according to claim 1, it is characterized in that, described employing is based on the hypersphere heart reorientation of hierarchical clustering and unitized start distance USD: by writing brush word is carried out hierarchical clustering, it is gathered into T class, and each word after the cluster can be expressed as:
Word (V i): :=<numbering (i), the numbering of affiliated class (CID)〉(1)
Then that it is corresponding USD combines the index key assignments that obtains this word with the numbering of this word place class, as the formula (2):
key ( V i ) = CID + USD ( V i ) MAX _ USD . . . ( 2 )
Wherein CID represents word V qThe numbering of affiliated class, MAX_USD is a constant, is provided with to be wide enough so that the maximum query context of each word is [CID, CID+1], at last n key assignments is set up based on B+ tree index;
For inquiry word V q, to find the needed least radius value of its arest neighbors word be ε in order, the size of this value is estimated to obtain by the statistical distribution situation to the nearest neighbor distance Δ of each writing brush word in the calligraphy character library; When the user submits an inquiry word V to q, at first be that radius passes through T cycle calculations judgement and inquiry hypersphere Θ (V with ε q, ε) crossing class hypersphere; In these class hyperspheres, try to achieve the arest neighbors word V of inquiry word then pThe new hypersphere heart as the candidate; In like manner, when two hyperspheres intersect, obtain earlier and V qThe word of arest neighbors is made comparisons with candidate's arest neighbors word that the last time circulation obtains then, tries to achieve apart from V qNearest word; At last, when two hyperspheres are all non-intersect, continue circulation, finally obtain word V qArest neighbors word V p
3. a kind of interactive calligraphic character K approaching search method according to claim 1 is characterized in that, and is described by arest neighbors word V pFinish pseudo-k neighbour with relevant feedback and inquire about Pk-NN: introduced relevant feedback, when k gets greatly, V qPseudo-k-NN inquiry only return less than k arest neighbors word; When k gets hour, to V qPseudo-k-NN inquiry return k arest neighbors word.
CNB2006100534095A 2006-09-14 2006-09-14 Interactive calligraphic character K approaching search method Expired - Fee Related CN100401304C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006100534095A CN100401304C (en) 2006-09-14 2006-09-14 Interactive calligraphic character K approaching search method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006100534095A CN100401304C (en) 2006-09-14 2006-09-14 Interactive calligraphic character K approaching search method

Publications (2)

Publication Number Publication Date
CN1920822A true CN1920822A (en) 2007-02-28
CN100401304C CN100401304C (en) 2008-07-09

Family

ID=37778548

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100534095A Expired - Fee Related CN100401304C (en) 2006-09-14 2006-09-14 Interactive calligraphic character K approaching search method

Country Status (1)

Country Link
CN (1) CN100401304C (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101324904B (en) * 2008-07-04 2010-08-11 西安交通大学 High-dimension index structure technique of equipment failure cases based on distance measurement
CN108460137A (en) * 2018-03-09 2018-08-28 广西师范大学 A kind of range query data fragmentation optimization method based on merging deviation threshold
CN109446293A (en) * 2018-11-13 2019-03-08 嘉兴学院 A kind of parallel higher-dimension nearest Neighbor

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101324904B (en) * 2008-07-04 2010-08-11 西安交通大学 High-dimension index structure technique of equipment failure cases based on distance measurement
CN108460137A (en) * 2018-03-09 2018-08-28 广西师范大学 A kind of range query data fragmentation optimization method based on merging deviation threshold
CN108460137B (en) * 2018-03-09 2021-07-20 广西师范大学 Range query data fragmentation optimization method based on combined deviation threshold
CN109446293A (en) * 2018-11-13 2019-03-08 嘉兴学院 A kind of parallel higher-dimension nearest Neighbor
CN109446293B (en) * 2018-11-13 2021-12-10 嘉兴学院 Parallel high-dimensional neighbor query method

Also Published As

Publication number Publication date
CN100401304C (en) 2008-07-09

Similar Documents

Publication Publication Date Title
Fernando et al. Mining mid-level features for image classification
CN101055585A (en) System and method for clustering documents
CN1717685A (en) Information storage and retrieval
Cha et al. The GC-tree: a high-dimensional index structure for similarity search in image databases
CN101079033A (en) Integrative searching result sequencing system and method
CN1967536A (en) Region based multiple features Integration and multiple-stage feedback latent semantic image retrieval method
CN1503167A (en) Information storage and retrieval
CN1920818A (en) Transmedia search method based on multi-mode information convergence analysis
CN1858737A (en) Method and system for data searching
CN1920831A (en) Method and system for managing object information on network
CN1746891A (en) Information handling
Zhao et al. Approximate k-NN graph construction: a generic online approach
CN101030230A (en) Image searching method and system
CN106919658B (en) A kind of large-scale image words tree search method and system accelerated based on GPU
CN1920822A (en) Interactive calligraphic character K approaching search method
Mosbah et al. Distance selection based on relevance feedback in the context of CBIR using the SFS meta-heuristic with one round
Prasomphan Toward Fine-grained Image Retrieval with Adaptive Deep Learning for Cultural Heritage Image.
CN107391647B (en) Patent retrieval method and system for carrying out word embedding expansion under composite domain view angle
Mao et al. On optimizing distance-based similarity search for biological databases
Mohamed et al. Quantized ranking for permutation-based indexing
Mathkour et al. A comprehensive survey on genome sequence analysis
Riemenschneider et al. Image retrieval by shape-focused sketching of objects
Zheng et al. Tensor index for large scale image retrieval
CN1920821A (en) Calligraphic character search method based on data lattice
Lee et al. An efficient method of computing the k-dominant skyline efficiently by partition value

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20080709

Termination date: 20120914