CN1920822A

CN1920822A - Interactive calligraphic character K approaching search method

Info

Publication number: CN1920822A
Application number: CN 200610053409
Authority: CN
Inventors: 庄越挺; 吴飞; 庄毅
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2006-09-14
Filing date: 2006-09-14
Publication date: 2007-02-28
Anticipated expiration: 2026-09-14
Also published as: CN100401304C

Abstract

The invention relates to a k neighbor inquire method of interactive writing brush word, wherein the invention can realize the interactive index and search based on mantic of writing brush word, that user can adjust the index to improve the inquire accuracy. The inventive method comprises: first calculating the distance of each couple of writing brush words in the word base at certain threshold value, to generate one local distance diagram, and building the index based on B+tree of said diagram; when user provides one sample writing brush word, system based on demand searches the word similar to the word; the user based on dynamic feedback selects the word similar to the mantic of said word. Therefore, the invention can based on the feedback information of user dynamically adjust the local distance diagram, to avoid irrelevant words to keep high inquire accuracy.

Description

Interactive calligraphic character K approaching search method

Technical field

The present invention relates to database and multimedia field, relate in particular to a kind of interactive calligraphic character K approaching search method.

Background technology

China the successive dynasties calligraphy work, accumulation the esthetic sentiment of Chinese traditional, philosophical thinking and culture psychology speciality, be the rarity in the Chinese traditional culture.The preservation medium of calligraphy work is stone, bone, metal, bamboo, paper etc. normally, is inconvenient to carry, and is damaged easily, is not easy to sharing of resource, is unfavorable for the again utilization of people to cultural resource.These calligraphy wories are carried out digitizing, described, management, and provide the retrieval service of Chinese calligraphy's word for the user by Digital Library Portals, reach resource sharing, to help the art lover to appreciate the artistic beauty of different dynasty different authors different-styles, the variation of research calligraphist various years calligraphic style helps culture of historical fan's studying history and history culture, makes art and history live again.This is to propaganda and carry forward the civilization and history of China, represents the excellent culture of China, makes people learn, appreciate Chinese calligraphy more easily and has great social significance and academic significance, has a tremendous social and economic benefits.

China's transition for thousands of years make that the different calligraphy volume morphings of same Chinese character are different.Writing brush word has following characteristic:

(1) stroke distortion: horizontal pen is uneven, and perpendicular pen is not straight, and a folding turning becomes circular arc.Sometimes or even for aesthetic feeling is intentional twist, as withered word.

(2) complicacy: different style, the stroke of this connection not to connect, and linking together of should not connecting.According to statistics, a Chinese character on average has 12.71 strokes [1], and the size of each stroke depends on the total stroke number of this word, and the size of each section depends on total hop count of this stroke.

(3) ambiguity: because works are experienced all kinds of historical vicissitudes, be subjected to the influence of natural cause, the part stroke may be smudgy.

Say that in essence writing brush word is a kind of handwritten form.Identification about handwritten form had a lot of researchs, document [3] has been looked back the mainstream technology online and identification of off-line handwritten form. and some comparatively successful researchs about handwritten form identification have appearred at present, as document [4] the Washington manuscript is discerned, document [5] however Hebraic clerical type is classified. rarer document is introduced the retrieval of Chinese calligraphy word and the research work of index aspect. in document [2] lining, execute people such as Bole and proposed a kind of ancient books content search method, this method is by the mode of Chinese character barycenter in the multistage calculating ancient books book, successfully the ancient books Chinese character of normalized written is retrieved, yet for writing lack of standardization and from the calligraphy work in different dynasties, this method is difficult to gather effect.

High-dimensional Index Technology has experienced 20 years of researches [7], and the technology of employing mainly is divided three classes: the first kind is based on the tree index of data and space burst, as R-tree[8] and mutation [9,10] etc.But these tree index methods only are fit to the lower situation of dimension, and along with the increase of dimension, the performance of its index often is inferior to sequential search, and dimension is in case increase, its inquiry overlay area increases very fast, causes the rapid decline of inquiry velocity, produces " dimension disaster "; Second class is to adopt approximate method to represent original vector, as VA-file[11] and IQ-tree[12] etc. the basic thought of these class methods is to quicken sequential search speed by the higher-dimension point data being compressed and being similar to storage.The inquiry precision after yet data compression and the information dropout that quantizes to bring make it filter first is also unsatisfactory.Although reduced simultaneously IO number of disk, the upper bound and lower bound owing to needing bit strings to decode to calculate simultaneously to the query point distance cause very high CPU computing cost; Last class is to carry out the higher-dimension inquiry by high dimensional data being converted into one-dimensional data, comprises NB-Tree[13] and iDistance[14] etc.(0, yardstick distance 0...0) is mapped to the one-dimensional space with the high dimensional data point to NB-Tree each point by calculating higher dimensional space, then these distance values is set up index with the B+ tree, thereby the higher-dimension inquiry is changed into the range query of the one-dimensional space with initial point O.Although it can obtain the result fast, because it can not effectively reduce search space, particularly when dimension is very high, range query efficient rapid deterioration.NB-Tree is a kind of method based on single reference point, iDistance is based on the method for multiple reference points, by introducing multiple reference points and having reduced the hunting zone of high-dimensional data space in conjunction with the method for cluster effectively, improved the inquiry precision, yet its search efficiency depends on choosing of reference point to a great extent and relies on data clusters and burst.Because unavoidably there is information dropout in iDistance when high dimensional data is mapped to one-dimensional distance, it is not very desirable causing inquiring about precision simultaneously.Under the worst situation, search space almost can cover whole higher dimensional space.

1 Wu helps the longevity, Ding Xiaoqing, " Chinese Character Recognition-principle, method and realization ". Beijing: Higher Education Publishing House, 1992

2 execute the Bole, Zhang Liang, and Wang Yong, Chen Zhifeng is based on the computing machine ancient books content search method of visual similarity. software journal .12 (9), 2001, pp.1336-1342

3R.Palmondon?and?S.N.Srihari，On-Line?and?Off-Line?Handwriting?Recognition：AComprehensive?Survey，IEEE?Transactions?on?Pattern?Analysis?and?Machine?Intelligence，Vol.22，No.1，January?2000，pp.63-84.

4T.M.Rath，S.Kane，A.Lehman，E.Partridge?and?R.Manmatha，Indexing?for?a?Digital?Libraryof?George?Washington’s?Manuscripts：A?Study?of?Word?Matching?Techniques，Technical?Report，Center?for?Intelligent?Information?Retrieval，University?of?Massachusetts，2002.

5Itay?Bar?Yosef，Klara?Kedem，etc，Classification?of?Hebrew?Calligraphic?Handwriting?Styles：Preliminary?Results.In?Proc.of?the?First?International?Workshop?on?Document?Image?Analysisfor?Libraries(DIAL’04)，Palo?Alto，California，2004，pp.299-305.

6Yueting?Zhuang，Xiafeng?Zhang，et?al，Retrieval?of?Chinese?Calligraphic?Character?Image.InProc.of?PCM?2004

7Christian?Bhm，Stefan?Berchtold，Daniel?Keim：Searching?in?High-dimensional?Spaces：IndexStructurcs?for?Improving?the?Performance?of?Multimedia?Databases.ACM?Computing?Surveys33(3)，2001.

8A.Guttman.R-tree：A?dynamic?index?structure?for?spatial?searching.In?Proc.of?the?ACMSIGMOD?Int.Conf.on?Management?of?Data.1984.pp.47-54.

9N.Beckmann，H.-P.Kriegel，R.Schneider，B.Seeger.The?R*-tree：An?Efficient?and?RobustAccess?Method?for?Points?and?Rectangles.In?Proc.ACM?SIGMOD?Int.Conf.on?Managementof?Data.1990，pp.322-331.

10S.Berchtold，D.A.Keim?and?H.P.Kriegel.The?X-tree：An?index?structure?for?high-dimensionaldata.In?Proc.22th?Int.Conf.on?Very?Large?Data?Bases，1996，pp.28-37.

11R.Weber，H.Schek?and?S.Blott.A?quantitative?analysis?and?performance?study?forsimilarity-search?methods?in?high-dimensional?spaces.In?Proc.24th?Iht.Conf.on?Very?LargeData?Bases，1998，pp.194-205.

12S.Berchtold，C.Bohm，H.P.Kriegel，J.Sander?and?H.V.Jagadish.Independent?quantization：An?index?compression?technique?for?high-dimensional?data?spaces.In?Proc.16th?Int.Conf.onData?Engineering，2000，pp.577-588.

13M?J.Fonseca?and?J?A.Jorge.NB-Tree：An?Indexing?Structure?for?Content-Based?Retrieval?inLarge?Databases.In?Proc.of?the?8th?International?Conference?on?Database?Systems?forAdvanced?Applications，Kyoto，Japan，Mar?2003，pp.267-274.

14H.V.Jagadish，B.C.Ooi，K.L.Tan，C.Yu，R.Zhang：iDistance：An?Adaptive?B+-tree?BasedIndexing?Method?for?Nearest?Neighbor?Search.ACM?Transactions?on?Data?Base?Systems，30，2，364-397，June?2005.

Summary of the invention

The present invention seeks to improve the inquiry precision, a kind of interactive calligraphic character K approaching search method is provided in order to improve the performance of k neighbour inquiry.

The technical scheme that technical solution problem of the present invention is adopted is:

1) at first reference word all is used as in each word in the higher dimensional space, under corresponding pseudo range fault value condition, calculates and obtain the candidate similar respectively, by circulation, generate a local distance figure, and this figure is set up the index of setting based on B+ to this word; By each inquiry of user, dynamically adjust this local distance figure afterwards by relevant feedback;

2) employing is found inquiry V based on the hypersphere heart reorientation of hierarchical clustering and unitized start distance _qArest neighbors word V _p

3) by arest neighbors word V _pFinish pseudo-k neighbour with relevant feedback and inquire about Pk-NN, return Query Result.

Described employing is based on the hypersphere heart reorientation of hierarchical clustering and unitized start distance USD: by writing brush word is carried out hierarchical clustering, it is gathered into T class, each word after the cluster can be expressed as:

Word (V _i) ∷=＜numbering (i), the numbering of affiliated class (CID)＞(3)

Then that it is corresponding USD combines the index key assignments that obtains this word with the numbering of this word place class, as the formula (4):

key (V_{i}) = CID + \frac{USD (V_{i})}{MAX_USD} - - - (4)

Wherein CID represents word V _qThe numbering of affiliated class, MAX_USD is a constant, is provided with to be wide enough so that the maximum query context of each word is [CID, CID+1], at last n key assignments is set up based on B+ tree index;

For inquiry word V _q, to find the needed least radius value of its arest neighbors word be ε in order, the size of this value is estimated to obtain by the statistical distribution situation to the nearest neighbor distance Δ of each writing brush word in the calligraphy character library; When the user submits an inquiry word V to _q, at first be that radius passes through T cycle calculations judgement and inquiry hypersphere Θ (V with ε _q, ε) crossing class hypersphere; In these class hyperspheres, try to achieve the arest neighbors word V of inquiry word then _pThe new hypersphere heart as the candidate; In like manner, when two hyperspheres intersect, obtain earlier and V _qThe word of arest neighbors is made comparisons with candidate's arest neighbors word that the last time circulation obtains then, tries to achieve apart from V _qNearest word; At last, when two hyperspheres are all non-intersect, continue circulation, finally obtain word V _qArest neighbors word V _p

By arest neighbors word V _pFinish pseudo-k neighbour with relevant feedback and inquire about Pk-NN: introduced relevant feedback, when k gets greatly, V _qPseudo-k-NN inquiry only return less than k arest neighbors word; When k gets hour, to V _qPseudo-k-NN inquiry return k arest neighbors word.

Beneficial effect of the present invention: the inquiry precision that can significantly improve writing brush word search efficiency while index makes the user can obtain the writing brush word based on identical semanteme fast also along with user's relevant feedback continues to improve.

Description of drawings

Fig. 1 is an interactive calligraphic character k neighbour inquiry system architectural schematic;

Fig. 2 is the FB(flow block) of interactive calligraphic character K approaching search method;

Fig. 3 (a) satisfies VDT (V _p" it " word corresponding virtual inquiry radius synoptic diagram of) 〉=Δ+r condition;

Fig. 3 (b) satisfies VDT (V _p" it " word corresponding virtual inquiry radius synoptic diagram of)＜Δ+r condition;

Fig. 4 is the Gaussian distribution example synoptic diagram of Δ;

Fig. 5 is a hypersphere heart reorientation synoptic diagram;

Fig. 6 is the approximate minimum hypersphere synoptic diagram that surrounds;

Fig. 7 is the retrieval example synoptic diagram through feeding back not;

Fig. 8 is through the retrieval example synoptic diagram of feedback.

Specific implementation method

The concrete implementation step of interactive calligraphic character K approaching search method is as follows:

(1) local distance index of the picture:

For supporting the efficient accurate content-based similar inquiry of writing brush word, a kind of interactive high dimensional indexing structure at the calligraphy character seach characteristics---local distance figure (PDM) is proposed.By related feedback information in conjunction with the user, can more effectively dwindle search space, guaranteed higher precision ratio when improving search efficiency.The basic thought of PDM index is for an inquiry word V _q, by its arest neighbors (the most similar) word V _pFinish inquiry with pregenerated local distance figure.

According to observation, for any given writing brush word V to the calligraphy character seach result _iTo this word distance value less than 150 the word of (below be defined as MAX_VDT) all very possible similar to it, in other words, two distances just fully can not be similar greater than 150 word, therefore only needs to consider getting final product as the candidate index key assignments from the word less than 150 with this character-spacing.Simultaneously for arbitrary word V _i, similar to it and from its farthest distance value (below be defined as VDT (V _i)) all may be not exclusively the same, can set by user's relevant feedback.Therefore, in PDM, respectively with each writing brush word as the reference word, will with its distance value less than the word of the vicinity of a certain fault value key assignments as index.

Definition 1 (pseudo range fault value). given two writing brush word V _iAnd V _j, V _iPseudo range fault value (be designated as VDT (V _i)) be meant V _iWith V _jDistance, V wherein _jBe to be appointed as and V by user's relevant feedback _iThe word similar and distance is the longest, formalization representation is: VDT (V _i)=d (V _i, V _j), V wherein _jWith V _iDistance farthest and and V _iSimilar and V _i, V _j∈ Ω.

For example, as shown in Figure 3, a given inquiry writing brush word V _qAnd V _q Ω, V _pBe V _qThe most contiguous writing brush word.Must there be a word V _R, make it and V _PSemanteme is identical and distance is the longest, so with V _RWith V _PDistance table be shown VDT (V _P).Different writing brush word V _iThere is different VDT.Pseudo range fault value table (VDTT) is used for writing down the VDT of each word, can guarantee a high precision ratio constantly thereby upgrade VDTT and revise PDM by user's relevant feedback simultaneously.

Definition 2 (local distance figure). local distance figure (being designated as PDM) is expressed as adjacency list, wherein a d _Ij∈ PDM and D _IjThe distance of representing i word and its j contiguous word.

Pseudo range fault value table (being designated as VDTT) is the sequence of the VDT of each word correspondence of record, is expressed as: VDTT=＜＜1, VDT (V ₁),＜2, VDT (V ₂) ...,＜n, VDT (V _n), VDT (V wherein _i) expression i word pseudo range fault value.

Definition 3 (maximum pseudo range fault values). maximum pseudo range fault value (being designated as MAX_VDT) refers to the initial pseudo range fault value of each word, all is greater than the VDT of itself, i.e. MAX_VDT 〉=max{VDT (V ₁), VDT (V ₂) ..., VDT (V _n).

For writing brush word, rule of thumb MAX VDT is made as 150, the initial VDT value of each word is 150 among the expression VDTT.According to user's related feedback information, progressively adjust the VDT value of each word.The increment type that below is VDTT safeguards that it is one and continues and dynamic process.At first inquire about the related feedback information of (being designated as PkNNQuery), be divided into two kinds of situations and dynamically update VDTT by user's each pseudo-k neighbour.It should be noted that the renewal for VDTT, MIN_K generally is set at 40 for the minimum number of all words similar to inquiry word in the calligraphy character library that the user sets.Have only as k during greater than MIN_K, the candidate of returning just can comprise in the calligraphy character library and V _qSimilar whole words do not omit (being that recall ratio is 100%), otherwise authorized user do not carry out relevant feedback.V in addition _qArest neighbors word V _pPass through V in the 2nd step of algorithm _qThe reorientation of the hypersphere heart obtain.Flag[V _p]=TRUE represents V _pPassed through relevant feedback.

Input: a VDTT, PDM index RI, V _q

Output: VDTT after the renewal and PDM index

(1) enters circulation

(2)S←PkNNQuery(V _q，k)；

(3) as k＞MIN_K and flag[V _p]=FALSE then

(4) by user's relevant feedback, obtain apart from V _pFarthest and similar word V _r

(5) calculate V _pWith V _rDistance is also upgraded VDTT;

(6) otherwise as k＜MIN_K and flag[V _p]=TRUE then

(7) by user's relevant feedback, obtain apart from V _pFarthest and similar word V _r

(8) if VDT is (V _p)＜d (V _p, V _r) then

(9) with V _rAdd the PDM index to and upgrade VDTT;

(10) return VDTT and PDM index after the renewal;

(11) end loop;

With other based on the indexing means of distance different be, in local distance figure, each word in the higher dimensional space all is taken as reference word, is the similarity distance that radius (distance) upper limit is calculated each candidate in its radius (distance) scope with separately VDT respectively.Like this n of a higher dimensional space word just change into the one-dimensional space O (the individual distance value of n * k), wherein k＜＜n.For these distance values are carried out fast query, need set up efficient index to it.Because the similarity distance value of any two writing brush word is far longer than 1, need carries out normalization to it and handle simultaneously, make that any two writing brush word distance after handling is less than or equal to 1, like this for writing brush word V _i, its index key can be expressed as:

key (V_{i}) = i + \frac{d (V_{i}, V_{j})}{MAX_VDT} - - - (5)

Adopt the B+ tree to carry out index for the key assignments of these one dimensions. from formula (4) as can be seen the maximum query context of single word be [i, i+1].Be the generating algorithm of local distance index of the picture below, comprise the initialization (1-3 is capable) of VDTT and PDM index and PDM is set up index (4-12 is capable) two parts, wherein the conversion of function T ransValue () expression distance value.

Input: calligraphy character library Ω

Output: PDM index RI

(1) for each the word V among the calligraphy character library Ω _i

(2) VDTT initialization;

(3) create B+ tree index RI;

(4) recirculate by two, as d (V _i, V _j) less than VDT (V _i) then

(5) value of adjusting the distance d (V _i, V _j) be converted to key assignments and be inserted into the B+ tree;

(6) return PDM index RI;

(2) based on the hypersphere heart reorientation of cluster and unitized start distance:

The reorientation of the hypersphere heart is to find apart from inquiry word V _qNearest that word V _pThe present invention adopts and quickens the inquiry of arest neighbors word (1-NN) based on cluster and unitized start distance indexing means, by in advance writing brush word being carried out hierarchical clustering, it is gathered into T class, and each word after the cluster can be expressed as:

Word (V _i) ∷=＜numbering (i), the numbering of affiliated class (CID)＞(6)

Unitized pilot with its correspondence obtains its index key assignments apart from combining with the numbering of this word place class then, as the formula (7):

key (V_{i}) = CID + \frac{USD (V_{i})}{MAX_USD} - - - (7)

Wherein CID represents word V _qThe numbering of affiliated class, MAX_USD is a constant, is provided with to be wide enough so that the maximum query context of each word is [CID, CID+1].N key assignments set up based on B+ tree index at last.

For inquiry word V _q, it is ε that the needed least radius value of its arest neighbors word is found in order.The size of this value can be estimated by the statistical distribution situation to the nearest neighbor distance Δ of each writing brush word in the calligraphy character library, as shown in Figure 4, the frequency that the Δ value of each word correspondence drops on different range satisfies Gaussian distribution (red line is represented the result of Gauss curve fitting), therefore can obtain the maximum likelihood estimator of corresponding σ.According to " 3 σ principle ", stochastic variable X satisfies P (μ-3 σ＜X≤μ+3 σ)=0.9974 arbitrarily, that is to say that when the value range of X was 3 σ, the probability of getting the arest neighbors word was 99.74%, near 100% again.So make ε=3 σ.

When the user submits an inquiry word V to _qAfter, as shown in Figure 5, at first be that radius passes through (cluster number) cycle calculations judgement hypersphere Θ (V T time with ε _q, ε) position with these class hyperspheres concerns (the 2nd row). and comprise Θ (V when satisfying certain class hypersphere _q, (the 3rd row) carries out the subrange inquiry by index in the time of ε), and the candidate that this inquiry obtains is calculated and V _qDistance, get that word V of distance value minimum _pThe new hypersphere heart (the 4th row) as the candidate withdraws from circulation (the 5th row) at last; In like manner, (the 6th row) obtains earlier and V when two hyperspheres intersect _qThe word of arest neighbors (the 7th row) is made comparisons (eighth row) with candidate's arest neighbors word that the last time circulation obtains then, in order relatively whether to intersect with other class hypersphere, does not need end loop; At last, when two hyperspheres are all non-intersect (the 9th row), continue circulation (the 10th row).Below be hypersphere heart reorientation algorithm:

Input: writing brush word Ω and inquiry example writing brush word V _q

Output: V _qArest neighbors word V _p

(1) initialization;

(2) for each class hypersphere Θ (O _j, CR _j)

(3) as Θ (O _j, CR _j) comprise Θ (V _q, ε) then

(4) in j class hypersphere, return apart from V _qNearest word V _pAnd withdraw from circulation;

(5) as Θ (O _j, CR _j) and Θ (V _q, ε) intersect then

(7) in j class hypersphere, return apart from V _qNearest word V _p

(8) make comparisons with the candidate's arest neighbors word that obtained last time, return the arest neighbors word;

(9) otherwise continue circulation up to end;

(10) return arest neighbors word V _p

(3) pseudo-k search algorithm neighbour

At the writing brush word index characteristics based on PDM, the present invention proposes a kind of improvement of k-NN inquiry---pseudo-k neighbour inquiry (being designated as Pk-NN).Owing to introduced relevant feedback, made when k gets greatly, to V _qApproximate k-NN inquiry not necessarily guarantee to return k arest neighbors word.Because in the calligraphy storehouse with V _qThe quantity of similar word is limited, may be less than the k of user's setting, so be called pseudo-k-NN inquiry.Need to prove that if do not add relevant feedback the Pk-NN inquiry has just become common k-NN inquiry.

Shown in the broken circle among Fig. 6 (the approximate minimum hypersphere that surrounds), wherein dash area is represented real query context based on the hunting zone of the pseudo-k-NN inquiry of PDM; It is divided into two stages, as shown in Figure 2, at first finds inquiry word V by the reorientation of the hypersphere heart _qArest neighbors word V _p, be to carry out at last based on V _pPseudo-k-NN inquiry, its essence is to obtain k arest neighbors writing brush word by nestedly calling the range query algorithm.Concrete steps are as follows: a given V _qAnd k, at first pass through V _qHypersphere heart reorientation (the 1st row) find its arest neighbors word V _p, initialization and calculate V then _qWith V _pDistance (the 2nd row), enter circulation at last, beginning is to remove to carry out range query (4-5 is capable) with a less radius, when the candidate number that obtains during greater than k, then finds at this candidate collection S middle distance inquiry word V by circulate (eighth row) _q(the individual word of ‖ S ‖-k-1) and farthest with they deletions (6-7 is capable).Just like this, obtain k arest neighbors word.Jump out While circulation (the 9th row) at last.Otherwise, when inquiring about radius r greater than V _pVirtual inquiry radius the time, stop inquiry (the 10th row).Need to prove that in this case, the candidate number of returning can be less than k.：

Input: inquiry word V _q, k

Output: Query Result s

(1) to V _qThe reorientation of the hypersphere heart obtain V _p

(2) initialization;

(3) be not more than k and (bStop=FALSE) as candidate number ‖ S ‖, continue circulation;

(4) increase radius r;

(5) to V _pCarry out the range query that radius is r, obtain Query Result S;

(6) when returning candidate number ‖ S ‖ then greater than k

(7) in candidate, delete apart from V _q‖-k-1 word of ‖ S farthest and jump out circulation;

(8) otherwise as r＞VQR (V _p) then

(9) do not have k the word similar, withdraw from circulation to inquiry word;

(10) end loop return results S;

As shown in Figure 7, when the user submit to one " my god " word, from the calligraphy character library, retrieve the candidate similar by the PDM index to this word shape, then the user can according to relevant feedback judge in these candidate which word with " day " semanteme is identical, which is different.In this way, dynamically update local distance figure, make higher precision ratio of maintenance of this searching system.Fig. 7 is the result without the interactive calligraphic character retrieval of feedback.

Similarly, as shown in Figure 8,, go out the candidate similar to " topic " by the PDM indexed search when the user submits " topic " word to.In this way, dynamically update local distance figure, make higher precision ratio of maintenance of this searching system.The result of Fig. 8 for retrieving through the interactive calligraphic character of feedback.

Claims

1. interactive calligraphic character K approaching search method is characterized in that:

2. a kind of interactive calligraphic character K approaching search method according to claim 1, it is characterized in that, described employing is based on the hypersphere heart reorientation of hierarchical clustering and unitized start distance USD: by writing brush word is carried out hierarchical clustering, it is gathered into T class, and each word after the cluster can be expressed as:

Word (V _i): :=＜numbering (i), the numbering of affiliated class (CID)〉(1)

Then that it is corresponding USD combines the index key assignments that obtains this word with the numbering of this word place class, as the formula (2):

key (V_{i}) = CID + \frac{USD (V_{i})}{MAX_USD} . . . (2)

3. a kind of interactive calligraphic character K approaching search method according to claim 1 is characterized in that, and is described by arest neighbors word V _pFinish pseudo-k neighbour with relevant feedback and inquire about Pk-NN: introduced relevant feedback, when k gets greatly, V _qPseudo-k-NN inquiry only return less than k arest neighbors word; When k gets hour, to V _qPseudo-k-NN inquiry return k arest neighbors word.