CN101030216A

CN101030216A - Method for matching text string based on parameter characteristics

Info

Publication number: CN101030216A
Application number: CNA2007100487900A
Authority: CN
Inventors: 丁光耀
Original assignee: Individual
Current assignee: Individual
Priority date: 2007-04-02
Filing date: 2007-04-02
Publication date: 2007-09-05
Also published as: WO2008119297A1

Abstract

A method for matching character string based on characteristic parameter includes calculating the matching relation of text to character in indexing word, calculating characteristic parameter according to calculated matching relation, calculating characteristic matching degree according to calculated parameter and text as well as length of indexing word and returning calculated characteristic matching degree back.

Description

Character string matching method based on characterisitic parameter

Technical field

The present invention relates to a kind of character string matching method, specifically, relate to a kind of character string matching method based on characterisitic parameter.

Background technology

The dictionary retrieval is the most basic application of string matching technology.The retrieval technique of existing dictionary retrieval product is divided into two classes: based on the retrieval technique of accurate coupling, and based on the retrieval technique of non-accurate coupling.Accurately the coupling retrieval technique can not be fault-tolerant; But not the retrieval technique of accurately mating allows a spot of mistake of appearance in user's input, and therefore higher using value is arranged.

Over 40 years, both at home and abroad distance calculation based on wrong factor is adopted in the method research of non-accurate string matching always, the most frequently used is Levenshtein distance and ED (Edit Distance) distance, and the wrong factor that influences distance results mainly comprises inserts mistake, deletion error, replacement mistake, exchange mistake etc.This based on wrong factor distance calculating method, have some intrinsic problems, caused the dictionary result for retrieval too general and fault-tolerant ability is limited, problem be mainly reflected in following some:

1), thinking is studied in existing non-accurate string matching based on wrong factor distance calculation, is a kind of research thinking based on the problem phenomenon, as insertion, deletion, replacement, exchange, backward error etc.These wrong phenomenons be not complete independent, can polarize, present diversified feature, be typical problem phenomenon.For example, can represent that is qualitatively replaced a mistake with an insertion mistake and a deletion error in essence, represent qualitatively that with an insertion mistake and a deletion error exchanges a mistake.Therefore, some wrong factor is not independent notion, and string matching does not form the taxonomic hierarchies of science so far, and this is one of major reason wherein.

2), existingly the polymorphism of character string matching problem is described based on wrong factor, directly have influence on the ordering of string matching result and result for retrieval.Table 1 reflects the polymorphism that the wrong phenomenon of particular problem is described based on wrong factor.

Text

Term

Describe based on wrong factor

ABCDEFGH	ABCDEFGH	Accurately mate (being the substring coupling): do not allow mistakes such as deletion, insertion, exchange
ABCDEFGH	ABCDEFGH		AB CDEFGH	CDEF	Accurately mate (being the substring coupling): allow the deletion of front, back
AB CDE FGH	CDF	Non-accurate coupling: 1, have a deletion (E); Or 2, have the deletion of some fronts, back, and deletion (E) in the middle of having; Or 3, have an insertion (F); Or 4, exist a replacement (E, F); Deng	AB CDEFGH	CDEF
AB CDE FGH	CDF		AB CDEFGH	CEDF	Non-accurate coupling: 1, exchange of existence (DE, ED); Or 2, exist two replacements (D, E), (E, D); Or 3, have an insertion (D) and a deletion (D); Deng
AB CDEFGH	CEFD	Non-accurate coupling: 1, have a deletion (D) and an insertion (D); Or 2, have two insertions (C), (D); Or 3, have two insertions (E), (F); Deng	AB CDEFGH	CEDF
AB CDEFGH	CEFD		AB CD EFGH	ACEFXD	Non-accurate coupling: 1, have two deletions (B), (D) with two insertions (X), (D); Or 2, have two deletions (B), (D) with two replacements (G, X), (H, D); Deng

Table 1

In the table 1, ignore the quantization influence of distance calculation,, have multiple qualitative representation method, reflect the polymorphism of describing same problem, be not easy to classification and handle based on wrong factor based on the matching problem of wrong factor to particular text and term.

3), based on the non-accurate character string matching method of wrong factor distance calculation, because by distance calculation various wrong factors being carried out unified quantization handles, as ED (Edit Distance) distance calculation, the character obfuscation of different wrong factors in the feasible coupling, the matching result that distance reflects is too general.For example distance is 2, can represent to contain two and insert mistake, also can represent to contain 2 deletion errors, perhaps replaces mistake, the perhaps mistake of 2 mixing for two.And the nonindependence of wrong factor notion, and the uncertainty of the wrong properties in the coupling, make again and can not the further refinement of matching state be represented according to wrong factor.Therefore, in the dictionary retrieval, when calculating matching degree, lack the more careful parameter foundation of accurate match situation, be unfavorable for detecting result's reasonable ordering.

4), existing dictionary retrieval, seldom from psychology, cognitive science, linguistics, ethological angle, the influence to the dictionary retrieval is discussed.In fact, each character is according to factors such as the position in word, pronunciation, visions, have cognitive difference in various degree, some character remembers that easily some character then is difficult for remembeing, perhaps As time goes on and gradually desalination, this also is the main cause that causes non-accurate input.Therefore in the retrieval, should consider of the influence of the cognitive difference of each character in word to the dictionary result for retrieval.

Above problem presents unsound, the imperfection of string matching foundational system, directly has influence on the reasonable ordering of dictionary result for retrieval, and fault-tolerant ability is limited, the diversity of matching process, but be difficult to launch integrated application, need to be resolved hurrily.

Summary of the invention

The objective of the invention is to overcome above-mentioned deficiency of the prior art, the character string matching method based on characterisitic parameter that a kind of result for retrieval ordering is more reasonable, have very strong fault-tolerant ability is provided.

For achieving the above object, a kind of character string matching method of the present invention based on characterisitic parameter, a given text that is stored in the memory device, and the term of input equipment input, it is characterized in that, messaging device carries out string matching based on characterisitic parameter to given text and term, and step is:

The A step), calculate the matching relationship of character in text and the term;

B step), according to the matching relationship estimated performance parameter of character in text and the term, characterisitic parameter comprises that each character of reflection term appears at the dispersion number of the discrete number of characters in the text, each character of reflection term appears at the crossing number of the number of characters of the intersection in the text, and each character of reflection term does not appear at the non-perfect number of the number of characters in the text;

C step), according to the length computation characteristic matching degree of characterisitic parameter, text and term;

The D step), output characteristics matching degree.

Compared with prior art, the invention has the beneficial effects as follows:

One, by table 2, existing based on wrong factor and the present invention is based on three specific characters the description difference table example relatively, clearly illustrated.

Text

Term

Existing retouching based on wrong factor

The present invention is based on three specific characters

		State	Description
		State	Description	BCDEFGH	ABCDEFGH	Accurately mate (being the substring coupling): do not allow mistakes such as deletion, insertion, exchange	Accurate coupling (being the substring coupling): do not allow to disperse, intersection, non-fully
ABCDEFGH	CDEF	Accurately mate (being the substring coupling): only allow the deletion of front, back	Accurate coupling (being the substring coupling): do not allow to disperse, intersection, non-fully	BCDEFGH	ABCDEFGH
ABCDEFGH	CDEF			ABCDEFGH	CDF	Non-accurate coupling: 1, have a deletion (E); Or 2, have the deletion of some fronts, back and have one in the middle of deletion; Or 3, have an insertion (F); Or 4, exist a replacement (E, F); Deng	Discrete coupling: only have one discrete (E)
ABCDEFGH	CEDF	Non-accurate coupling: 1, existence exchange (DE, ED); Or 2, exist two replacements (D, E), (E, D); 3, there are insertion (D) and deletion (D); Deng	Cross-matched: only have an intersection (D)	ABCDEFGH	CDF		Discrete coupling: only have one discrete (E)
ABCDEFGH	CEDF		Cross-matched: only have an intersection (D)	ABCDEFGH	CEFD	Non-accurate coupling: 1, have deletion (D) and insertion (D); Or 2, have two insertions (C), (D); Or	Cross-matched: only have an intersection (D)

		3, there are two insertions (E), (F) etc.
		3, there are two insertions (E), (F) etc.		ABCDEFGH	ACEFXD	Non-accurate coupling: 1, have two deletions (B), (D) with two insertions (X), (D); Or 2, in two deletions (B), (D) and two replace (G, X), (H, D); Deng	The non-coupling fully of discrete intersection: have discrete (B), an intersection (D), one non-(X) fully

Table 2

As seen, have now, have multiple describing method, reflect the polymorphism of describing same problem, be not easy to careful classification and handle based on wrong factor based on the string matching problem of wrong factor to particular text and term.And adopt three specific characters of the present invention that only there is a kind of describing method in same problem, reflected that exactly text is corresponding with the character of term.

About two substring matching problems, existing have two kinds of different describing methods based on wrong factor in the table 2; Only there is a kind of describing method in the present invention to two substring matching problems, more meets the definition of substring.

Contrast by last table, can also be well understood to the difference of discrete feature and deletion error, the difference that cross characteristic is wrong with exchange, the incomplete difference wrong with insertion.

The three specific character parameters that the present invention adopts, i.e. discreteness, intercrossing, incomplete have diverse notion and character between mutually, are characteristics independently each other.The mistake factor is the external manifestation of three specific characters, based on the string matching research thinking of three specific character parameters, has more scientifically disclosed the inherent law of string matching problem.

Two, the nonindependence of existing wrong factor notion, and the uncertainty of the wrong properties in the coupling make and can not the further refinement of matching state be represented according to wrong factor.Therefore, in the dictionary retrieval, when calculating matching degree, lack the more careful parameter foundation of accurate match situation, be unfavorable for detecting result's reasonable ordering.

And the three specific character parameters that the present invention adopts: discreteness, intercrossing, incomplete, have diverse notion and character between mutually, be characteristic independently each other.By three specific character calculation of parameter characteristic matching degrees, can consider of the influence of three specific characters respectively to the characteristic matching degree, make the characteristic matching degree that calculates reflect the similarity degree of text and term more accurately.Therefore, the characteristic matching degree that obtains according to the present invention, ordering output to all matching results of dictionary word is more reasonable, and fault-tolerant ability improves greatly, has overcome too general based on wrong factor distance calculation result, as to be unfavorable for the COMPREHENSIVE CALCULATING of matching degree defective.

Embodiment

Below in conjunction with embodiment, the character string matching method that the present invention is based on characterisitic parameter is described in further detail.

The electronics English dictionary is the electronic dictionary storehouse that is made of English word.Electronics English dictionary retrieval is meant: according to the English word of input, i.e. term P is to each word in the electronics English dictionary storehouse, be text T, carry out the string matching computing, and according to matching result, to the word ordering output that satisfies condition, user friendly selection.

The core technology of electronics English dictionary is exactly string matching, and its matching result directly has influence on all final sorting positions that detect word, also is the important indicator of weighing electronics English dictionary retrieval effectiveness.

In the present embodiment, a given text that is stored in the memory device is T=" t ₁t ₂T _n", and the term of input equipment input is P=" p ₁p ₂P _m", t wherein _i, p _j(1≤i≤n, 1≤j≤m) are character, and m, n are all greater than zero, and the concrete steps of the matching relationship of character are in the calculating text in described A step and the term:

The a step), stable sort term P

To term P=" p ₁p ₂P _m" in all characters; carry out stable ascending sort; and be stored in the internal memory among the array PT, stored each character original position in term among the array PT simultaneously, be called the position subnumber group PTp that stores among the character subnumber group PTc that stores among the array PT and the array PT;

The b step), stable sort text T

To text T=" t ₁t ₂T _n" in all characters; carry out stable ascending sort; and be stored in the internal memory among the array WT, stored each character original position in text among the array WT simultaneously, be called the position subnumber group WTp that stores among the character subnumber group WTc that stores among the array WT and the array WT;

The c step), parameter initialization

Array POS is used for the store character correspondence position in the internal memory, all is initialized as-1, non-perfect number=0, the position W=1 of array WT, the position P=1 of array PT, maximum position=0, minimum position=n;

The d step), whether circulation finishes

If array WT relatively finishes or array PT relatively finishes, then change the f step;

The e step), relatively

According to the comparable situation of the character of P storage in position among the character of position W storage among the WTc and the PTc, carry out following situation processing respectively:

If the character of position P storage among the character＜PTc of position W storage among the WTc, then position W increases by 1, changes the d step;

If the character of position P storage among the character＞PTp of position W storage among the WTp, then position P increases by 1, and non-perfect number increases by 1, changes the d step;

If the character of position P storage among the character=PTc of position W storage among the WTc, then the value storage that position W among the WTp is stored is in array POS, and its memory location is the numerical value of P storage in position among the PTp; If the numerical value＞maximum position of position W storage among the WTp, then the numerical value with W storage in position among the WTp deposits in the maximum position; If the numerical value＜minimum position of position W storage among the WTp, then the numerical value with W storage in position among the WTp deposits in the minimum position; Position W increases by 1, and position P increases by 1, changes the d step;

The f step), finish

Obtain representing array POS, maximum position, minimum position, position P and the non-perfect number of the matching relationship of character in text and the term.

This ordering text and term, the method for calculating character matching relationship can improve computing velocity.

Time complexity is: and O (k * log2k), k=Max (m, n);

In another embodiment, on the basis of last embodiment, the step according to the matching relationship estimated performance parameter of character in text and the term in described B step is:

The a step), non-perfect number=(non-perfect number+m-position P+1);

The b step), dispersion number=(the non-perfect number of maximum position-minimum position+1-m+);

C step), crossing number=carry out crossing number according to array POS result to calculate.

The step that aforementioned crossing number calculates can for:

(1), asks the maximum length of ascending sequence at interval;

(2), the non-perfect number-maximum of the crossing number=m-length of ascending sequence at interval.

Aforementioned ask maximum at interval the length of ascending sequence step can for:

(1), initialization

To deposit in the interim array in the internal memory greater than whole numerical value of zero among the array POS successively, interim array is last adds an end mark; If the element number in the interim array equals 0, directly return the maximum length of the ascending sequence result that equals 0 at interval; If the element number in the interim array equals 1, directly return the maximum length of the ascending sequence result that equals 1 at interval; Otherwise, deposit maximum ascending sequence at interval with array LPOS in the internal memory, and the numerical value of first position of array LPOS is initialized as first numerical value in the interim array; LP is used to indicate the position of current number group LPOS processing, and is initialized as 1; Get in the interim array second numerical value in comparing data;

(2), judge whether to finish

If comparing data is an end mark, change (4) step;

(3), handle according to relatively carrying out two kinds of situations

If comparing data is greater than the data of LP position among the array LPOS, then LP increases by 1, stores comparing data among array LPOS LP position, gets that next numerical value changes (2) step in the interim array in comparing data;

If comparing data is less than the data of LP position among the array LPOS, then first position is carried out binary search backward from array LPOS, searches for first data greater than comparing data, and rewrites this data with comparing data; Get in the interim array next numerical value in comparing data, change (2) step;

(4), the maximum length=LP of ascending sequence at interval.

Array POS is used for depositing the term appearance position of characters matched in text in matching process, data characteristic is among the array POS: beyond the divider value-1, other data are the integer greater than 0, and unequal mutually.Because numerical value-1 represented not characters matched, so numerical value-1 is not counted in maximum at interval in the ascending sequence.

The maximum of the array POS length of ascending sequence at interval is meant: by the size of data among the array POS, seek out maximum ascending sequence at interval at array POS, the data number of this sequence is the maximum length of ascending sequence at interval.

The strict difinition of maximum ascending sequence at interval and length is as follows:

Definition: establish arbitrary sequence a ₁a ₂A _n(a _i≠ a _j), each element can compare, if there is maximum subsequence a _K1a _K2A _KmSatisfy

1,1≤k1＜k2＜...＜km≤n and

2、a _k1＜a _k2＜……＜a _km

Then claim subsequence a _K1a _K2A _KmBe sequence a ₁a ₂A _nMaximum ascending sequence at interval, unit several count m be its length.

For example 7,8,9,1,2,6,3,4,12

Maximum is ascending sequence at interval: 1,2,3,4,12;

Maximum is ascending sequence length at interval: 5

By the maximum length of ascending sequence at interval, the crossing number in the time of can obtaining text and term and mate.The effect of crossing number is to cooperate dispersion number, non-perfect number estimated performance matching degree, conveniently detects the ordering of text, satisfies user's retrieval requirement.

The time complexity of this algorithm is: O (mlog ₂(m)).

In another embodiment, on the basis of aforementioned A step, B step embodiment, the step according to the length computation characteristic matching degree of characterisitic parameter, text and term in described C step is:

The a step), calculate the relevant character number of the actual match character of term and text

Relevant character number=2 * (the non-perfect number of m-);

The b step), the estimated performance parameter is to the factor of influence 1 of characteristic matching degree

Factor of influence 1=k ₁* crossing number;

The c step), the estimated performance parameter is to the factor of influence 2 of characteristic matching degree

Factor of influence 2=q ₁* non-perfect number+q ₂* crossing number+q ₃* dispersion number;

The d step), estimated performance matching degree

Characteristic matching degree=(relevant character number-factor of influence 1) ÷ (m+n+ factor of influence 2);

Wherein, k ₁, q ₁, q ₂, q ₃Be the weight coefficient of each characterisitic parameter in the characteristic matching degree, k ₁For more than or equal to zero and smaller or equal to 2 real number, q ₁, q ₂, q ₃For more than or equal to zero real number, weight coefficient k ₁, q ₁, q ₂, q ₃, can select different numerical value according to different product, different application occasion, thereby influence the characteristic matching degree of the text that retrieves, and influence the ordering of the text that retrieves.In a kind of concrete application, weight coefficient k ₁, q ₁, q ₂, q ₃Value be k ₁=2/3, q ₁=1, q ₂=2/3, q ₃=1/3.

The introducing of factor of influence, purpose are according to different product, and the different application environment is taken all factors into consideration the weighing factor of different qualities parameter to the characteristic matching degree, thereby make the ordering of result for retrieval more meet the customer requirements that particular surroundings is used.

In the present embodiment, this characteristic matching degree is for satisfying more than or equal to zero and smaller or equal to 1 real number.

Example one

Be result for retrieval and characteristics below according to the electronics English dictionary library searching example of the method specific design of above-mentioned A, B, C step embodiment.

6,000 English words commonly used have been selected in the electronics English dictionary storehouse of this example; Weight coefficient k ₁, q ₁, q ₂, q ₃Be chosen as: k ₁=2/3, q ₁=1, q ₂=2/3, q ₃=1/3; Result for retrieval is only exported the first five word by the characteristic matching degree descending sort that calculates.

1, discrete retrieval

Can omit the character in the word during input English word arbitrarily.

For example: target word is: " wonderful "

Input character is: " wdfl "

Result for retrieval is: 1wonderful 2handful 3unfold 4wind 5windy

Characteristic matching degree: 0.546 0.487 0.444 0.375 0.343

2, cross-searching

Input can intersect the character in the word during English word arbitrarily.

For example: target word is: " what "

Input character is: " whta "

Result for retrieval is: 1what 2wheat 3watch 4hat 5white

Characteristic matching degree: 0.846 0.733 0.625 0.615 0.581

3, allow non-complete character

During the input English word, allow to occur error character.

For example: target word is: " error "

Input character is: " irror "

Result for retrieval is: 1mirror 2error 3terror 4terrorist 5territory

Characteristic matching degree: 0.909 0.727 0.667 0.636 0.622

4, comprehensive example

Input is during English word, can disperse, intersection, non-ly mix fully.

For example: target word is: " marvelous "

Input character is: " mvrilus "

Result for retrieval is: 1marvelous 2various 3survival 4minus 5visual

Characteristic matching degree: 0.607 0.600 0.536 0.522 0.520

When wherein importing, omitted a, e, o, an error character i arranged, have an intersection (v, r).

5, particular example

For example: target word is: " marvelous "

Input character is: " mrxxxxxxlus "

Result for retrieval is: 1marvelous 2muscular 3marxist 4marxism 5luxurious

Characteristic matching degree: 0.366 0.317 0.312 0.312 0.302

In another embodiment, a kind of character string matching method of the present invention, described text T=" t based on characterisitic parameter ₁t ₂T _n" in each character, corresponding stored a cognitive weight w, formed the cognitive weights series W=" w of text ₁w ₂W _n", and satisfy w ₁+ w ₂+ ... + w _n=1, cognitive weight w _i(1≤i≤n) represented character at text " t ₁t ₂T _n" middle by cognitive probability;

Aforementioned on the basis of A step, B step embodiment, the step according to the length computation characteristic matching degree of characterisitic parameter, text and term in described described C step is:

Relevant character number=2 * (the non-perfect number of m-);

Factor of influence 1=k ₁* crossing number;

The d step), calculate the cognitive weights sum of having mated character in the text

According to the position of matched text character among the array POS, obtain the cognitive weights sum that all have mated character;

The e step), characteristic matching degree=[(relevant character number-factor of influence 1) ÷ (m+n+ factor of influence 2)] * cognitive weights sum.

Wherein, k ₁, q ₁, q ₂, q ₃Be the weight coefficient of each characterisitic parameter in the characteristic matching degree, k ₁For more than or equal to zero and smaller or equal to 2 real number, q ₁, q ₂, q ₃For more than or equal to zero real number, weight coefficient k ₁, q ₁, q ₂, q ₃, can select different numerical value according to different product, different application occasion, thereby influence the characteristic matching degree of the text that retrieves, and influence the ordering of the text that retrieves.

The cognitive weights of above-mentioned increase are to the method that influences of characteristic matching degree, it is a kind of improvement that the characteristic matching degree is calculated, meet the actual cognitive difference of people, combine multi-disciplinary cognitive thoughts such as psychology, behaviouristics, linguistics, statistics, especially be adapted to the dictionary retrieval special symbol.Strengthen the weight of the character of easy cognition, desalinated the weight of the character of makeing mistakes easily, the characteristic matching degree that calculates thus, its ordering that detects the result more meets user's requirement.

Determine the principal element of cognitive weights to have:

1) character first position of word whether;

2) whether character is the initial character of each syllable of pronunciation;

3) whether character standard whether in the pronunciation of syllable perhaps acts on obviously;

4) whether character vision in word is obvious;

5) whether character is character in the word etc.

Example two

According to the aforementioned embodiment based on the character string matching method of characterisitic parameter that cognitive weights are arranged, this example increases to the electronics English dictionary retrieval of cognitive weights.The electronics English dictionary is the electronic dictionary storehouse that is made of English word and cognitive weights.

The method of method of this example two and example one is basic identical, each character that different only is among the text T, corresponding stored cognitive weights, increased of the influence of cognitive weights to the estimated performance matching degree.

Be a kind of method of calculating cognitive weights below:

1, character first position of word whether, score value 0.4;

2, whether character is the initial character of each syllable of pronunciation, score value 0.3;

3, character standard or whether act on obviously score value 0.1 whether in the pronunciation of syllable;

4, whether character vision in word is obvious, score value 0.2;

5, be character in the word, score value 1.

For example: consider English word " what ".

Character w satisfies 1,2,3,4,5, character w score value=2;

Character h satisfies 4,5, character h score value=1.2;

Character a satisfies 3,5, character a score value=1.1;

Character t satisfies 2,3,4,5, character w score value=1.6.

Total score value of English word " what " is 5.9, and the cognitive weights of each character are:

The total score value of cognitive weights=w score value/" what " of w=2/5.9;

Total score value=the 1.2/5.9 of cognitive weights=h score value/" what " of h;

Total score value=the 1.1/5.9 of cognitive weights=a score value/" what " of a;

Total score value=the 1.6/5.9 of cognitive weights=t score value/" what " of t;

Obtain the cognitive weights sequence of English word " what " at last: 2/5.9,1.2/5.9,1.1/5.9,1.6/5.9.

By above-mentioned embodiment, we as can be seen, compare based on the distance calculation of wrong factor with existing, the notion of each characterisitic parameter of estimated performance matching degree of the present invention is independent, the characterisitic parameter that calculates has reflected text and the term difference on each characterisitic parameter more meticulously.Therefore the characteristic matching degree that calculates according to three characterisitic parameters can more reasonably reflect matching state, meets user's actual demand in the ordering of dictionary result for retrieval more.

Simultaneously, we it can also be seen that, the character string matching method based on characterisitic parameter of the present invention has extremely strong fault-tolerant retrieval capability, are adapted to the dictionary retrieval.

Although above the illustrative embodiment of the present invention is described; but should be understood that; the invention is not restricted to the scope of embodiment; to those skilled in the art; as long as various variations appended claim limit and the spirit and scope of the present invention determined in; these variations are conspicuous, and all utilize innovation and creation that the present invention conceives all at the row of protection.

Claims

1, a kind of character string matching method based on characterisitic parameter, a given text that is stored in the memory device, and the term of input equipment input is characterized in that, messaging device carries out string matching based on characterisitic parameter to given text and term, and step is:

The D step), output characteristics matching degree.

2, a kind of character string matching method based on characterisitic parameter according to claim 1, a given text that is stored in the memory device is T=" t ₁t ₂T _n", and the term of input equipment input is P=" p ₁p ₂P _m", t wherein _i, p _j(1≤i≤n, 1≤j≤m) are character, and m, n greater than zero, is characterized in that all the concrete steps of the matching relationship of character are in the calculating text in described A step and the term:

The a step), stable sort term P

The b step), stable sort text T

The c step), parameter initialization

The d step), whether circulation finishes

The e step), relatively

The f step), finish

3, a kind of character string matching method based on characterisitic parameter according to claim 2 is characterized in that, the step according to the matching relationship estimated performance parameter of character in text and the term in described B step is:

The a step), non-perfect number=(non-perfect number+m-position P+1);

4, a kind of character string matching method based on characterisitic parameter according to claim 3 is characterized in that, the step according to the length computation characteristic matching degree of characterisitic parameter, text and term in described C step is:

Relevant character number=2 * (the non-perfect number of m-);

Factor of influence 1=k ₁* crossing number;

The d step), estimated performance matching degree

5, a kind of character string matching method based on characterisitic parameter according to claim 4 is characterized in that, described weight coefficient k ₁, q ₁, q ₂, q ₃Value be k ₁=2/3, q ₁=1, q ₂=2/3, q ₃=1/3.

6, a kind of character string matching method based on characterisitic parameter according to claim 3 is characterized in that, the step that crossing number calculates is:

(1), asks the maximum length of ascending sequence at interval;

7, a kind of character string matching method based on characterisitic parameter according to claim 6 is characterized in that, ask maximum at interval the step of the length of ascending sequence be:

(1), initialization

(2), judge whether to finish

If comparing data is an end mark, change (4) step;

(3), handle according to relatively carrying out two kinds of situations

(4), the maximum length=LP of ascending sequence at interval.

8, a kind of character string matching method based on characterisitic parameter according to claim 3 is characterized in that, described text T=" t ₁t ₂T _n" in each character, corresponding stored a cognitive weight w, formed the cognitive weights series W=" w of text ₁w ₂W _n", and satisfy w ₁+ w ₂+ ... + w _n=1, cognitive weight w _i(1≤i≤n) represented character at text " t ₁t ₂T _n" middle by cognitive probability;

The step according to the length computation characteristic matching degree of characterisitic parameter, text and term in described C step is:

Relevant character number=2 * (the non-perfect number of m-);

Factor of influence 1=k ₁* crossing number;