Quick character and word identification method
The present invention relates to the method for a kind of quick character and word identification, the method for particularly a kind of quick identification Spain derivative, this method is applied to the electronic installation (such as electronic dictionary) or the software (such as word processing program or computerized dictionary) of sequencing.The present invention also can be used for carrying out mosaic and proofreaies and correct.
For the electronic dictionary or the computerized dictionary of many types, if the query word of user's input can't find in existing database, the response that the user will obtain refusing such as " look into and do not have this word ", or obtains spelling or the immediate word of phonetic.Yet, except misspelling, sometimes query word may be so-called speech of basic words-be-derivative or compound, and do not include among database.Therefore, similar above-mentioned response seems improper and utterly useless.
Yet in the European Chinese language such as Spanish and French, each word has many different derivatives usually; Spanish particularly, a verb can have 100 kinds of change types of surpassing.In Spanish, glossary can be classified as root speech, compound, derivative and combination thereof.Basically, root speech itself is basic words; Compound is made of plural speech; Derivative then is to be changed and got by the middle affixe that changes the root speech or back affixe or both.At present, root speech and their some derivatives commonly used only included in common electronic dictionary, and it is required that this clearly can't satisfy actual use.
A simple solution of the problems referred to above is that all derivatives of root speech are all taken in the electronic dictionary.But this method needs a large amount of storage spaces, and all Spain's glossarys are stored.Yet so both uneconomical, need a lot of time to import related data again.Therefore, in electronic installation or computer software such as the Spanish word processor such as Spain's electronic dictionary, if can provide a kind of needs less storage space, and words is discerned, searches and proofreaied and correct to more efficient method, will have great benefit and practical value.
Because the shortcoming of traditional words recognition device or software the invention provides a kind of character and word identification method fast.
The object of the present invention is to provide the fast searching method of a kind of Spain derivative.
Moreover, according to the present invention, provide the coding method of a kind of Spain derivative fast searching for computer or word processor.
The present invention also provides a kind of efficient method, checks the spain word graphs, and provides and query word spelling or the approaching candidate of phonetic, as the correction of the query word of misspelling.
Disclosed method mainly comprises the following step:
(1) collection and classification obtain the strictly all rules of derivative from Spain's root speech;
(2) with a kind of coding method these rules of encoding; And
(3) rule behind the sorting coding, and form a question blank.
In addition, according to the present invention, as follows to the searching procedure of each query word:
(1) earlier in root speech database, searches this query word,, promptly export storing data and stop about this word if find; Otherwise
(2) in the affixe question blank, pick up the affixe of looking into this word,, promptly export storing data and stop about this root speech if find any speech; Otherwise
(3) proofread and correct spelling to provide with the mode of the approaching word of query word spelling or phonetic.
Again, according to the present invention, the search of the back affixe formula derivative that each is possible can words the mode of reversing implement; Wherein earlier the lexicographic order of query word is put upside down, can be begun to obtain all possible back speech speech from the prefix of reversing thus, but these back affixes are in reverse order.Then from the question blank of previous preparation, with the known back affixe back affixe of gained relatively.According to maximum match principle, keep the back affixe of coupling, find out affixe behind the corresponding root speech, and back affixe in the speech words is looked in it replacement, obtain possible candidate root speech.Whether at last, continue to search these candidate root speech again exists in the master database.
Fig. 1 has the procedure chart that Spain's derivative is reduced into the rule query table of root speech for forming one among the present invention.
Fig. 2 is the process flow diagram of Spain's electronic dictionary typical case words searching procedure among the present invention.
Fig. 3 is for describing subroutine function calcspar how to discern derivative among the present invention.
Fig. 4 A is the back affixe of the searching query word subroutine flow chart of meta-rule also.
Fig. 4 B is the middle affixe of the searching query word subroutine flow chart of meta-rule also.
Fig. 4 C is a subroutine flow chart of confirming possible root speech from candidate.
Fig. 5 A, 5B, 5C and 5D be from this product-Oxford electronic dictionary-the screen of acquisition, show with " quiero " be that example is from importing, search the process of output.
For solving the derive problem of word of identification Spain, the word rule change of deriving of all Spain's words classifies according to its part of speech. Following is approximately some rule change examples of various parts of speech:
(1) noun:
--the words of consonant ending, plural number+s
--the words of consonant ending, plural number+es
--irregular variation example:
“z”->“ces”;
Rubí->Rubíes;
Bistrí->bisturíes;
Bambu->bambues;
Jersey->jerseys; Etc..
(2) adjective:
Adjective have feminine gender, the positive minute, therefore, for example each suffix with the adjective of o ending can have four kinds of forms, be respectively+' o ' ,+' a ' ,+' os ' and+' as '. And some adjectives with the consonant ending also can have+' a '+' as ', the form of+' es ' except original shape.
(3) adverbial word:
It is to become feminine gender by adjective that one class adverbial word is arranged in the Spanish, add ' mente ' again and come, so their declination is, ' o '->' amente '.
(4) verb:
This is the most complicated situation.In the middle of the Spanish, each verb can have more than 100 kind of version.Remove in the modern Spanish rare noly, 60 kinds of derivatives nearly also arranged, and these to have much be irregular version, the irregular of suffix not only arranged, irregular in the speech also arranged.
Table one is shown the collected part affixe rule change of the present invention.
Table one
The version kind | The derivative affixe | Replaceable affixe |
1, verb declination | a | o,ar,er,ir |
| aba | ar |
| abais | ar |
| abamos | ar |
| … | … |
2, irregular verb declination | he | haber |
| has | haber |
| ha | haber |
| hemos | haber |
| … | … |
3, the heteroclite plural number changes | ces | z |
| … | … |
4, adjective becomes the variation of adverbial word | amete | o |
| … | … |
5, change in the verb speech | ar | ac |
| br | b |
| c | z,qu |
| dr | d,n |
| habéis | haber |
2,400 kinds of such rule change are arranged approximately, comprising the rule that changes simultaneously in rule change (variation of middle affixe) and suffix, the speech in declination rule (variation of back affixe), the speech in the Spanish.So can form a kind of form of inquiry, comprise all also meta-rule, for the middle affixe (or back affixe) of a certain specific derivative, can find original words institute might in affixe (or after affixe) (step 2 among Fig. 1).In other words for affixe in each derivative (or back affixe), affixe (or back affixe) in the root speech of several associated will be had.Specifically, the back affixe lexicographic order in the enquiry form is through counter-rotating, and the back affixe after these counter-rotatings is according to Spanish letter series arrangement (step 4).Middle affixe also passes through same letter sequence (step 6).Can quicken follow-up searching procedure widely like this.In one of the present invention preferred embodiment, also be to these rules (step 8), and just simply these rules are not accumulated a big form of encoding through the mode of affixe coding behind the derivative behind the letter sequence with a kind of.So, formed enquiry form comprises three parts: one is that the alphabetic index table done for back affixe reduction table is (as table two, wherein enumerated the index rule of part), another is that back affixe reduction rule list is (as table three, wherein enumerated also meta-rule of back affixe partly), and the 3rd be middle affixe reduction rule list (as table four, the middle affixe of wherein having enumerated part is meta-rule also).
Table two
The Spanish letter | Lower limit | The upper limit |
…… | … | … |
I | 37 | 37 |
J | -- | -- |
K | -- | -- |
L | 38 | 38 |
M | -- | -- |
N | 39 | 73 |
O | 74 | 96 |
P | -- | -- |
Q | -- | -- |
R | 97 | 98 |
S | 99 | 193 |
T | -- | -- |
U | -- | -- |
…… | … | … |
Table three
Affixe behind the derivative of counter-rotating | Affixe behind the root speech of replacement |
…… | …… |
nard | er,ir |
o | ar,er,ir,r |
oda | ar |
odi | er,ir |
odna | ar |
odne | er,ir |
odnei | er,eír,ir |
odney | er,ir |
og | cer,er,ir,cir,r |
ogi | er |
ohce | acer |
oj | er,cir |
oserpmi | imprimir |
otircse | escribir |
otirf | freír |
otleuv | volver |
otor | romper |
otreiba | abrir |
otreum | morir |
otseup | poner |
otsiv | ver |
otsivorp | proveer |
ovu | ar |
oy | ir |
raey | er,ir |
…… | …… |
Table four
Speech in the derivative | Affixe in the root speech of replacement |
…… | …… |
i | e |
ic | ac |
ie | e,i |
is | er |
iz | ac |
j | g |
o | u |
qu | c |
ub | ab |
ue | o |
ue | u |
up | ab |
us | on |
uv | en |
ye | e |
z | c |
zc | c |
zg | c |
According to this table, for the specific words that can not directly find in the master database of Spanish dictionary or electronic dictionary, just the institute that can application table be also given in the meta-rule might in affixe (or back affixe) in affixe (or back affixe) the replacement derivative, all candidate root speech of this word of construction, and pick up and whether look into that any one can find in these prepare words in basic root speech database.
The present invention also can similarly be applied on Spain's electronic dictionary, Spanish Word or the things of the like description; Yet for the purpose of concrete more, this instructions will describe with Spain's electronic dictionary especially as an example for the explanation summary of the invention.
Fig. 2 is the process flow diagram of Spain's electronic dictionary typical case words searching procedure among the present invention.At first, the user will be required to import a query word.
(after step 10) received query word, its master database will be searched in electronic dictionary, seeks the identical words of spelling (step 12), and this database comprises root speech and derivative commonly used thereof usually.If database has this query word, then the data about this word will directly be exported in electronic dictionary, stop searching procedure then and wait for user's next instruction (step 22).
If database is this query word not, the derivative recognition subroutine promptly come into operation (step 14), as shown in Figure 3.At first, be written into affixe enquiry form (step 26), then the search of back affixe and middle affixe relatively will) order execution (step 28 and step 30), its result will compare with master database at last (step 32).
The searching procedure process flow diagram of affixe formula and back affixe formula derivative during Fig. 4 A and 4B are respectively.Preferable reality according to the present invention is dragged example, and the lexicographic order of query word will be inverted (step 36) before the affixe formula derivative after the search, because this will make things convenient for the acquisition of back affixe.First letter of back affixe after the counter-rotating will be in order to the search section (step 38) in the definition enquiry form.Then use a kind of method for searching, in this search section, find out with the back affixe that reverses before n the identical back affixe of letter, wherein n is since 1 natural number that increases in proper order (step 40 and step 44).In case do not have identical back affixe to find in form, program promptly stops (step 42).So, if find any back affixe, just obtain the also meta-rule of affixe behind one group of derivative; (step 46) otherwise, query word is not regarded as just not that the back affixe changes (step 52).
But in this and follow-up program, maximum match principle will be used to determine possible affixe.This principle statement: if query word has several possible derivative affixes found in enquiry form in certain class affixe (back affixe and middle affixe) searching procedure, the derivative affixe that only has maximum letters is retained so that meta-rule to be provided.
Therefore, after the derivative search of above-mentioned back affixe formula, if any, only have a rule change and stay, and use this rule with the possible root speech of associated after affixe replace back affixe in the query word, form one group of possible root speech (step 50).
Then, according to Fig. 4 B, take first letter and back affixe (step 54) away from query word.Once more, affixe letter in these is carried out the search (step 56 and step 60) of middle affixe reduction table.So, if any, will obtain another group rule change (step 62).Use maximum match principle once more, obtain also meta-rule (step 64), the middle affixe (step 66) in the candidate root speech that had before found to replace.Otherwise query word just is regarded as not having middle affixe to change (step 68).
At last,, form one group of new possible root speech in conjunction with the result of above-mentioned two kinds of searches (back affixe and middle affixe), for further with the comparison of master database.Shown in Fig. 4 C, this relatively chooses each candidate from new one group words, search in master database, has all picked up up to all words and has looked into ( step 70,72,74 and 76).If wherein any one can be identified, just export this root speech and related data thereof; If find several,, select (step 80) for the user just these candidate are all exported.
Otherwise query word just is sent to mosaic syndrome program (step 82), because can't form possible root speech, just can not find the also meta-rule of back affixe or middle affixe.The immediate word of spelling or phonetic will be exported to the user and select (step 20).
Whole procedure when termination output query word related data is no matter be words (step 22) after possible root speech or mosaic are proofreaied and correct.
Below will explain how the present invention carries out with ' quiero ' as an example.
Suppose that query word ' quiero ' do not include the master database in dictionary, then this words will temporarily be regarded as possible derivative candidate, and carry out following step.
During beginning, will carry out back affixe search (Fig. 4 A) to query word.According to the present invention, ' quiero ' will be inverted to ' oreiup ' (step 36) for back affixe search, so present " back affixe " may be ' o ', ' or ', ' ore ' or the like, they are counter-rotatings of original back affixe.Because first letter is ' o ', so the preferred embodiment one of according to the present invention will be selected (step 38) between ' o-' section between the 74th and the 96th of the reduction rule in the enquiry form, carry out data-gathering fast for identical " back affixe ".
At first, first letter ' o ' of counter-rotating word will be gone out by peach, with the section selected in the enquiry form relatively (step 40), therefore just find also meta-rule ' o '->' ar ', ' er ', ' ir ', ' r ' }.Then next letter ' r ' will be affixed to ' o ' and form ' or ' (step 44) for further comparison.But ' or ' there is no any also meta-rule, and the affixe searching procedure just stops (step 43) behind this derivative.
Thereby in the middle of this example, according to maximum match principle, ' o ' is unique possible " back affixe ", and this just means that also ' o ' is the possible back affixe of candidate's derivative.This program then with the also meta-rule of the back affixe ' o ' in the former query word, replaces with ' ar ', ' er ', ' ir ' and ' r '.At last, promptly form first group of possible root speech quierar, quierer, quierir, quierr} does further comparison (step 50) with master database.
Next procedure is middle affixe search (Fig. 4 B).At first, remove first the back affixe ' o ' alphabetical and maximum match in the query word, obtain middle affixe ' uier '.This searches and takes ' u ' to mate (step 56) earlier, and still ' u ' is without any going back meta-rule.So ' i ' just brought comparison (step 58) and found also meta-rule { ' i '->' e ' } (step 56).Yet this searching procedure will continue to carry out and can't find other also (step 60) in the meta-rule.Finish after all comparisons, find to find two also meta-rule { ' i ' ...>' e ' } and { ' ie ' ...>' i ', ' e ' } (step 62).According to maximum match principle, ' ie ' is only the middle affixe of maximum match.So, have only meta-rule { ' ie ' ...>' i ', ' e ' } be retained, the usefulness (step 64) of follow-up replacement is provided.At last, just use ' ie ' in the words of ' i ' and ' e ' replacement first dimension to form second group of possible root speech (step 66).
Now, complete possible root set of words be previous first and second groups possible root speech unite collection, just { quierar, quierer, quierir, quierr, querar, querer, querir, querr, quirar, quirer, quirir, quirr}.
Words in this new set goes out (step 70 among Fig. 4 C) with peach one by one, to carry out search (step 72) of master database, is all picked up up to all prepare words and to look into (step 74).In the present example, only finding ' querer ' is ' quiero ' reasonably root speech (step 78), and institute is so that with its output network user (step 80).
Fig. 5 A is as user during with this product-Oxford Spain electronic dictionary-input inquiry words ' quiero ', the image that is occurred on the screen.At it simultaneously, the immediate words of spelling listed in electronic dictionary.Fig. 5 B is the picture of end of input.Fig. 5 C has shown transient copy when electronic dictionary is being searched.Fig. 5 D has shown the result who searches output.
By help of the present invention, can save storeies many in the electronic dictionary.For example, 18361 words only included in Oxford Spain electronic dictionary, accounts for 161KB, can discern 500000 words.Otherwise, include the ROM that all 500000 words will need 4MB, the benefit of being saved reaches near 25 doubly.
Though above only provide complete explanation to a specific embodiment,, scope of the present invention should not be limited.Because spirit of the present invention is the foundation of affixe reduction rule query form, and the method for searching of arranging in pairs or groups mutually, thus can carry out various modifications to this enquiry form, and other method for searching also can be brought utilization.Scope of the present invention should be defined by appended claims.