CN100440207C - Chinese dictionary search engine and method for quick positioning words in Chinese dictionary - Google Patents

Chinese dictionary search engine and method for quick positioning words in Chinese dictionary Download PDF

Info

Publication number
CN100440207C
CN100440207C CNB200410104045XA CN200410104045A CN100440207C CN 100440207 C CN100440207 C CN 100440207C CN B200410104045X A CNB200410104045X A CN B200410104045XA CN 200410104045 A CN200410104045 A CN 200410104045A CN 100440207 C CN100440207 C CN 100440207C
Authority
CN
China
Prior art keywords
chinese
dictionary
chinese words
gbk
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB200410104045XA
Other languages
Chinese (zh)
Other versions
CN1632798A (en
Inventor
谭帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vimicro Corp
Original Assignee
Vimicro Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vimicro Corp filed Critical Vimicro Corp
Priority to CNB200410104045XA priority Critical patent/CN100440207C/en
Publication of CN1632798A publication Critical patent/CN1632798A/en
Application granted granted Critical
Publication of CN100440207C publication Critical patent/CN100440207C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The present invention discloses a method for rapidly finding a Chinese character in a dictionary, which is characterized in that an index table is formed by an offset in a dictionary of the starting position in a Chinese dictionary corresponding to Chinese characters in the Chinese dictionary; the offset in the dictionary corresponding to the Chinese characters is acquired from the index table according to the GB2312/GBK code of the input Chinese characters; the Chinese characters are directly positioned by the offset in the dictionary and the starting position of the Chinese dictionary. The present invention simultaneously discloses a Chinese dictionary searching index and has the technical scheme that the Chinese characters are input into the Chinese dictionary by positioning in an ordering mode of GB2312/GBK code of the Chinese characters; the present invention has the advantages of high speed and less resource occupation in the process of positioning.

Description

The Chinese dictionary search engine reaches the method for quick positioning words in Chinese dictionary
Technical field
The present invention relates to the method that a kind of Chinese dictionary search engine reaches quick positioning words in Chinese dictionary.
Background technology
To the search technique of literal, common have traversal and technology such as binary search.In the search technique of Chinese words, general with methods such as traversals, promptly entry word compares and searches one by one.
This searching method randomness is strong, adopts simple algorithm then to need the suitable time can obtain the result, and adopts complicated method also quite big to efficient and the memory headroom demand of handling.Therefore, use for the search of medium scale Chinese words data volume, as Chinese dictionary, just this method is inapplicable.In a word, when adopting existing searching method to locate literal in Chinese dictionary, speed is all slow, thereby causes inefficiency.
Summary of the invention
The invention provides the method that a kind of Chinese dictionary search engine reaches quick positioning words in Chinese dictionary, have inefficient problem when prior art is located word in Chinese dictionary to solve.
For addressing the above problem, the invention provides following technical scheme:
A kind of in Chinese dictionary the method for quick positioning words, the Chinese words in the described Chinese dictionary adopts the GB2312/GBK coding; This method comprises the steps:
A, the GB2312/GBK coding of setting up Chinese words in the Chinese dictionary and this Chinese words in described Chinese dictionary with respect to the corresponding relation between the dictionary bias internal amount of Chinese dictionary reference position;
B, according to the represented GB2312/GBK table numbering of first byte of Chinese words of input and the reference position of described concordance list, determine the reference position in the described concordance list of being coded in of first Chinese words in the GB2312/GBK table at this Chinese words place;
C, according to the represented row of this Chinese words in the GB2312/GBK table of second byte of the Chinese words of described input number and row number, calculate the table bias internal amount of this Chinese words with respect to first word in the GB2312/GBK table;
D, determine the position that is coded in concordance list of this Chinese words, and read dictionary bias internal amount according to described reference position and described table bias internal amount;
E, judge whether the GB2312/GBK coding of the Chinese words of input is empty, if the Chinese words that then finishes the location and point out this input is in Chinese dictionary, otherwise execution in step F;
F, according to the direct Chinese words of the described input in location in this Chinese dictionary of the initial memory location of Chinese dictionary and described dictionary bias internal amount.
A kind of in Chinese dictionary the method for quick positioning words, the Chinese words in the described Chinese dictionary adopts the GB2312/GBK coding; This method comprises the steps:
A, with the dictionary bias internal amount of Chinese words in the Chinese dictionary with respect to the reference position of this Chinese dictionary, press the GB2312/GBK coded sequence of GB2312/GBK coding schedule and corresponding Chinese words and preserve to form concordance list;
B, according to the represented GB2312/GBK table numbering of first byte of Chinese words of input and the reference position of described concordance list, determine the position of the dictionary bias internal amount of first Chinese words correspondence in the GB2312/GBK table at this Chinese words place at described concordance list;
C, according to the represented row of this Chinese words in the GB2312/GBK table of second byte of the Chinese words of described input number and row number, calculate this Chinese words table bias internal amount with respect to first Chinese words in the GB2312/GBK table;
D, the position of dictionary bias internal amount in described concordance list of determining this Chinese words correspondence according to the side-play amount and the described table bias internal amount of described first Chinese words correspondence, and read dictionary bias internal amount;
E, judge that whether the pairing dictionary bias internal of this Chinese words amount is empty, if the Chinese words that then finishes the location and point out this input is not in Chinese dictionary, otherwise read dictionary bias internal amount from this dictionary bias internal amount position described concordance list, according to the direct Chinese words of the described input in location in this Chinese dictionary of the initial memory location of Chinese dictionary and described dictionary bias internal amount.
A kind of Chinese dictionary search engine is used for locating the Chinese words of importing fast from described Chinese dictionary, and the Chinese words in the described Chinese dictionary adopts the GB2312/GBK coding; Comprise:
First module, be used for setting up and preserve the GB2312/GBK coding of Chinese words in the Chinese dictionary and this Chinese words at described Chinese dictionary with respect to the corresponding relation between the dictionary bias internal amount of Chinese dictionary reference position;
Second module is used to receive the Chinese words of input, and utilizes the GB2312/GBK coding of this Chinese words to obtain corresponding dictionary bias internal amount from described first module;
Three module, whether the GB2312/GBK coding that is used to judge the Chinese words of input is empty, if the Chinese words that then finishes the location and point out this input is not in Chinese dictionary, otherwise according to the initial memory location and the described dictionary bias internal amount of Chinese dictionary, the directly Chinese words of the described input in location and output in this Chinese dictionary;
Wherein, described second module comprises:
First module, be used for according to the represented GB2312/GBK table numbering of first byte of Chinese words of input and the reference position of concordance list, first Chinese words was coded in reference position in the described concordance list during the GB2312/GBK that calculates this Chinese words place showed; Wherein, the described corresponding relation of described index table stores, and the GB2312/GBK coding is also pressed the coding ascending order as index arrange;
Unit second is used for according to represented this Chinese words of second byte of the Chinese words of described input calculating the table bias internal amount of this Chinese words with respect to first word in the GB2312/GBK table at the row of GB2312/GBK table number and row number;
Unit the 3rd is used for determining according to the result of first module and Unit second position that is coded in concordance list of this Chinese words, and reads dictionary bias internal amount.
A kind of Chinese dictionary search engine is used for locating the Chinese words of importing fast from described Chinese dictionary, and the Chinese words in the described Chinese dictionary adopts the GB2312/GBK coding; Comprise:
First module is used for the GB2312/GBK coded sequence by GB2312/GBK coding schedule and corresponding Chinese words, preserves the dictionary bias internal amount of the interior Chinese words of Chinese dictionary with respect to the reference position of this Chinese dictionary;
Second module, be used to receive the Chinese words of input, and according to the represented row of this Chinese words in the GB2312/GBK table of the represented GB2312/GBK of first byte table numbering and second byte in the coding of input Chinese words number be listed as number, calculate the position of the pairing dictionary bias internal of this Chinese words amount in described first module;
Three module, be used to judge whether the pairing dictionary bias internal of this Chinese words amount is empty, if the Chinese words that then finishes the location and point out this input is not in Chinese dictionary, otherwise read dictionary bias internal amount from this dictionary bias internal amount position described concordance list, according to the initial memory location and the described dictionary bias internal amount of Chinese dictionary, the directly Chinese words of the described input in location and output in this Chinese dictionary;
Wherein, described second module comprises:
First module, be used for showing the reference position of numbering and concordance list according to the represented GB2312/GBK of first byte of the Chinese words of importing, calculate the reference position in the described concordance list of being coded in of first Chinese words in the GB2312/GBK table at this Chinese words place, wherein, the described corresponding relation of described index table stores, and the GB2312/GBK coding is also pressed the coding ascending order as index arrange;
Unit second is used for according to represented this Chinese words of second byte of the Chinese words of described input calculating the table bias internal amount of this Chinese words with respect to first word in the GB2312/GBK table at the row of GB2312/GBK table number and row number;
The position that is coded in concordance list of this Chinese words is determined according to the result of first module and Unit second in Unit the 3rd, and reads dictionary bias internal amount.
The present invention is with the dictionary bias internal amount of Chinese words in the Chinese dictionary with respect to the reference position of this Chinese dictionary, pressing the GB2312/GBK coded sequence of GB2312/GBK coding schedule and corresponding Chinese words preserves to form concordance list, encode according to the GB2312/GBK of input Chinese words and to inquire corresponding dictionary bias internal amount, at last locate Chinese words according to the reference position of dictionary bias internal amount and Chinese dictionary, thereby, make basic fixed search time of each words controlled, not only locate the short and efficient height of time of Chinese words, and the resource that takies in position fixing process is few, can save internal memory and storage space.
Description of drawings
Fig. 1, Fig. 2 are the synoptic diagram of two kinds of concordance lists among the present invention;
Fig. 3, Fig. 4 are respectively and adopt Fig. 1, concordance list shown in Figure 2 to realize the process flow diagram of word location.
Fig. 5 is the structured flowchart of Chinese dictionary search engine of the present invention.
Embodiment
The present invention is by adopting the Chinese words of locating input with the GB2312 or the GBK coding and sorting order mode of Chinese words from the Chinese words allusion quotation.Its core concept is: with the dictionary bias internal amount formation concordance list of Chinese words in the Chinese dictionary with respect to the reference position of this Chinese dictionary, GB2312/GBK according to the input Chinese words encodes the dictionary bias internal amount of obtaining this Chinese words correspondence from concordance list, directly locatees Chinese words according to the reference position of dictionary bias internal amount and Chinese dictionary at last.
The GBK coding and the GB2312 coding of Chinese character all are the Chinese character code tables that a kind of country formulates, and describe a Chinese character by two bytes.GBK is the extended coding of GB2312.Present embodiment mainly is encoded to example with GBK and describes.
Basic GBK sheet format is as follows:
CC?0 1 2 3 4 5 6 7 8 9 A B C D E F
4 Oi Lin Xuan Qin Lai Fallen-leaves-and-bark E Rui Rui Qi Lo Sue Tui Mang Yun Ping
5 Yu Xun Ji Jiong Shou-Qiu Su Jiong Feng Tiller Rang Yi ?Yu Ju Xian
6 Lian Yin Qiang Ying Tiffany Tou Hua Yue Ling Yao Mei Han Hui Lan Ji Tang
7 Man Lei Lei Hua Song Zhi Wei Huai Gracilaria Ji Lei Dill Spice
8 Kui Lu Jian Sal Teng Creeper Quan
Figure C20041010404500092
Phalaris Luan Men
Figure C20041010404500093
Benedict Roar Office Exhale
9 Xu Cuo Fu Virtual Krupp Hu No. Hao Ju Cruel Yan Zhan Zhan Loss Bin Terrified
A Shu otter flog spoil step on the tire tongue lift the safe phthalein of platform too attitude eliminate the stand that collapses
B covets altar wingceltis phlegm pool, paralysed beach
Figure C20041010404500094
Talking the smooth blanket carbon that shields visits and to sigh charcoal
If the C soup pool is warded off Chinese bush cherry thorax Tang sugar and is lain to drop down time to scald and draw the great waves flood
D silk ribbon grape peach escapes wash in a pan to make pottery and begs for the special rattan of cover and rise the painful ladder of transcribing and pick and play
E antimony is carried the topic hoof body of crowing and is shaved the drawer sky and add and fill out the field for sneezing cautious tears
Sweet quiet the licking of F sumptuously chosen far the looking into the distance of bar and jump to be pasted the iron card Room and listen hydrocarbon
Chinese words GBK is encoded to two bytes, and wherein first byte is the numbering of this table (Table), and its scope is: 81~FE; The Gao Siwei of second byte is row (Row) label, and its scope is: 4~F, and low four of second byte are row (Column) label, its scope is: 0~F.Being encoded to of " Tan " word in the table: CCB7 for example.
Total for GBK one: FE-81+1=254-129+1=126 table, each table has: F-4+1=15-4+1=12 is capable, and each table has the F-0+1=15-0+1=16 row.Therefore, each table comprises 16 (row) * 12 (OK)=192 Chinese character, and is total total: 126 (table) * 192 (word)=24192 words.
Literal order in the dictionary can randomize.Just the speech behind this word must so that when navigating to first word, can inquire the speech with this word beginning fast only with thereafter.As:
The people
The people
Personnel
Ah
Auntie
The younger sister
In the present embodiment, the foundation of concordance list can be adopted dual mode (but being not limited to two kinds):
A kind of mode be the GBK coding that adopts Chinese words in the Chinese dictionary with this Chinese words in described Chinese dictionary with respect to the corresponding relation between the dictionary bias internal amount of Chinese dictionary reference position, the GBK of Chinese words is encoded as index.
As shown in Figure 1, in concordance list, the GBK of Chinese words coding is arranged by ascending order, and the Chinese words of arranging in the Chinese dictionary takies 4 bytes with respect to the side-play amount of the reference position of this dictionary.As shown in FIG., first table of GBK table promptly is numbered the GBK table of " 81 " in the foremost of concordance list, and the word among this GBK is arranged by the sequencing of row.As the index of showing the starting position is encoded to " 8140 ", promptly represents first word in the GBK table No. 81, this coding correspondence be this word side-play amount with respect to the dictionary reference position in Chinese dictionary; Being encoded to after the starting position " 8411 " promptly represented second word in the GBK table No. 81, by that analogy.
GBK according to the Chinese words of importing when obtaining the dictionary bias internal amount of this Chinese words correspondence encodes the side-play amount that inquires correspondence from concordance list.
The second way is with the dictionary bias internal amount of Chinese words in the Chinese dictionary with respect to the reference position of this Chinese dictionary, preserves by the GBK coded sequence of GBK coding schedule and corresponding Chinese words and sets up concordance list.
As shown in Figure 2, in concordance list, putting in order of dictionary bias internal amount arranged corresponding with the GBK coding ascending order of Chinese words.As shown in FIG., what concordance list was preserved from continuous 4 bytes of starting position is the dictionary bias internal amount that first table of GBK table promptly is numbered first word (being that GBK is encoded to " the 8410 ") correspondence the GBK table of " 81 ", after continuous 4 side-play amounts that byte is second word correspondence.By that analogy.
In setting up the concordance list process,, in concordance list shown in Figure 2, the dictionary bias internal amount of relevant position is put sky if the word of the coding representative after some GBK coding is then put sky with the coding of relevant position in the concordance list of Fig. 1 not in Chinese dictionary.Putting sky is meant specific and other codings or the different mark of dictionary bias internal amount is set.
Therefore, according to the difference of concordance list, the realization of word location has all differences.Adopt concordance list shown in Figure 1 to realize the word location process as shown in Figure 3:
Step 1, the GBK coding of setting up Chinese words in the Chinese dictionary and this Chinese words in described Chinese dictionary with respect to the corresponding relation between the dictionary bias internal amount of Chinese dictionary reference position, as shown in Figure 1.
Step 2, according to the represented GBK table numbering of first byte of Chinese words of input and the reference position of described concordance list, determine the reference position in the described concordance list of being coded in of first Chinese words in the GBK table at this Chinese words place.
Step 3, according to the represented row of this Chinese words in the GBK table of second byte of the Chinese words of described input number and row number, calculate the table bias internal amount of this Chinese words with respect to first word in the GBK table.
Step 4, determine the position that is coded in concordance list of this Chinese words, and read dictionary bias internal amount according to described reference position and described table bias internal amount.
Step 5, according to the direct Chinese words of the described input in location in this Chinese dictionary of the initial memory location of Chinese dictionary and described dictionary bias internal amount.
In order in time to point out to null character (NUL), in step 4, read and judge whether corresponding GBK coding is empty before the dictionary bias internal amount, if the Chinese words that then finishes the location and point out this input is in Chinese dictionary, otherwise continue step 5.
Adopt concordance list shown in Figure 2 to realize the word location process as shown in Figure 4:
Step 11, with the dictionary bias internal amount of Chinese words in the Chinese dictionary with respect to the reference position of this Chinese dictionary, press the GBK coded sequence of GBK coding schedule and corresponding Chinese words and preserve to form concordance list;
Step 12, according to the represented GBK table numbering of first byte of Chinese words of input and the reference position of described concordance list, determine the reference position of the dictionary bias internal amount of first Chinese words correspondence in the GBK table at this Chinese words place at described concordance list;
Step 13, according to the represented row of this Chinese words in the GBK table of second byte of the Chinese words of described input number and row number, calculate this Chinese words table bias internal amount with respect to first Chinese words in the GBK table;
Step 14, the position of dictionary bias internal amount in described concordance list of determining this Chinese words correspondence according to the side-play amount and the described table bias internal amount of described first Chinese words correspondence, and read dictionary bias internal amount;
Step 15, according to the direct Chinese words of the described input in location in this Chinese dictionary of the initial memory location of Chinese dictionary and described dictionary bias internal amount.
In order in time to point out to null character (NUL), before reading side-play amount, step 14 judges also that whether side-play amount is empty, if the Chinese words that then finishes the location and point out this input is in Chinese dictionary, otherwise continue step 15.
Find out that from following both differences are concordance list and obtain side-play amount and distinguish to some extent in concordance list.His-and-hers watches for concordance list shown in Figure 1, main according to being that GBK according to the Chinese words of input encodes and locatees the position that this is coded in concordance list, for concordance list shown in Figure 2, main GBK according to the Chinese words of importing encodes and locatees the corresponding position of side-play amount in concordance list.Taking 4 bytes (byte) space below by dictionary bias internal amount illustrates.
(1) calculate the location number of this word place GBK table according to first byte code in the GBK coding of the Chinese words of input:
7E (summary table number)-(FE-first byte code)-1 is initial with 81 wherein, and FE is termination, and starting symbol 81 is 0.
(2) reference position of the GBK at Shu Ru Chinese words place table in concordance list:
The location number * 192 (number of words of each table) * N; For the index of Fig. 1, N is 2, and promptly the byte number of the GBK of each word coding is (if coding is deposited continuously with corresponding dictionary bias internal amount, be that preceding two bytes are coding, back to back is the dictionary bias internal amount of 4 bytes, and N is 2+4 just in this case, i.e. 6 bytes); For Fig. 2, N is 4, i.e. the byte number that takies of each dictionary bias internal amount.
(3) calculate the table bias internal amount of word according to second byte of the GBK coding of the Chinese words of input:
[192 (total number of word)-(256-second byte code (Hex))] * N, wherein, 256 is last word code (FF)+1; For the index of Fig. 1, N is 2+4, i.e. the byte number of the GBK of each word coding; For Fig. 2, N is 4, i.e. the byte number that takies of each dictionary bias internal amount.
(4) the GBK table reference position in concordance list is added the dictionary bias internal amount of the Chinese words correspondence that table bias internal amount can obtain to import.
According to above-mentioned description, the present invention can obtain locating fast the Chinese dictionary search engine of the Chinese words of input equally from the Chinese words allusion quotation, and processor, storer and input equipment parts are realized the Chinese words location in this search engine and the computer installation.As shown in Figure 5, Chinese engine comprises:
First module, be used for setting up and preserve the GB2312/GBK coding of Chinese words in the Chinese dictionary and this Chinese words at described Chinese dictionary with respect to the corresponding relation between the dictionary bias internal amount of Chinese dictionary reference position;
Second module is used to receive the Chinese words of input, and utilizes the GB2312/GBK coding of this Chinese words to obtain corresponding dictionary bias internal amount from described first module;
Three module is used for initial memory location and described dictionary bias internal amount according to Chinese dictionary, the directly Chinese words of the described input in location and output in this Chinese dictionary.
Above-mentioned second module comprises:
First module, be used for according to the represented GB2312/GBK table numbering of first byte of Chinese words of input and the reference position of described concordance list, first Chinese words was coded in reference position in the described concordance list during the GB2312/GBK that calculates this Chinese words place showed;
Unit second is used for according to represented this Chinese words of second byte of the Chinese words of described input calculating the table bias internal amount of this Chinese words with respect to first word in the GB2312/GBK table at the row of GB2312/GBK table number and row number;
The position that is coded in concordance list of this Chinese words is determined according to the result of first module and Unit second in Unit the 3rd, and reads dictionary bias internal amount.
Equally, another Chinese dictionary search engine (its structure is with reference to figure 5) comprising:
First module is used for the GB2312/GBK coded sequence by GB2312/GBK coding schedule and corresponding Chinese words, preserves the dictionary bias internal amount of the interior Chinese words of Chinese dictionary with respect to the reference position of this Chinese dictionary;
Second module, be used to receive the Chinese words of input, and according to the represented row of this Chinese words in the GB2312/GBK table of the represented GB2312/GBK of first byte table numbering and second byte in the coding of input Chinese words number be listed as number, calculate the position of the pairing dictionary bias internal of this Chinese words amount in described first module, and read dictionary bias internal amount from first module;
Three module is used for initial memory location and described dictionary bias internal amount according to Chinese dictionary, the directly Chinese words of the described input in location and output in this Chinese dictionary.
Described second module comprises:
First module, be used for according to the represented GB2312/GBK table numbering of first byte of Chinese words of input and the reference position of described concordance list, first Chinese words was coded in reference position in the described concordance list during the GB2312/GBK that calculates this Chinese words place showed;
Unit second is used for according to represented this Chinese words of second byte of the Chinese words of described input calculating the table bias internal amount of this Chinese words with respect to first word in the GB2312/GBK table at the row of GB2312/GBK table number and row number;
The position that is coded in concordance list of this Chinese words is determined according to the result of first module and Unit second in Unit the 3rd, and reads dictionary bias internal amount.
To the localization process of the Chinese words that adopts the GB2312 coding and said method in like manner, repeat no more.Obviously, concordance list of the present invention can also be other forms, and therefore, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims (5)

1, a kind of in Chinese dictionary the method for quick positioning words, the Chinese words in the described Chinese dictionary adopts the GB2312/GBK coding; It is characterized in that this method comprises the steps:
A, the GB2312/GBK coding of setting up Chinese words in the Chinese dictionary and this Chinese words in described Chinese dictionary with respect to the corresponding relation between the dictionary bias internal amount of Chinese dictionary reference position;
B, according to the represented GB2312/GBK table numbering of first byte of Chinese words of input and the reference position of described concordance list, determine the reference position in the described concordance list of being coded in of first Chinese words in the GB2312/GBK table at this Chinese words place;
C, according to the represented row of this Chinese words in the GB2312/GBK table of second byte of the Chinese words of described input number and row number, calculate the table bias internal amount of this Chinese words with respect to first word in the GB2312/GBK table;
D, determine the position that is coded in concordance list of this Chinese words, and read dictionary bias internal amount according to described reference position and described table bias internal amount;
E, judge whether the GB2312/GBK coding of the Chinese words of input is empty, if the Chinese words that then finishes the location and point out this input is in Chinese dictionary, otherwise execution in step F;
F, according to the direct Chinese words of the described input in location in this Chinese dictionary of the initial memory location of Chinese dictionary and described dictionary bias internal amount.
2, the method for claim 1 is characterized in that, described corresponding relation is stored in the concordance list, and the GB2312/GBK coding is arranged as index and by the coding ascending order.
3, a kind of in Chinese dictionary the method for quick positioning words, the Chinese words in the described Chinese dictionary adopts the GB2312/GBK coding; It is characterized in that this method comprises the steps:
A, with the dictionary bias internal amount of Chinese words in the Chinese dictionary with respect to the reference position of this Chinese dictionary, press the GB2312/GBK coded sequence of GB2312/GBK coding schedule and corresponding Chinese words and preserve to form concordance list;
B, according to the represented GB2312/GBK table numbering of first byte of Chinese words of input and the reference position of described concordance list, determine the position of the dictionary bias internal amount of first Chinese words correspondence in the GB2312/GBK table at this Chinese words place at described concordance list;
C, according to the represented row of this Chinese words in the GB2312/GBK table of second byte of the Chinese words of described input number and row number, calculate this Chinese words table bias internal amount with respect to first Chinese words in the GB2312/GBK table;
D, the position of dictionary bias internal amount in described concordance list of determining this Chinese words correspondence according to the side-play amount and the described table bias internal amount of described first Chinese words correspondence, and read dictionary bias internal amount;
E, judge that whether the pairing dictionary bias internal of this Chinese words amount is empty, if the Chinese words that then finishes the location and point out this input is not in Chinese dictionary, otherwise read dictionary bias internal amount from this dictionary bias internal amount position described concordance list, according to the direct Chinese words of the described input in location in this Chinese dictionary of the initial memory location of Chinese dictionary and described dictionary bias internal amount.
4, a kind of Chinese dictionary search engine is used for locating the Chinese words of importing fast from described Chinese dictionary, and the Chinese words in the described Chinese dictionary adopts the GB2312/GBK coding; It is characterized in that, comprising:
First module, be used for setting up and preserve the GB2312/GBK coding of Chinese words in the Chinese dictionary and this Chinese words at described Chinese dictionary with respect to the corresponding relation between the dictionary bias internal amount of Chinese dictionary reference position;
Second module is used to receive the Chinese words of input, and utilizes the GB2312/GBK coding of this Chinese words to obtain corresponding dictionary bias internal amount from described first module;
Three module, whether the GB2312/GBK coding that is used to judge the Chinese words of input is empty, if the Chinese words that then finishes the location and point out this input is not in Chinese dictionary, otherwise according to the initial memory location and the described dictionary bias internal amount of Chinese dictionary, the directly Chinese words of the described input in location and output in this Chinese dictionary;
Wherein, described second module comprises:
First module, be used for according to the represented GB2312/GBK table numbering of first byte of Chinese words of input and the reference position of concordance list, first Chinese words was coded in reference position in the described concordance list during the GB2312/GBK that calculates this Chinese words place showed; Wherein, the described corresponding relation of described index table stores, and the GB2312/GBK coding is also pressed the coding ascending order as index arrange;
Unit second is used for according to represented this Chinese words of second byte of the Chinese words of described input calculating the table bias internal amount of this Chinese words with respect to first word in the GB2312/GBK table at the row of GB2312/GBK table number and row number;
Unit the 3rd is used for determining according to the result of first module and Unit second position that is coded in concordance list of this Chinese words, and reads dictionary bias internal amount.
5, a kind of Chinese dictionary search engine is used for locating the Chinese words of importing fast from described Chinese dictionary, and the Chinese words in the described Chinese dictionary adopts the GB2312/GBK coding; It is characterized in that, comprising:
First module is used for the GB2312/GBK coded sequence by GB2312/GBK coding schedule and corresponding Chinese words, preserves the dictionary bias internal amount of the interior Chinese words of Chinese dictionary with respect to the reference position of this Chinese dictionary;
Second module, be used to receive the Chinese words of input, and according to the represented row of this Chinese words in the GB2312/GBK table of the represented GB2312/GBK of first byte table numbering and second byte in the coding of input Chinese words number be listed as number, calculate the position of the pairing dictionary bias internal of this Chinese words amount in described first module;
Three module, be used to judge whether the pairing dictionary bias internal of this Chinese words amount is empty, if the Chinese words that then finishes the location and point out this input is not in Chinese dictionary, otherwise read dictionary bias internal amount from this dictionary bias internal amount position described concordance list, according to the initial memory location and the described dictionary bias internal amount of Chinese dictionary, the directly Chinese words of the described input in location and output in this Chinese dictionary;
Wherein, described second module comprises:
First module, be used for showing the reference position of numbering and concordance list according to the represented GB2312/GBK of first byte of the Chinese words of importing, calculate the reference position in the described concordance list of being coded in of first Chinese words in the GB2312/GBK table at this Chinese words place, wherein, the described corresponding relation of described index table stores, and the GB2312/GBK coding is also pressed the coding ascending order as index arrange;
Unit second is used for according to represented this Chinese words of second byte of the Chinese words of described input calculating the table bias internal amount of this Chinese words with respect to first word in the GB2312/GBK table at the row of GB2312/GBK table number and row number;
The position that is coded in concordance list of this Chinese words is determined according to the result of first module and Unit second in Unit the 3rd, and reads dictionary bias internal amount.
CNB200410104045XA 2004-12-31 2004-12-31 Chinese dictionary search engine and method for quick positioning words in Chinese dictionary Expired - Fee Related CN100440207C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB200410104045XA CN100440207C (en) 2004-12-31 2004-12-31 Chinese dictionary search engine and method for quick positioning words in Chinese dictionary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB200410104045XA CN100440207C (en) 2004-12-31 2004-12-31 Chinese dictionary search engine and method for quick positioning words in Chinese dictionary

Publications (2)

Publication Number Publication Date
CN1632798A CN1632798A (en) 2005-06-29
CN100440207C true CN100440207C (en) 2008-12-03

Family

ID=34848200

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB200410104045XA Expired - Fee Related CN100440207C (en) 2004-12-31 2004-12-31 Chinese dictionary search engine and method for quick positioning words in Chinese dictionary

Country Status (1)

Country Link
CN (1) CN100440207C (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101118777B (en) * 2007-08-22 2011-06-22 无锡中星微电子有限公司 Playing method of multimedia container format file and indexes reading method thereof
CN102609510B (en) * 2012-02-06 2014-05-28 中国农业银行股份有限公司 Chinese name data processing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1196535A (en) * 1997-04-15 1998-10-21 英业达股份有限公司 Method for automatic marking pronunciation symbol
CN1295295A (en) * 1999-11-04 2001-05-16 英业达集团(西安)电子技术有限公司 Word looking-up method for electronic dictionary with fast polling index structure

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1196535A (en) * 1997-04-15 1998-10-21 英业达股份有限公司 Method for automatic marking pronunciation symbol
CN1295295A (en) * 1999-11-04 2001-05-16 英业达集团(西安)电子技术有限公司 Word looking-up method for electronic dictionary with fast polling index structure

Also Published As

Publication number Publication date
CN1632798A (en) 2005-06-29

Similar Documents

Publication Publication Date Title
US8010344B2 (en) Dictionary word and phrase determination
CN101199122B (en) Using language models to expand wildcards
US8412517B2 (en) Dictionary word and phrase determination
CN102449579B (en) All-in-one chinese character input method
CN110019647B (en) Keyword searching method and device and search engine
CN107545044A (en) A kind of tables of data method for building up, electronic equipment and storage medium
CN102385609A (en) Enhancing search-result relevance ranking using uniform resource locators for queries containing non-encoding characters
CN100464333C (en) File name generating method and device in file distribution system
KR20140056231A (en) Detecting source languages of search queries
CN101715579A (en) Language independent index storage system and retrieval method
US9158758B2 (en) Retrieval of prefix completions by way of walking nodes of a trie data structure
WO2014047214A1 (en) Hierarchical ordering of strings
US7366984B2 (en) Phonetic searching using multiple readings
CN109918682B (en) Text labeling method and device
CN104281275A (en) Method and device for inputting English
CN100440207C (en) Chinese dictionary search engine and method for quick positioning words in Chinese dictionary
CN101930474A (en) Chinese character simple stroke search method
CN102385597B (en) The fault-tolerant searching method of a kind of POI
CN103049095A (en) Tibetan language input method of embedded device
CN101436203B (en) Recording index method and apparatus
CN102981607A (en) Computer-implemented method of arranging text items in a predefined order
CN110489603A (en) A kind of method for information retrieval, device and vehicle device
CN2869995Y (en) Chinese-character searching engine
CN1496062A (en) Intelligent information processing method in network and its system
CN102004598B (en) Media player and character input method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20081203

Termination date: 20111231