CN1248024A - Chinese character search method using decoding - Google Patents

Chinese character search method using decoding Download PDF

Info

Publication number
CN1248024A
CN1248024A CN 99113849 CN99113849A CN1248024A CN 1248024 A CN1248024 A CN 1248024A CN 99113849 CN99113849 CN 99113849 CN 99113849 A CN99113849 A CN 99113849A CN 1248024 A CN1248024 A CN 1248024A
Authority
CN
China
Prior art keywords
chinese character
character
chinese
decoding
search method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 99113849
Other languages
Chinese (zh)
Other versions
CN1116647C (en
Inventor
汪文虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN 99113849 priority Critical patent/CN1116647C/en
Publication of CN1248024A publication Critical patent/CN1248024A/en
Application granted granted Critical
Publication of CN1116647C publication Critical patent/CN1116647C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

On a Chinese platform and in the case of calling no Chinese character input method, ASCII characters are input in keyboard based on some common simple encoding rule to form one character string. On the other hand, the searched data are decoded according to Chiense character inner code and the same rule as that used in inputting character string to returen one other character string. The two character strings are compared, it is "true" if they are equal to each other or the searched character string contains the searching character string; or else it is "false". The aim of Chinese character search is thus achieved.

Description

Adopt the Chinese character search method of decoding
The present invention relates in a kind of Computer Applied Technology field to adopt coding and decoded mode to carry out method for information retrieval, refer in particular to a kind of Chinese character index technology on the computer system of being applied to that adopts decoded mode.
Nowadays the world has entered an information age, and information processing comprises that information retrieval has more and more become the daily essential thing of vast ordinary people.For example, the collected books of a large-scale library reach millions of even up to ten million volumes, want to find the books that oneself need at first to retrieve; A compact disk capacity of today reaches more than several G, can store several hundred million Chinese characters, or stores first MIDI music up to ten thousand, and its quantity has surpassed books that most of people read all one's life or the melody of appreciating, and can not feel to have no way of doing it if do not retrieve people; On the stamp market, what the philatelic caring person faced is several ten thousand kinds postal materials and price thereof, and he also must be by retrieval, just can find and oneself like and postal materials that price is suitable.So, there is not good retrieval method, want to find the information of needs as searching for a needle in a haystack.
In addition, vast Chinese's computer user also requires computer software can use Chinese character.Through the unremitting effort of two more than ten years, the input output method of Chinese character is all quite perfect, but the retrieval of Chinese character waits to improve.The difficulty that solves Chinese character index just seems very urgent.
As shown in Figure 1, present Chinese character index flow process is: select a kind of input method of Chinese character on Chinese platform, " template " character string of input docuterm is taken out the data character string that is retrieved then piecemeal, if wherein there is one section character string with template equal fully, then differentiating is " very ".Present Chinese character search method must call input method of Chinese character and import according to the Chinese character mode, and generally can not contain symbol or western language in search field, otherwise often can't discern.
Yet above-mentioned in practice Chinese character index has been run into various difficulties, as:
1. can export Chinese character when some computer software moves on the Chinese character platform but can not import
Therefore Chinese character is difficult to Chinese character as docuterm;
2. although allow the input Chinese character when some computer software moves, in input, both comprised
Can get into trouble when Chinese character has western language, symbol again or make operation quite numerous because switch
Trivial;
3. in the current society of information prosperity, the common China that many educational levels are still very not high
The common people grasp Chinese character input method and have any problem, but also have a large amount of informational needs select and
Retrieval, this has just required easy Chinese character search method adaptive with it.
At above-mentioned situation, the object of the present invention is to provide a kind of Chinese character search method that adopts decoding.The operator can call input method of Chinese character, even must not grasp input method of Chinese character, can carry out Chinese character index.
Another object of the present invention is to use simple and direct coding, make simple to operately, easy to use, improved the efficient of Chinese character index, reach easy, at a high speed, purpose efficiently.
Robot calculator is the machine of process information, and its handled information not only has numeral to also have literal, figure, and sound and other can convert the physical quantity of electric signal to.No matter data, address in the inside of robot calculator, scale-of-two is all used in the fortune control, so computing machine all will encode and decipher for the input and output of information, use be Unite States Standard (USS) message code ASCII character.ASCII character in fact has been internationally recognized, and Chinese national standard GB 1988 " 7-bit coded character set that message exchange is used " is equal to the ASCII basic code basically.GB 2311 " information processing exchange is used for the extending method of 7-bit coded character set ", GB 11383 " information processing message exchange with eight bit code structure and coding rule " has stipulated the using method of extended code.GB 15273 (being equal to ISO 8859) has stipulated the extended code numbering of the phonetic alphabet of multiple non-English.
The more literary composition kind of literal numbers such as Chinese character is then used two ASCII character of adjoining, and promptly the double byte form just can be stored in computer.The form of this couple of ASCII is called ISN.So being the form with ISN, Chinese character in computer, stores.
Chinese character set is the mapping relations table of ISN and Chinese character, is the standard that computer stores Chinese character.ISN is stored in the computer, and it meets the principle of one yard, one yard one word of a word.
At present the Chinese character base used of computer mainly contains two classes: one. basic storehouse: 1. GB 2312 standards " Chinese Character Set Code for Informati baseset " national standard region-position code, be called for short the GB sign indicating number.Totally 94 districts are 94, and wherein first-level Chinese characters is 3755,3008 of the Chinese characters of level 2, totally 6763 simplified Hanzis.2. GB/T 12345 " Chinese Character Set Code for Informati supplementary set " is called for short GBFT, comprises 6866 unsimplified Hanzis.Although GB/T 12345 is called supplementary set, reality is basic storehouse, wherein a second Chinese character base is identical with GB 2312, has just increased the Chinese character of being simplified when Chinese character simplified 103 sixties.Two. expanding library: the individual character number of Chinese character is a lot of, writes arrangement ancient books demand for satisfying, and CNS office has worked out:
GB 7578 " Chinese Character Set Code for Informati second supplementary set "
GB 13131 " Chinese Character Set Code for Informati the 3rd supplementary set "
GB 7590 " Chinese Character Set Code for Informati the 4th supplementary set "
GB 13132 " Chinese Character Set Code for Informati the 5th supplementary set "
Three, the 5th supplementary set is the complex form of Chinese characters of the second, the 4th supplementary set.Although more than several supplementary sets are compulsory standards, as if few people use.Perhaps all used 94 in 94 districts with these character set, finish the character set conversion by ESC control command " escape ", volume computer program a bit trouble has relation.Except enlarging the character library, can only switch the limitation that to use simultaneously by simplified and traditional body in the past Chinese software, worked out the ISO/IEC 10466.1/GB 13000.1 " CJK unifies Hanzi coded character set " that allows simplified and traditional Japan and Korea S Chinese character to use simultaneously again, be called for short CJK i.e. " China, Japan and Korea ", collected 20902 Chinese characters.Then use CNS 11643 standards in Hong Kong and Taiwan, be commonly called as the character set of Big5 BIG 5, contain 13053 unsimplified Hanzis.The big character library of present domestic Comnputer Chinese character " Hanzi expanded internal code specification " GBK, it contains CJK Chinese character and BIG 5 non-Chinese character basies, and other adds 80 Chinese character radicals and member.
The key of Chinesizing is the coding and the decoding of input and output.They all divide two steps to carry out: for input, at first be that Chinese character is weaved into outer sign indicating number according to certain rule, then outer yard ISN that is converted to two ASCII character stored; For output, at first be to read the ASCII ISN of storing, call the corresponding Chinese character of Chinese character set output ISN again.Import Chinese character by some keys continuously from keyboard, the coding of this type Chinese character input method is called outer sign indicating number, and outer sign indicating number may not meet one yard one word.Earlier with being easy to the short outer sign indicating number key feeding character of remembering, encode, be automatically converted to the ISN storage by computer again during computation input Chinese character.With any method input Chinese character, all must be through coding typing computer, descriptor index method of the present invention is only and Hanzi internal code is contacted directly and outer yard and imputting Chinese characters do not have direct relation.The Chinese character of method input in any way can be retrieved with decoding method of the present invention conversely speaking.
Any computer software all is a program, and typical computer software then is the source program of writing with algorithmic language.The Chinesizing source program has several different types:
1. in character string and relevant input and output thereof, use Chinese character;
2. the reserved word Chinese character of algorithmic language;
3. variable name Chinese character.
That is to say that Chinesizing is not really wanted the portion of demanding perfection and used Chinese character, but as required can Chinese and western languages, arabic numeral, symbol is used with, is target to obtain optimum efficiency.Therefore, Chinese character retrieval is exactly in fact to make character string relatively, relatively differentiates when be retrieved character string and searching character string to be " very " when equating.
Technical scheme of the present invention is as follows:
A kind of Chinese character search method that adopts decoding comprises the steps:
(1) loads the Chinese character index system
(2) type of selection character set
(3) input docuterm
(4) generate " searching character string " by compiling method
(5) read the data that are retrieved
(6) press decoding method and generate " another name character string "
(7) compare " searching character string " and " another name character string "
(8) do the relevant running of the data that are retrieved
(9) check whether data end
(10) enter next data field
(11) retrieval finishes
The present invention has changed the thinking and the operational scheme of Chinese character index: do not call in input method of Chinese character on Chinese platform, and import ASCII basic code character according to certain commonly used and easy cryptoprinciple from keyboard, form a character string.On the other hand, with the data character string that is retrieved by Hanzi internal code with the used identical cryptoprinciple decoding of input of character string, return a character string.Two character strings are compared,, then differentiate to be " very " if equate or the character string that is retrieved contains searching character string.
The aforesaid commonly used and easy coding method that is used for the searching character string is made up of ASCII basic code character from keyboard input is such: the method that adopts Chinese phonetic alphabet initial character usually, promptly adopt the mode of first letter (no matter it is initial consonant or simple or compound vowel of a Chinese syllable) of the Chinese phonetic alphabet of each Chinese character of importing docuterm to retrieve, because this mode need not recited coding rule, stroke is less, easily learn well understand, easy and simple to handle, as long as can the Chinese phonetic alphabet.
Because the method that this Chinese character search method proposes does not need the Chinese character load module, thereby can be applied to import Chinese character, but can export the occasion of Chinese character; Operating difficulties in the time of can avoiding the input of Chinese and western languages and symbol to switch; And in fact cooperating of the present invention is brief compiling method, can retrieve with less key entry, thereby when saving machine.So, can not use the user of input method of Chinese character, utilize the present invention also can Chinese character retrieval; Use the user of input method of Chinese character, utilize the present invention can improve recall precision.
Conclusion is got up, and the present invention compares traditional Chinese character search method, and following remarkable result is arranged:
1. Jian Suo Chinese character can call input method of Chinese character, even retrieval person can be ignorant of the Chinese
The word input method;
2. adopt this Chinese character search method, can simplify the input of term, operation is very easy;
3. in the huge system of quantity of information, carry out Chinese character index, faster than classic method;
4, need not stipulate the form retrieved, searching character can be initial character, intermediate character or
Trailing character.
Description of drawings:
The Chinese character search method schematic flow sheet that Fig. 1 is common;
Fig. 2 Chinese character search method schematic flow sheet of the present invention;
The computer program process flow diagram of this search method of Fig. 3
Describe the present invention in detail below in conjunction with accompanying drawing.
As shown in Figure 2, the present invention has changed the thinking and the operational scheme of Chinese character index: do not call in input method of Chinese character on Chinese platform, and import ASCII basic code character according to certain commonly used and easy cryptoprinciple from keyboard, form a character string.On the other hand, with the data character string that is retrieved by Hanzi internal code with the used identical cryptoprinciple decoding of input of character string, return a character string.Two character strings are compared,, then differentiate to be " very " if equate or the character string that is retrieved contains searching character string.
Because the method that this Chinese character search method proposes does not need the Chinese character load module, thereby can be used for importing Chinese character, but can export the occasion of Chinese character; Operating difficulties in the time of can avoiding the input of Chinese and western languages and symbol to switch; And, in fact cooperating normally brief compiling method of the present invention, can retrieve with less key entry, thereby when saving machine.So, can not use the user of input method of Chinese character, utilize the present invention also can Chinese character retrieval; Use the user of input method of Chinese character, utilize the present invention can improve recall precision.
Each step among Fig. 2 discloses as follows in detail: one, load the Chinese character index system
The formed Chinese character search method of the present invention is installed in the required computer system, and this Chinese character search method can be suitable for various operating systems and application software, as DOS, CCDOS, WINDOS etc.This Chinese character search method just is used for the function of extended operation system, rather than substitutes original operating system.Two, select the type of character set
After the system start-up of this Chinese character index, the type of character set will be judged automatically.Realize this kind judgement, its concrete operations can be such: the character and the ISN that read several fixed positions in the character library that former Chinese character input system wears, compare with the ISN of having set the standard of being used as in the program, select that identical setting of ISN promptly to represent the type of former character set.
The decoding method that this Chinese character index system adopts can be used for various computer double-byte characters coded sets, as: GB 2312 " Chinese Character Set Code for Informati baseset "; GB/T12345 " Chinese Character Set Code for Informati supplementary set "; Taiwan CNS 11643 standard BIG 5 Big5s; The CJK character library of ISO/IEC 10646, and GB 12052 Korean writings, GB 8045 mongolian characters, GB 12050 Uygur's literal and other national double-byte characters of other countries.Wherein with GB 2312, GB 12345, and CNS 11643 Hanzi coded character sets are the most commonly used.After starting this Chinese character index system, it will discern the type of character set in the computer automatically, and be complementary with it.Three, input docuterm
The decoding of adopting in this Chinese character search method can reverse any input method, the indexing system of Chinese Characters in principle.With the input method of Chinese character is example, although the compiling method of various uniquenesses has his own strong points, most input methods can not be admitted by users, and are difficult to be put to practicality.The present invention with the Chinese phonetic alphabet initial character of Chinese character phrase as input character as the input term, as long as the operator possesses the phonetic basis of mandarin, learning training in advance just, and the keystroke number is few, the easiest being widely accepted.Import docuterm except aforesaid with first character of phonetic alphabet of Chinese character, the present invention also can adopt various present existing input methods to import docuterm, as adopting double-spelling Chinese character input method, all-phonetic input method, phonetic input method, five-stroke character input method, Zheng's code inputting method, wang code input method etc.
This Chinese character search method can also adopt the input mode of contract sign indicating number or the key that contracts (the few key board at phone and so on adopts less key).If wish to reduce the number of times of keying in, can adopt the sign indicating number that contracts.Typical representative is to use Chinese phonetic alphabet initial character, and Chinese character sign indicating number that contracts is a sign indicating number, also is the sign indicating number that contracts with the Two bors d's oeuveres of an initial consonant and a simple or compound vowel of a Chinese syllable.Present Two bors d's oeuveres letter sound is regardless of, and can only alphabetically represent a Chinese character with two, considers that Chinese phonetic alphabet initial character will use 23 phonetic alphabet altogether, and the i among remaining i, u, the v, u are simple or compound vowel of a Chinese syllable, in addition with , 1,0 represent respectively a, e, o (
Figure A9911384900112
, o is likeness in form, i be sound like), just can sound separately, retrieve with one one rhythm or one or a rhythm, sound is retrieved some advantage for the object number of words weak point that is retrieved.
The concrete correspondence of the Phonological input method after the improvement is as follows:
Phonetic a o e b p m f d t n l g k h j q x z c s r w y
(z, c, s contain zh, ch, sh)
Keyboard
Figure A9911384900121
01 b p m f d t n l g k h j q x z c s r w y
Few for the hardware bond number, as phone, telepilot etc., can use the key method that contracts.For example " one or the nine student's dictionary " as far back as the twenties just uses nine sign indicating numbers to look into Chinese character:
123456789 retrieve according to the order of strokes observed in calligraphy.But, the applicable situation of the key that contracts can be not a lot, above very general of telepilot 23 keys.Four, press decoding method and generate " searching character string "
According to program, the information of importing is changed into " searching character string " that can compare.
This Chinese character search method can be used for the mixed index of Chinese and western languages and symbol, can discern Chinese and western languages, capital and small letter and symbol etc.For example: 1. represent Chinese character with lowercase, capitalization is represented English, English retrieval case-insensitive, and this is a Chinese character input method commonly used; 2. represent Chinese character with capitalization, lowercase is represented English, English retrieval case-insensitive; 3. English case sensitive, Chinese is unified to be shone upon with small letter (or capitalization) letter; 4. English case-insensitive, Chinese is unified to be shone upon with small letter (or capitalization) letter.Because Chinese retrieval initial character ordering and English alphabet sort marked difference is not often arranged, 3., 4. the redundant object that retrieves of mode can be not much yet.
For example: the object that is retrieved " Chinesizing Qbasic language "
In Chinese character search method, data conversion process is as follows:
Data The character string content The GB Chinese character, heuristicimal code
Raw data Chinesizing Qbasic language BABA?BBAF?5142?61 7369?63D3?EFD1?D4
English is converted into capitalization Chinesizing QBASIC language BABA?BBAF?5142?41 5349?43D3?EFD1?D4
Chinese character is converted into the phonetic initial character hhQBASICyy ?68?68?51?42?41 ?53?49?43?79?79
Docuterm desirable " hhQB* ", this form is represented first section coupling, * number is a mark, represents no requirement (NR) thereafter.The another name character string that the word that is retrieved generates is " 68 68 51 42 4,153 49 43 79 79 ", and the searching character string that searching character string generates is " 68 68 5142 ", and preceding 4 bytes of intercepting were all when character string compared
68 68 51 42 coincide! Retrieve successfully.
The Chinese and western languages mixed index, wherein so-called symbol is an ASCII character basic code symbol, accounts for a byte and an English alphabet is similar.The double byte symbol that Chinese character is concentrated can not be skipped during retrieval as the alphabetical content of retrieval, but allowed to exist in the word that is retrieved.Five, read the data that are retrieved
According to program command, computing machine automatically from internal memory, hard disk, floppy drive, CD or network, data optical cable even large database etc. read the data that are retrieved.Six, press decoding method and generate " another name character string "
When Chinese character retrieval, being one group for the character string that is retrieved with two characters judges one by one and handles, constitute one " another name character string " through decoding, to cooperate GB 2312 and GB/T12345 character set is that the example way is: the 1. character beyond the Chinese character base, be ASCII character less than 161 character, former state is copied word by word; 2. for Chinese characters of level 2, be converted to the respective coding character according to the decoding character repertoire; 3. for the non-Chinese character part in the Chinese character base scope, promptly former the and later part in 87 districts (or 90 districts of GB/T 12345) in 16 districts is abandoned.
The character string that is retrieved, " the another name character string " that can convert to automatically in advance stores.Also can when retrieval, generate temporarily.Adopt which kind of form, depend on the length of the object that is retrieved and whether relatively stable.The object total length is less than 100,000 bytes if be retrieved, and the time of decoding cost was compared and can ignore with the time of typing character, does not need the storage another name.Object change is a lot of if be retrieved, and such as the network information or read CD, then cannot change in advance.Otherwise for example the big stable again character of library catalogue and so on amount can convert in advance to have another name called and store, when saving the machine of each retrieval.And the character string that will have another name called is arranged according to size order, if retrieval is the lead-in section, just can get with the fast quick checking of bisection method.
The object that is retrieved is treated with the program design angle, all belongs to information paper, and its structure can be divided into: unformatted sequential file, format sequential file, random file.The method of storage " another name character string " is decided on the form of object in advance: for the format sequential file, be converted to the character string two-dimensional array; For the unformatted sequential file, be converted into character string simple variable or character string one-dimension array; Be converted into the record variable array for random file.In computer memory, the element of match retrieval just can be determined the raw data of coupling according to its subscript and the relation of shining upon of raw data these storage of array.
The used decoding of this Chinese character search method is different with the input method of Chinese character that with the phrase is unit, and it does not comprise fixing and extendible dictionary.It belongs to adaptive system, and the character string that can and be retrieved is sought coupling automatically.
The decoding of this Chinese character search method needs one and the supporting decoding character repertoire of ISN, produces shining upon of the keyboard and the character that is retrieved.The decoding character repertoire is equivalent to a two-dimensional array, and one dimension wherein is corresponding to the district of Hanzi coded character set, and another is tieed up corresponding to the position.When CHAR is adopted in decoding, when for example using a character, to decipher character repertoire and can use one-dimension array instead, an array location is corresponding to a district.The storage unit that adopts one-dimension array to take is less, and loading velocity is very fast.Appendix one is listed with the Chinese characters phonetic initial character decoding character repertoire of the small letter Latin alphabet to GB 2312 " Chinese Character Set Code for Informati baseset ", read in about 2.6 milliseconds of the time of internal memory from hard disk with the PII-300 microcomputer, committed memory is less than 8K, and consumption of natural resource is very few.The program of decoding is less complex also, is easy to be cured to be used for intelligence instrument and household electrical appliance in the chip.Seven, compare " searching character string " and " another name character string "
" another name character string " and " searching character string " compared,, can write down the position of coupling if needed if coupling is differentiated for very, the part of coupling with different colours or font representation, so that discern; And can or print former character string display.If unmatched words are differentiated and are " vacation ".
This Chinese character search method can be in conjunction with logical operation.The most frequently used is " with " calculate, several search fields are comprehensively judged.Usually use " * " to separate search field as separator when document retrieval, expression is done AND operation to these fields.For example, the object " Chinesizing Qbasic language " that is retrieved can be retrieved with " hh*hh ".For keyword or theme, conjunctive search is more valuable.With the chemical catalyst inquiry is example, contains the platinum hydrocracking catalyst such as seeking, and should be " * j*jq*lh*chj* " with Chinese phonetic alphabet initial character term.If term is changed into " * bj*jq*chj* ", adaptive scope has just enlarged, and is not limited to hydrofining, the hydrogenation transformation of the way, oil hydrogenation sclerosis or the like used catalyzer covered with gold leaf.
In addition, can also require " full coupling ", " lead-in section coupling ", " last fields match ", " middle field coupling " for field.Middle field can require " order coupling " or " non-order coupling " again.Thereby satisfy various retrieval needs.These preceding topics can reflect in the program that comprises the decoding retrieval.Eight, other step
If differentiation relatively later on for true, then shows or prints the former data that are retrieved, the part of coupling is come out with different colors or font representation, so that discern.Simultaneously, can do other relevant running, as move this program etc.Check then whether the data that are retrieved end, if end, then search complete; If do not end, then enter next data field, carry out new round circulation.
If differentiate for false relatively, then determine whether also will do further retrieval, if not then search complete; If, then enter next data field, carry out a new retrieval.
In addition, compatibility of this Chinese character search method and ISN is compatible consistent.For example: GBK is contained CJK, and CJK is contained GB 2312.Identical with the GB2312 again traditional font of just using instead of the secondary character set of GB 12345 shows, so the decoding of GB 2312 ISNs just can be used for retrieving GBK, CJK, GB 2312, GB 12345, just can only make docuterm with the decoding of GB 2312 ISNs, other Chinese character can be used as the composition of the word that is retrieved, and shows and prints.
Otherwise the decoding method of GBK also can backward compatiblely be used for CJK, GB 12345, the retrieval of GB2312 ISN.Just for GB 12345, GB 2312, can not simultaneously the C1 collection eight-level code of ISO4873/GB 11383 be used for the western language single-byte character of non-English, this can't use reality and cause difficulty.
Application examples 1
WIN 95 operating systems have been released the longest long filenames that reaches 255 characters, have thoroughly solved the defective of PC 8.3 filenames shortage file title recognition capability; Improved the document retrieval function, adapted to the Computer Storage amount and enlarged rapidly, quantity of information speed increases the puzzlement that brings; Dispose multiple file type of drive, made things convenient for operation; But the standing of retrieving under WINDOWS is perfect inadequately, need possess the DOS Chinese platform with the DIR command search under DOS.Now the Chinese character index decoding method is weaved into executable program, be used for Chinese character retrieval and English pathname and filename, reach and do not call in the purpose that hanzi system can search path and file easily.
Only use a lead-in section when searching pathname, pathname in disk or the CD is read in internal memory, use then with the supporting Chinese phonetic alphabet initial character database of GB 2312 " Chinese Character Set Code for Informati baseset " and convert another name to, docuterm with the keyboard input carries out character string relatively again, checks in the pathname of coupling.If satisfactory pathname is unique, enters filename automatically and search; And if more than one of the pathname of searching character string coupling, then list the pathname and the sequence number thereof of whole couplings, selective affirmation.
The retrieving files name makes and uses the same method, just because filename is longer, therefore allow branch several retrieval field to do conjunctive search, retrieval format can reflect that search field is lead-in section or tail field or middle field, and middle field can specify whether requirement is mated in proper order.If result for retrieval is unique, automatic operating file; Otherwise the filename of coupling is all shown, and the part that will mate is with eye-catching color demonstration, for you to choose.
This software can be selected " comprising sub-directory " and " not comprising sub-directory " two kinds of search modes.Such as, Beethovan's melody of " to the Alice " is arranged among the WIN 95, much being translated into " Alice " used technology of the present invention on the disc at home, as long as key in als, just this document can be selected automatically and shown, and plays this melody even if mix driver.
To read hard disk is example, about 1000 paths in the C dish, and the time that P II-300 micro computer is searched the Chinese pathname cost of coupling is about 10 to 20% of disk machinery working time.If slower from floppy disk, CD, telecommunication cable reading of data speed, retrieval time, proportion was lower.Computor-keyboard is " lights ", input speed can not surpass 400 keys/minute, most of people's input speeds do not reach 250 keys/minute, moreover the time that the time ratio that beats one's brains when keying in is started is many, the present invention has exempted and calls in input method of Chinese character and key in that operation is comparatively easy has saved the time, thereby having shortened the exchange that retrieval expends, is a kind of search method efficiently.
This method has overcome that the operational objective program is difficult to import Chinese character under the WINDOWS, thereby also is difficult to the difficulty of Chinese character retrieval, also can avoid input of character string to comprise the troublesome poeration that Chinese character, western language, symbol occur simultaneously.Adaptability and dirigibility with height.
Application examples 2
For books retrieval, patent retrieval, telephone number retrieval, customs duty retrieval or the like, the data structure of these objects is fairly simple generally, can express with a form, and each row in the form has several data item, reflects the feature of the object that is retrieved.Therefore, resemble the just example of this type of list data inquiry of books retrieval, patent retrieval, telephone number retrieval, customs duty retrieval or the like.Such as, take business card to make example, comprise usually on the business card: projects such as name, post, unit, unit address, phone, fax, network address, these data are the object of retrieval, are again the foundations of retrieval.Generalized case is or several inputs will knowing, requires the output total data.Searching system allows the user to select input item, and characteristics of the present invention are that the Chinese character part in the retrieval can carry out under the western language input state, and the content of keying in can be simplified.Will search a business card that is named as the people of " inventor " now, only need to key in " fmz " and get final product, " Fa Mingzhe " is easy than keying in; If will dwindle the hunting zone, can key in Business Name phonetic initial character on company one hurdle.Conversely, if want in business card, to search " patent and trademark agency ", as long as in company's column, key in * zlsbdl*, just can find personnel's business card of these all companies of industry.
In sum, the present invention can make the retrieval of Chinese character very easily learn, handy, need not call input method of Chinese character, also needn't remember numerous and diverse Chinese character input rule, and more easy, quick during underway western language mixed index.The above embodiments have been done further to describe to the present invention, but this is not as limit.

Claims (16)

1. a Chinese character search method that adopts decoding is characterized in that comprising the steps:
(1) load the Chinese character index system,
(2) type of selection character set,
(3) input docuterm,
(4) generate " searching character string " by compiling method,
(5) read the data that are retrieved,
(6) press decoding method and generate " another name character string ",
(7) compare " searching character string " and " another name character string ",
(8) do the relevant running of the data that are retrieved,
(9) check whether data end,
(10) enter next data field,
(11) retrieval finishes.
2. the Chinese character search method of employing decoding according to claim 1, the method that it is characterized in that described (3) step input docuterm can adopt Phonological input method, promptly only get the Chinese phonetic alphabet initial character of docuterm, it both can be initial consonant, also can be simple or compound vowel of a Chinese syllable, the corresponding relation of the key on its phonetic and the universal keyboard be as follows:
Phonetic a o e b p m f d t n l g k h j q x z c s r w y
Keyboard
Figure A9911384900021
01 b p m f d t n, 1 g k h j q x z c s r w y
Wherein: z, c, s contain zh, ch, sh.
3. the Chinese character search method of employing decoding according to claim 1, the decoding method that it is characterized in that described (6) step is at following various Chinese characters, the Korean, the Mongolian, Uygur's literal and other countries and the national double-byte characters coded set that computer character set is arranged: the GB 2312 " Chinese Character Set Code for Informati baseset " that works out, GB/T12345 " Chinese Character Set Code for Informati supplementary set ", Taiwan CNS11643 standard BIG 5 Big5s, the CJK character library of ISO/IEC 10646, GB 12052 Korean writings, GB 8045 mongolian characters, GB 12050 Uygur's literal are specially adapted to hanzi system.
4. the Chinese character search method of employing according to claim 1 decoding is characterized in that the type of described selection character set, is can judge the type of character set automatically and match, and be an adaptive system.
5. the Chinese character search method of employing decoding according to claim 1, it is characterized in that described Chinese character search method is only relevant with the computing machine ISN, irrelevant with outer sign indicating number, which kind of promptly with input method import regardless of Chinese character, the decoding method in (6) step all can reverse it, as: double-spelling Chinese character input method, all-phonetic input method, phonetic input method, five-stroke character input method, Zheng's code inputting method, wang code input method, Chinese phonetic alphabet initial character input method.
6. the Chinese character search method of employing decoding according to claim 1 and 2, its feature can adopt the input mode of the contract key and the sign indicating number that contracts at described (3) step input docuterm.
7. the Chinese character search method of employing according to claim 1 decoding is characterized in that described (3) step can import Chinese and western languages and symbol and their hybrid combining simultaneously when the input docuterm.
8. the Chinese character search method of employing decoding according to claim 1 is characterized in that can storing this " another name character string " after described (6) step is by decoding method generation " another name character string ".
9. the Chinese character search method of employing according to claim 1 decoding is characterized in that the decoding method in described (6) step can be in conjunction with logical operation, the most frequently used is " with ".
10. the Chinese character search method of employing decoding according to claim 1 is characterized in that the decoding method in described (6) step can require search field " full coupling ", " lead-in section coupling ", " middle field coupling ", " last fields match ".
11. the Chinese character search method of employing decoding according to claim 1 is characterized in that the data that are retrieved in (3) step can be: computer documents catalogue, file name, file content and keyword; The list data of form and database: as name, place name, unit name, kinds of goods name, synopsis etc.; The network service standing that contains Chinese character or other double byte literal; Titles such as books, patent, documents and materials, telephone number, customs duty; And the large-scale program information of canned data, particularly optical disc storage in various household electrical appliance that contain Chinese character or other double byte literal and the smart instrumentation.
12. the Chinese character search method of employing decoding according to claim 1 is characterized in that this Chinese character search method can call the input method of Chinese character of former Chinese platform in carrying out retrieving.
13. the Chinese character search method of employing decoding according to claim 4 is characterized in that described Chinese character search method does not comprise fixing and extendible character library.
14. the Chinese character search method of employing decoding according to claim 10 is characterized in that described " middle field coupling " can be divided into " order coupling " and " non-order coupling ".
15. the Chinese character search method of employing decoding according to claim 11, it is characterized in that described Chinese character index system can be medium with disk, CD or other external record materials, be used for robot calculator or computerized smart machine, also can solidify into ROM, EPROM semiconductor material and be used for each electric appliances and Telecom Facilities.
16. the Chinese character search method of employing decoding according to claim 1, it is characterized in that described (9) step can show or print the data that are retrieved after search complete, perhaps the part of coupling is come out with different colors or font representation, can also do the computing or the executive routine of related data.
CN 99113849 1999-07-06 1999-07-06 Chinese character search method using decoding Expired - Fee Related CN1116647C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 99113849 CN1116647C (en) 1999-07-06 1999-07-06 Chinese character search method using decoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 99113849 CN1116647C (en) 1999-07-06 1999-07-06 Chinese character search method using decoding

Publications (2)

Publication Number Publication Date
CN1248024A true CN1248024A (en) 2000-03-22
CN1116647C CN1116647C (en) 2003-07-30

Family

ID=5276995

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 99113849 Expired - Fee Related CN1116647C (en) 1999-07-06 1999-07-06 Chinese character search method using decoding

Country Status (1)

Country Link
CN (1) CN1116647C (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100429648C (en) * 2003-05-28 2008-10-29 洛昆多股份公司 Automatic segmentation of texts comprising chunsk without separators
CN101201829B (en) * 2006-12-15 2011-06-15 英业达股份有限公司 Chinese character library system as well as character code display method thereof
CN103649944A (en) * 2012-06-27 2014-03-19 株式会社新明系统 Device and method for inputting Chinese words
CN105426389A (en) * 2015-10-26 2016-03-23 武汉微创光电股份有限公司 Fuzzy retrieval locating method based on UI directory tree view
CN113760246A (en) * 2021-09-06 2021-12-07 网易(杭州)网络有限公司 Application program text language processing method and device, electronic equipment and storage medium
CN115391495A (en) * 2022-10-28 2022-11-25 强企宝典(山东)信息科技有限公司 Method, device and equipment for searching keywords in Chinese context

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100429648C (en) * 2003-05-28 2008-10-29 洛昆多股份公司 Automatic segmentation of texts comprising chunsk without separators
CN101201829B (en) * 2006-12-15 2011-06-15 英业达股份有限公司 Chinese character library system as well as character code display method thereof
CN103649944A (en) * 2012-06-27 2014-03-19 株式会社新明系统 Device and method for inputting Chinese words
CN103649944B (en) * 2012-06-27 2016-05-04 株式会社新明系统 Chinese word input unit and method
CN105426389A (en) * 2015-10-26 2016-03-23 武汉微创光电股份有限公司 Fuzzy retrieval locating method based on UI directory tree view
CN113760246A (en) * 2021-09-06 2021-12-07 网易(杭州)网络有限公司 Application program text language processing method and device, electronic equipment and storage medium
CN113760246B (en) * 2021-09-06 2023-08-11 网易(杭州)网络有限公司 Application text language processing method and device, electronic equipment and storage medium
CN115391495A (en) * 2022-10-28 2022-11-25 强企宝典(山东)信息科技有限公司 Method, device and equipment for searching keywords in Chinese context

Also Published As

Publication number Publication date
CN1116647C (en) 2003-07-30

Similar Documents

Publication Publication Date Title
US5873111A (en) Method and system for collation in a processing system of a variety of distinct sets of information
US7711542B2 (en) System and method for multilanguage text input in a handheld electronic device
CN102016837A (en) System and method for classification and retrieval of Chinese-type characters and character components
CN1008016B (en) Imput process system
CA2579052C (en) Multi language text input in a handheld electronic device
CN1095560C (en) Kanji conversion result amending system
CN101770291B (en) Semantic analysis data hashing storage and analysis methods for input system
CN1116647C (en) Chinese character search method using decoding
CN1464430A (en) System for distinguishing organization names in Asian language writing system
CN101021828A (en) Chinese electronic big dictionary
CN100476826C (en) Chinese character ordering searching method and device and one information system
CN102722527B (en) Full-text search method supporting search request containing missing symbols
CN1274883A (en) Simplified spelling-touching screen mouse Chinese character input method
CN1679023A (en) Method and system of creating and using chinese language data and user-corrected data
TWI230341B (en) Kanji searching method using codes
CN1269542A (en) Association Chinese character input system
CN1275127C (en) Chinese characters input method according to stroke sequence and keyboard thereof
CN1052200A (en) Pronunciation-form-meaning words encode series with compatibility and keyboard
CN1975643A (en) Interconnected network address searching device
CN1058342C (en) Chinese character byte codes and its keyboard of using the same
CN102103610A (en) Method and device for retrieving and processing information
CN1318784A (en) 'One character plus four strokes' Chinese word hand-written input method
Suseela et al. Unicode Applications in the Digital Libraries of India
JPH07129565A (en) Information processor
CN111142682A (en) Chinese character five-stroke member coding computer spelling input method

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee