CN1719390A - Character searching and logging method for obtaining rarely used word using Chinese character full information and full text retrieval - Google Patents
Character searching and logging method for obtaining rarely used word using Chinese character full information and full text retrieval Download PDFInfo
- Publication number
- CN1719390A CN1719390A CN 200510012219 CN200510012219A CN1719390A CN 1719390 A CN1719390 A CN 1719390A CN 200510012219 CN200510012219 CN 200510012219 CN 200510012219 A CN200510012219 A CN 200510012219A CN 1719390 A CN1719390 A CN 1719390A
- Authority
- CN
- China
- Prior art keywords
- chinese character
- character
- full
- information
- searching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Document Processing Apparatus (AREA)
Abstract
The present invention provides a method for indexing and logging new and rare Chinese character by utilizing total information of Chinese characters and adopting full-text searching mode. The described total information of Chinese character includes its character pattern, character phonetic letters, character meaning, number of strokes and code, etc.
Description
Technical field
The invention belongs to computer Chinese-character input method technical field, be specifically related to a kind of general Character searching and logging method that utilizes Chinese character full information to obtain rare Chinese character by the full-text search mode.
Background technology
China is as one of four big ancient civilized countries, the world, and the cultural deposits that 5,000 years long development of civilization histories arranged and come down in a continuous line, and Chinese character then are the fundamental elements of the whole Chinese civilization of succession.Chinese character is through invention and the development in thousands of years, and the total quantity that is handed down at present is about about 70,000, and wherein Chinese characters in common use nearly three, 4,000.Through years of researches and the reasonable computing machine typing problem that has solved Chinese characters in common use of practice.But the typing to a large amount of rare Chinese characters never has solution preferably.Even if the huge rare Chinese character of these quantity is for well-educated crowd, also few people's understanding and use, its end user's faciation is added present computing machine and still can't be handled this class Chinese character when narrow, and rare Chinese character is in the state of " dead word " substantially.In fact, present international character coding standard Unicode has been put into the coding of about 50,000 rare Chinese characters the slab code area of four bytes, to solve the difficult problem of two byte code sign indicating number position deficiencies, and Microsoft has also installed the super large character set of sum near 70,000 Chinese characters many years ago in advance in common software MS Word, but these never cause compatriots' attention.A uncared-for fact is: these rare Chinese characters but are present in the ancient times of China in the ancient books and records in a large number, constitute the carrier of " Chinese character " this Chinese civilization with Chinese characters in common use, itself also are the ingredients of the cultural heritage of national precious.In general ancient books, have millesimal Chinese character to belong to the rare Chinese character of four bytes approximately, and in such as wordbook class ancient books such as " origin of Chinese character ", " 42-volume Chinese dictionary compiled during the regin of Kang Xi in the Qing Dynastys ", the frequency that the Chinese character of four byte codes occurs can be up to about 3 percent to five.Obviously, give up these rarely used words and the Chinese knowledge base of construction definitely is incomplete.
At present, the with good grounds nothing but pronunciation of main Chinese character input method input and according to two kinds of methods of character pattern input, however for rare Chinese character, these two kinds of input methods all have certain deficiency.At first for the pronunciation input method, basically all rare Chinese characters most people all be not familiar with fully, adding that a lot of words have lost its pronunciation in the process of the evolution of history, or pronunciation being uncertain, is not all right according to pronunciation input rare Chinese character only.Secondly, there is not problem though utilize font that rare Chinese character is carried out typing as the pronunciation input method, yet present existing character pattern input method, as " the Five-stroke Method " input method etc., often need the user to learn for a long time in advance and train, this restrictive condition makes character-shape input method not have the advantage influence power at present in non-typist crowd.Domestic consumer uses the researchist of rare Chinese character in a large number as needs, can't this method of very fast grasp.
In the face of more than 3000 to 10,000 common Chinese character, the input method that Chinese used 23 years has not had the space of big breakthrough.But, handle 70,000 Chinese characters, to such an extent as to more Chinese character to be processed from now on or character, common input method manifests many disadvantages and deficiency.In the face of the outwardness of so super chaos, the brand-new input method of exigence one cover notion.
Summary of the invention
The present invention is directed to the deficiency and the defective that on the rare Chinese character input method, exist at present, generally rare Chinese character is understood the infull fact of information at people, the characteristic that combines by simple Chinese character at the difficult point and the rare Chinese character polyphyly of rare Chinese character typing, a kind of Chinese character full information that utilizes is proposed, promptly utilize the pairing font of each Chinese character, word sound (if there is), meaning of word (if there is), stroke number, public coding objective characteristics such as (as the 5-stroke codings of the certain correspondence that exists of rare Chinese character), realize the method for rare Chinese character typing.Because in the process of Historical Evolution, these features have formed fixing standard, by gathering these objective characteristics, utilize the mode of full-text search to seek the Chinese character that all and user's input feature vector are complementary again, and then finish the typing of rare Chinese character.This method can satisfy the demand of the research of rare Chinese character user special group, typing rare Chinese character.
The Chinese character full information that utilizes according to the present invention obtains the general Character searching and logging method of rare Chinese character by the full-text search mode, and concrete steps comprise:
(1) rare Chinese character is put in order, obtained the rare Chinese character character set that this input method need be handled;
(2) this rare Chinese character collection is carried out the perfect information arrangement according to the objective characteristics of each Chinese character, and set up database;
When (3) user imports, the rare Chinese character of wanting typing is described, utilizes the method for full-text search that the perfect information of each Chinese character in the rare Chinese character character set is carried out full-text search with phonetic or digital mode;
(4) utilize the numerical information of Chinese-character stroke that the searching sort result is exported.
Above step has promptly realized the general searching typing of rare Chinese character, and the user is as long as from the result of searching ordering output, select and want the Chinese character of typing to get final product.
Characteristics of the present invention:
Have a rare Chinese character input method of retrieving character at what the use crowd's of rare Chinese character feature provided;
Utilize the full detail that exists in the rare Chinese character, comprise information such as font, the meaning of word, word sound and stroke number, carry out full-text search, by the full-text search of rare Chinese character being finished the typing of rare Chinese character.
Can compatible succinct input method, such as spelling input method or " the Five-stroke Method " input method.
Description of drawings
Below in conjunction with accompanying drawing the present invention is illustrated in further detail:
Fig. 1 is the FB(flow block) for the general Character searching and logging method of rare Chinese character of the present invention.
Embodiment
Below with reference to accompanying drawing of the present invention, most preferred embodiment of the present invention is described in more detail.
Be illustrated in figure 1 as the FB(flow block) of the general Character searching and logging method of rare Chinese character of the present invention, specifically may further comprise the steps according to the general Character searching and logging method of rare Chinese character of the present invention:
(1) rare Chinese character is put in order, obtained the rare Chinese character character set that this input method need be handled; Can adopt Unicode to manage to the rare Chinese character character set; Here the rarely used word of indication mainly is the Chinese character outside GB GB2312-80 " Chinese Character Set Code for Informati " baseset.
(2) this rare Chinese character collection is carried out the perfect information arrangement according to the objective characteristics of each Chinese character, and set up database; These information can adopt phonetic or digital mode to represent, also can adopt other forms of coded representation; Here the Chinese character full information of indication comprises information such as the meaning of word, font, stroke, word sound, public coding.
The arrangement of perfect information comprises the collection of the meaning of word, font, stroke, word sound and public coded message.
To the collection of meaning of word information, the information that we have utilized the relevant information in the wordbook in ancient times such as " origin of Chinese character ", " 42-volume Chinese dictionary compiled during the regin of Kang Xi in the Qing Dynasty " that this rare Chinese character collection is word for word put in order the meaning of word;
To the collection of pronunciation information, not only comprise the pinyin pronunciation of information (if there is) of rare Chinese character, also comprise pronunciation information (if there is) in ancient times such as its archaic Chinese phonology, ancient Chinese phonology;
To the collection of font information, can put stroke number information (if there is) outside total stroke number information of each rare Chinese character and the portion in order, the font according to each rare Chinese character splits and text description rare Chinese character simultaneously.Constitute by the mode of common simple Chinese character because rare Chinese character is many, perhaps obtain, therefore can describe rare Chinese character by these common simple Chinese characters by common simple Chinese character increase and decrease stroke by combination.The minority rare Chinese character is the non-combination Chinese character of few stroke, can utilize the basic Chinese characters stroke for these words, and for example point, horizontal stroke, left-falling stroke, right-falling stroke, folding etc. cooperate the Chinese-character stroke number, solve the typing problem of the non-combination phonogram of few stroke in the rare Chinese character.
The public coding that some are commonly used as the pairing 5-stroke coding of rare Chinese character, is also included in the perfect information, and the 5-stroke coding of the rare Chinese character that the user who is familiar with five-stroke character input method also can be by importing the typing of wanting is realized the typing of Chinese character.
All these information will adopt phonetic or digital mode to represent, and input perfect information database.
When (3) user imports, the rare Chinese character of wanting typing is described, utilizes the method for full-text search that the perfect information of each Chinese character in the rare Chinese character character set is carried out full-text search with phonetic or digital mode.
(4) utilize the numerical information of Chinese-character stroke that the searching sort result is exported.
Above step has promptly realized the general searching typing of rare Chinese character, and the user is as long as from the result of searching ordering output, select and want the Chinese character of typing to get final product.
The rare Chinese character input method that utilizes Chinese character full information of the present invention mainly solves the Chinese characters in computer typing problem outside GB GB2312-80 " Chinese Character Set Code for Informati " baseset.For each Chinese character, in the process of its evolution, had fixing font, the meaning of word, most Chinese characters also have fixing one or more pronunciations, these fixed character and just constituted Chinese character full information.For rare Chinese character, it is incomplete that the user understands these objective informations, and the advantage of perfect information input method just is that the user can utilize one or more information about importing Chinese character known to him to realize the input of Chinese character to greatest extent.The user only need import any information about the typing Chinese character known to it, can finish the typing of this Chinese character.The information of input is many more, and Chinese character to be selected can be few more.
Below in conjunction with the searching typing of some concrete rarely used words, further specify the specific implementation of the inventive method:
The structure of for example setting the Chinese character full information table is: { prefix: word sound; The meaning of word; Font; Stroke number; 5-stroke coding }.
Then the perfect information of following rare Chinese character is respectively:
{
:? Ask the heart; 11; (pronunciation is not clear, and the meaning of word is not clear for fiyn}.)
When utilizing search engine specifically to handle, need in advance perfect information to be simplified, comprise simplification to content, with to basic stroke, point (,=d), horizontal (one=h), perpendicular (Shu=s), cast aside (Pie=p), press down (=n), folding (=z, roll over downwards) and second (y, folding upwards) simplification adopts phonetic or digital mode to represent and input database then.Above-mentioned perfect information becomes after processing:
The user in the typing Chinese character can according to own to want that the typing Chinese character information understands how much, arbitrarily known to the input about one or more information of this Chinese character, program will be according to the Chinese character information of user's input, utilize the method for full-text search that the perfect information of each Chinese character in the rare Chinese character character set is carried out full-text search, seek the rare Chinese character that the Chinese character information of all and user input is complementary, and the searching sort result is exported according to the information of Chinese-character stroke.The user seeks own needed rare Chinese character in the searching result, thereby finishes the typing of rare Chinese character.Because what adopt is the method for full-text search, the Chinese character information of user's input does not need to follow certain sequencing, for the input of font information also can only comprise Chinese character part shape information or with the alike shape information of input Chinese character, therefore have very big dirigibility.
For example in above-mentioned example, the user wants input
Can be by input its word sound " wei " or the meaning of word " wei " or stroke number " 8 " or all or part of font description as " san ", " zhua san ", " zhua zhe san " or 5-stroke coding " endf ", " end " or above-mentioned information arbitrary combination such as word sound+font " wei zhua zhesan ", the stroke number+5-stroke coding+meaning of word " 8 endfwei " are finished the input of this Chinese character.Open with the space lattice between the information of combination.Certainly, the information of user's input is many more, and last Chinese character to be selected will be few more,
Advantage of the present invention and technique effect:
The crowd that uses of rare Chinese character is the researchist mostly, does not possess the ability of use such as font input methods such as " the Five-stroke Methods ".Adopt the Chinese character typing mode of Chinese character full information, will be possible to use information as much as possible and adopt the mode of retrieving to finish the typing of rare Chinese character.The advantage of this method do not need to be study, does not also have radical, and the Hanzi features of user's input does not need to follow certain sequencing, therefore has very big dirigibility.
Although disclose specific embodiments of the invention and accompanying drawing for the purpose of illustration, its purpose is to help to understand content of the present invention and implement according to this, but it will be appreciated by those skilled in the art that: without departing from the spirit and scope of the invention and the appended claims, various replacements, variation and modification all are possible.Therefore, the present invention should not be limited to most preferred embodiment and the disclosed content of accompanying drawing.
Claims (5)
1. one kind is utilized Chinese character full information to obtain the Character searching and logging method of rarely used word by full-text search, specifically may further comprise the steps:
1) rare Chinese character is put in order, obtained the rare Chinese character character set that this input method need be handled;
2) this rare Chinese character collection is carried out the perfect information arrangement according to the objective characteristics of each Chinese character, and set up database;
When 3) user imports, the rare Chinese character of wanting typing is described, utilizes the method for full-text search that the perfect information of each Chinese character in the rare Chinese character character set is carried out full-text search with phonetic or digital mode;
4) utilize the numerical information of Chinese-character stroke that the searching sort result is exported.
2. the Chinese character full information that utilizes as claimed in claim 1 obtains the Character searching and logging method of rarely used word by full-text search, it is characterized in that: the rare Chinese character character set is managed with Unicode.
3. the Chinese character full information that utilizes as claimed in claim 1 obtains the Character searching and logging method of rarely used word by full-text search, and it is characterized in that: described rarely used word mainly is meant the Chinese character outside GB GB2312-80 " Chinese Character Set Code for Informati " baseset.
4. the Chinese character full information that utilizes as claimed in claim 1 obtains the Character searching and logging method of rarely used word by full-text search, and it is characterized in that: described Chinese character full information comprises the meaning of word, font, stroke, word sound, public coded message.
5. the Chinese character full information that utilizes as claimed in claim 1 obtains the Character searching and logging method of rarely used word by full-text search, it is characterized in that: further, the user is as long as from the result of searching ordering output, select and want the Chinese character of typing to get final product.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200510012219 CN1719390A (en) | 2005-07-18 | 2005-07-18 | Character searching and logging method for obtaining rarely used word using Chinese character full information and full text retrieval |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200510012219 CN1719390A (en) | 2005-07-18 | 2005-07-18 | Character searching and logging method for obtaining rarely used word using Chinese character full information and full text retrieval |
Publications (1)
Publication Number | Publication Date |
---|---|
CN1719390A true CN1719390A (en) | 2006-01-11 |
Family
ID=35931237
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 200510012219 Pending CN1719390A (en) | 2005-07-18 | 2005-07-18 | Character searching and logging method for obtaining rarely used word using Chinese character full information and full text retrieval |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1719390A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102566769A (en) * | 2010-12-13 | 2012-07-11 | 腾讯科技(深圳)有限公司 | Chinese character input method and Chinese character input system |
CN102736741A (en) * | 2011-04-12 | 2012-10-17 | 腾讯科技(深圳)有限公司 | Pinyin input method and system of Chinese characters |
CN105069171A (en) * | 2015-08-31 | 2015-11-18 | 百度在线网络技术(北京)有限公司 | Chinese character query method and system |
CN105425976A (en) * | 2015-06-11 | 2016-03-23 | 周连惠 | Rarely-used Chinese character input method |
CN110413810A (en) * | 2019-07-31 | 2019-11-05 | 中国工商银行股份有限公司 | Uncommon word processing method and system |
-
2005
- 2005-07-18 CN CN 200510012219 patent/CN1719390A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102566769A (en) * | 2010-12-13 | 2012-07-11 | 腾讯科技(深圳)有限公司 | Chinese character input method and Chinese character input system |
CN102566769B (en) * | 2010-12-13 | 2015-11-25 | 深圳市世纪光速信息技术有限公司 | Chinese character input method and system |
CN102736741A (en) * | 2011-04-12 | 2012-10-17 | 腾讯科技(深圳)有限公司 | Pinyin input method and system of Chinese characters |
CN105425976A (en) * | 2015-06-11 | 2016-03-23 | 周连惠 | Rarely-used Chinese character input method |
CN105069171A (en) * | 2015-08-31 | 2015-11-18 | 百度在线网络技术(北京)有限公司 | Chinese character query method and system |
CN105069171B (en) * | 2015-08-31 | 2018-07-13 | 百度在线网络技术(北京)有限公司 | Chinese character inquiry method and system |
CN110413810A (en) * | 2019-07-31 | 2019-11-05 | 中国工商银行股份有限公司 | Uncommon word processing method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1719390A (en) | Character searching and logging method for obtaining rarely used word using Chinese character full information and full text retrieval | |
CN1841281A (en) | Chinese phonetic transcription input method using digital code for mobile phone | |
CN1510554A (en) | Embedded applied Chinese character inputting method | |
CN87100555A (en) | Double stroke-order Chinese character input scheme of computer and keyboard thereof | |
CN1177271C (en) | Four-stroke number code input method for characters and words and without duplication code and its keyboard | |
CN1306376C (en) | Treatng and input method for graphics of Naxi figure and character | |
CN1275127C (en) | Chinese characters input method according to stroke sequence and keyboard thereof | |
CN1208712C (en) | <<Chinese character structure> input method> | |
CN1050915C (en) | Indication method for computer inputting Chinese characters | |
CN1195262C (en) | Method for inputting Chinese characters by numeral keys | |
CN1029046C (en) | Chinese character radicals and strokes input method | |
CN1673935A (en) | Jiaguwen (inscriptions on bones or tortoise shells of the Shang Dynasty) computer inputting method | |
CN2476059Y (en) | Keyboard for Jiang code input method | |
CN1043381C (en) | Four-stroke digit look-up method for Chinese characters | |
CN1425975A (en) | Stroke digital Chinese character input method | |
CN100384196C (en) | Handset Chinese <<I type>> input method | |
CN1100288C (en) | Four-stroke sequential syllable Chinese character coding method | |
CN1811779A (en) | Retrieve method for using unicode super-large-scale character set containing four bits in system containing personal name and place name | |
CN1445644A (en) | Digitalization method to express Chinese characters and its keyboard | |
CN1178344A (en) | Four tone inputting method for Chinese characters | |
CN1165996A (en) | Free Chinese character input method and its keyboard | |
CN101556505A (en) | Chinese character input method, small-sized numerical keyboard and Chinese character input system | |
CN1259696A (en) | Six key digital Chinese character coding input method and keyboard | |
CN1095502A (en) | Character spectrum Chinese character coding method (Yan Di and Huang Di, two legendary rulers of remote antiquity's sign indicating number) and keyboard thereof | |
CN1202044A (en) | Method for coding and checking Chinese characters by ten-stroke order |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |