CN1719390A - Character searching and logging method for obtaining rarely used word using Chinese character full information and full text retrieval - Google Patents

Character searching and logging method for obtaining rarely used word using Chinese character full information and full text retrieval Download PDF

Info

Publication number
CN1719390A
CN1719390A CN 200510012219 CN200510012219A CN1719390A CN 1719390 A CN1719390 A CN 1719390A CN 200510012219 CN200510012219 CN 200510012219 CN 200510012219 A CN200510012219 A CN 200510012219A CN 1719390 A CN1719390 A CN 1719390A
Authority
CN
China
Prior art keywords
chinese character
character
full
information
searching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200510012219
Other languages
Chinese (zh)
Inventor
钱则侃
王宏源
赵锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN 200510012219 priority Critical patent/CN1719390A/en
Publication of CN1719390A publication Critical patent/CN1719390A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The present invention provides a method for indexing and logging new and rare Chinese character by utilizing total information of Chinese characters and adopting full-text searching mode. The described total information of Chinese character includes its character pattern, character phonetic letters, character meaning, number of strokes and code, etc.

Description

Utilize Chinese character full information to obtain the Character searching and logging method of rarely used word by full-text search
Technical field
The invention belongs to computer Chinese-character input method technical field, be specifically related to a kind of general Character searching and logging method that utilizes Chinese character full information to obtain rare Chinese character by the full-text search mode.
Background technology
China is as one of four big ancient civilized countries, the world, and the cultural deposits that 5,000 years long development of civilization histories arranged and come down in a continuous line, and Chinese character then are the fundamental elements of the whole Chinese civilization of succession.Chinese character is through invention and the development in thousands of years, and the total quantity that is handed down at present is about about 70,000, and wherein Chinese characters in common use nearly three, 4,000.Through years of researches and the reasonable computing machine typing problem that has solved Chinese characters in common use of practice.But the typing to a large amount of rare Chinese characters never has solution preferably.Even if the huge rare Chinese character of these quantity is for well-educated crowd, also few people's understanding and use, its end user's faciation is added present computing machine and still can't be handled this class Chinese character when narrow, and rare Chinese character is in the state of " dead word " substantially.In fact, present international character coding standard Unicode has been put into the coding of about 50,000 rare Chinese characters the slab code area of four bytes, to solve the difficult problem of two byte code sign indicating number position deficiencies, and Microsoft has also installed the super large character set of sum near 70,000 Chinese characters many years ago in advance in common software MS Word, but these never cause compatriots' attention.A uncared-for fact is: these rare Chinese characters but are present in the ancient times of China in the ancient books and records in a large number, constitute the carrier of " Chinese character " this Chinese civilization with Chinese characters in common use, itself also are the ingredients of the cultural heritage of national precious.In general ancient books, have millesimal Chinese character to belong to the rare Chinese character of four bytes approximately, and in such as wordbook class ancient books such as " origin of Chinese character ", " 42-volume Chinese dictionary compiled during the regin of Kang Xi in the Qing Dynastys ", the frequency that the Chinese character of four byte codes occurs can be up to about 3 percent to five.Obviously, give up these rarely used words and the Chinese knowledge base of construction definitely is incomplete.
At present, the with good grounds nothing but pronunciation of main Chinese character input method input and according to two kinds of methods of character pattern input, however for rare Chinese character, these two kinds of input methods all have certain deficiency.At first for the pronunciation input method, basically all rare Chinese characters most people all be not familiar with fully, adding that a lot of words have lost its pronunciation in the process of the evolution of history, or pronunciation being uncertain, is not all right according to pronunciation input rare Chinese character only.Secondly, there is not problem though utilize font that rare Chinese character is carried out typing as the pronunciation input method, yet present existing character pattern input method, as " the Five-stroke Method " input method etc., often need the user to learn for a long time in advance and train, this restrictive condition makes character-shape input method not have the advantage influence power at present in non-typist crowd.Domestic consumer uses the researchist of rare Chinese character in a large number as needs, can't this method of very fast grasp.
In the face of more than 3000 to 10,000 common Chinese character, the input method that Chinese used 23 years has not had the space of big breakthrough.But, handle 70,000 Chinese characters, to such an extent as to more Chinese character to be processed from now on or character, common input method manifests many disadvantages and deficiency.In the face of the outwardness of so super chaos, the brand-new input method of exigence one cover notion.
Summary of the invention
The present invention is directed to the deficiency and the defective that on the rare Chinese character input method, exist at present, generally rare Chinese character is understood the infull fact of information at people, the characteristic that combines by simple Chinese character at the difficult point and the rare Chinese character polyphyly of rare Chinese character typing, a kind of Chinese character full information that utilizes is proposed, promptly utilize the pairing font of each Chinese character, word sound (if there is), meaning of word (if there is), stroke number, public coding objective characteristics such as (as the 5-stroke codings of the certain correspondence that exists of rare Chinese character), realize the method for rare Chinese character typing.Because in the process of Historical Evolution, these features have formed fixing standard, by gathering these objective characteristics, utilize the mode of full-text search to seek the Chinese character that all and user's input feature vector are complementary again, and then finish the typing of rare Chinese character.This method can satisfy the demand of the research of rare Chinese character user special group, typing rare Chinese character.
The Chinese character full information that utilizes according to the present invention obtains the general Character searching and logging method of rare Chinese character by the full-text search mode, and concrete steps comprise:
(1) rare Chinese character is put in order, obtained the rare Chinese character character set that this input method need be handled;
(2) this rare Chinese character collection is carried out the perfect information arrangement according to the objective characteristics of each Chinese character, and set up database;
When (3) user imports, the rare Chinese character of wanting typing is described, utilizes the method for full-text search that the perfect information of each Chinese character in the rare Chinese character character set is carried out full-text search with phonetic or digital mode;
(4) utilize the numerical information of Chinese-character stroke that the searching sort result is exported.
Above step has promptly realized the general searching typing of rare Chinese character, and the user is as long as from the result of searching ordering output, select and want the Chinese character of typing to get final product.
Characteristics of the present invention:
Have a rare Chinese character input method of retrieving character at what the use crowd's of rare Chinese character feature provided;
Utilize the full detail that exists in the rare Chinese character, comprise information such as font, the meaning of word, word sound and stroke number, carry out full-text search, by the full-text search of rare Chinese character being finished the typing of rare Chinese character.
Can compatible succinct input method, such as spelling input method or " the Five-stroke Method " input method.
Description of drawings
Below in conjunction with accompanying drawing the present invention is illustrated in further detail:
Fig. 1 is the FB(flow block) for the general Character searching and logging method of rare Chinese character of the present invention.
Embodiment
Below with reference to accompanying drawing of the present invention, most preferred embodiment of the present invention is described in more detail.
Be illustrated in figure 1 as the FB(flow block) of the general Character searching and logging method of rare Chinese character of the present invention, specifically may further comprise the steps according to the general Character searching and logging method of rare Chinese character of the present invention:
(1) rare Chinese character is put in order, obtained the rare Chinese character character set that this input method need be handled; Can adopt Unicode to manage to the rare Chinese character character set; Here the rarely used word of indication mainly is the Chinese character outside GB GB2312-80 " Chinese Character Set Code for Informati " baseset.
(2) this rare Chinese character collection is carried out the perfect information arrangement according to the objective characteristics of each Chinese character, and set up database; These information can adopt phonetic or digital mode to represent, also can adopt other forms of coded representation; Here the Chinese character full information of indication comprises information such as the meaning of word, font, stroke, word sound, public coding.
The arrangement of perfect information comprises the collection of the meaning of word, font, stroke, word sound and public coded message.
To the collection of meaning of word information, the information that we have utilized the relevant information in the wordbook in ancient times such as " origin of Chinese character ", " 42-volume Chinese dictionary compiled during the regin of Kang Xi in the Qing Dynasty " that this rare Chinese character collection is word for word put in order the meaning of word;
To the collection of pronunciation information, not only comprise the pinyin pronunciation of information (if there is) of rare Chinese character, also comprise pronunciation information (if there is) in ancient times such as its archaic Chinese phonology, ancient Chinese phonology;
To the collection of font information, can put stroke number information (if there is) outside total stroke number information of each rare Chinese character and the portion in order, the font according to each rare Chinese character splits and text description rare Chinese character simultaneously.Constitute by the mode of common simple Chinese character because rare Chinese character is many, perhaps obtain, therefore can describe rare Chinese character by these common simple Chinese characters by common simple Chinese character increase and decrease stroke by combination.The minority rare Chinese character is the non-combination Chinese character of few stroke, can utilize the basic Chinese characters stroke for these words, and for example point, horizontal stroke, left-falling stroke, right-falling stroke, folding etc. cooperate the Chinese-character stroke number, solve the typing problem of the non-combination phonogram of few stroke in the rare Chinese character.
The public coding that some are commonly used as the pairing 5-stroke coding of rare Chinese character, is also included in the perfect information, and the 5-stroke coding of the rare Chinese character that the user who is familiar with five-stroke character input method also can be by importing the typing of wanting is realized the typing of Chinese character.
All these information will adopt phonetic or digital mode to represent, and input perfect information database.
When (3) user imports, the rare Chinese character of wanting typing is described, utilizes the method for full-text search that the perfect information of each Chinese character in the rare Chinese character character set is carried out full-text search with phonetic or digital mode.
(4) utilize the numerical information of Chinese-character stroke that the searching sort result is exported.
Above step has promptly realized the general searching typing of rare Chinese character, and the user is as long as from the result of searching ordering output, select and want the Chinese character of typing to get final product.
The rare Chinese character input method that utilizes Chinese character full information of the present invention mainly solves the Chinese characters in computer typing problem outside GB GB2312-80 " Chinese Character Set Code for Informati " baseset.For each Chinese character, in the process of its evolution, had fixing font, the meaning of word, most Chinese characters also have fixing one or more pronunciations, these fixed character and just constituted Chinese character full information.For rare Chinese character, it is incomplete that the user understands these objective informations, and the advantage of perfect information input method just is that the user can utilize one or more information about importing Chinese character known to him to realize the input of Chinese character to greatest extent.The user only need import any information about the typing Chinese character known to it, can finish the typing of this Chinese character.The information of input is many more, and Chinese character to be selected can be few more.
Below in conjunction with the searching typing of some concrete rarely used words, further specify the specific implementation of the inventive method:
The structure of for example setting the Chinese character full information table is: { prefix: word sound; The meaning of word; Font; Stroke number; 5-stroke coding }.
Then the perfect information of following rare Chinese character is respectively:
{
Figure A20051001221900061
: w é i; Ancient Chinese prose is; Pie is mortar anyhow; 12; Wnnv}
{
Figure A20051001221900062
: w é i; Ancient Chinese prose is; Pawl three; 8; Endf, end}
{
Figure A20051001221900063
:? Ask the heart; 11; (pronunciation is not clear, and the meaning of word is not clear for fiyn}.)
When utilizing search engine specifically to handle, need in advance perfect information to be simplified, comprise simplification to content, with to basic stroke, point (,=d), horizontal (one=h), perpendicular (Shu=s), cast aside (Pie=p), press down (=n), folding (=z, roll over downwards) and second (y, folding upwards) simplification adopts phonetic or digital mode to represent and input database then.Above-mentioned perfect information becomes after processing:
{
Figure A20051001221900064
:wei;wei;pie?p?zhe?z?zhe?z?heng?h?shu?sjiu;12;wnnv}
{
Figure A20051001221900065
:wei;wei;zhua?zhe?z?san;8;endf,end}
{
Figure A20051001221900066
:? Qiu xin; 11; (pronunciation is not clear, and the meaning of word is not clear for fiyn}.)
The user in the typing Chinese character can according to own to want that the typing Chinese character information understands how much, arbitrarily known to the input about one or more information of this Chinese character, program will be according to the Chinese character information of user's input, utilize the method for full-text search that the perfect information of each Chinese character in the rare Chinese character character set is carried out full-text search, seek the rare Chinese character that the Chinese character information of all and user input is complementary, and the searching sort result is exported according to the information of Chinese-character stroke.The user seeks own needed rare Chinese character in the searching result, thereby finishes the typing of rare Chinese character.Because what adopt is the method for full-text search, the Chinese character information of user's input does not need to follow certain sequencing, for the input of font information also can only comprise Chinese character part shape information or with the alike shape information of input Chinese character, therefore have very big dirigibility.
For example in above-mentioned example, the user wants input Can be by input its word sound " wei " or the meaning of word " wei " or stroke number " 8 " or all or part of font description as " san ", " zhua san ", " zhua zhe san " or 5-stroke coding " endf ", " end " or above-mentioned information arbitrary combination such as word sound+font " wei zhua zhesan ", the stroke number+5-stroke coding+meaning of word " 8 endfwei " are finished the input of this Chinese character.Open with the space lattice between the information of combination.Certainly, the information of user's input is many more, and last Chinese character to be selected will be few more,
Advantage of the present invention and technique effect:
The crowd that uses of rare Chinese character is the researchist mostly, does not possess the ability of use such as font input methods such as " the Five-stroke Methods ".Adopt the Chinese character typing mode of Chinese character full information, will be possible to use information as much as possible and adopt the mode of retrieving to finish the typing of rare Chinese character.The advantage of this method do not need to be study, does not also have radical, and the Hanzi features of user's input does not need to follow certain sequencing, therefore has very big dirigibility.
Although disclose specific embodiments of the invention and accompanying drawing for the purpose of illustration, its purpose is to help to understand content of the present invention and implement according to this, but it will be appreciated by those skilled in the art that: without departing from the spirit and scope of the invention and the appended claims, various replacements, variation and modification all are possible.Therefore, the present invention should not be limited to most preferred embodiment and the disclosed content of accompanying drawing.

Claims (5)

1. one kind is utilized Chinese character full information to obtain the Character searching and logging method of rarely used word by full-text search, specifically may further comprise the steps:
1) rare Chinese character is put in order, obtained the rare Chinese character character set that this input method need be handled;
2) this rare Chinese character collection is carried out the perfect information arrangement according to the objective characteristics of each Chinese character, and set up database;
When 3) user imports, the rare Chinese character of wanting typing is described, utilizes the method for full-text search that the perfect information of each Chinese character in the rare Chinese character character set is carried out full-text search with phonetic or digital mode;
4) utilize the numerical information of Chinese-character stroke that the searching sort result is exported.
2. the Chinese character full information that utilizes as claimed in claim 1 obtains the Character searching and logging method of rarely used word by full-text search, it is characterized in that: the rare Chinese character character set is managed with Unicode.
3. the Chinese character full information that utilizes as claimed in claim 1 obtains the Character searching and logging method of rarely used word by full-text search, and it is characterized in that: described rarely used word mainly is meant the Chinese character outside GB GB2312-80 " Chinese Character Set Code for Informati " baseset.
4. the Chinese character full information that utilizes as claimed in claim 1 obtains the Character searching and logging method of rarely used word by full-text search, and it is characterized in that: described Chinese character full information comprises the meaning of word, font, stroke, word sound, public coded message.
5. the Chinese character full information that utilizes as claimed in claim 1 obtains the Character searching and logging method of rarely used word by full-text search, it is characterized in that: further, the user is as long as from the result of searching ordering output, select and want the Chinese character of typing to get final product.
CN 200510012219 2005-07-18 2005-07-18 Character searching and logging method for obtaining rarely used word using Chinese character full information and full text retrieval Pending CN1719390A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200510012219 CN1719390A (en) 2005-07-18 2005-07-18 Character searching and logging method for obtaining rarely used word using Chinese character full information and full text retrieval

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200510012219 CN1719390A (en) 2005-07-18 2005-07-18 Character searching and logging method for obtaining rarely used word using Chinese character full information and full text retrieval

Publications (1)

Publication Number Publication Date
CN1719390A true CN1719390A (en) 2006-01-11

Family

ID=35931237

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200510012219 Pending CN1719390A (en) 2005-07-18 2005-07-18 Character searching and logging method for obtaining rarely used word using Chinese character full information and full text retrieval

Country Status (1)

Country Link
CN (1) CN1719390A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102566769A (en) * 2010-12-13 2012-07-11 腾讯科技(深圳)有限公司 Chinese character input method and Chinese character input system
CN102736741A (en) * 2011-04-12 2012-10-17 腾讯科技(深圳)有限公司 Pinyin input method and system of Chinese characters
CN105069171A (en) * 2015-08-31 2015-11-18 百度在线网络技术(北京)有限公司 Chinese character query method and system
CN105425976A (en) * 2015-06-11 2016-03-23 周连惠 Rarely-used Chinese character input method
CN110413810A (en) * 2019-07-31 2019-11-05 中国工商银行股份有限公司 Uncommon word processing method and system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102566769A (en) * 2010-12-13 2012-07-11 腾讯科技(深圳)有限公司 Chinese character input method and Chinese character input system
CN102566769B (en) * 2010-12-13 2015-11-25 深圳市世纪光速信息技术有限公司 Chinese character input method and system
CN102736741A (en) * 2011-04-12 2012-10-17 腾讯科技(深圳)有限公司 Pinyin input method and system of Chinese characters
CN105425976A (en) * 2015-06-11 2016-03-23 周连惠 Rarely-used Chinese character input method
CN105069171A (en) * 2015-08-31 2015-11-18 百度在线网络技术(北京)有限公司 Chinese character query method and system
CN105069171B (en) * 2015-08-31 2018-07-13 百度在线网络技术(北京)有限公司 Chinese character inquiry method and system
CN110413810A (en) * 2019-07-31 2019-11-05 中国工商银行股份有限公司 Uncommon word processing method and system

Similar Documents

Publication Publication Date Title
CN1719390A (en) Character searching and logging method for obtaining rarely used word using Chinese character full information and full text retrieval
CN1841281A (en) Chinese phonetic transcription input method using digital code for mobile phone
CN1510554A (en) Embedded applied Chinese character inputting method
CN87100555A (en) Double stroke-order Chinese character input scheme of computer and keyboard thereof
CN1177271C (en) Four-stroke number code input method for characters and words and without duplication code and its keyboard
CN1306376C (en) Treatng and input method for graphics of Naxi figure and character
CN1275127C (en) Chinese characters input method according to stroke sequence and keyboard thereof
CN1208712C (en) <<Chinese character structure> input method>
CN1050915C (en) Indication method for computer inputting Chinese characters
CN1195262C (en) Method for inputting Chinese characters by numeral keys
CN1029046C (en) Chinese character radicals and strokes input method
CN1673935A (en) Jiaguwen (inscriptions on bones or tortoise shells of the Shang Dynasty) computer inputting method
CN2476059Y (en) Keyboard for Jiang code input method
CN1043381C (en) Four-stroke digit look-up method for Chinese characters
CN1425975A (en) Stroke digital Chinese character input method
CN100384196C (en) Handset Chinese <<I type>> input method
CN1100288C (en) Four-stroke sequential syllable Chinese character coding method
CN1811779A (en) Retrieve method for using unicode super-large-scale character set containing four bits in system containing personal name and place name
CN1445644A (en) Digitalization method to express Chinese characters and its keyboard
CN1178344A (en) Four tone inputting method for Chinese characters
CN1165996A (en) Free Chinese character input method and its keyboard
CN101556505A (en) Chinese character input method, small-sized numerical keyboard and Chinese character input system
CN1259696A (en) Six key digital Chinese character coding input method and keyboard
CN1095502A (en) Character spectrum Chinese character coding method (Yan Di and Huang Di, two legendary rulers of remote antiquity's sign indicating number) and keyboard thereof
CN1202044A (en) Method for coding and checking Chinese characters by ten-stroke order

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication