CN101122905A - Method for associating classical book database with historical geographic information system for supporting four bytes - Google Patents

Method for associating classical book database with historical geographic information system for supporting four bytes Download PDF

Info

Publication number
CN101122905A
CN101122905A CNA2006100891656A CN200610089165A CN101122905A CN 101122905 A CN101122905 A CN 101122905A CN A2006100891656 A CNA2006100891656 A CN A2006100891656A CN 200610089165 A CN200610089165 A CN 200610089165A CN 101122905 A CN101122905 A CN 101122905A
Authority
CN
China
Prior art keywords
ideograph
ancient books
historical
unicode
historical geography
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2006100891656A
Other languages
Chinese (zh)
Inventor
张向辉
冯健康
王宏源
赵锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CNA2006100891656A priority Critical patent/CN101122905A/en
Publication of CN101122905A publication Critical patent/CN101122905A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a method for relating four-byte-supportive data of books and records with historical-geographical information. The method establishes a query index library for data of books and records and historical-geographical information by a search engine which supports unicode four-byte code East Asian ideograph. By mutually corresponding simplified and traditional forms, Chinese-Japanese-South-Korean words and variant characters of East Asia ideograph which contains unicode four-byte codes, by using conversion correspondence table to mutually correspond ancient and present popular names and alias of East Asia ideograph place names which contain unicode four-byte codes to converted key words, and by inquiring according to a certain inquiry logic, relevant record contents are obtained. The present invention well solves problems that prior book-and-record data can not match and relate with historical-geographical information because of unicode four-byte East Asia ideograph and that the unicode four-byte display, storage, query and search in the present historical-geographical information system may not be compatible with other different systems.

Description

A kind of ancient books and records database method related of supporting nybble with the historical geography infosystem
Technical field
The invention belongs to field of computer technology, be specifically related to a kind of ancient books and records database method related that realizes supporting nybble with the historical geography infosystem.
Background technology
The formal name used at school of Unicode (Unicode) is " Universal Multiple-Octet Coded CharacterSet ", abbreviates UCS as.UCS has stipulated how to use the concrete grammar of the various literal of a plurality of byte representations.Nearly more than 70,000 of the East Asia ideographic characters that has been encoded in UCS at present, wherein major part is a rare Chinese character.UCS has two kinds of form: UCS-2 and UCS-4.As its name suggests, UCS-2 uses two byte codes exactly, and UCS-4 is with 4 byte codes (in fact only used 31, most significant digit is necessary for 0).The transmission of these codings is carried out according to the standard that UTF (UCS Transformation Format) sets, and common UTF standard comprises UTF-7, UTF-8 and UTF-16.
The ancient books and records database is put contents such as literature of ancient book and unearthed document in order the also e-sourcing database of input computer.At present Beijing epoch vast hall scientific ﹠ technical corporation ancient books searching system (http://www.neohytung.com) and be exactly the ancient books and records database that can support the UCS-4 standard in Database Application " the vast hall of dragon language ancient books and records database " (http://www.dragoninfo.cn), can realize various functions such as inquiry, demonstration for the Chinese character more than 70,000 that comprises a large amount of rare Chinese characters at present.
Geographic Information System (GIS, Geographic Informat ion System) is a kind of computer based instrument, and it can be carried out to figure and analysis to thing and the event that exists on the earth.The GIS technology can integrate having the map of unique visualization effect and geography-analysis function and general database manipulation (for example inquiry and statistical study etc.).Along with progress of science and technology, the GIS technology is day by day ripe, historical data is imported GIS do not had technical obstacle, fully may by the achievement of the Internet-distributed GIS.
Historical geography infosystem (HGIS) is each period of history basis gis database, this system is according to the GIS technology, the basic geographical information space that expression changes along with timed transition distributes, and provides the most succinct data query, retrieval, compilation data map and the function that is connected user data for the user simultaneously.As import keywords such as historical time, place name, just can inquire about at special time the historical geography information of locality.
Relevant with China at present historical geography infosystem, as " space base plinth framework in the Chinese civilization " (the http://ccts.sinica.edu.tw of Taiwan Academia Sinica), it adopts the form storage of following the BIG5 standard and transmits these codings; And for example " Chinese history Geographic Information System " (the historical geography research centre http://yugong.fudan.edu.cn/Ichg/Chgis_Intr.asp of Fudan University), it adopts the form storage of following the GB2312 standard and transmits these codings.They all can only handle the inquiry and the demonstration of double-byte characters, and the content of text that contains nybble for all that meet Unicode (Unicode) standard can't be inquired about.These systems adopt picture to substitute for the Chinese character beyond the UCS-2 or the mode of coinage voluntarily shows, store with the coding that sets up on their own.
Contact is closely arranged between historical geography information and the Chinese ancient books and records data.On the one hand, the arrangement and the research of Chinese ancient books and records data need to rely on historical geography information, as: for the research of famous poet li po life experience, can be more effective accurate under the assistance of historical geography infosystem, time, place, personage can be presented to the inquiry simultaneously; On the other hand, the rich content of historical geography informational needs Chinese ancient books and records data replenishes, as: in the research of changing its course for the Yellow River, the description of in the ancient books and records data being changed its course in the Yellow River is corresponding with historical geography information and just connect and can reach a conclusion more easily and accurately, and the ancient books and records database can be the historical geography information research historical data support is provided.Owing to use the nybble rare Chinese character of Unicode (Unicode) coding mainly to appear in the proper nouns such as place name in ancient times, name, therefore historical geography infosystem and Chinese ancient books and records database are being carried out in the process of combination, needing to consider the problem of rare Chinese character.
The disposal route that existing historical information system uses coinage voluntarily or picture to substitute to the Chinese character beyond the UCS-2 usually.The method that picture substitutes only can be separated the demonstration problem of UCS-2 Chinese character in system by no means; Simultaneously, though existing historical information system uses the method for coinage can separate the demonstration problem of UCS-2 Chinese character by no means, can't be compatible with other system.Because the method for existing use picture or coinage does not meet any world or domestic standard, make the content of conventional historical geography infosystem can't outside system, use general browser to show fully and preservation, also can't realize the East Asia ideograph that comprises four byte code is searched for and inquired about, so that present system can't realize is normal between ancient books and records database and the historical geography infosystem, comprehensively related.
Summary of the invention
In view of the foregoing, fundamental purpose of the present invention provides a kind of Chinese ancient books and records database and the related each other method of historical geography infosystem of supporting nybble of realizing supporting nybble.
In order to retrieve to the content of text that contains the nybble literal, content of text in historical geography information and the ancient books and records data need be set up corresponding index, and proper noun information extractions such as all place names in them, name are come out to form the conversion corresponding tables.
The interrelational form of two systems is as follows:
A, when from the ancient books and records database when the historical geography infosystem is related, the information of place names of this record in the ancient books and records database is changed by the conversion corresponding tables, proper noun information such as information of place names affix age after the conversion are retrieved in the historical geography information index, thereby be associated with the historical geography infosystem.
B, when from the historical geography infosystem during to the ancient books and records database association, the information of place names of this record in the historical geography infosystem is changed by the conversion corresponding tables, be associated with the ancient books and records database in the ancient books and records data directory thereby the information such as information of place names affix age after the conversion are retrieved.
Advantage of the present invention and technique effect:
Because the outwardness of nybble rare Chinese character in ancient books and records database and historical geography infosystem, cause between present ancient books and records database and historical geography infosystem, setting up comprehensive related having difficulties.The present invention used the search engine of support Unicode (Unicode) four byte code East Asia ideograph and supported the mutual conversion of complicated and simple, the China, Japan and Korea S. of Unicode (Unicode) four byte code East Asia ideograph, variant Chinese character and comprise the title at all times of main physical geography key element such as Unicode (Unicode) four byte code East Asia ideograph place name, mountain range, river, lake, desert, shore line, the mutual respective modules that is commonly called as, calls, and greatly improved validity related between historical geography information and ancient books and records data and comprehensive; And historical geography infosystem and Chinese ancient books and records database function expanded, the user can use these two systems by this method more conveniently, more effectively obtains the information of asking for, and improves the efficient of research work.
Description of drawings
Fig. 1 is the interrelational form synoptic diagram of the present invention between ancient books and records database and historical geography infosystem
Embodiment
The present invention sets up index to the ancient books and records data with historical geography information and combines special treatment method to the nybble literal, and concrete implementation step is as follows:
A. in the ancient books and records database, write down information such as the place name relevant, time with document.As: to the record of a certain first poem, replenish the creation age of poem, the writing place; Record to unearthed documents such as the inscriptions on bones or tortoise shells, inscription on ancient bronze objects, simple silks, imperial or royal seal seal, historical relic replenishes findspot, historical relic age information.Help like this ancient books and records data and historical geography information are more accurate and effective, comprehensively connect;
B. use and support the search engine of Unicode (Unicode) four byte code East Asia ideograph that all document contents in the ancient books and records database (content that comprises the A step record) are set up search index.What is called is set up index and just is to use the mode of support four byte code literal that the document content that contains the four byte code literal is analyzed, thereby sets up the process of inverted list.By this process, employed inverted list in the time of can obtaining for query and search.And the content that contains the document of four byte code literal is meant the ancient books and records data content of preserving with certain form, is included in the extra content relevant with historical geography information of replenishing in the A step.
C. search engine that use to support Unicode (Unicode) four byte code East Asia ideograph is set up search index to the place name in the historical geography infosystem, time and relevant various side informations etc.Need make Unicode (Unicode) coded format into to information, especially Chinese characters of four bytes picture and the coinage character that comprises Unicode (Unicode) four byte code East Asia ideograph.Similar with the B step, be that the content of text in the historical geography infosystem is set up the inverted list index here.Content of text in the historical geography infosystem comprises place name, and the time, dependent event is replenished description etc., and place name comprises settlement, administrative region and territory, physical geography key element; Time comprises the comprehensive of multiple temporal informations such as the way of numbering the years in Christian era, the title of an emperor's reign way of numbering the years, the Heavenly Stems and Earthly Branches way of numbering the years.
D. all place names in historical geography information and the ancient books and records data, age information extraction are come out to form the conversion corresponding tables, this table contains the literal of four byte code.The “ Yong Alley of " lane forever " corresponding ancient books and records database of historical geography infosystem for example ", " forever
Figure A20061008916500061
".
E. by complicated and simple, the China, Japan and Korea S. that comprise Unicode (Unicode) four byte code East Asia ideograph, the mutual correspondence of variant Chinese character, and use the conversion corresponding tables, thereby realize corresponding conversion at historical geography information to simplified, the traditional font that comprises Unicode (Unicode) four byte code East Asia ideograph place name, place, modern ground, the mutual correspondence that is commonly called as at all times, calls.
F. from the ancient books and records database when the historical geography infosystem is related, the information such as place name of this record in the ancient books and records database are changed by the mode among the E, obtain a series of searching keywords, and in the index that the C step is set up, carry out query and search according to certain query logic, obtain a series of historical geography information relevant, thereby be associated with the historical geography infosystem with this record of ancient books and records database.
G. from the historical geography infosystem when the ancient books and records database association, the information such as place name of this record in the historical geography infosystem are changed by the mode among the E, obtain a series of searching keywords, and in the index that the B step is set up, carrying out query and search according to certain query logic, obtain a series of ancient books and records data messages relevant, thereby be associated with the ancient books and records database with this record of historical geography infosystem.
It is related each other to can be implemented on the super large character set that contains Unicode (Unicode) four byte code literal both of Chinese ancient books and records database and historical geography infosystem by above-mentioned steps.In order to understand correlating method of the present invention, we disclose the step and the accompanying drawing of concrete enforcement of the present invention at this, in the hope of it will be appreciated by those skilled in the art that.Because what contain Unicode (Unicode) four byte code literal contains byte and two byte characters naturally, without departing from the spirit and scope of the invention and the appended claims, various relevant replacements, variation and modification all are possible, therefore, the present invention must be not limited to example and the disclosed content of accompanying drawing.

Claims (5)

1. the ancient books and records database method related with the historical geography infosystem that can realize supporting the four byte code East Asia ideograph comprises and supports the search engine that contains Unicode four byte code East Asia ideograph to set up corresponding index database respectively the use of the content of text in content of text in the historical geography information and the ancient books and records data.
2. ancient books and records database method related that can realize supporting the four byte code East Asia ideograph with the historical geography infosystem, comprise proper noun information extractions such as the place name in ancient books and records data and the historical geography both information, name are come out, and set up synonym table character, the proprietary name conversion corresponding tables that contains Unicode four byte code East Asia ideograph is to comprising the mutual corresponding conversion that realizes keyword that is commonly called as at all times, calls of proper nouns such as name, place name.
3. as in claim 1, the 2 described ancient books and records database methods related that can realize supporting the four byte code East Asia ideograph with the historical geography infosystem, it is characterized in that: the information such as proper noun information affix age such as place name after the conversion are retrieved in the historical geography information index, thereby be associated with the historical geography infosystem.
4. as in claim 1, the 2 described ancient books and records database methods related that can realize supporting the four byte code East Asia ideograph with the historical geography infosystem, it is characterized in that: the information such as proper noun information affix age such as place name after the conversion are retrieved in the ancient books and records data directory, thereby be associated with the ancient books and records database.
5. as in claim 1,2,3, the 4 described ancient books and records database methods related that can realize supporting the four byte code East Asia ideograph with the historical geography infosystem, it is characterized in that: when ancient books and records database and historical geography infosystem are retrieved, perhaps call conversion during corresponding tables, realize the conversion of keyword by complicated and simple, the China, Japan and Korea S. that comprise Unicode four byte code East Asia ideograph, the mutual correspondence of variant Chinese character.
CNA2006100891656A 2006-08-08 2006-08-08 Method for associating classical book database with historical geographic information system for supporting four bytes Pending CN101122905A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2006100891656A CN101122905A (en) 2006-08-08 2006-08-08 Method for associating classical book database with historical geographic information system for supporting four bytes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2006100891656A CN101122905A (en) 2006-08-08 2006-08-08 Method for associating classical book database with historical geographic information system for supporting four bytes

Publications (1)

Publication Number Publication Date
CN101122905A true CN101122905A (en) 2008-02-13

Family

ID=39085242

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2006100891656A Pending CN101122905A (en) 2006-08-08 2006-08-08 Method for associating classical book database with historical geographic information system for supporting four bytes

Country Status (1)

Country Link
CN (1) CN101122905A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103823874A (en) * 2014-02-27 2014-05-28 北京六间房科技有限公司 Special character search method and system
CN104615706A (en) * 2015-01-30 2015-05-13 南京师范大学 Spatial integrating method of Chinese historical classic information
CN105183844A (en) * 2015-09-06 2015-12-23 国家基础地理信息中心 Method for obtaining rarely-used Chinese character library in basic geographic information data
CN105280086A (en) * 2014-06-05 2016-01-27 卡西欧计算机株式会社 Learning support apparatus, and data output method in learning support apparatus
CN107577819A (en) * 2017-09-30 2018-01-12 百度在线网络技术(北京)有限公司 A kind of content of text shows method, apparatus, computer equipment and storage medium
CN117494811A (en) * 2023-11-20 2024-02-02 南京大经中医药信息技术有限公司 Knowledge graph construction method and system for Chinese medicine books

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103823874A (en) * 2014-02-27 2014-05-28 北京六间房科技有限公司 Special character search method and system
CN105280086A (en) * 2014-06-05 2016-01-27 卡西欧计算机株式会社 Learning support apparatus, and data output method in learning support apparatus
CN105280086B (en) * 2014-06-05 2019-04-05 卡西欧计算机株式会社 Data output method in Learning support device and Learning support device
CN104615706A (en) * 2015-01-30 2015-05-13 南京师范大学 Spatial integrating method of Chinese historical classic information
CN104615706B (en) * 2015-01-30 2018-04-24 南京师范大学 A kind of spatially integrate method of Chinese history ancient books and records information
CN105183844A (en) * 2015-09-06 2015-12-23 国家基础地理信息中心 Method for obtaining rarely-used Chinese character library in basic geographic information data
CN107577819A (en) * 2017-09-30 2018-01-12 百度在线网络技术(北京)有限公司 A kind of content of text shows method, apparatus, computer equipment and storage medium
CN117494811A (en) * 2023-11-20 2024-02-02 南京大经中医药信息技术有限公司 Knowledge graph construction method and system for Chinese medicine books
CN117494811B (en) * 2023-11-20 2024-05-28 南京大经中医药信息技术有限公司 Knowledge graph construction method and system for Chinese medicine books

Similar Documents

Publication Publication Date Title
CN101840406B (en) Place name searching device and system
CN102395965B (en) Method for searching objects in a database
CN101794307A (en) Vehicle navigation POI (Point of Interest) search engine based on internetwork word segmentation idea
WO2006133538A1 (en) System and method for ranking web content
CN101122905A (en) Method for associating classical book database with historical geographic information system for supporting four bytes
CN102385609A (en) Enhancing search-result relevance ranking using uniform resource locators for queries containing non-encoding characters
NO328657B1 (en) Inverted index for contextual sock
CN101685021A (en) Method and device for acquiring point of interest
CN107908627A (en) A kind of multilingual map POI search systems
CN104252542A (en) Dynamic-planning Chinese words segmentation method based on lexicons
CN102385597B (en) The fault-tolerant searching method of a kind of POI
Martellos From a textual checklist to an information system: The case study of ITALIC, the Information System on Italian Lichens
CN107066112A (en) The spelling input method and device of a kind of address information
KR20000024179A (en) Korean Internet Natural Language Query Responsive Information Retrieval Engine.
CN1786956B (en) Method for processing converting abnormal word containing unicode four byte code East Asia ideograph in searching engine
Hovy et al. Data Acquisition and Integration in the DGRC's Energy Data Collection Project
CN103092846A (en) Realization of commodity retrieval method based on phonetic initial letters
CN102722527B (en) Full-text search method supporting search request containing missing symbols
Colcuc CARTOGRAPHY AND INTEROPERABILITY OF LINGUISTIC DATA: Digital geolinguistics illustrated by the project VerbaAlpina.
Doll Korean Rŏmaniz’atiŏn: Is It Finally Time for The Library Of Congress to Stop Promoting Mccune-Reischauer and Adopt the Revised Romanization Scheme?
KR100289332B1 (en) Automatic Word Construction System for Electronic Documents and Method
Felle Inscriptions by Christians in Late Antique Rome. Some Issues and Perspectives for the Epigraphic Database Bari (EDB)
JP4382634B2 (en) Address analysis apparatus, address analysis method, and address analysis program
JP6076285B2 (en) Translation apparatus, translation method, and translation program
JP5533576B2 (en) Information creating apparatus, information creating method and program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20090320

Address after: Beijing City, Chaoyang District Street heading for the small village compound No. 12 room 901

Applicant after: Wang Fei

Address before: Beijing City, Chaoyang District Street heading for the small village compound No. 12 room 901

Applicant before: Wang Hongyuan

ASS Succession or assignment of patent right

Owner name: WANG FEI

Free format text: FORMER OWNER: WANG HONGYUAN

Effective date: 20090320

C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20080213