CN101419605B - Electronic dictionary inquiry method for implementing repeated word list - Google Patents

Electronic dictionary inquiry method for implementing repeated word list Download PDF

Info

Publication number
CN101419605B
CN101419605B CN200810027792A CN200810027792A CN101419605B CN 101419605 B CN101419605 B CN 101419605B CN 200810027792 A CN200810027792 A CN 200810027792A CN 200810027792 A CN200810027792 A CN 200810027792A CN 101419605 B CN101419605 B CN 101419605B
Authority
CN
China
Prior art keywords
layer
prefix
word
node
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN200810027792A
Other languages
Chinese (zh)
Other versions
CN101419605A (en
Inventor
王建民
麦灿章
黄达尧
陈佳鹏
罗笑南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
National Sun Yat Sen University
Original Assignee
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Sun Yat Sen University filed Critical National Sun Yat Sen University
Priority to CN200810027792A priority Critical patent/CN101419605B/en
Publication of CN101419605A publication Critical patent/CN101419605A/en
Application granted granted Critical
Publication of CN101419605B publication Critical patent/CN101419605B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method for searching an electronic dictionary which can realize a repeated word list, wherein an index layer, a prefix layer and a data layer are established in the lexicon of the electronic dictionary. The index layer comprises contains indexes to the nodes of the prefix layer. The prefix layer contains the name of words and the phonetic information thereof. The data layer contains the detailed interpretation information of words. Meanwhile, the index way of a binary search is established in the index layer. The prefix layer is used for arranging words according to specific needs and supporting the repeated emergency of the words. The data layer is used for storing different data attributes. The offset displacement of the data layer is obtained by the nodes of the prefix layer so as to enter the nodes of the data layer corresponding to the word. The prefix layer contains the length of the nodes of a prior prefix layer. Through deducting length of the nodes of a prior prefix layer from the offset displacement of the present node, the pointer to one node of the prefix layer is obtained. Therefore, the ergodic in the prefix layer is realized and the detailed information of the word is obtained.

Description

A kind of querying method of realizing the electronic dictionary of repeated word list
Technical field
The present invention relates to a kind of design of dictionary of electronic dictionary, particularly relate to a kind of querying method of realizing the electronic dictionary of repeated word list.
Background technology
Electronic dictionary will have the ability that retrieves required word rapidly, therefore must make index of reference.No index dictionary seldom uses, and is orderly because necessarily require dictionary without index, and in order to visit conveniently, generally requiring each record is fixed length.
Electronic dictionary storage at present generally is divided into two parts: one is index file, and one is data file.Index file generally can be divided into the one-level index, secondary index and multiple index.The index structure that uses has a lot, but the framework of its storage but is generally two level frameworks.The first order is the index to word, and the second level is the explain information of word.As shown in Figure 1.
Indexed mode can have of all kinds.A kind of be each prefix as an index key, can comprise or not comprise the repetition prefix; Also having a kind of is with previous or several letters of prefix index as key word.One-level index and secondary index are more common in electronic dictionary, and multiple index only is applied to special processing, generally seldom use.
The explain information of word can have the different forms of expression based on the requirement of concrete dictionary.Generally comprised the word name, phonetic symbol, general informations such as explanation and example sentence, some special dictionaries also have its special grammer, like the word grammer, cultural note, history story, appendix (comprising language note, numeral-classifier compound etc.).
The function of electronic dictionary is exactly to find the explain information of this word rapidly through index, and is shown to the illustrated in detail of this word on the screen, supplies the user to browse.
Summary of the invention
To the simplicity of the structural design of present electronic dictionary and the stationarity of word list; The object of the invention is exactly to propose a kind of querying method of realizing the electronic dictionary of repeated word list, and under this framework, designs related data structure and method for expressing.
For realizing above-mentioned purpose, the technical scheme that the design of this dictionary configuration is adopted is following:
In the dictionary of electronic dictionary, set up index level, prefix layer and data Layer; Wherein: index level covers the index of prefix node layer; The prefix layer comprises the title and the phonetic symbol information of word; Data Layer has comprised the illustrated in detail information of word, and in index level, sets up the indexed mode based on binary search; The prefix layer is used for carrying out the arrangement of word according to concrete needs, and supports the appearance that repeats of word; Data Layer is used to preserve different data attributes;
Node from the prefix layer obtains the data Layer offset displacement; And get into the data Layer node of this word response; Wherein the prefix layer has comprised the length of previous prefix layer node; Deduct the length of previous prefix layer node through the offset displacement of this node, obtain the pointer of previous prefix layer node, and can be implemented in the traversal in the prefix layer thus;
Obtain the details of this word.
Dictionary is made up of index level, prefix layer, data Layer three-decker.
Because the dictionary of electronic dictionary is based on third-level storage structure, in order to inquire about a word, need use index level, the calling each other of prefix layer and data Layer.The index of these data of three layers all is based on the offset displacement of this layer starting position.Word enquiring based under three grades of storage formats is as shown in Figure 2.
(1) index level: can select the index strategy, be accomplished to the index of prefix layer.
Can use different index strategies according to demand, like tree index, Hash index.Adopted the mode of binary search in the present invention, based on the indexed mode of binary search, way of search is simple, and search speed is fast, and owing to only preserved the index of prefix layer, the storage space that therefore needs is very little.
(2) prefix layer: carry out the arrangement of word according to concrete needs, and support repeating of word.
In dictionary, the appearance of word order is always orderly, is based on the orderly ordering of word title or phonetic symbol generally speaking.The prefix layer that the present invention adopts can be supported repeating of word, and for example phonetically similar word repeats in dictionary.And because the unordered demonstration of word is supported in the design of prefix layer, so the user can the self-defined word list that needs demonstration.
(3) data Layer: data Layer is preserved different data attributes.
Node from the prefix layer obtains the data Layer offset displacement, and gets into the data Layer node of this word response.
Obtain the details of this word.Owing to preserved the attribute of word information in the dictionary in the data Layer, therefore the concrete display requirement based on system carries out the corresponding view demonstration.Like example sentence, the syntax, appendix is explained.
Wherein the prefix layer has comprised the length of previous prefix layer node; Deduct the length of previous prefix layer node through the offset displacement of this node; Can get the pointer of previous prefix layer node; And can be implemented in the traversal in the prefix layer thus, like turning over row or turning over screen operator in the display list of word.
In the prefix layer, the repetition prefix layer node of same word has only first repetition prefix layer node to shine upon mutually with the index level node, and all corresponding same data Layer node of all repetition prefix layer nodes of same word.Such mapping mode can reduce the redundancy of data, the more efficient use storage space.
Beneficial effect of the present invention:
The present invention has combined the storage architecture and the retrieval technique of present electronic dictionary, has proposed a kind of structure of the embedded electronic dictionary based on third-level storage structure.Electronic dictionary storage organization based under this framework is divided into index level, prefix layer and data Layer: index level can be selected the index strategy; The prefix layer can carry out the arrangement of word according to concrete needs, and supports the appearance that repeats of word; Data Layer is preserved the explain information of data.Adopted the indexed mode of binary search in the present invention; The data of word and storage that can not only find input faster are littler; If the word list in the prefix layer is to sort, can also further use the mode of sparse index, it is littler that index level is taken up space.The user can self-defined word tabulation, the repetition prefix layer node of same word has only first to repeat prefix layer node to shine upon mutually with the index level node, and all corresponding same data Layer node of all repetition prefix layer nodes of same word.Such mapping mode can reduce the redundancy of data, the more efficient use storage space.
Description of drawings
Fig. 1 is an index structure based on secondary storage.
Fig. 2 is that the word enquiring process flow diagram of three grades of storage formats is the framework synoptic diagram of three grades of storage formats.
Fig. 3 representes for index level and prefix layer data.
Fig. 4 is the index level data representation based on sparse index.
Fig. 5 is a repetitor head layer node structural drawing.
Fig. 6 is prefix layer and data Layer data representation.
Fig. 7 is the data Layer data structure.
Embodiment
Below in conjunction with accompanying drawing the present invention is further set forth:
(1) index level design
Index level is the ground floor of electronic dictionary third-level storage structure.Word based on user's input utilizes index level to find the position of this word in the prefix layer faster, and the word name and the phonetic of this word is shown.
In index level, adopted response mode based on window procedure, when character of the every input of user, the user need not by definite key, and system will carry out the indexed search of word automatically, and a screen word list of screen is upgraded, and show new word list information.
If have the word that repeats to occur in the prefix layer, like polyphone.Then these a plurality of words only use an index node, and this index node points to the node that the prefix layer occurs for the first time.As shown in Figure 3.
The index level of electronic dictionary is a kind of quick indexing district based on binary search, has carried out the ascending order arrangement according to the word of word in second layer key word by name.Each node of index level is 4 bytes, and this node has write down the offset displacement of this word in the prefix layer.
The data structure of index level is as shown in the table.
The data structure table of table 1 index level
Byte number 4 4 4 ........ . 4 4
Explanation The total number n of word Word 1 is at prefix layer pointer Word 2 is at prefix layer pointer ?........?. Word n-1 is at prefix layer pointer Word n is at prefix layer pointer
If the node in the prefix layer is to sort, then can use the mode of sparse index.Use the mode of sparse index, occupation space is less.As shown in Figure 4.
(2) prefix layer design
The prefix layer is the middle layer that connects index level and data Layer, plays a part data-linkage.Information in the index level has comprised the word title of this word, the pointer in phonetic symbol and the sensing data Layer.Through operation, can realize turning over the function of row and page turning at the prefix layer.Through pointing to the pointer of data Layer, the user can be through getting into the illustrated in detail information of checking this word by definite key.
When the user carries out word lookup, one group of word list on the display screen.The dictionary that has only need show the title of word, and what have then need also show the phonetic symbol of word.Therefore, the data in our the prefix layer have comprised the title of word and the phonetic symbol of word.
In word list, the design of dictionary also should be supported user's page-turning function, and the user can check the word list of page up or following one page through page turning key.
Owing in word list, there is the situation of repeated word.For this reason, we have designed the data type of two kinds of prefix layer nodes, are respectively common prefix layer node and repetition prefix layer node.
Under embedded environment; For saving storage space, we are that two kinds of different node types are provided with identical prefix, can judge according to the 1st byte of node this node specifically is to belong to which kind of type; If this byte is 0; The multiple junction point of then attaching most importance to uses repetition prefix layer node to read in, otherwise belongs to common prefix layer node.The information that repeats prefix layer node has comprised the information of common prefix layer node, and wants the doubly linked list of a pointer, has pointed to the offset displacement of a last node that repeats to occur with next this word respectively.It is as shown in Figure 5 to repeat prefix layer node and common prefix layer node relation.
Used the prefix layer and the data Layer relation of repeated word as shown in Figure 6.
The information of common prefix layer node has comprised the word title, phonetic symbol and the pointer that points to data Layer.Owing to need to realize turning over capable page-turning function up and down, so the prefix layer also has the node total length of a last prefix layer node, can jump to the prefix layer node of a word easily.Common prefix layer node data structure is as shown in the table.
The common prefix layer of table 2 node data structure table
Byte number Explanation Remarks
1 The length wordLen of word title ASCII character is represented, the length of this word title.WordLen>0
WordLen The spelling of word title
1 The length soundLen of phonetic symbol ASCII character is represented, then is not 0
SoundLen The spelling of phonetic symbol If phonetic symbol length is 0, then there is not this field attribute
4 Point to the offset displacement of the 3rd layer data layer
1 The length totalLen of a last prefix layer node ASCII character is represented, if first node then is 0Total Len=7+wordLen+soundLen, totalLen<255 (wherein wordLen, soundLen is the node information of previous prefix layer)
When prefix layer node is a repetition prefix node, use the data structure of repetition prefix node.Repeat the full detail that prefix layer node not only comprised common prefix layer node; Also has a last pointer with the next repetition prefix layer node that repeats to occur that points to this word, the tram that the needs that let the user can be easy-to-look-up go out this repeated word occur.It is as shown in the table to repeat prefix layer node data structure.
Table 3 repeats prefix layer node data structure
Byte number Explanation Remarks
1 The repeated word zone bit Put 0
4 A last repeated word A last repeated word then is not changed to-1 at the offset displacement of prefix layer
4 Next repeated word Next repeated word then is not changed to-1 at the offset displacement of prefix layer
1 The length wordLen of word title ASCII character is represented, the length of this word title
wordLen The spelling of word title
1 The length soundLen of phonetic symbol ASCII character is represented, then is not changed to 0
soundLen The spelling of phonetic symbol If phonetic symbol length is changed to 0, then there is not this field attribute
4 Point to the offset displacement of the 3rd layer data layer
1 The length totalLen of a last prefix layer node ASCII character is represented, then is not changed to 0
(3) data Layer design
Data Layer is being stored the illustrated in detail information of word, comprises the grammatical information of word, explain information, the example sentence under explaining etc.
The data Layer structure is made up of the information of video data piece sum and video data piece.The information of a word is made up of the attribute that constitutes this word, and an attribute is a video data piece.If there are some nested attribute representations, then, nested attribute is divided into several video data pieces according to system's needs of practice.
Structure in the data Layer has comprised the sum and each video data piece of video data piece.The video data piece is elongated data flow architecture.Prefix layer conceptual data structure is as shown in Figure 7.The data structure of video data piece is as shown in the table.
Table 4 word video data block data structure
Byte number Explanation Remarks
1 The attribute of video data piece ASCII character is represented
1 The length L ength of this displaying block ASCII character is represented, the length of this data block
Length The content of displaying block
In data Layer, the sum of data block and length all use a byte to represent that promptly the length of this data block is not more than 255.Because in actual dictionary, the length of data block total number or data block that has some words is greater than 255.Therefore, we have used the method for expressing of length displacement.
When a number Num is less than 255 the time, we use a byte to represent.When Num>255, we use 3 bytes to represent, first byte is tag mark 0XFF, two length that byte is this word of back, so the expression scope of Num has just become 0 to 65535.In electronic dictionary, this expression scope can be suitable for fully.Elongated numeral method is as shown in the table.
The numeral method that table 5 is elongated
Byte number Explanation Remarks
1 Tag mark oxFF ASCII character is represented, representes that this number is not less than 255
2 Length Signless integer, expression scope 255~65535

Claims (6)

1. a querying method of realizing the electronic dictionary of repeated word list is characterized in that, said method comprises:
In the dictionary of electronic dictionary, set up index level, prefix layer and data Layer; Wherein: index level covers the index of prefix node layer; The prefix layer comprises the title and the phonetic symbol information of word; Data Layer has comprised the illustrated in detail information of word, and in index level, sets up the indexed mode based on binary search; The prefix layer is used for carrying out the arrangement of word according to concrete needs, and supports the appearance that repeats of word; Data Layer is used to preserve different data attributes;
Node from the prefix layer obtains the data Layer offset displacement; And get into the corresponding data Layer node of this word; Wherein the prefix layer has comprised the length of previous prefix layer node; Deduct the length of previous prefix layer node through the offset displacement of this node, obtain the pointer of previous prefix layer node, and can be implemented in the traversal in the prefix layer thus; Prefix layer node is made up of common prefix layer node and repetition prefix layer node; Can judge which kind of node structure this node belongs to based on the 1st byte of prefix layer node, the repetition prefix layer node of certain word not only has the data structure of common prefix layer node also to comprise a previous and back pointer that repeats prefix layer node of this word;
Obtain the details of this word.
2. querying method as claimed in claim 1 is characterized in that, the word list in the said prefix layer sorts, and further uses the mode of sparse index.
3. querying method as claimed in claim 1 is characterized in that, prefix layer node comprised the length of previous prefix layer node, word title, word phonetic symbol and the pointer that arrives corresponding data Layer node.
4. querying method as claimed in claim 1; It is characterized in that; Prefix layer node is made up of common prefix layer node and repetition prefix layer node; Can judge which kind of node structure this node belongs to according to the 1st byte of prefix layer node, the repetition prefix layer node of certain word not only has the data structure of common prefix layer node also to comprise a previous and back pointer that repeats prefix layer node of this word.
5. querying method as claimed in claim 1 is characterized in that, it is corresponding with index level that the repetition prefix layer node of the same word of prefix layer has only first to repeat prefix layer node.
6. querying method as claimed in claim 1 is characterized in that, has comprised the sum and each video data piece of video data piece in the data Layer; Each displaying block of word all is elongated data flow architecture, and has defined the attribute of data message in the displaying block with numeral.
CN200810027792A 2008-04-30 2008-04-30 Electronic dictionary inquiry method for implementing repeated word list Expired - Fee Related CN101419605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810027792A CN101419605B (en) 2008-04-30 2008-04-30 Electronic dictionary inquiry method for implementing repeated word list

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810027792A CN101419605B (en) 2008-04-30 2008-04-30 Electronic dictionary inquiry method for implementing repeated word list

Publications (2)

Publication Number Publication Date
CN101419605A CN101419605A (en) 2009-04-29
CN101419605B true CN101419605B (en) 2012-10-10

Family

ID=40630397

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810027792A Expired - Fee Related CN101419605B (en) 2008-04-30 2008-04-30 Electronic dictionary inquiry method for implementing repeated word list

Country Status (1)

Country Link
CN (1) CN101419605B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751473A (en) * 2009-12-31 2010-06-23 中兴通讯股份有限公司 The searching of a kind of amendment record item, renewal and method for synchronous and data sync equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101124579A (en) * 2005-02-24 2008-02-13 富士施乐株式会社 Word translation device, translation method, and translation program
CN101145155A (en) * 2007-10-24 2008-03-19 中山大学 Electronic dictionary data memory format and its searching method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101124579A (en) * 2005-02-24 2008-02-13 富士施乐株式会社 Word translation device, translation method, and translation program
CN101145155A (en) * 2007-10-24 2008-03-19 中山大学 Electronic dictionary data memory format and its searching method

Also Published As

Publication number Publication date
CN101419605A (en) 2009-04-29

Similar Documents

Publication Publication Date Title
CN101673307B (en) Space data index method and system
US20120259829A1 (en) Generating related input suggestions
CN101025738A (en) Template-free dynamic website generating method
CN103390020A (en) Method and system for storing data in database
CN101794307A (en) Vehicle navigation POI (Point of Interest) search engine based on internetwork word segmentation idea
CN101398830B (en) Thesaurus fuzzy enquiry method and thesaurus fuzzy enquiry system
CN102609452A (en) Data storage method and data storage device
CN103365992A (en) Method for realizing dictionary search of Trie tree based on one-dimensional linear space
CN103123650A (en) Extensible markup language (XML) data bank full-text indexing method based on integer mapping
US20040006458A1 (en) Method and system of creating and using Chinese language data and user-corrected data
CN106021523A (en) Storage and search method for data warehouse based on JASON
CN103235789B (en) A kind of Chinese character is converted to the method for spelling and initial
CN103309879A (en) Method and device for managing marks in WORD document
US11238084B1 (en) Semantic translation of data sets
CN102819606A (en) Spelling-based information inquiry method and system and server
US20220121637A1 (en) Structured document indexing and searching
CN101145155A (en) Electronic dictionary data memory format and its searching method
CN101419605B (en) Electronic dictionary inquiry method for implementing repeated word list
KR100326936B1 (en) System and method for translating foreign language phonetic presentation of korean word to korean word and retrieving information related to translated korean word
CN109933803A (en) A kind of Chinese idiom information displaying method shows device, electronic equipment and storage medium
US20040243396A1 (en) User-oriented electronic dictionary, electronic dictionary system and method for creating same
CN101089850A (en) System for global search using comparison single work position relation
CN100561469C (en) Create and use the method and system of Chinese language data and user-corrected data
CN108268517B (en) Method and system for managing labels in database
CN103092846A (en) Realization of commodity retrieval method based on phonetic initial letters

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20121010

Termination date: 20140430