CN101419605B - Electronic dictionary inquiry method for implementing repeated word list - Google Patents
Electronic dictionary inquiry method for implementing repeated word list Download PDFInfo
- Publication number
- CN101419605B CN101419605B CN200810027792A CN200810027792A CN101419605B CN 101419605 B CN101419605 B CN 101419605B CN 200810027792 A CN200810027792 A CN 200810027792A CN 200810027792 A CN200810027792 A CN 200810027792A CN 101419605 B CN101419605 B CN 101419605B
- Authority
- CN
- China
- Prior art keywords
- layer
- prefix
- word
- node
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Abstract
The invention discloses a method for searching an electronic dictionary which can realize a repeated word list, wherein an index layer, a prefix layer and a data layer are established in the lexicon of the electronic dictionary. The index layer comprises contains indexes to the nodes of the prefix layer. The prefix layer contains the name of words and the phonetic information thereof. The data layer contains the detailed interpretation information of words. Meanwhile, the index way of a binary search is established in the index layer. The prefix layer is used for arranging words according to specific needs and supporting the repeated emergency of the words. The data layer is used for storing different data attributes. The offset displacement of the data layer is obtained by the nodes of the prefix layer so as to enter the nodes of the data layer corresponding to the word. The prefix layer contains the length of the nodes of a prior prefix layer. Through deducting length of the nodes of a prior prefix layer from the offset displacement of the present node, the pointer to one node of the prefix layer is obtained. Therefore, the ergodic in the prefix layer is realized and the detailed information of the word is obtained.
Description
Technical field
The present invention relates to a kind of design of dictionary of electronic dictionary, particularly relate to a kind of querying method of realizing the electronic dictionary of repeated word list.
Background technology
Electronic dictionary will have the ability that retrieves required word rapidly, therefore must make index of reference.No index dictionary seldom uses, and is orderly because necessarily require dictionary without index, and in order to visit conveniently, generally requiring each record is fixed length.
Electronic dictionary storage at present generally is divided into two parts: one is index file, and one is data file.Index file generally can be divided into the one-level index, secondary index and multiple index.The index structure that uses has a lot, but the framework of its storage but is generally two level frameworks.The first order is the index to word, and the second level is the explain information of word.As shown in Figure 1.
Indexed mode can have of all kinds.A kind of be each prefix as an index key, can comprise or not comprise the repetition prefix; Also having a kind of is with previous or several letters of prefix index as key word.One-level index and secondary index are more common in electronic dictionary, and multiple index only is applied to special processing, generally seldom use.
The explain information of word can have the different forms of expression based on the requirement of concrete dictionary.Generally comprised the word name, phonetic symbol, general informations such as explanation and example sentence, some special dictionaries also have its special grammer, like the word grammer, cultural note, history story, appendix (comprising language note, numeral-classifier compound etc.).
The function of electronic dictionary is exactly to find the explain information of this word rapidly through index, and is shown to the illustrated in detail of this word on the screen, supplies the user to browse.
Summary of the invention
To the simplicity of the structural design of present electronic dictionary and the stationarity of word list; The object of the invention is exactly to propose a kind of querying method of realizing the electronic dictionary of repeated word list, and under this framework, designs related data structure and method for expressing.
For realizing above-mentioned purpose, the technical scheme that the design of this dictionary configuration is adopted is following:
In the dictionary of electronic dictionary, set up index level, prefix layer and data Layer; Wherein: index level covers the index of prefix node layer; The prefix layer comprises the title and the phonetic symbol information of word; Data Layer has comprised the illustrated in detail information of word, and in index level, sets up the indexed mode based on binary search; The prefix layer is used for carrying out the arrangement of word according to concrete needs, and supports the appearance that repeats of word; Data Layer is used to preserve different data attributes;
Node from the prefix layer obtains the data Layer offset displacement; And get into the data Layer node of this word response; Wherein the prefix layer has comprised the length of previous prefix layer node; Deduct the length of previous prefix layer node through the offset displacement of this node, obtain the pointer of previous prefix layer node, and can be implemented in the traversal in the prefix layer thus;
Obtain the details of this word.
Dictionary is made up of index level, prefix layer, data Layer three-decker.
Because the dictionary of electronic dictionary is based on third-level storage structure, in order to inquire about a word, need use index level, the calling each other of prefix layer and data Layer.The index of these data of three layers all is based on the offset displacement of this layer starting position.Word enquiring based under three grades of storage formats is as shown in Figure 2.
(1) index level: can select the index strategy, be accomplished to the index of prefix layer.
Can use different index strategies according to demand, like tree index, Hash index.Adopted the mode of binary search in the present invention, based on the indexed mode of binary search, way of search is simple, and search speed is fast, and owing to only preserved the index of prefix layer, the storage space that therefore needs is very little.
(2) prefix layer: carry out the arrangement of word according to concrete needs, and support repeating of word.
In dictionary, the appearance of word order is always orderly, is based on the orderly ordering of word title or phonetic symbol generally speaking.The prefix layer that the present invention adopts can be supported repeating of word, and for example phonetically similar word repeats in dictionary.And because the unordered demonstration of word is supported in the design of prefix layer, so the user can the self-defined word list that needs demonstration.
(3) data Layer: data Layer is preserved different data attributes.
Node from the prefix layer obtains the data Layer offset displacement, and gets into the data Layer node of this word response.
Obtain the details of this word.Owing to preserved the attribute of word information in the dictionary in the data Layer, therefore the concrete display requirement based on system carries out the corresponding view demonstration.Like example sentence, the syntax, appendix is explained.
Wherein the prefix layer has comprised the length of previous prefix layer node; Deduct the length of previous prefix layer node through the offset displacement of this node; Can get the pointer of previous prefix layer node; And can be implemented in the traversal in the prefix layer thus, like turning over row or turning over screen operator in the display list of word.
In the prefix layer, the repetition prefix layer node of same word has only first repetition prefix layer node to shine upon mutually with the index level node, and all corresponding same data Layer node of all repetition prefix layer nodes of same word.Such mapping mode can reduce the redundancy of data, the more efficient use storage space.
Beneficial effect of the present invention:
The present invention has combined the storage architecture and the retrieval technique of present electronic dictionary, has proposed a kind of structure of the embedded electronic dictionary based on third-level storage structure.Electronic dictionary storage organization based under this framework is divided into index level, prefix layer and data Layer: index level can be selected the index strategy; The prefix layer can carry out the arrangement of word according to concrete needs, and supports the appearance that repeats of word; Data Layer is preserved the explain information of data.Adopted the indexed mode of binary search in the present invention; The data of word and storage that can not only find input faster are littler; If the word list in the prefix layer is to sort, can also further use the mode of sparse index, it is littler that index level is taken up space.The user can self-defined word tabulation, the repetition prefix layer node of same word has only first to repeat prefix layer node to shine upon mutually with the index level node, and all corresponding same data Layer node of all repetition prefix layer nodes of same word.Such mapping mode can reduce the redundancy of data, the more efficient use storage space.
Description of drawings
Fig. 1 is an index structure based on secondary storage.
Fig. 2 is that the word enquiring process flow diagram of three grades of storage formats is the framework synoptic diagram of three grades of storage formats.
Fig. 3 representes for index level and prefix layer data.
Fig. 4 is the index level data representation based on sparse index.
Fig. 5 is a repetitor head layer node structural drawing.
Fig. 6 is prefix layer and data Layer data representation.
Fig. 7 is the data Layer data structure.
Embodiment
Below in conjunction with accompanying drawing the present invention is further set forth:
(1) index level design
Index level is the ground floor of electronic dictionary third-level storage structure.Word based on user's input utilizes index level to find the position of this word in the prefix layer faster, and the word name and the phonetic of this word is shown.
In index level, adopted response mode based on window procedure, when character of the every input of user, the user need not by definite key, and system will carry out the indexed search of word automatically, and a screen word list of screen is upgraded, and show new word list information.
If have the word that repeats to occur in the prefix layer, like polyphone.Then these a plurality of words only use an index node, and this index node points to the node that the prefix layer occurs for the first time.As shown in Figure 3.
The index level of electronic dictionary is a kind of quick indexing district based on binary search, has carried out the ascending order arrangement according to the word of word in second layer key word by name.Each node of index level is 4 bytes, and this node has write down the offset displacement of this word in the prefix layer.
The data structure of index level is as shown in the table.
The data structure table of table 1 index level
Byte number | 4 | 4 | 4 | ........ . | 4 | 4 |
Explanation | The total number n of word | Word 1 is at prefix layer pointer | Word 2 is at prefix layer pointer | ?........?. | Word n-1 is at prefix layer pointer | Word n is at prefix layer pointer |
If the node in the prefix layer is to sort, then can use the mode of sparse index.Use the mode of sparse index, occupation space is less.As shown in Figure 4.
(2) prefix layer design
The prefix layer is the middle layer that connects index level and data Layer, plays a part data-linkage.Information in the index level has comprised the word title of this word, the pointer in phonetic symbol and the sensing data Layer.Through operation, can realize turning over the function of row and page turning at the prefix layer.Through pointing to the pointer of data Layer, the user can be through getting into the illustrated in detail information of checking this word by definite key.
When the user carries out word lookup, one group of word list on the display screen.The dictionary that has only need show the title of word, and what have then need also show the phonetic symbol of word.Therefore, the data in our the prefix layer have comprised the title of word and the phonetic symbol of word.
In word list, the design of dictionary also should be supported user's page-turning function, and the user can check the word list of page up or following one page through page turning key.
Owing in word list, there is the situation of repeated word.For this reason, we have designed the data type of two kinds of prefix layer nodes, are respectively common prefix layer node and repetition prefix layer node.
Under embedded environment; For saving storage space, we are that two kinds of different node types are provided with identical prefix, can judge according to the 1st byte of node this node specifically is to belong to which kind of type; If this byte is 0; The multiple junction point of then attaching most importance to uses repetition prefix layer node to read in, otherwise belongs to common prefix layer node.The information that repeats prefix layer node has comprised the information of common prefix layer node, and wants the doubly linked list of a pointer, has pointed to the offset displacement of a last node that repeats to occur with next this word respectively.It is as shown in Figure 5 to repeat prefix layer node and common prefix layer node relation.
Used the prefix layer and the data Layer relation of repeated word as shown in Figure 6.
The information of common prefix layer node has comprised the word title, phonetic symbol and the pointer that points to data Layer.Owing to need to realize turning over capable page-turning function up and down, so the prefix layer also has the node total length of a last prefix layer node, can jump to the prefix layer node of a word easily.Common prefix layer node data structure is as shown in the table.
The common prefix layer of table 2 node data structure table
Byte number | Explanation | Remarks |
1 | The length wordLen of word title | ASCII character is represented, the length of this word title.WordLen>0 |
WordLen | The spelling of word title | |
1 | The length soundLen of phonetic symbol | ASCII character is represented, then is not 0 |
SoundLen | The spelling of phonetic symbol | If phonetic symbol length is 0, then there is not this field attribute |
4 | Point to the offset displacement of the 3rd layer data layer | |
1 | The length totalLen of a last prefix layer node | ASCII character is represented, if first node then is 0Total Len=7+wordLen+soundLen, totalLen<255 (wherein wordLen, soundLen is the node information of previous prefix layer) |
When prefix layer node is a repetition prefix node, use the data structure of repetition prefix node.Repeat the full detail that prefix layer node not only comprised common prefix layer node; Also has a last pointer with the next repetition prefix layer node that repeats to occur that points to this word, the tram that the needs that let the user can be easy-to-look-up go out this repeated word occur.It is as shown in the table to repeat prefix layer node data structure.
Table 3 repeats prefix layer node data structure
Byte number | Explanation | Remarks |
1 | The repeated word zone bit | Put 0 |
4 | A last repeated word | A last repeated word then is not changed to-1 at the offset displacement of prefix layer |
4 | Next repeated word | Next repeated word then is not changed to-1 at the offset displacement of prefix layer |
1 | The length wordLen of word title | ASCII character is represented, the length of this word title |
wordLen | The spelling of word title | |
1 | The length soundLen of phonetic symbol | ASCII character is represented, then is not changed to 0 |
soundLen | The spelling of phonetic symbol | If phonetic symbol length is changed to 0, then there is not this field attribute |
4 | Point to the offset displacement of the 3rd layer data layer | |
1 | The length totalLen of a last prefix layer node | ASCII character is represented, then is not changed to 0 |
(3) data Layer design
Data Layer is being stored the illustrated in detail information of word, comprises the grammatical information of word, explain information, the example sentence under explaining etc.
The data Layer structure is made up of the information of video data piece sum and video data piece.The information of a word is made up of the attribute that constitutes this word, and an attribute is a video data piece.If there are some nested attribute representations, then, nested attribute is divided into several video data pieces according to system's needs of practice.
Structure in the data Layer has comprised the sum and each video data piece of video data piece.The video data piece is elongated data flow architecture.Prefix layer conceptual data structure is as shown in Figure 7.The data structure of video data piece is as shown in the table.
Table 4 word video data block data structure
Byte number | Explanation | Remarks |
1 | The attribute of video data piece | ASCII character is represented |
1 | The length L ength of this displaying block | ASCII character is represented, the length of this data block |
Length | The content of displaying block |
In data Layer, the sum of data block and length all use a byte to represent that promptly the length of this data block is not more than 255.Because in actual dictionary, the length of data block total number or data block that has some words is greater than 255.Therefore, we have used the method for expressing of length displacement.
When a number Num is less than 255 the time, we use a byte to represent.When Num>255, we use 3 bytes to represent, first byte is tag mark 0XFF, two length that byte is this word of back, so the expression scope of Num has just become 0 to 65535.In electronic dictionary, this expression scope can be suitable for fully.Elongated numeral method is as shown in the table.
The numeral method that table 5 is elongated
Byte number | Explanation | Remarks |
1 | Tag mark oxFF | ASCII character is represented, representes that this number is not less than 255 |
2 | Length | Signless integer, expression scope 255~65535 |
Claims (6)
1. a querying method of realizing the electronic dictionary of repeated word list is characterized in that, said method comprises:
In the dictionary of electronic dictionary, set up index level, prefix layer and data Layer; Wherein: index level covers the index of prefix node layer; The prefix layer comprises the title and the phonetic symbol information of word; Data Layer has comprised the illustrated in detail information of word, and in index level, sets up the indexed mode based on binary search; The prefix layer is used for carrying out the arrangement of word according to concrete needs, and supports the appearance that repeats of word; Data Layer is used to preserve different data attributes;
Node from the prefix layer obtains the data Layer offset displacement; And get into the corresponding data Layer node of this word; Wherein the prefix layer has comprised the length of previous prefix layer node; Deduct the length of previous prefix layer node through the offset displacement of this node, obtain the pointer of previous prefix layer node, and can be implemented in the traversal in the prefix layer thus; Prefix layer node is made up of common prefix layer node and repetition prefix layer node; Can judge which kind of node structure this node belongs to based on the 1st byte of prefix layer node, the repetition prefix layer node of certain word not only has the data structure of common prefix layer node also to comprise a previous and back pointer that repeats prefix layer node of this word;
Obtain the details of this word.
2. querying method as claimed in claim 1 is characterized in that, the word list in the said prefix layer sorts, and further uses the mode of sparse index.
3. querying method as claimed in claim 1 is characterized in that, prefix layer node comprised the length of previous prefix layer node, word title, word phonetic symbol and the pointer that arrives corresponding data Layer node.
4. querying method as claimed in claim 1; It is characterized in that; Prefix layer node is made up of common prefix layer node and repetition prefix layer node; Can judge which kind of node structure this node belongs to according to the 1st byte of prefix layer node, the repetition prefix layer node of certain word not only has the data structure of common prefix layer node also to comprise a previous and back pointer that repeats prefix layer node of this word.
5. querying method as claimed in claim 1 is characterized in that, it is corresponding with index level that the repetition prefix layer node of the same word of prefix layer has only first to repeat prefix layer node.
6. querying method as claimed in claim 1 is characterized in that, has comprised the sum and each video data piece of video data piece in the data Layer; Each displaying block of word all is elongated data flow architecture, and has defined the attribute of data message in the displaying block with numeral.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200810027792A CN101419605B (en) | 2008-04-30 | 2008-04-30 | Electronic dictionary inquiry method for implementing repeated word list |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200810027792A CN101419605B (en) | 2008-04-30 | 2008-04-30 | Electronic dictionary inquiry method for implementing repeated word list |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101419605A CN101419605A (en) | 2009-04-29 |
CN101419605B true CN101419605B (en) | 2012-10-10 |
Family
ID=40630397
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200810027792A Expired - Fee Related CN101419605B (en) | 2008-04-30 | 2008-04-30 | Electronic dictionary inquiry method for implementing repeated word list |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101419605B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101751473A (en) * | 2009-12-31 | 2010-06-23 | 中兴通讯股份有限公司 | The searching of a kind of amendment record item, renewal and method for synchronous and data sync equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101124579A (en) * | 2005-02-24 | 2008-02-13 | 富士施乐株式会社 | Word translation device, translation method, and translation program |
CN101145155A (en) * | 2007-10-24 | 2008-03-19 | 中山大学 | Electronic dictionary data memory format and its searching method |
-
2008
- 2008-04-30 CN CN200810027792A patent/CN101419605B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101124579A (en) * | 2005-02-24 | 2008-02-13 | 富士施乐株式会社 | Word translation device, translation method, and translation program |
CN101145155A (en) * | 2007-10-24 | 2008-03-19 | 中山大学 | Electronic dictionary data memory format and its searching method |
Also Published As
Publication number | Publication date |
---|---|
CN101419605A (en) | 2009-04-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101673307B (en) | Space data index method and system | |
US20120259829A1 (en) | Generating related input suggestions | |
CN101025738A (en) | Template-free dynamic website generating method | |
CN103390020A (en) | Method and system for storing data in database | |
CN101794307A (en) | Vehicle navigation POI (Point of Interest) search engine based on internetwork word segmentation idea | |
CN101398830B (en) | Thesaurus fuzzy enquiry method and thesaurus fuzzy enquiry system | |
CN102609452A (en) | Data storage method and data storage device | |
CN103365992A (en) | Method for realizing dictionary search of Trie tree based on one-dimensional linear space | |
CN103123650A (en) | Extensible markup language (XML) data bank full-text indexing method based on integer mapping | |
US20040006458A1 (en) | Method and system of creating and using Chinese language data and user-corrected data | |
CN106021523A (en) | Storage and search method for data warehouse based on JASON | |
CN103235789B (en) | A kind of Chinese character is converted to the method for spelling and initial | |
CN103309879A (en) | Method and device for managing marks in WORD document | |
US11238084B1 (en) | Semantic translation of data sets | |
CN102819606A (en) | Spelling-based information inquiry method and system and server | |
US20220121637A1 (en) | Structured document indexing and searching | |
CN101145155A (en) | Electronic dictionary data memory format and its searching method | |
CN101419605B (en) | Electronic dictionary inquiry method for implementing repeated word list | |
KR100326936B1 (en) | System and method for translating foreign language phonetic presentation of korean word to korean word and retrieving information related to translated korean word | |
CN109933803A (en) | A kind of Chinese idiom information displaying method shows device, electronic equipment and storage medium | |
US20040243396A1 (en) | User-oriented electronic dictionary, electronic dictionary system and method for creating same | |
CN101089850A (en) | System for global search using comparison single work position relation | |
CN100561469C (en) | Create and use the method and system of Chinese language data and user-corrected data | |
CN108268517B (en) | Method and system for managing labels in database | |
CN103092846A (en) | Realization of commodity retrieval method based on phonetic initial letters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20121010 Termination date: 20140430 |