CN101499081A - Words language structure tree building method - Google Patents

Words language structure tree building method Download PDF

Info

Publication number
CN101499081A
CN101499081A CNA2008100573987A CN200810057398A CN101499081A CN 101499081 A CN101499081 A CN 101499081A CN A2008100573987 A CNA2008100573987 A CN A2008100573987A CN 200810057398 A CN200810057398 A CN 200810057398A CN 101499081 A CN101499081 A CN 101499081A
Authority
CN
China
Prior art keywords
language
code information
tree
node
word language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2008100573987A
Other languages
Chinese (zh)
Inventor
赵文银
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING KINGQUE DIGITAL TECHNOLOGY Co Ltd
Original Assignee
BEIJING KINGQUE DIGITAL TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING KINGQUE DIGITAL TECHNOLOGY Co Ltd filed Critical BEIJING KINGQUE DIGITAL TECHNOLOGY Co Ltd
Priority to CNA2008100573987A priority Critical patent/CN101499081A/en
Publication of CN101499081A publication Critical patent/CN101499081A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to a word language managing technique and a computer data structure technique, in particularly a method for constructing a word language structure tree. The method includes steps as follows: a rule for converting the word language to a space position and converting; a rule design construction of a word language structure tree code information; a method for synthesizing and managing the word language structure tree code information; a method for analyzing and identifying the word language structure tree code information; processing the word language through the computer or other devices that can calculate and store for obtaining the word language tree junction. The method can manage word language simply, directly and efficiently, the user can use the word language tree for reaching same effect when the user manages the word language. The word language tree has certain regularity that can increase efficiency greatly in processing the word language tree. The method has strong practicability.

Description

The construction method of Words language structure tree
Technical field
The present invention relates to word language administrative skill and computer data structure technology, particularly relate to a kind of method that makes up Words language structure tree.
Background technology
Along with deepening continuously of global IT application process, information is also increasing to the influence of people's life.
With the internet is example, has every day a large amount of fresh informations to propagate on Internet, and people create the wealth by seeking valuable information on the net.In the face of the magnanimity information of so huge and rapid growth every day, chaotic, manage these information how efficiently, in an orderly manner and will become of crucial importance.
In present stage, the good method of neither one realizes above-mentioned problem.In most of the cases, people are by the text index mode, utilize high-efficiency management to magnanimity information is set up in the management of keyword.
Present traditional method more complicated, also not directly perceived, the efficient of system is subjected to the constraint of other system such as database, hardware etc. simultaneously; Traditional in addition method when quantity of information or keyword reach certain quantity, because the restriction of hardware and other system, has increased the workload of computing machine widely when practical application, reduced efficient; In addition, owing to be not to adopt structurized management, but the method that adopts less unordered key word information to manage unordered magnanimity information, when key word information reached certain order of magnitude, the work of treatment amount was bigger, and extensibility is not strong.
Summary of the invention
In view of this, the invention provides a kind of construction method of Words language structure tree, utilize this method to correspond to word language uniquely on the structure tree simply, easily, is the management transitions to word language management to node on the structure tree, because high efficiency, the regularity of structure, improved speed greatly, saved workload the word language management; Managing magnanimity information such as the same keyword mode that adopts, because keyword has adopted structured techniques, is the orderly information of controlled range, is not subjected to the restriction of quantity, and processing speed is subjected to the influence of quantity soon and not, and is practical.
In order to realize goal of the invention, the invention provides a kind of construction method of Words language structure tree, this method comprises:
Transformation rule and conversion that A, word language arrive the locus;
The Rule Design of B, Words language structure tree code information is formulated;
Synthetic and the management method of C, Words language structure tree code information;
The analysis recognition method of D, Words language structure tree code information;
E, word language is handled by computing machine or other equipment that can calculate and store, obtained the word language tree node.
Described steps A comprises:
A1, word language is mapped to the N dimension space, obtains a unique locus point P={0,1,2..., N-1};
A2, use a plurality of spaces or increase the method for space dimensionality, solve the situation of same location point in the corresponding space of different literals language, increase anti-" conflict " ability;
Described word language is unidirectional to the mapping in space, can not reduce word language from the locus;
A3, the locus is converted to address field, space specified is the N dimension space, locus P has N coordinate, and N coordinate values is divided into the L group in order, every group M numeral, every group of M digital addition (maximal value<L*M), obtain L new numeral, each numeral is got the surplus of Y, obtains L remainder and L multiple less than Y, remainder and multiple make up in order, obtain an address field K who is made of integer;
A4, address field is converted to tree node, wherein major node is got 64 remainder for the address field round values, and factor node is that address field is got 64 multiple;
A5, per 2 address fields are got 64 the resulting result of multiple and are placed in the same factor node.
The length of described address field is increment with 2 address fields, and total length is 2 multiple;
Conversion from the word language to the tree node is unidirectional, can not reduce word language from tree node;
Reduce the probability that the different literals language corresponds to same node by the number of plies that increases structure tree, perhaps reduce the probability that the different literals language corresponds to same node by the mode that makes up a subtree for this node again.
Described step B comprises:
The Words language structure tree code information comprises 2 or a plurality of numeral or character or symbol of arranging in order, the code information of any one tree node is that the independent code by segmentation combines, and the code information of any one tree node contains the feature of information heredity;
Word language tree code information and executing is order from left to right.
Described step C comprises:
C1, code information are the set of being arranged in order by the independent code of segmentation, and Words language structure tree is one group of code
The set that information constitutes;
C2, segmentation code have 64 at most, are respectively numeral 0,1...62,63;
C3, word language are mapped to a node on the structure tree uniquely;
C4, the management of word language can be realized by the management to code information.
Described step D comprises:
Code information comprises the full detail in this literal language its path in constructive process;
The identification of code information is to finish by the identification to the position of segmentation code information and segmentation code place layer;
The implication of a complete code information is the set of the implication of whole segmentation code information;
The position relation that includes vertical and horizontal in the code information;
Code information is one 2 dimension table, and each segmentation code is a data point in the table;
Structure tree is a set that is made of many 2 dimension tables.
Described step e comprises:
E1, read the input characters language message, being mapped to and specifying dimension is position P in the space of N;
E2, locus P is transformed into the address field of forming by a plurality of integers;
E3, address field is transformed on the node of a structure tree;
E4, the word language information of managing input by the management tree node.
Described Words language structure tree has and has only a root, and root is the starting point of Words language structure tree.
The infinitely layering of described Words language structure tree, but layer is accelerated and is decided by the address field that is produced in the word language transfer process.
The storage mode of described structure tree code information is file, database, file directory or other hardware medium.
From above scheme as can be seen, the construction method of Words language structure tree provided by the invention has following effect:
1, set up the corresponding relation of the uniqueness between word language and the structure tree, by directly perceived, convenient, management structure is set the purpose that reaches the managing literal language efficiently;
2, because searching of word language can be converted to the position of seeking the node on the structure tree, therefore locating speed is not subjected to what the influence of word language quantity basically, when word language quantity increases, location institute's time spent linear increasing, simple, practical, speed is fast.
Description of drawings
Fig. 1 forms the processing flow chart of Words language structure tree for the present invention;
Fig. 2 is converted to the exemplary plot of structure tree node for word language of the present invention;
Fig. 3 is a Words language structure tree synoptic diagram of the present invention.
Embodiment
In order to make the features and advantages of the present invention clearer, the present invention is further detailed explanation in conjunction with specific embodiments with reference to the accompanying drawings.
With dissimilar letter symbols is example, describes the formation method of Words language structure tree in detail.
Fig. 1 is the processing flow chart of Words language structure tree formed according to the present invention.
Step 101, word language arrive the transformation rule and the conversion formula of locus.
Word language is mapped to the space N that specifies dimension, and dimension is big more, and spatial content is big more, and the probability that the different literals language is mapped to same position P is more little.
Locus P is converted to the address field K that constitutes by a plurality of integers.
With the N=64 dimension space is example, word language is mapped to locus P, P is made of 64 coordinates, these 64 coordinate values are divided into 4 groups in order, every group of 16 numerals, the addition of every group of numeral (maximal value<16*64=4*256), obtain 4 new numerals, each numeral get 256 surplus, obtain 4 less than 256 numeral, and 4 multiples partly are combined into 2 groups of numerals, combination in order, first multiple is first group a tens word, and second multiple is first group unit numbers, and the 3rd multiple is second group tens word, the 4th multiple is second group unit numbers, resulting remainder is respectively Y1, Y2, Y3, Y4, multiple is B1, B2, resulting address field K=Y1:Y2:B1:Y3:Y4:B2;
Address field is converted to node code information on the structure tree.
With address field K=Y1:Y2:B1:Y3:Y4:B2 is example, respectively to Y1, Y2, B1, Y3, Y4, B2 get 64 remainder and multiple, and remainder is as the independent processing unit on the structure tree, the multiple of Y1, Y2 is combined as a node, the multiple of B1, Y3 is combined as a node, and the multiple of Y4, B2 is combined as a node, has so just obtained the tree of 9 layers of structure.
The number of plies of the tree node after the conversion of word language section is 3 multiple.
The Rule Design of step 102, code information is formulated.
In this routine Words language structure tree, code adopts character representation, and the length of segmentation code is 2 characters, the putting in order to from left to right of code.
Step 103, Words language structure tree obtain corresponding word language segment information by the segmentation code is handled, and the code information of tree node is to obtain by synthesizing of segmentation code, and the set of whole tree nodes has just constituted Words language structure tree.
The Storage Format of Words language structure tree is file, database, file directory or other hardware medium etc.
Step 104, read the node code information, decompose according to rule, per 3 segmentation codes are as one group, and the lengthwise position at record segmentation code place, analyze the implication of each segmentation code, and the lateral attitude at record segmentation code place, gather these analysis results, just obtain to finish the detailed meanings of point code information.
Step 105, read the input characters language message, being mapped to and specifying dimension is position P in the space of N; Locus P is transformed into the address field of forming by a plurality of integers; Address field is transformed on the node of a structure tree; Manage the word language information of input by the management tree node.
Through above step, just can create a Words language structure tree.Referring to Fig. 2, Fig. 3.
Fig. 2 shows that according to the present invention word language is transformed into the example schematic of the node code on the word language tree.
Wherein Shu Ru character can be a Chinese, other national literal or symbol, and corresponding tree node code information has 18 characters.
Such as input character is " Chinese character ", and then the tree node code is sddndfbdfnbfsafibb, and its segmentation code length is 2 characters, divides 3 groups, and totally 9 layers, every group comprises 3 layers.
Fig. 3 is " Chinese character " node distribution schematic diagram on the word language tree.
Step 301, be the root of Words language structure number.
Step 302, word language are set a node in the 1st layer, and the segmentation code is sd, belong to the 1st group.
Step 303, word language are set a node in the 2nd layer, and the segmentation code is dn, belong to the 1st group.
Step 304, word language are set a node in the 3rd layer, and the segmentation code is df, belong to the 1st group.
Step 305, word language are set a node in the 4th layer, and the segmentation code is bd, belong to the 2nd group.
Step 306, word language are set a node in the 5th layer, and the segmentation code is fn, belong to the 2nd group.
Step 307, word language are set a node in the 6th layer, and the segmentation code is bf, belong to the 2nd group.
Step 308, word language are set a node in the 7th layer, and the segmentation code is sa, belong to the 3rd group.
Step 309, word language are set a node in the 8th layer, and the segmentation code is fi, belong to the 3rd group.
Step 310, word language are set a node in the 9th layer, and the segmentation code is bb, belong to the 3rd group.
More than, be example only with the structure tree of creating the letter symbol correspondence, the method of the present invention being created Words language structure tree has been described in detail, but method provided by the present invention is used in other field equally, implementation method and above-described method basically identical in other field repeat no more here.
The above is specific embodiments of the invention only, is not in order to limit protection scope of the present invention.

Claims (10)

1, a kind of construction method of Words language structure tree is characterized in that, this method comprises:
Transformation rule and conversion that A, word language arrive the locus;
The Rule Design of B, Words language structure tree code information is formulated;
Synthetic and the management method of C, Words language structure tree code information;
The analysis recognition method of D, Words language structure tree code information;
E, word language is handled by computing machine or other equipment that can calculate and store, obtained the word language tree node.
2, method according to claim 1 is characterized in that, described steps A comprises:
A1, word language is mapped to the N dimension space, obtains a unique locus point P={0,1,2..., N-1};
A2, use a plurality of spaces or increase the method for space dimensionality, solve the situation of same location point in the corresponding space of different literals language, increase anti-" conflict " ability;
Described word language is unidirectional to the mapping in space, can not reduce word language from the locus;
A3, the locus is converted to address field, space specified is the N dimension space, locus P has N coordinate, and N coordinate values is divided into the L group in order, every group M numeral, every group of M digital addition (maximal value<L*M), obtain L new numeral, each numeral is got the surplus of Y, obtains L remainder and L multiple less than Y, remainder and multiple make up in order, obtain an address field K who is made of integer;
A4, address field is converted to tree node, wherein major node is got 64 remainder for the address field round values, and factor node is that address field is got 64 multiple;
A5, per 2 address fields are got 64 the resulting result of multiple and are placed in the same factor node.
3, method according to claim 2, the length of described address field is increment with 2 address fields, total length is 2 multiple;
Conversion from the word language to the tree node is unidirectional, can not reduce word language from tree node;
Reduce the probability that the different literals language corresponds to same node by the number of plies that increases structure tree, perhaps reduce the probability that the different literals language corresponds to same node by the mode that makes up a subtree for this node again.
4, method according to claim 1 is characterized in that, described step B comprises:
The Words language structure tree code information comprises 2 or a plurality of numeral or character or symbol of arranging in order, the code information of any one tree node is that the independent code by segmentation combines, and the code information of any one tree node contains the feature of information heredity;
Word language tree code information and executing is order from left to right.
5, method according to claim 1 is characterized in that, described step C comprises:
C1, code information are the set of being arranged in order by the independent code of segmentation, and Words language structure tree is the set that one group of code information constitutes;
C2, segmentation code have 64 at most, are respectively numeral 0,1...62,63;
C3, word language are mapped to a node on the structure tree uniquely;
C4, the management of word language can be realized by the management to code information.
6, method according to claim 1 is characterized in that, described step D comprises:
Code information comprises the full detail in this literal language its path in constructive process;
The identification of code information is to finish by the identification to the position of segmentation code information and segmentation code place layer;
The implication of a complete code information is the set of the implication of whole segmentation code information;
The position relation that includes vertical and horizontal in the code information;
Code information is one 2 dimension table, and each segmentation code is a data point in the table;
Structure tree is a set that is made of many 2 dimension tables.
7, method according to claim 1 is characterized in that, described step e comprises:
E1, read the input characters language message, being mapped to and specifying dimension is position P in the space of N;
E2, locus P is transformed into the address field of forming by a plurality of integers;
E3, address field is transformed on the node of a structure tree;
E4, the word language information of managing input by the management tree node.
8, method according to claim 1 is characterized in that, described Words language structure tree has and have only a root, and root is the starting point of Words language structure tree.
9, method according to claim 1 is characterized in that, the infinitely layering of described Words language structure tree, but layer is accelerated and is decided by the address field that is produced in the word language transfer process.
10, method according to claim 1 is characterized in that, the storage mode of described structure tree code information is file, database, file directory or other hardware medium.
CNA2008100573987A 2008-02-01 2008-02-01 Words language structure tree building method Pending CN101499081A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2008100573987A CN101499081A (en) 2008-02-01 2008-02-01 Words language structure tree building method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2008100573987A CN101499081A (en) 2008-02-01 2008-02-01 Words language structure tree building method

Publications (1)

Publication Number Publication Date
CN101499081A true CN101499081A (en) 2009-08-05

Family

ID=40946155

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2008100573987A Pending CN101499081A (en) 2008-02-01 2008-02-01 Words language structure tree building method

Country Status (1)

Country Link
CN (1) CN101499081A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268170A (en) * 2014-09-12 2015-01-07 电子科技大学 QPF font library organization method
CN106021286A (en) * 2016-04-29 2016-10-12 东北电力大学 Method for language understanding based on language structure
CN111506378A (en) * 2020-04-17 2020-08-07 腾讯科技(深圳)有限公司 Method, device and equipment for previewing text display effect and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268170A (en) * 2014-09-12 2015-01-07 电子科技大学 QPF font library organization method
CN104268170B (en) * 2014-09-12 2017-05-17 电子科技大学 QPF font library organization method
CN106021286A (en) * 2016-04-29 2016-10-12 东北电力大学 Method for language understanding based on language structure
CN111506378A (en) * 2020-04-17 2020-08-07 腾讯科技(深圳)有限公司 Method, device and equipment for previewing text display effect and storage medium
CN111506378B (en) * 2020-04-17 2021-09-28 腾讯科技(深圳)有限公司 Method, device and equipment for previewing text display effect and storage medium

Similar Documents

Publication Publication Date Title
CN101673307B (en) Space data index method and system
CN110929042B (en) Knowledge graph construction and query method based on power enterprise
Munro et al. Space-efficient construction of compressed indexes in deterministic linear time
EP2924594B1 (en) Data encoding and corresponding data structure in a column-store database
US9665600B2 (en) Method for implementing database
CN105630803B (en) The method and apparatus that Document image analysis establishes index
CN107798054A (en) A kind of range query method and device based on Trie
CN106777163A (en) IP address institute possession querying method and system based on RBTree
CN107145526B (en) Reverse-nearest neighbor query processing method for geographic social keywords under road network
CN106407201A (en) Data processing method and apparatus
CN102867049B (en) Chinese PINYIN quick word segmentation method based on word search tree
CN107766433A (en) A kind of range query method and device based on Geo BTree
CN1920831A (en) Method and system for managing object information on network
CN104573022A (en) Data query method and device for HBase
CN105574212A (en) Image retrieval method for multi-index disk Hash structure
CN104021123A (en) Method and system for data transfer
CN106326475A (en) High-efficiency static hash table implement method and system
CN108197313B (en) Dictionary indexing method for realizing space optimization through 16-bit Trie tree
CN103002061A (en) Method and device for mutual conversion of long domain names and short domain names
Hernández-Illera et al. Serializing RDF in compressed space
CN103324763A (en) Presenting method for tree-form data structure of mobile phone terminal
CN113535788A (en) Retrieval method, system, equipment and medium for marine environment data
CN100476824C (en) Method and system for storing element and method and system for searching element
CN101499081A (en) Words language structure tree building method
CN114443656A (en) Customizable automated data model analysis tool and use method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20090805