CN102646118B - Data indexing method and device - Google Patents

Data indexing method and device Download PDF

Info

Publication number
CN102646118B
CN102646118B CN201210039265.3A CN201210039265A CN102646118B CN 102646118 B CN102646118 B CN 102646118B CN 201210039265 A CN201210039265 A CN 201210039265A CN 102646118 B CN102646118 B CN 102646118B
Authority
CN
China
Prior art keywords
tree
index
indexed object
lifetime
hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210039265.3A
Other languages
Chinese (zh)
Other versions
CN102646118A (en
Inventor
王恩东
文中领
刘正伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201210039265.3A priority Critical patent/CN102646118B/en
Publication of CN102646118A publication Critical patent/CN102646118A/en
Priority to PCT/CN2013/071627 priority patent/WO2013123867A1/en
Application granted granted Critical
Publication of CN102646118B publication Critical patent/CN102646118B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data indexing method and device and relates to the field of data management, and the method and the device are used for solving the problem that the traditional indexing technology can not meet the needs of large data retrieval. The method comprises the following steps: creating an index tree of at least one indexed object R; establishing a hash structure according to the ID (identifier) of the at least one indexed object R; and storing the lifetime of the indexed object (at each position of the index tree) in the hash structure. The technical scheme provided by the invention is applicable to large-scale relation data management systems, and an effect of high-efficiency and high-accuracy data indexing is achieved.

Description

Data directory method and apparatus
Technical field
The present invention relates to data management field, relate in particular to a kind of data directory method and apparatus.
Background technology
In recent decades, data management technique development is swift and violent, has played outstanding role in the development of the national economy.Take Large-scale Relational Database management system (Relational Database Management System, RDBMS) that Oracle, DB2, SQL Server etc. are representative many large management information systems, the indispensable core of CRM software especially.Meanwhile, take semi-structured data administrative skill that extend markup language (Extensible Markup Language, XML) is representative also in exchanges data with lack aspect the data management of strict structure and occupying one seat.Above-mentioned technology is all very high to the accuracy requirement of the quality of data, pending data.When raw data of low quality, need to first through preprocessing process, promote the quality of data.Take department's Human Resources Management System as example, and the information such as employee's personal information, emolument treatment and daily examination must be accurately.But at fields such as economy, military affairs and telecommunications, the uncertain ubiquity of data, there is error in its existence the unknown and each property value.Although data pre-service can promote the quality of raw data set, also may lose the some properties of raw data set, cause returning to high-quality Query Result.Typical application background is as follows.
Location-based service (Location Base Service, LBS) is the key problem in mobile computing field.Mobile object (or user) is followed the tracks of in location-based service, then the position of object (or user) is located on electronic chart, and spatial Information Service is provided based on this.In this class application, the position of mobile object is subject to particular technology means (for example GPS (Global Positioning System, GPS) technology) restriction, has certain error.Although this error can progressively be dwindled along with the lifting of technological means, " location privacy " problem but seems and becomes increasingly conspicuous.The positional information of mobile object is extremely important, and some user is unwilling to make known publicly, in order to avoid make troubles.The object of " location privacy " is the precision dipping---carve at a time, mobile object not in a certain space " point " upper, but in one " region ", thereby protected privacy.Meanwhile, each ISP still can provide corresponding service according to this " region " information, for example, and the facilities such as near hospital inquiry mobile object, hotel.
Index technology is the important content of data management technique.It is that one-dimensional data is set up index that relevant database often adopts B+ tree and mutation thereof; In multidimensional data management domain or time-space data management field, be widely used R tree with and mutation carry out index.These index technologies all can significantly improve query processing speed.In like manner, in processing uncertain data, also need to pay close attention to index problem.In some query task, for example top-k inquiry, the probable value of tuple is also extremely important, therefore need to create one dimension index for probability dimension, and now traditional index technology is effective.But traditional index technology cannot solve all problems.
When the value of each tuple must be described by probability distribution function, and probability distribution function be when cannot specify in advance, and traditional index technology index efficiency will significantly reduce, and cannot meet application demand.
Summary of the invention
The invention provides a kind of data directory method and apparatus, solved the problem that traditional index technology cannot meet large data retrieval needs.
A data directory method, comprising:
Create the index tree of described at least one indexed object R;
According to the ID of at least one indexed object, set up Hash structure;
In described Hash structure, store the lifetime of described indexed object each position in described index tree.
Preferably, the index tree of at least one indexed object R of described establishment comprises:
Create the TPR-Tree of the superiors;
Under described TPR-Tree, chain is connected at least one 2 dimension R-Tree;
Each R-Tree is connected to an one dimension R-Tree by hash chain chain link.
Preferably, the described lifetime of storing described indexed object each position in described index tree in described Hash structure is specially:
In described Hash structure, store described indexed object and be in the lifetime in described TPR-Tree or described 2 dimension R-Tree or described one dimension R-Tree.
Preferably, above-mentioned data directory method also comprises:
When arbitrary indexed object being carried out to time interval inquiry or timeslice inquiry, the lifetime of searching described indexed object by described Hash structure;
According to described indexed object lifetime corresponding to each position in described index tree, determine the position of described indexed object manipulative indexing in described index tree.
Preferably, the described lifetime is specially indexed object lasting time interval under same state.
The present invention also provides a kind of data directory device, comprising:
Index tree creation module, for creating the index tree of described at least one indexed object R;
Hash structural generation module, for setting up Hash structure according to the ID of at least one indexed object;
Relating module, for storing the lifetime of described indexed object each position in described index tree in described Hash structure.
Preferably, described index tree creation module comprises:
The first creating unit, for creating the TPR-Tree of the superiors;
The second creating unit, is connected at least one 2 dimension R-Tree for chain under described TPR-Tree;
The 3rd creating unit, is connected to an one dimension R-Tree by each R-Tree by hash chain chain link.
Preferably, above-mentioned data directory device also comprises:
Index module, for when arbitrary indexed object being carried out to time interval inquiry or timeslice inquiry, the lifetime of searching described indexed object by described Hash structure, and according to described indexed object lifetime corresponding to each position in described index tree, determine the position of described indexed object manipulative indexing in described index tree.
The invention provides a kind of data directory method and apparatus, according to the ID of at least one indexed object, set up Hash structure, create the index tree of described at least one indexed object R, in described Hash structure, store the lifetime of described indexed object each position in described index tree again, hash index and two kinds of modes of index tree index are combined data are carried out to index, improve index efficiency and index precision, solved the problem that traditional index technology cannot meet large data retrieval needs.
Accompanying drawing explanation
The process flow diagram of a kind of data directory method that Fig. 1 provides for embodiments of the invention one;
Fig. 2 is index tree structural representation related in embodiments of the invention;
Fig. 3 is the schematic diagram of Hash structure and index tree incidence relation in embodiments of the invention;
The structural representation of a kind of data directory device that Fig. 4 provides for embodiments of the invention three.
Embodiment
Index technology is the important content of data management technique.It is that one-dimensional data is set up index that relevant database often adopts B+ tree and mutation thereof; In multidimensional data management domain or time-space data management field, be widely used R tree with and mutation carry out index.These index technologies all can significantly improve query processing speed.In like manner, in processing uncertain data, also need to pay close attention to index problem.In some query task, for example top-k inquiry, the probable value of tuple is also extremely important, therefore need to create one dimension index for probability dimension, and now traditional index technology is effective.But traditional index technology cannot solve all problems.
When the value of each tuple must be described by probability distribution function, and probability distribution function be when cannot specify in advance, and traditional index technology index efficiency will significantly reduce, and cannot meet application demand.
In order to address the above problem, embodiments of the invention provide a kind of data directory method and apparatus.Hereinafter in connection with accompanying drawing, embodiments of the invention are elaborated.It should be noted that, in the situation that not conflicting, the embodiment in the application and the feature in embodiment be combination in any mutually.
First by reference to the accompanying drawings, embodiments of the invention one are described.
The embodiment of the present invention provides a kind of data directory method, can carry out uncertain data management index.Traditional solution generally adopts the mode of tree index or Hash (Hash) index, but tree index technology and hash index technology have its relative merits.Such as, tree index technology is applicable to random data access; Hash index technology is applicable to sequential organization data, similar broadcast channel.Tree index technology is very effective to the data broadcast of bunch collection; But bunch set pair hash index technical feature impact is little.Hash index technology is particularly suitable for multiattribute data directory; Tree index technology provides a kind of based on the more accurate and complete global view of index value, and client's function finds the time of arrival of conceivable data rapidly on tree index, and like this, tune-in time has just shortened naturally.Because hash index does not comprise the global information of Frame, it can only to client computer judge current data frame whether with relevant the offering help of inquiry.The validity of its filtration depends on the mean failure rate of hash index to a great extent.
Use flow process that data directory method that the embodiment of the present invention provides completes data directory as shown in Figure 1, comprising:
The index tree of step 101, described at least one the indexed object R of establishment;
In the embodiment of the present invention, the superiors of this index tree are TPR-Tree, are then a plurality of 2 dimension R-Tree, the R-Tree of an one dimension of Hash link of the R-Tree of 2 dimensions.The related index tree structure of embodiments of the invention as shown in Figure 2.
TPR tree is the multichannel balanced tree with R tree construction.In tree, each non-leafy node is comprised of several (TPBR, Point) unit.TPBR is that current what comprise its corresponding child is a pointer that points to child's node with time parameter border rectangle .Point.Leafy node is comprised of several (TPBR, ObjectlD).Wherein TPBR is that what comprise corresponding mobile object is a pointer that points to mobile object with time parameter border rectangle .ObjectlD, can obtain the details of corresponding mobile object by pointer.
R-tree is that B-tree is to the another kind of form of hyperspace development, it is divided spatial object by scope, the corresponding region of each node and a disk page, the regional extent of storing its all child nodes in the disk page of non-leaf node, within its regional extent is all dropped in the region of all child nodes of non-leaf node; In the disk page of leaf node, store the boundary rectangle of all spatial objects within its regional extent.The child node number that each node can have has upper and lower limit, lower limit guarantees the effective utilization to disk space, the upper limit guarantees the corresponding disk page of each node, when inserting space that new node causes certain node to require while being greater than a disk page, and this node be divided into two (division).R tree is a kind of dynamic indexing structure, that is: its inquiry can or be deleted and carry out simultaneously with insertion, and does not need termly tree construction to be reorganized.
Step 102, according to the ID of at least one indexed object, set up Hash structure;
For whole indexed objects, can build Hash structure (Hash table) according to their ID.
Step 103, in described Hash structure, store lifetime of described indexed object each position in described index tree;
In this step, at Hash structure memory, store up each indexed object and be in the lifetime in TPR-Tree, R-Tree or (2 dimension R-Tree+1 dimension R-Tree).
The incidence relation of Hash structure and index tree as shown in Figure 3.
Step 104, when arbitrary indexed object being carried out to the inquiry of time interval inquiry or timeslice, the lifetime of searching described indexed object by described Hash structure;
Step 105, according to described indexed object lifetime corresponding to each position in described index tree, determine the position of described indexed object manipulative indexing in described index tree;
In this step, when for time interval inquiry and timeslice inquiry, by the search lifetime out in Hash structure, can directly determine which index structure from tree index starts search.
Below in conjunction with accompanying drawing, embodiments of the invention two are described.
The embodiment of the present invention provides a kind of data directory method, and the implementation procedure that the data directory the method below embodiment of the present invention being provided is applied in mobile communication environment describes.Last decade comes, and along with the development of wireless telecommunications, location technology, position-based service (LBS) has obtained application very widely.We suppose that communication company need to follow the tracks of each cellphone subscriber's real time position, to distribute rational bandwidth to specific region, guarantee the smooth and easy of communication, are unlikely to occur congestion phenomenon; Or need to know and use mobile phone as the people's of communication tool warning current location.This all needs cellphone subscriber's positional information to carry out real-time tracing.The mobile of the person that holds mobile phone may numerous and complicated, but be exactly nothing but static, similar static, low speed unrestrictedly moves, restricted high-speed mobile (this conventionally need to by the vehicles).
Can use R-tree index cellphone subscriber's static and similar stationary state; Use TPR-tree index cellphone subscriber's low speed unrestrictedly to move; And use (2-WeiR-Shu+1-WeiR-Shu) index cellphone subscriber similar stationary state in restricted high-speed mobile object.
Every record in this hybrid index has the lifetime, and the so-called lifetime refers to the translational speed of object and the time interval that direction remains unchanged.With [t start, t end] represent.
For example the cellphone subscriber railway station of going out, is first by TPR-, to be set index as a low speed mobile object; After train starts, former record is deleted and (is made t by logic endequal present time), new special record is inserted into TPR-tree, this record points to corresponding high-speed mobile object indexing.Arrive behind destination, again this special logic that records is deleted, and insert a common record to TPR-tree.
For indexed whole objects, can build Hash structure according to their ID, in this structure, also store each object and be in the lifetime in TPR-tree, R-tree or (2 WeiR-Shu+1 WeiR-Shu).For time interval inquiry and timeslice inquiry, by the search lifetime out in Hash structure, can directly determine from which index structure and start search.
Below in conjunction with accompanying drawing, embodiments of the invention three are described.
The embodiment of the present invention provides a kind of data directory device, and its structure as shown in Figure 4, comprising:
Index tree creation module 401, for creating the index tree of described at least one indexed object R;
Hash structural generation module 402, for setting up Hash structure according to the ID of at least one indexed object;
Relating module 403, for storing the lifetime of described indexed object each position in described index tree in described Hash structure.
Preferably, described index tree creation module 401 comprises:
The first creating unit, for creating the TPR-Tree of the superiors;
The second creating unit, is connected at least one 2 dimension R-Tree for chain under described TPR-Tree;
The 3rd creating unit, is connected to an one dimension R-Tree by each R-Tree by hash chain chain link.
Preferably, above-mentioned data directory device also comprises:
Index module 404, for when arbitrary indexed object being carried out to time interval inquiry or timeslice inquiry, the lifetime of searching described indexed object by described Hash structure, and according to described indexed object lifetime corresponding to each position in described index tree, determine the position of described indexed object manipulative indexing in described index tree.
Embodiments of the invention provide a kind of data directory method and apparatus, according to the ID of at least one indexed object, set up Hash structure, create the index tree of described at least one indexed object R, in described Hash structure, store the lifetime of described indexed object each position in described index tree again, hash index and two kinds of modes of index tree index are combined data are carried out to index, improve index efficiency and index precision, solved the problem that traditional index technology cannot meet large data retrieval needs.To set two kinds of index technology combinations of index and hash index, effectively raise the efficiency of multidimensional data management.
The all or part of step that one of ordinary skill in the art will appreciate that above-described embodiment can realize by computer program flow process, described computer program can be stored in a computer-readable recording medium, described computer program (as system, unit, device etc.) on corresponding hardware platform is carried out, when carrying out, comprise step of embodiment of the method one or a combination set of.
Alternatively, all or part of step of above-described embodiment also can realize with integrated circuit, and these steps can be made into respectively integrated circuit modules one by one, or a plurality of modules in them or step are made into single integrated circuit module realize.Like this, the present invention is not restricted to any specific hardware and software combination.
Each device/functional module/functional unit in above-described embodiment can adopt general calculation element to realize, and they can concentrate on single calculation element, also can be distributed on the network that a plurality of calculation elements form.
The form of software function module of usining each device/functional module/functional unit in above-described embodiment realizes and during as production marketing independently or use, can be stored in a computer read/write memory medium.The above-mentioned computer read/write memory medium of mentioning can be ROM (read-only memory), disk or CD etc.
Anyly be familiar with those skilled in the art in the technical scope that the present invention discloses, can expect easily changing or replacing, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain described in claim.

Claims (6)

1. a data directory method, is characterized in that, comprising:
The index tree that creates at least one indexed object R, comprising: the TPR-Tree that creates the superiors; Under described TPR-Tree, chain is connected at least one 2 dimension R-Tree; Described 2 dimension R-Tree are connected to an one dimension R-Tree by hash chain chain link;
According to the ID of at least one indexed object, set up Hash structure;
In described Hash structure, store the lifetime of described indexed object each position in described index tree.
2. data directory method according to claim 1, is characterized in that, the described lifetime of storing described indexed object each position in described index tree in described Hash structure is specially:
In described Hash structure, store described indexed object and be in the lifetime in described TPR-Tree or described 2 dimension R-Tree or described one dimension R-Tree.
3. data directory method according to claim 2, is characterized in that, the method also comprises:
When arbitrary indexed object being carried out to time interval inquiry or timeslice inquiry, the lifetime of searching described indexed object by described Hash structure;
According to described indexed object lifetime corresponding to each position in described index tree, determine the position of described indexed object manipulative indexing in described index tree.
4. according to the data directory method described in claim 1 or 2 or 3, it is characterized in that, the described lifetime is specially indexed object lasting time interval under same state.
5. a data directory device, is characterized in that, comprising:
Index tree creation module, for creating the index tree of at least one indexed object R;
Hash structural generation module, for setting up Hash structure according to the ID of at least one indexed object;
Relating module, for storing the lifetime of described indexed object each position in described index tree in described Hash structure;
Described index tree creation module comprises:
The first creating unit, for creating the TPR-Tree of the superiors;
The second creating unit, is connected at least one 2 dimension R-Tree for chain under described TPR-Tree;
The 3rd creating unit, is connected to an one dimension R-Tree by the dimension of 2 in described the second creating unit R-Tree by hash chain chain link.
6. data directory device according to claim 5, is characterized in that, this device also comprises:
Index module, for when arbitrary indexed object being carried out to time interval inquiry or timeslice inquiry, the lifetime of searching described indexed object by described Hash structure, and according to described indexed object lifetime corresponding to each position in described index tree, determine the position of described indexed object manipulative indexing in described index tree.
CN201210039265.3A 2012-02-20 2012-02-20 Data indexing method and device Active CN102646118B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201210039265.3A CN102646118B (en) 2012-02-20 2012-02-20 Data indexing method and device
PCT/CN2013/071627 WO2013123867A1 (en) 2012-02-20 2013-02-18 Data indexing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210039265.3A CN102646118B (en) 2012-02-20 2012-02-20 Data indexing method and device

Publications (2)

Publication Number Publication Date
CN102646118A CN102646118A (en) 2012-08-22
CN102646118B true CN102646118B (en) 2014-11-05

Family

ID=46658937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210039265.3A Active CN102646118B (en) 2012-02-20 2012-02-20 Data indexing method and device

Country Status (2)

Country Link
CN (1) CN102646118B (en)
WO (1) WO2013123867A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102646118B (en) * 2012-02-20 2014-11-05 浪潮(北京)电子信息产业有限公司 Data indexing method and device
CN102915382A (en) * 2012-11-21 2013-02-06 亚信联创科技(中国)有限公司 Method and device for carrying out data query on database based on indexes
CN105786932B (en) * 2014-12-26 2020-03-27 北大医疗信息技术有限公司 Query method and query device for clinical business in medical system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012908A (en) * 2010-11-12 2011-04-13 浙江大学 Method for inquiring visible neighbours of moving objects in environment with barriers

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1203433C (en) * 2002-06-26 2005-05-25 联想(北京)有限公司 Data storing and query combination method in a flush type system
CN101256579A (en) * 2008-04-08 2008-09-03 中兴通讯股份有限公司 Method for inquesting data organization in database
CN102646118B (en) * 2012-02-20 2014-11-05 浪潮(北京)电子信息产业有限公司 Data indexing method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012908A (en) * 2010-11-12 2011-04-13 浙江大学 Method for inquiring visible neighbours of moving objects in environment with barriers

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
李焕梅.《移动点对象Hash-R索引及反向最近邻查询》.《中国优秀硕士学位论文全文数据库》.2011,3.3.3节,图3-5. *
李贞海.《交通网络中移动对象全时态索引研究与实现》.《中国优秀硕士学位论文全文数据库》.2011,2.7.2节. *
金泽峰.《TRP-树在基于位置服务系统中的引用研究》.《中国优秀硕士学位论文全文数据库》.2008,4.2.1节,4.3.3节,图4.8. *

Also Published As

Publication number Publication date
WO2013123867A1 (en) 2013-08-29
CN102646118A (en) 2012-08-22

Similar Documents

Publication Publication Date Title
CN103294790B (en) A kind of space and time order towards GPS track data indexes and search method
CN106649656B (en) Database-oriented space-time trajectory big data storage method
CN111382226B (en) Database query and retrieval method and device and electronic equipment
CN104750681B (en) A kind of processing method and processing device of mass data
CN103714134B (en) Network flow data index method and system
CN104133867A (en) DOT in-fragment secondary index method and DOT in-fragment secondary index system
CN103729447A (en) Method for fast searching database
CN103023970A (en) Method and system for storing mass data of Internet of Things (IoT)
CN109582677B (en) R tree index optimization method of multi-granularity distributed read-write lock based on child nodes
CN104239377A (en) Platform-crossing data retrieval method and device
US10762068B2 (en) Virtual columns to expose row specific details for query execution in column store databases
CN109582678B (en) R tree index optimization method of multi-granularity distributed read-write lock based on leaf nodes
US11553023B2 (en) Abstraction layer for streaming data sources
CN103714163A (en) Pattern management method and system of NoSQL database
CN103177120A (en) Index-based XPath query mode tree matching method
CN102646118B (en) Data indexing method and device
CN102982034B (en) The searching method and search system of Internet website information
CN117112691A (en) Storage method of big data-oriented multi-storage engine database
CN104750860B (en) A kind of date storage method of uncertain data
CN110134511A (en) A kind of shared storage optimization method of OpenTSDB
CN104408183A (en) Data import method and device of data system
WO2017000592A1 (en) Data processing method, apparatus and system
CN103699556A (en) Digital local chronicle information system for compiling local chronicle and geographical information
CN101813485B (en) Electronic map based on geographic information data and navigation method thereof
CN106250443A (en) The method and system of data base's complex text inquiry are solved based on internal memory full-text search

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant