CN107818147A - Distributed temporal index system based on Voronoi diagram - Google Patents
Distributed temporal index system based on Voronoi diagram Download PDFInfo
- Publication number
- CN107818147A CN107818147A CN201710976062.XA CN201710976062A CN107818147A CN 107818147 A CN107818147 A CN 107818147A CN 201710976062 A CN201710976062 A CN 201710976062A CN 107818147 A CN107818147 A CN 107818147A
- Authority
- CN
- China
- Prior art keywords
- point
- voronoi
- data
- cluster
- points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Distributed temporal index system based on Voronoi diagram, belongs to data directory field, and for solving the problems, such as to improve available data querying method index efficiency, technical essential is:Each object s objects in each object r, data set S in data set R are calculated respectively with representing the distance between point p points, and by object r, s distributes to immediate representative point P;The immediate representative point with m object r, an object r and any object s is all collected in a Voronoi cell in R, thus produces into m Voronoi cell as subregion, output<VCm, List (Pi)>It is right;Effect is:Greatly reduce space cost so that space efficiency is very high.
Description
Technical field
The invention belongs to data directory field, is related to big data processing and spatial query algorithms application.
Background technology
With mobile communication and the fast development based on location-based service correlation technique, cloud computing, big data, Internet of Things, shifting
The technologies such as dynamic calculating and space orientation are also progressively ripe, and GPS, camera, blue-teeth data etc. are also constantly increasing, and emerge in large numbers
Substantial amounts of spatial data, this to be faced with huge challenge in the storage and processing of various spatial datas or object.
When data carry out big data processing, the problem of operation time is long, spatiotemporal data warehouse efficiency is low is frequently encountered.And
The computing system of traditional computer is poor with distributed performance parallel because only supporting limited thread, the calculating money of unit
Source usually limited (be such as limited to the size of hard disk or internal memory, CPU element computing capability is not strong etc.) and can not directly apply.
Index has important influence to large-scale data access efficiency.New space index method needs to be incorporated into tradition
Database processing engine in, so as to R-tree structures occur.R-tree is indexed in multidimensional data ring equivalent to two-dimentional B+ trees
Extension under border.The algorithm inquired about to carry out arest neighbors (Nearest Neighbor, NN) for being currently based on R-tree indexes has
A lot, but these methods all concentrate single thread execution task on a single computer.When data scale increases rapidly it is necessary to
Application distribution formula Database Systems are handled to be indexed with data query etc..
The content of the invention
In order to improve available data querying method index efficiency, the present invention provides following scheme:
A kind of distributed temporal index system based on Voronoi diagram, wherein being stored with a plurality of instruction, the instruction is suitable to
There is processor to load and perform:Row's Voronoi indexes are made down using Spark structures;In given d dimension spaces two datasets R and
S, Spark carry out burst, part mappers parallel operations simultaneously by default mechanism;Acquiescence is used in Spark tasks
reducer;Before map functions are started, obtain representing point p using pre- clustering algorithm, and be loaded into each map main memory
In;
In each map treatment progress, the burst of input is read using TextInputFormat successively,
TextInputFormat reads data into Mapper example from file, calculates each object r in data set R respectively,
Each object s objects in data set S are with representing the distance between point p points, and by object r, s distributes to immediate representative
Point P;The immediate representative point with m object r, an object r and any object s is all collected at a Voronoi in R
In cell, m Voronoi cell is thus produced into as subregion, output<VCm, List (Pi)>It is right, query point p is given, is sentenced
Its other closest subregion or most some neighbouring partition sets, mapper output initial data concentrate to closest subregion or
Each object r, s and its subregion VC of closest partition setmId;Mapper is output to Spark file system.
One space is divided into multiple disjoint polygons by Voronoi diagram, some point in each polygon
Arest neighbors be respectively positioned in the Voronoi cells where the point, each polygon in figure is referred to as associated with point p
Voronoi cells, any point in the cell where point p are all p arest neighbors.
Row's Voronoi indexes include two parts:Master index, including all cluster centres;Second index, including storage
In the presence of each subregion VC to as queue.
The described distributed temporal index system based on the row's of falling Thiessen polygon, it is based on following manner and obtains representative
Point, it is determined that internal cluster point and consecutive points, inside being clustered to the data clusters of point, selecting cluster centre after cluster is indexed,
Required data are to cluster a consecutive points for connection with internal, with this inside cluster point for the center of circle, include adjacent cluster centre
Point establishes circle, using this circle for circumscribed circle triangle as Delaunay triangles, by two different inside in this method
Cluster point establishes Delaunay triangles respectively, and the two Delaunay triangles establish Delaunay by common ground of consecutive points
The triangulation network, data object is divided into several big subregions, selects a wherein cluster representative point to turn into and represent a little, what is be divided is each
Object contains object id to be clustered in a Voronoi unit in each Voronoi grids.
Voronoi diagram is by VD (p)={ V (p1),V(p2),...,V(pm), wherein:VD (p) is the Voronoi diagram on P
Intersection, V (p1) be p1 Voronoi diagram, the set associated with all points provided, be referred to as following distance caused by p
Function Dist () Voronoi diagram, the Voronoi diagram of each p points is necessarily including the institute than other any points closer to q here
A little, thus a query point q neighbour be closure Voronoi diagram;
Voronoi units mark off a region for including n point, i.e. P on the R of space from D dimension spaces:{p1,
p2,…,pn, the region that subregion VC is provided, i.e. VC subregions are on point piRegion VC (pi), if meeting VC (pi)=p | d (p, pi)
≤(p,pj), then the region is referred to as the Voronoi unit associated with p;
Wherein:Wherein p is specified point or query point, d (p, pi) it is p and piBetween minimum Eustachian distance, i, j are variables,
N >=2, p1≠p2, i ≠ j, i, j ∈ In=1 .., n, and i takes all values in 1 .., the n, when often taking a value, j is taken all over 1 ..,
Except all values of i values now in n.
Beneficial effect:The present invention uses the indexing means of Voronoi diagram, due to having used multidimensional Voronoi indexes, the rope
Draw support Spatial-data Integration, be suitable for indexing the data set of various dimensions, can support mass data collection and various dimensions, and due to
Preferable Spatial Objects storage needs a very small space, because we only need to store the representative point letter of each object
Breath, so greatly reducing space cost so that space efficiency is very high, using arranging safe polygon to distributed medical space-time
Region is indexed, and this solution has important influence to large-scale data access efficiency.
Brief description of the drawings
Fig. 1 .Voronoi scheme
Fig. 2 fall to arrange Voronoi diagram index schematic diagram;
The example key diagram of Fig. 3 present invention;
Fig. 4 .Delaunay triangulation networks establish schematic diagram;
Specific embodiment party
Embodiment 1:A kind of distributed temporal index method based on Voronoi diagram, this method is by based on Voronoi diagram
Distributed temporal index system performs, and described system is wherein stored with a plurality of instruction, and the instruction is suitable to have processor loading
And perform, its step is as follows:Row's Voronoi indexes are made down using Spark structures, give two datasets R and S in d dimension spaces,
Spark is a kind of existing computing engines, and it carries out burst by default mechanism, and part mappers is simultaneously parallel to be run,
Using the reducer of acquiescence in Spark tasks, before map functions are started, obtain representing point p using pre- clustering algorithm, and will
It is loaded into each map main memory;
In each map treatment progress, the burst of input is read using TextInputFormat successively,
TextInputFormat reads data into Mapper example from file, calculates each object r in data set R respectively,
Each object s objects in data set S are with representing the distance between point p points, and by object r, s distributes to immediate representative
Point P;The immediate representative point with m object r, an object r and any object s is all collected at a Voronoi in R
In cell, m Voronoi cell is thus produced into as subregion, output<VCm, List (Pi)>It is right, PiIt is one obtained
Series is immediate to be represented a little, and i represents the position sequence of point, gives query point p, differentiates its closest subregion or most some are neighbouring
Partition set, each object r, s to closest subregion or closest partition set that mapper output initial data is concentrated
And its subregion VCmId;Mapper is output to Spark file system.
Wherein described Voronoi diagram, it is that a space is divided into multiple disjoint polygons, in each polygon
In arest neighbors of some point be respectively positioned in the Voronoi cells where the point, each polygon in figure is referred to as and point p
Associated Voronoi cells, any point in the cell where point p are all p arest neighbors.
Voronoi diagram is by VD (p)={ V (p1),V(p2),...,V(pm) wherein:VD (p) is the Voronoi diagram on P
Intersection, V (p1) be p1 Voronoi diagram, the set associated with all points provided, be referred to as following distance caused by p
Function Dist () Voronoi diagram, the Voronoi diagram of each p points is necessarily including the institute than other any points closer to q here
A little, thus a query point q neighbour be closure Voronoi diagram;
Voronoi units mark off a region for including n point, i.e. P on the R of space from D dimension spaces:{p1,
p2,…,pn, the region that subregion VC is provided, i.e. VC subregions are on point piRegion VC (pi), if meeting VC (pi)=p | d (p, pi)
≤(p,pj), then the region is referred to as the Voronoi unit associated with p;
Wherein:Wherein p is specified point or query point, d (p, pi) it is p and piBetween minimum Eustachian distance, i, j are variables,
N >=2, p1≠p2, i ≠ j, i, j ∈ In=1 .., n, and i takes all values in 1 .., the n, when often taking a value, j is taken all over 1 ..,
Except all values of i values now in n.
Row's Voronoi indexes include two parts:Master index, including all cluster centres;Second index, including storage
In the presence of each subregion VC to as queue.
The acquisition methods of point are represented, it is determined that internal cluster point and consecutive points, by the internal data clusters for clustering point, after cluster
Select cluster centre to be indexed, required data are the consecutive points with internal cluster point connection, are circle with this inside cluster point
The heart, circle is established comprising adjacent cluster centre point, Delaunay triangles, we are used as the triangle of circumscribed circle using this circle
Two different inside cluster points are established into Delaunay triangles respectively in method, the two Delaunay triangles are with consecutive points
Establish Delaunay triangulation network for common ground, data object be divided into several big subregions, select a wherein cluster representative point into
To represent a little, each object being divided to be clustered in a Voronoi unit, in each Voronoi grids containing pair
As id.
Embodiment 2:Further scheme supplement or explanation of the present embodiment as embodiment 1, as shown in Figure 1, Voronoi
One space is divided into multiple disjoint polygons by figure.The arest neighbors of some point in each polygon is respectively positioned on this
In Voronoi cells where point.Each polygon in figure is referred to as the Voronoi cell associated with point p.This sampling point
Any point in cell where p is all p arest neighbors.So in the K-NN search based on Voronoi, each
The data point p of Voronoi cells may serve to be verified its whether be some query point q neighbour.And inverted index leads to
It is usually used in the search of text similarity, the position of record is determined by property value.
Voronoi diagram (Voronoi Diagram, VD):By VD (p)={ V (p1),V(p2),...,V(pm) provide with
The associated set of all points, is referred to as the Voronoi diagram that distance function Dist () is followed caused by p.Here each p points
Voronoi diagram necessarily include than other any points closer to q institute a little.Therefore query point q neighbour is closure
Voronoi diagram.Accompanying drawing 1 shows 8 Neighbor Points in the two-dimentional Euclidean space of Voronoi diagram.
Voronoi units (Voronoi Cell, VC):On the R of space, one is marked off from D dimension spaces and includes n point
Region, i.e. P:{p1,p2,…,pn, wherein n >=2, p1≠p2, i ≠ j, i, j ∈ InThe region VC that=1 .., n.VC are provided
(pi)=p | d (p, pi)≤(p,pj), wherein d (p, pi) it is p and piBetween minimum Eustachian distance, then the region be referred to as and pi
Associated Voronoi units.
Our row's of falling Voronoi indexes are to be combined inverted index and Vornoi indexes, produce new index, simultaneous
Both advantages of tool.The Voronoi indexes of the specific row of falling are the extensive spatial data structures of storage mapping data point.Given one
Individual large data sets P, it includes the set of data objects in Euclidean space, and for directoried data set, each object is to be clustered one
In individual Voronoi units, Voronoi diagram can be expressed as VC (p)={ VC1,VC2,…,VCm}.We are using VC (p) as the row of falling
The key value of index.All data object { Pi}∈VCmId be stored in queue and be used as value.That is, each Voronoi
Contain substantial amounts of object id in grid.
In such a system, face it is following some:
S1. the data handled are very big;
S2. query point occurs at random, is not included in data set, while data set is probably that distribution tilts
's;
S3. the data model established under multidimensional theorem in Euclid space and distance.
Arrange Voronoi indexes (Inverted Voronoi Index, IVI) and include two portions
Point:S1. master index, including all cluster centres;
S2. the second index, including be stored in each VC to as queue.Inverted index be in order to effectively index position with
Data object in the adjacent queue of query object.When a given inquiry, we differentiate closest VC or most one at can
A little neighbouring VC collection.Then the corresponding queue element (QE)s of these VC are included to come, so as to obtain kNN query resultses.
As shown in Figure 2, an IVI for including two-dimensional space object is illustrated, is divided based on Voronoi, we will be right
As being divided into 6 subregions.For the sake of simplicity, we select P as representing a little,Therefore, each object most connects with it
Near representative point has been each assigned to same Voronoi cells.Intuitively, the side of Voronoi diagram index partition is arranged
Method is that hyperspace is divided into the Voronoi units of multiple forms of falling row.
Therefore, our IVI has advantages below:
S1. mass data collection is supported:Because the row's of falling Voronoi diagram index structure inherits the form of inverted index,
It is very directly perceived it is known that, this index scheme is applied to distributed treatment.
S2. various dimensions are supported:Multidimensional Voronoi indexes are make use of, the index supports Spatial-data Integration, is suitable for indexing
The data set of various dimensions.
S3. space efficiency:Preferable Spatial Objects storage needs a very small space.Because we only need to store
The representative point information of each object, so greatly reducing space cost.
Build Spark and fall to arrange Voronoi diagram index
How we using Spark builds IVI if introducing.Because Voronoi diagram can be multiple with merging by fractionation
Voronoi diagram (VP) obtains, so construction falls row's Voronoi indexes and is applied to Spark models.Particularly every sub- VP is closed
And obtain Voronoi to the end.
As shown in algorithm 1:Two datasets R and S are given in given d dimension spaces.Spark peace default mechanisms carry out burst.
Some mappers parallel operations simultaneously.In Spark tasks, we use the reducer given tacit consent to.Start map functions it
Before, we obtain representing point p using quick pre- clustering algorithm, and are loaded into each map main memory.
Then, in each map treatment progress, it will read point of input using TextInputFormat successively
Piece (presses the pattern of the input in distributed file system), and TextInputFormat can read data to Mapper's from file
In example.Each r, the distance between s objects and p points are calculated, and by r, s distributes to immediate representative point P. in algorithm
In 2-3 rows, each point is collected in a Voronoi cell, and it will be produced into m Voronoi cell, in algorithm
It can be exported in 4-6 rows<VCm, List (Pi)>Right, mapper output raw data sets (R or S) arrive each of hithermost subregion
Individual object r, s and its subregion VCmId.
Finally, in algorithm 8-10 rows, it would be desirable to needed according to what is controlled oneself by customized
Mapper is output to Spark file system by MultipleOutputFormat functions.It is determined how task result
Write back in the lasting storage of bottom.Voronoi index structure of the structure based on Spark is described in detail in we in algorithm 1
Algorithm pseudo code.Using IVI, if given one represents a little, our cans start Spark tasks to carry out data partition simultaneously
Collect some data messages of each subregion.
Embodiment 3:In today that medical social security service develops rapidly at a high speed, with the living standard day of people
Benefit improves, and also becomes more hommization and personalization for the demand of medical services.Also there are increasing people to need simultaneously
Medical services that will be more convenient and perfect.Simultaneously with mobile communication and the fast development based on location-based service correlation technique,
The technologies such as cloud computing, big data, Internet of Things, mobile computing and space orientation are also progressively ripe, and GPS, camera, bluetooth number
Also constantly increasing according to waiting, emerging in large numbers substantial amounts of spatial data, this causes the storage and processing of various spatial datas or object
In be faced with huge challenge.Electronic health record, nursing call center system, extensive medical data base in industry of medical care
Also improving operating efficiency in fast development, portable medical correlation technique Deng application, improving medical services, Economy type medicine cost etc.
Aspect has played more and more effects.
It is especially flourishing but China's geographical environment difference is huge, economic development is uneven, medical resource skewness weighing apparatus
Area is compared with outlying district, and medical level is there is also very big difference, while as rural area is to industries such as urban migration, tourisms
Rapid development so that exponentially type increases on the basis of script population mobility is big, and patient is frequently encountered originally to one
When individual local, it is unknown to where see a doctor after suffering from the disease, stands in the queue to register it is more likely that need several months ahead of time to preengage hospital,
Toss about multiple hospitals by bus, most a large amount of manpower financial capacities have been wasted in traffic etc. at last, and disease does not obtain in time
The problem for the treatment of.It is daily that we are also frequently encountered when needing emergency treatment, do not know but around have what hospital, which hospital's energy
This state of an illness is handled, which hospital position is more preferable closer to, service from patient, so as to because the delay time at stop, causing treatment not in time,
Tragedy because of delay treatment and lethal even can occur.
Although there are the website of oneself in more hospitals at present, it can in advance register, inquire about, online interrogation also becomes to hold very much
Easily, but hospital of China is numerous, and it is difficult to distinguish the true from the false for size medical web site, and online doctor's qualification cannot get certification, while PC end equipments
It is not easy to carry, when needing complicated inquiry and family's distress call so that related interrogation of seeing a doctor becomes extremely difficult.
In recent years, with the arrival in medical big data epoch, there are the related data of more medical resources.Mobile doctor
The concept for the treatment of is arisen at the historic moment, and so-called portable medical refers to use mobile communication technology and equipment, and any place carries at any time
For the medical services suitable for masses and medical information.In development in recent years, the skill such as internet, mobile communication, multimedia
The rapid development of the rapid development of art, especially 3G, 4G technology, portable medical technology is set to achieve significant progress.But in recent years
Come, it has been found that when carrying out big data processing for this kind of portable medical data, be frequently encountered operation time length, space-time data
The problem of search efficiency is low.And the computing system of traditional computer is because only support limited thread, parallel with distribution
Poor performance, the computing resource of unit are usually limited (be such as limited to the size of hard disk or internal memory, CPU element computing capability is not strong etc.)
And the processing of Large-scale Mobile medical data can not be directly applied to.This big data inquiry given in Mobile medical system and processing band
Come a series of with challenging.
It is well known that index has important influence to large-scale data access efficiency.New space index method needs
It is incorporated into traditional database processing engine, so as to R-tree structures occur.R-tree indexes equivalent to two-dimentional B+ trees
Extension under multidimensional data environment.It is currently based on being looked into carry out arest neighbors (Nearest Neighbor, NN) for R-tree indexes
The algorithm of inquiry has a lot, but these methods all concentrate single thread execution task on a single computer.When data scale is rapid
Handled during growth it is necessary to application distribution formula Database Systems to be indexed with data query etc..
The distributed temporal index method based on Voronoi diagram in embodiment 1 or 2 is applied to mobile cure by the present embodiment
Calling field is treated, current existing medical call system there are three kinds, there is bus medical care intercom system, IP network Semi-digital medical care
Intercom system, IP network medical care information intercom system.And these medical call systems have significant limitation, they can only
Short range transmission information, if patient not in the range of information transfer, can not perform.And it is used to performing and described is based on Voronoi
The medical call system of the distributed temporal index method of figure is not influenceed then by these, and it can effectively be carried under distributed environment
NN Query efficiency in tall and big size range.This just makes this invention particularly important, especially for paroxysmal disease or
Need the patient the more paid close attention to, it is necessary to preferably service is provided, while be also required for a kind of equipment can more preferable corresponding disease
Communication between the service of people's needs and medical personnel, there is provided a good medical environment.
The system of the distributed temporal index method based on Voronoi diagram is able to carry out, by the information of patient according to attribute
After being classified, establishing turns into internal cluster point, and when patient uses medical call system, system is analyzed according to patient information to be belonged to
Property, which kind of analysis patient now needs most and helps, and is the help of help or the life inconvenience of great medical knowledge.At this moment, exist
The point in the Thiessen polygon nearest from it is found out using patient information as discrete points data, is now needed most so as to obtain patient
Help, to make patient obtain best help.
The present invention, the system for being able to carry out the distributed temporal index method based on Voronoi diagram, due to having used multidimensional
Voronoi indexes, the index support Spatial-data Integration, are suitable for indexing the data set of various dimensions, can support mass data collection
And various dimensions, and a very small space is needed because preferable Spatial Objects store, because we only need storage every
The representative point information of one object, so greatly reducing space cost so that space efficiency is very high, can make patient timely
Get help.
In another embodiment scheme, the row's of falling Voronoi diagram index is based on to build using Spark, 3-dimensional is given in space
Fixed two medical associated data set R and S, R are medical resource data sets, including such as the reaction medical treatment such as doctor, Medical Devices, position
The data set of resource information.S is patient data set, includes the data set of the reaction such as patient's case information, position conditions of patients,
The two data sets are uploaded in HDFS, because Spark peace default mechanisms carry out burst.Some mappers parallel operations simultaneously.
In Spark tasks, we use the reducer given tacit consent to.Before map functions are started, we use quick pre- clustering algorithm
The representative point p of the medical resource in a region is obtained, and is loaded into each map main memory.
Then, in each map treatment progress, it will read point of input using TextInputFormat successively
Piece (presses the pattern of the input in distributed file system), and file can be read data by TextInputFormat in a streaming manner
Into Mapper example.Calculate each medical resource data r object, the distance between patient data s objects and p points, and
By r, s distributes to immediate representative point P, and in the algorithm, it is mono- that each medical resource representative point is collected at a Voronoi
In first lattice, production (in actual scene, is exactly that an extensive medical resource is concentrated, is divided into m by it into m Voronoi cell
There is the representative for representing a medical resource point in the medical area of same nature, such as a city medical centre, each region, than
Such as say a Grade A hospital), such program can export upon execution<VCm, List (Pi)>It is right, mapper output raw data sets
(R or S) arrives each object r, s and its subregion VC of hithermost subregionmId.We need to be passed through according to the needs controlled oneself
Mapper is output to Spark file system by customized MultipleOutputFormat functions.It determine how by
Task result is write back in the lasting storage of bottom.Using the row of falling medical IVI, if giving the inquiry of a patient user
Request, such as a hospital for meeting case diagnosis and treatment needs is found from the medical data in the whole nation, we start can
Spark tasks carry out data partition and collect some data messages of each subregion.Medical treatment is found by the key of inverted index
It is a representative hospital that resource, which represents point, then finds correlation by the specific data of hospital and need medical resource, and is fed back to
Patient.Thus can quickly using Spark data handling system Spark using number with thousand note computers, in a distributed manner
The distributed data for finding correlation from extensive medical resource.
Claims (5)
1. a kind of distributed temporal index system based on Voronoi diagram, wherein being stored with a plurality of instruction, the instruction is suitable to have
Processor is loaded and performed:Row's Voronoi indexes are made down using Spark structures;Two datasets R and S in given d dimension spaces,
Spark carries out burst, part mappers parallel operations simultaneously by default mechanism;Acquiescence is used in Spark tasks
reducer;Before map functions are started, obtain representing point p using pre- clustering algorithm, and be loaded into each map main memory
In;
In each map treatment progress, the burst of input is read using TextInputFormat successively,
TextInputFormat reads data into Mapper example from file, calculates each object r in data set R respectively,
Each object s objects in data set S are with representing the distance between point p points, and by object r, s distributes to immediate representative
Point P;The immediate representative point with m object r, an object r and any object s is all collected at a Voronoi in R
In cell, m Voronoi cell is thus produced into as subregion, output<VCm, List (Pi)>It is right, query point p is given, is sentenced
Its other closest subregion or most some neighbouring partition sets, mapper output initial data concentrate to closest subregion or
Each object r, s and its subregion VC of closest partition setmId;Mapper is output to Spark file system.
2. the distributed temporal index system as claimed in claim 1 based on the row's of falling Thiessen polygon, it is characterised in that:
One space is divided into multiple disjoint polygons by Voronoi diagram, the arest neighbors of some point in each polygon
It is respectively positioned in the Voronoi cells where the point, each polygon in figure is referred to as the Voronoi unit associated with point p
Lattice, any point in the cell where point p are all p arest neighbors.
3. the distributed temporal index system as claimed in claim 1 based on the row's of falling Thiessen polygon, it is characterised in that:Arrange
Voronoi indexes include two parts:Master index, including all cluster centres;Second index, including it is stored in each subregion
VC to as queue.
4. the distributed temporal index system as claimed in claim 1 based on the row's of falling Thiessen polygon, it is characterised in that:Its base
Obtain and represented a little in following manner, it is determined that internal cluster point and consecutive points, inside is clustered to the data clusters of point, selected after cluster
Cluster centre is indexed, and required data are and the internal consecutive points for clustering point connection, with this inside cluster point for the center of circle, bag
Establish circle containing adjacent cluster centre point, using this circle for circumscribed circle triangle as Delaunay triangles, in this method
Two different inside cluster points are established into Delaunay triangles respectively, the two Delaunay triangles are common using consecutive points
Delaunay triangulation network is established with point, data object is divided into several big subregions, selects a wherein cluster representative point to turn into generation
Table point, each object being divided contain object id to be clustered in a Voronoi unit in each Voronoi grids.
5. the distributed temporal index system as claimed in claim 4 based on the row's of falling Thiessen polygon, it is characterised in that:
Voronoi diagram is by VD (p)={ V (p1),V(p2),...,V(pm), wherein:VD (p) is the Voronoi diagram intersection on P, V
(p1) be p1 Voronoi diagram, the set associated with all points provided, be referred to as following distance function caused by p
Dist () Voronoi diagram, here the Voronoi diagram of each p points necessarily include than other any points closer to q institute a little,
Thus query point q neighbour is the Voronoi diagram of closure;
Voronoi units mark off a region for including n point, i.e. P on the R of space from D dimension spaces:{p1,p2,…,
pn, the region that subregion VC is provided, i.e. VC subregions are on point piRegion VC (pi), if meeting VC (pi)=p | d (p, pi)≤(p,
pj), then the region is referred to as the Voronoi unit associated with p;
Wherein:Wherein p is specified point or query point, d (p, pi) it is p and piBetween minimum Eustachian distance, i, j are variables, n >=
2, p1≠p2, i ≠ j, i, j ∈ In=1 .., n, and i takes all values in 1 .., n, when often taking a value, j is taken in 1 .., n
Except all values of i values now.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710976062.XA CN107818147A (en) | 2017-10-19 | 2017-10-19 | Distributed temporal index system based on Voronoi diagram |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710976062.XA CN107818147A (en) | 2017-10-19 | 2017-10-19 | Distributed temporal index system based on Voronoi diagram |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107818147A true CN107818147A (en) | 2018-03-20 |
Family
ID=61608145
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710976062.XA Pending CN107818147A (en) | 2017-10-19 | 2017-10-19 | Distributed temporal index system based on Voronoi diagram |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107818147A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112925789A (en) * | 2021-02-24 | 2021-06-08 | 东北林业大学 | Spark-based space vector data memory storage query method and system |
WO2022063150A1 (en) * | 2020-09-27 | 2022-03-31 | 阿里云计算有限公司 | Data storage method and device, and data query method and device |
CN115272769A (en) * | 2022-08-10 | 2022-11-01 | 中国科学院地理科学与资源研究所 | Automatic moon impact pit extraction method and device based on machine learning |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101799939A (en) * | 2010-04-02 | 2010-08-11 | 天津大学 | Rapid and self-adaptive generation algorithm of intermediate viewpoint based on left and right viewpoint images |
CN101959218A (en) * | 2009-10-25 | 2011-01-26 | 苏州大学 | Method for detecting event region based on splay tree |
CN104834719A (en) * | 2015-05-12 | 2015-08-12 | 北京比酷天地文化股份有限公司 | Database system applied to real-time big data scene |
CN105205169A (en) * | 2015-10-12 | 2015-12-30 | 中国电子科技集团公司第二十八研究所 | Distributed image index and retrieval method |
CN105760469A (en) * | 2016-02-05 | 2016-07-13 | 大连大学 | High-dimensional approximate image retrieval method based on inverted LSH in cloud computing environment |
CN105760468A (en) * | 2016-02-05 | 2016-07-13 | 大连大学 | Large-scale image querying system based on inverted position-sensitive Hash indexing in mobile environment |
CN106209989A (en) * | 2016-06-29 | 2016-12-07 | 山东大学 | Spatial data concurrent computational system based on spark platform and method thereof |
CN106528773A (en) * | 2016-11-07 | 2017-03-22 | 山东首讯信息技术有限公司 | Spark platform supported spatial data management-based diagram calculation system and method |
-
2017
- 2017-10-19 CN CN201710976062.XA patent/CN107818147A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101959218A (en) * | 2009-10-25 | 2011-01-26 | 苏州大学 | Method for detecting event region based on splay tree |
CN101799939A (en) * | 2010-04-02 | 2010-08-11 | 天津大学 | Rapid and self-adaptive generation algorithm of intermediate viewpoint based on left and right viewpoint images |
CN104834719A (en) * | 2015-05-12 | 2015-08-12 | 北京比酷天地文化股份有限公司 | Database system applied to real-time big data scene |
CN105205169A (en) * | 2015-10-12 | 2015-12-30 | 中国电子科技集团公司第二十八研究所 | Distributed image index and retrieval method |
CN105760469A (en) * | 2016-02-05 | 2016-07-13 | 大连大学 | High-dimensional approximate image retrieval method based on inverted LSH in cloud computing environment |
CN105760468A (en) * | 2016-02-05 | 2016-07-13 | 大连大学 | Large-scale image querying system based on inverted position-sensitive Hash indexing in mobile environment |
CN106209989A (en) * | 2016-06-29 | 2016-12-07 | 山东大学 | Spatial data concurrent computational system based on spark platform and method thereof |
CN106528773A (en) * | 2016-11-07 | 2017-03-22 | 山东首讯信息技术有限公司 | Spark platform supported spatial data management-based diagram calculation system and method |
Non-Patent Citations (1)
Title |
---|
吴晓兵: "基于Voronoi图的分布式反最近邻查询方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022063150A1 (en) * | 2020-09-27 | 2022-03-31 | 阿里云计算有限公司 | Data storage method and device, and data query method and device |
CN112925789A (en) * | 2021-02-24 | 2021-06-08 | 东北林业大学 | Spark-based space vector data memory storage query method and system |
CN112925789B (en) * | 2021-02-24 | 2022-12-20 | 东北林业大学 | Spark-based space vector data memory storage query method and system |
CN115272769A (en) * | 2022-08-10 | 2022-11-01 | 中国科学院地理科学与资源研究所 | Automatic moon impact pit extraction method and device based on machine learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kaur et al. | Efficient resource management system based on 4vs of big data streams | |
JP2017523485A (en) | Techniques for tiling location-based information with server control | |
CN107818147A (en) | Distributed temporal index system based on Voronoi diagram | |
Ho et al. | Distributed graph database for large-scale social computing | |
Li et al. | Aggregated multi-attribute query processing in edge computing for industrial IoT applications | |
CN110300122A (en) | A kind of Internet of Things electronic information processing system and method | |
CN105117497A (en) | Ocean big data master-slave index system and method based on Spark cloud network | |
CN104850593A (en) | Big data-based emergency supplies data storage and circulation monitoring method | |
Zeng et al. | Data visualization for air quality analysis on bigdata platform | |
Ding et al. | ComMapReduce: An improvement of MapReduce with lightweight communication mechanisms | |
Li et al. | Efficient subspace skyline query based on user preference using MapReduce | |
Duan et al. | Distributed in-memory vocabulary tree for real-time retrieval of big data images | |
Xia et al. | DAPR-tree: a distributed spatial data indexing scheme with data access patterns to support Digital Earth initiatives | |
CN107844532A (en) | Based on MapReduce and the extensive nearest Neighbor for arranging Thiessen polygon | |
CN107679216A (en) | The distributed temporal index method of the row's of falling Thiessen polygon of portable medical and application | |
CN107766495A (en) | Distributed temporal index method based on Voronoi diagram | |
Khedr | Decomposable algorithm for computing k-nearest neighbours across partitioned data | |
Liu et al. | A Novel DBSCAN Clustering Algorithm via Edge Computing‐Based Deep Neural Network Model for Targeted Poverty Alleviation Big Data | |
CN107766496A (en) | Based on MapReduce and the extensive NN Query system for arranging Thiessen polygon | |
Akhtar et al. | Challenges in managing real-time data in health information system (HIS) | |
Wang et al. | A novel visual analytics approach for clustering large-scale social data | |
CN113724007A (en) | Patient population selection method, device, equipment and computer-readable storage medium | |
Xie et al. | Application of Internet of Things Sensor in Intelligent Art‐Aided Design | |
Jogar et al. | Chronic diseases prediction over bigdata by using machine learning | |
Chen et al. | Spatio-temporal keywords queries in HBase |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180320 |
|
RJ01 | Rejection of invention patent application after publication |