CN109977149A - Crime big data point pattern analysis method based on G-function and improvement KD tree - Google Patents

Crime big data point pattern analysis method based on G-function and improvement KD tree Download PDF

Info

Publication number
CN109977149A
CN109977149A CN201910204662.3A CN201910204662A CN109977149A CN 109977149 A CN109977149 A CN 109977149A CN 201910204662 A CN201910204662 A CN 201910204662A CN 109977149 A CN109977149 A CN 109977149A
Authority
CN
China
Prior art keywords
tree
crime
point
function
closest
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910204662.3A
Other languages
Chinese (zh)
Inventor
何雨情
杨立涛
白璐斌
黄舒哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201910204662.3A priority Critical patent/CN109977149A/en
Publication of CN109977149A publication Critical patent/CN109977149A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • G06Q50/265Personal security, identity or safety

Abstract

The invention discloses a kind of based on G-function and improves the crime big data point pattern analysis method of KD tree parallel computation, the demand that the present invention is handled for crime big data instantly, by combining closest range points pattern analysis method (G-function) to provide a kind of method that can rapidly analyze crime spatial distribution mode improved KD tree parallel computation --- the crime big data point pattern analysis method based on G-function and improvement KD tree parallel computation.Crime point event sub-clustering in space is constructed KD tree by this method, the closest distance of crime point event in each KD tree of parallel computation, with playing it is whole be effect scattered, that block parallel is handled, quickening computational efficiency improves the utilization rate of computing resource.

Description

Crime big data point pattern analysis method based on G-function and improvement KD tree
Technical field
The invention belongs to big data excavation applications, are related to a kind of crime big data point pattern analysis method, and in particular to one The novel crime big data point pattern analysis method based on G-function and improvement KD tree parallel computation of kind.
Background technique
With the development of internet technology, the world today has come into big data era.Big data is sea in form The set for measuring relevant data refers to the ability for collecting and analyzing bulk information in practical application.In recent ten years, public security machine Close informatization and achieve the progress advanced by leaps and bounds, it is established that it is longitudinal on earth, lateral to the Police Information network on side, each police Kind business realizes information system management comprehensively, builds up the basic business data of magnanimity.Wherein, crime data amount it is big and disperse, Complicated, information extraction difficulty is constituted, traditional crime dramas analysis management mode embarrassment heavy burden is made, is badly in need of deep transition.And lead to Cross collection, arrangement, classification, the analysis to mass data, it can be deduced that the indetectable crime spatial distribution feature of traditional means, And then the immense value contained in mining data.
When the cholera map for pushing away Snow, which terminates the example of spatial point patterns the most famous research in geography Occur within 1853 popular in the cholera disease in London.Analysis space point distribution pattern is calculated to quantification to measure from the 1960s The revolution epoch are prevailing, are widely used in earth science research.Such as residential area density research (Dacey, 1962;King, 1962) drumlin distribution (Trenhaile, 1971) and in ice formation etc..With the rise of geospatial information system technology, mould is put Important content of the formula as spatial analysis, is furtherd investigate and is widely applied, and the moulds such as closest distance algorithm G-function occurs Type.
Closest distance algorithm G-function can analyze the spatial distribution characteristic of crime dramas point, need to ask institute when calculating There is the closest point of an event.In face of the crime initial data of nowadays magnanimity, traditional traversal search method calculate it is closest away from From when need to calculate the distance between all the points in central point and range one by one, calculate overlong time, serious waste of resources, efficiency Lowly.
Summary of the invention
In order to solve the above-mentioned technical problems, the present invention provides it is a kind of it is novel by G-function and improve KD tree parallel based on The crime big data point pattern analysis method of calculation.
The technical scheme adopted by the invention is that: a kind of crime big data based on G-function and improvement KD tree parallel computation Point pattern analysis method, which comprises the following steps:
Step 1: data prediction;
Point is divided into several points cluster by clustering algorithm, sets threshold by the pending crime dramas point coordinate of input institute Value is used to judge whether cluster is excessive, if it is continues to cluster sub-clustering, until the number at each cluster midpoint is appropriate;So KD tree is established using parallel computation strategy to each cluster afterwards;
Step 2: searching for closest point;
For each cluster calculated where point inquires it, and determine the KD tree where the cluster;It is searched in KD tree later The point closest to all the points, and calculate closest distance dmin, finish, owned until the point of all inputs all calculates The closest distance of point;
Step 3: calculating G-function;
In magnitude order by the closest distance of obtained all the points, the change journey R and group for calculating closest distance away from D, Middle R=max (dmin)-min(dmin), the quantity according to group away from upper limit value stored counts point, and calculate cumulative frequency G (d);
Step 4: carrying out significance test and obtain analysis result;
Using the method for Monte Carlo stochastic simulation, if stochastic simulation distribution functionProbability greater than upper bound U (d)WithProbability less than lower bound L (d)MeetThen calculated result meets significance test index, output G (d) about The curve graph of distance d judges the Spatial Distribution Pattern of point data;With the variation of distance d, crime dramas statistic frequency becomes Change, if fruit dot event tends to Assembled distribution in space, G-function value can in shorter distance rapid increase;If dot pattern Middle event tends to dispersed distribution, then G-function value increases just relatively slower.
The present invention improves traditional traversal search method, with improved KD tree (the tree data knot in segmentation K dimension data space Structure) the closest point of algorithm search, the crime data point set after parallel processing piecemeal calculates corresponding function and carries out conspicuousness inspection It tests, the Spatial Distribution Pattern of crime point event is judged according to corresponding function curve, data calculating speed greatly improved, solve Nowadays the complicated calculations problem in magnanimity crime data, further increases computing resource utilization rate.
Detailed description of the invention
Fig. 1 is the flow chart of the embodiment of the present invention;
Fig. 2 is the G-function analysis graph of California, USA crime data in the embodiment of the present invention.
Specific embodiment
Understand for the ease of those of ordinary skill in the art and implement the present invention, with reference to the accompanying drawings and embodiments to this hair It is bright to be described in further detail, it should be understood that implementation example described herein is merely to illustrate and explain the present invention, not For limiting the present invention.
Spatial distribution mould based on the closest distance algorithm G-function analysis crime dramas in spatial point patterns analysis method Need to ask the closest point of all crime points when formula, traditional traversal search mode need to follow certain order, systematically access one Whole elements in a data structure, this searching method are only applicable to the lesser situation of data volume, and biggish in data volume When, it may appear that calculate the problem that the time is long, computational efficiency is low.
The demand that the present invention is handled for crime big data instantly, it is closest by combining improved KD tree parallel computation Range points pattern analysis method (G-function) provides a kind of method that can rapidly analyze crime spatial distribution mode --- it is based on G-function and the crime big data point pattern analysis method for improving the parallel computation of KD tree.This method is by the crime point event in space Sub-clustering constructs KD tree, the closest distance for the event of putting of committing a crime in each KD tree of parallel computation, with playing it is whole be scattered, block parallel The effect of processing accelerates computational efficiency, improves the utilization rate of computing resource.
Referring to Fig.1, a kind of crime big data dot pattern based on G-function and improvement KD tree parallel computation provided by the invention Analysis method, comprising the following steps:
Step 1: data prediction;
Point is divided into several points cluster by clustering algorithm, sets threshold by the pending crime dramas point coordinate of input institute Value is used to judge whether cluster is excessive, if it is continues to cluster sub-clustering, until the number at each cluster midpoint is appropriate;So KD tree is established using parallel computation strategy to each cluster afterwards, the closest list of each point will appear in cluster.
KD tree is established in the present embodiment, calculates every one-dimensional variance of all data in each cluster first, then selection side Difference it is maximum that it is one-dimensional in all data median as super face, i.e. root node is divided, finally determining left subtree right subtree, is passed Return progress, until leaf node.
Step 2: searching for closest point;
For each cluster calculated where point inquires it, and determine the KD tree where the cluster;It is searched in KD tree later The point closest to all the points, and calculate closest distance dmin, finish, owned until the point of all inputs all calculates The closest distance of point;
In the present embodiment, closest point is searched in KD tree using multi-threaded parallel search.
Improved KD tree parallel algorithm, can be according to the spacial distribution density automatic cluster of mass data point, each group It forms cluster and KD tree is established rapidly using parallel mode, the time consumed by KD tree is established in saving.After establishing KD tree, utilize Parallel computation handles data using multithreading, dramatically saves data processing time.This method avoid establish one A huge KD tree, but multiple lesser KD trees are established, reduce the time for building KD tree, more reduces the closest point of search Time, improve data-handling efficiency.
Step 3: calculating G-function;
In magnitude order by the closest distance of obtained all the points, the change journey R and group for calculating closest distance away from D, Middle R=max (dmin)-min(dmin), the quantity according to group away from upper limit value stored counts point, and calculate cumulative frequency G (d);
In the present embodiment, the cumulative frequency G of a closest distance is constructed using the distance of all closest events (d);
In formula, siIt is event in survey region;N is the quantity of an event;D is distance;#(dmin(si)≤d) Indicate the counting of closest point of the distance less than d.
Step 4: carrying out significance test and obtain analysis result;
Using the method for Monte Carlo stochastic simulation, if stochastic simulation distribution functionProbability greater than upper bound U (d)WithProbability less than lower bound L (d)MeetThen calculated result meets significance test index, output G (d) about The curve graph of distance d judges the Spatial Distribution Pattern of point data;With the variation of distance d, crime dramas statistic frequency becomes Change, if fruit dot event tends to Assembled distribution in space, G-function value can in shorter distance rapid increase;If dot pattern Middle event tends to dispersed distribution, then G-function value increases just relatively slower;
In the present embodiment, method that significance test uses Monte Carlo stochastic simulation:
1, m times CSR (complete space random, complete space random point) dot pattern, and estimation theory are generated Distribution
WhereinFor m independent random analog function of n CSR event of simulation.
2, stochastic simulation distribution function is calculatedUpper bound U (d) and lower bound L (d);
3, it calculates separatelyGreater than stochastic simulation distribution functionUpper bound U (d) probability WithLess than stochastic simulation distribution functionLower bound L (d) probability
If 4, meetingThen calculated result meets significance test Index exports G-function calculated result curve.
The G-function curve being calculated by taking the crime data of California, USA as an example is shown in attached drawing 2, and judgement is somebody's turn to do Space clustering distribution pattern is presented in crime point event in area, for the crime " severely afflicated area " of Assembled distribution is presented, it is possible to determine that be The high frequency generation area of crime, should increase police strength, improve patrol frequency, and give building for the public attention person and property safety View.
Crime dramas is abstracted as point spatially, analyzing by this method can be obtained three kinds of crime distribution patterns --- it is poly- Collection distribution is uniformly distributed and random distribution, is excavated and is obtained the Spatial Distribution Pattern of crime and have great significance:
1) synthesis for facilitating the police of profession to carry out crime data is analyzed comprehensively, is obtained crime hot spot (severely afflicated area), In a planned way rational allocation police strength is to reduce the generations of certain crimes;
2) legal consciousness publicity is reinforced in the area that Assembled distribution mode can be presented to crime, and Xiang Gongzhong gives suggestion and avoids not The necessary person and property loss, it is horizontal to improve municipal public safety;
3) based on point pattern analysis as a result, the crime quantity in predicted city future can also be analyzed further, crime is excavated The rule being distributed at any time, auxiliary government carry out the global assurance and decision of municipal public safety.
It should be understood that the part that this specification does not elaborate belongs to the prior art.
It should be understood that the above-mentioned description for preferred embodiment is more detailed, can not therefore be considered to this The limitation of invention patent protection range, those skilled in the art under the inspiration of the present invention, are not departing from power of the present invention Benefit requires to make replacement or deformation under protected ambit, fall within the scope of protection of the present invention, this hair It is bright range is claimed to be determined by the appended claims.

Claims (5)

1. a kind of crime big data point pattern analysis method based on G-function and improvement KD tree, which is characterized in that including following step It is rapid:
Step 1: data prediction;
Point is divided into several points cluster by clustering algorithm by the pending crime dramas point coordinate of input institute, and given threshold is used Judge whether cluster is excessive, if it is continues to cluster sub-clustering, until the number at each cluster midpoint is appropriate;Then right Each cluster establishes KD tree using parallel computation strategy;
Step 2: searching for closest point;
For each cluster calculated where point inquires it, and determine the KD tree where the cluster;Search obtains institute in KD tree later Somewhat closest point, and calculate closest distance dmin, finished until the point of all inputs all calculates, obtain all the points Closest distance;
Step 3: calculating G-function;
In magnitude order by the closest distance of obtained all the points, the change journey R and group for calculating closest distance are away from D, wherein R= max(dmin)-min(dmin), the quantity according to group away from upper limit value stored counts point, and calculate cumulative frequency G (d);
Step 4: carrying out significance test and obtain analysis result;
If calculated result meets significance test index, the curve graph of G (d) about distance d is exported, judges the space point of point data Cloth mode;With the variation of distance d, crime dramas statistic frequency changes, as fruit dot event tends to aggregation point in space Cloth, G-function value can in shorter distance rapid increase;If event tends to dispersed distribution, G-function value in dot pattern Increase just relatively slower.
2. the crime big data point pattern analysis method according to claim 1 based on G-function and improvement KD tree, feature It is: establishes KD tree described in step 1, calculates every one-dimensional variance of all data in each cluster first, then choose variance It is maximum that it is one-dimensional in all data median as super face, i.e. root node is divided, last determining left subtree right subtree, recurrence It carries out, until leaf node.
3. the crime big data point pattern analysis method according to claim 1 based on G-function and improvement KD tree, feature It is: in step 2, closest point is searched in KD tree using multi-threaded parallel search.
4. the crime big data point pattern analysis method according to claim 1 based on G-function and improvement KD tree, feature It is: in step 3, the cumulative frequency G (d) of a closest distance is constructed using the distance of all closest events;
In formula, siIt is event in survey region;N is the quantity of an event;D is distance;#(dmin(si)≤d) it indicates The counting of closest point of the distance less than d.
5. the crime big data point pattern analysis according to any one of claims 1-4 based on G-function and improvement KD tree Method, it is characterised in that: in step 4, using the method for Monte Carlo stochastic simulation, if stochastic simulation distribution functionIt is greater than The probability of upper bound U (d)WithProbability less than lower bound L (d)MeetThen calculated result meets significance test index.
CN201910204662.3A 2019-03-18 2019-03-18 Crime big data point pattern analysis method based on G-function and improvement KD tree Pending CN109977149A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910204662.3A CN109977149A (en) 2019-03-18 2019-03-18 Crime big data point pattern analysis method based on G-function and improvement KD tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910204662.3A CN109977149A (en) 2019-03-18 2019-03-18 Crime big data point pattern analysis method based on G-function and improvement KD tree

Publications (1)

Publication Number Publication Date
CN109977149A true CN109977149A (en) 2019-07-05

Family

ID=67079348

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910204662.3A Pending CN109977149A (en) 2019-03-18 2019-03-18 Crime big data point pattern analysis method based on G-function and improvement KD tree

Country Status (1)

Country Link
CN (1) CN109977149A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7475071B1 (en) * 2005-11-12 2009-01-06 Google Inc. Performing a parallel nearest-neighbor matching operation using a parallel hybrid spill tree
CN106445960A (en) * 2015-08-10 2017-02-22 阿里巴巴集团控股有限公司 Data clustering method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7475071B1 (en) * 2005-11-12 2009-01-06 Google Inc. Performing a parallel nearest-neighbor matching operation using a parallel hybrid spill tree
CN106445960A (en) * 2015-08-10 2017-02-22 阿里巴巴集团控股有限公司 Data clustering method and device

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
LINJIA HU等: ""Massively Parallel KD-tree Construction and Nearest Neighbor Search Algorithms"", 《2015 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS》 *
SHWU-HUEY YEN等: ""A KD-Tree-Based Nearest Neighbor Search for Large Quantities of Data"", 《KSII TRANSACTIONS ON INTERNET & INFORMATION SYSTEMS》 *
朱海燕等: ""基于ArcGIS平台的空间点模式的集成与应用"", 《全国地图学与GIS学术会议论文集》 *
湛东升等: ""北京市公共服务设施空间集聚特征分析"", 《经济地理》 *
陈晓康: ""基于Spark云计算平台的改进K近邻算法研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Similar Documents

Publication Publication Date Title
Tang et al. Flood susceptibility assessment based on a novel random Naïve Bayes method: A comparison between different factor discretization methods
Christelis et al. Pumping optimization of coastal aquifers assisted by adaptive metamodelling methods and radial basis functions
AU2018101946A4 (en) Geographical multivariate flow data spatio-temporal autocorrelation analysis method based on cellular automaton
Li et al. Simulation and optimization of land use pattern to embed ecological suitability in an oasis region: A case study of Ganzhou district, Gansu province, China
CN111898315B (en) Landslide susceptibility assessment method based on fractal-machine learning hybrid model
Du et al. A case-based reasoning approach for land use change prediction
CN105760649A (en) Big-data-oriented creditability measuring method
CN105205052B (en) A kind of data digging method and device
Ge et al. A comparison of five methods in landslide susceptibility assessment: a case study from the 330-kV transmission line in Gansu Region, China
do Lago et al. Generalizing rapid flood predictions to unseen urban catchments with conditional generative adversarial networks
Song et al. The application of cluster analysis in geophysical data interpretation
Zhao et al. A spatial case-based reasoning method for regional landslide risk assessment
CN106844736B (en) Time-space co-occurrence mode mining method based on time-space network
Mao et al. Landslide susceptibility modelling based on AHC-OLID clustering algorithm
Abedini et al. Cluster-based ordinary kriging of piezometric head in West Texas/New Mexico–Testing of hypothesis
Zhang et al. Grid-based land-use composition and configuration optimization for watershed stormwater management
CN109960702A (en) Crime big data point pattern analysis method based on F function and improvement KD tree
Zhang et al. Evaluating water resource carrying capacity using the deep learning method: a case study of Yunnan, Southwest China
CN106874447A (en) A kind of method for exhibiting data and device
Huang et al. A grid and density based fast spatial clustering algorithm
CN109977149A (en) Crime big data point pattern analysis method based on G-function and improvement KD tree
Du et al. Comparison between CBR and CA methods for estimating land use change in Dongguan, China
Li et al. A multi‐scale partitioning and aggregation method for large volumes of buildings considering road networks association constraints
Huo et al. Traffic anomaly detection method based on improved GRU and EFMS-Kmeans clustering
CN113190841A (en) Method for defending graph data attack by using differential privacy technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190705

RJ01 Rejection of invention patent application after publication