CN110097126B - Method for checking important personnel and house missing registration based on DBSCAN clustering algorithm - Google Patents

Method for checking important personnel and house missing registration based on DBSCAN clustering algorithm Download PDF

Info

Publication number
CN110097126B
CN110097126B CN201910374115.XA CN201910374115A CN110097126B CN 110097126 B CN110097126 B CN 110097126B CN 201910374115 A CN201910374115 A CN 201910374115A CN 110097126 B CN110097126 B CN 110097126B
Authority
CN
China
Prior art keywords
house
personnel
feature
samples
houses
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910374115.XA
Other languages
Chinese (zh)
Other versions
CN110097126A (en
Inventor
许正
朱哲辰
黄泷
闫子为
高子康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Ugs Information Technology Co ltd
Original Assignee
Jiangsu Ugs Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Ugs Information Technology Co ltd filed Critical Jiangsu Ugs Information Technology Co ltd
Priority to CN201910374115.XA priority Critical patent/CN110097126B/en
Publication of CN110097126A publication Critical patent/CN110097126A/en
Application granted granted Critical
Publication of CN110097126B publication Critical patent/CN110097126B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for checking important personnel and house miss-registration based on a DBSCAN clustering algorithm, which is used for preprocessing population and house data sets collected by polices, and comprises missing value filling, category type variable discretization and numerical value variable standardization; classifying samples of non-core points on a data set of 'key personnel and houses' by adopting a DBSCAN clustering algorithm, and analyzing a clustering result; and (3) fixing core points of all data of the tag 'key personnel and house', classifying samples of non-core points on population and house data sets through a DBSCAN clustering algorithm of the self-adaptive feature weights to obtain a clustering result, and finally generating a suspected missed registration 'key personnel and house' check list. Therefore, marked important attention personnel and houses are used as cores, personnel and houses similar to the important personnel and houses are screened out through a density clustering algorithm, and the checking range of suspected important personnel and houses is reduced, so that the accuracy and efficiency of police service checking can be effectively improved.

Description

Method for checking important personnel and house missing registration based on DBSCAN clustering algorithm
Technical Field
The invention relates to a method for checking important personnel and house omission, in particular to a method for checking important personnel and house omission based on a DBSCAN clustering algorithm.
Background
In the big data age, data mining technology plays a tremendous role in many areas. A series of problems of police working quality of the public security base layer are improved by optimizing a traditional community police checking working mode through big data and an algorithm. The first cut-in field is public security field, public security data is massive and rich in variety, and not only is traditional structured data, but also a large amount of unstructured data are available. The population and house management work is heavy, and the traditional police service checking work mode has difficulty in meeting public security population and house business under such large population and house base.
Aiming at the investigation needs of key personnel and houses in public security data, a dividing and layering method is a basic clustering method which is relatively effective and provided earlier, such as K-means, K-modes and the like, but the basic clustering method aims at finding spherical clusters, but is difficult to find clusters with any shape, and the algorithm needs to preset the K value. Density-based clustering such as DBSCAN can treat clusters as dense regions separated by sparse regions in the data space, with the advantages of finding arbitrary clusters, automatically eliminating noise points, and no need to specify the number of categories.
However, the core points obtained by the classical density clustering algorithm comprise all points meeting the neighborhood condition, in the population and house clustering problems, non-key personnel and houses are used as core points, the clustering effect is not satisfactory, in addition, the characteristic attributes which are strongly related to the key personnel and houses are weakened by giving equal weights to different characteristics in the process of calculating the similarity in distance, so that the misconvergence condition can occur.
In view of the above-mentioned drawbacks, the present designer is actively researched and innovated to create a method for checking key personnel and house miss registration based on a DBSCAN clustering algorithm, so that the method has more industrial utilization value.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a method for checking key personnel and house missing registration based on a DBSCAN clustering algorithm.
The invention relates to a method for checking key personnel and house miss registration based on a DBSCAN clustering algorithm, which is characterized by comprising the following steps:
preprocessing population and house data sets collected by police, wherein the preprocessing comprises missing value filling, category type variable discretization and numerical value variable standardization;
dividing the data marked with the key personnel and the house into known key personnel and house and unknown key personnel and house, fixing the data samples of the known key personnel and house as core points of density clustering, and separating non-core points;
step three, setting a neighborhood parameter (epsilon, minPts), wherein epsilon describes a neighborhood distance threshold of a certain sample, and MinPts describes a threshold of the number of samples in a neighborhood with epsilon of the certain sample. Classifying samples of non-core points on a data set of 'key personnel and houses' by adopting a DBSCAN clustering algorithm, and analyzing a clustering result;
fourthly, fixing core points of all data of the tag 'key personnel and houses', classifying samples of non-core points on population and house data sets through a DBSCAN clustering algorithm of the self-adaptive feature weights, and obtaining clustering results;
and fifthly, counting and judging the clustering result in the step four, and finally generating a suspected missed registration key personnel and house check list.
In the first step, the data preprocessing step is to preprocess population of public security, population in a house database and house related data features, including performing independent heat coding on type features in the population and house related data features, performing dimensionless processing on numerical feature variables, wherein the missing value filling is to fill the type features with mode and fill the numerical feature variables with average.
Furthermore, the method for checking the key personnel and the house miss registration based on the DBSCAN clustering algorithm comprises the steps of determining the classification type characteristics including gender and marital status, and determining the numerical characteristic variables including age and address longitude and latitude.
Further, in the method for checking the important personnel and the house miss registration based on the DBSCAN clustering algorithm, the step one is that the discretizing of the category type variable is as follows: if N qualitative values are provided, the feature is expanded into N features, and when the original feature value is the ith qualitative value, the ith expanded feature is assignedOther extension features are assigned a value of 0 for 1. The numerical variable normalization process requires calculation of the mean value of each dimension feature
Figure BDA0002051064700000031
And standard deviation (S), the calculation formula is,
Figure BDA0002051064700000032
further, in the method for checking the key personnel and house miss registration based on the DBSCAN clustering algorithm, the step two is that the weight is given to the data sample of the key personnel and house in the process of calculating the distance, and the larger positive value represents that the sample is easy to become the core point, and the smaller negative value prevents the sample from becoming the core point.
In the third step, the similarity between samples is measured by using the euclidean distance through the DBSCAN algorithm, the smaller the distance is, the more similar the samples are, n samples are divided into K clusters, and the number of the samples in each cluster is respectively: n is n 1 ,n 2 ,…,n k Then the sum d of the intra-class distances of all K clusters on the j-th dimensional feature p In order to achieve this, the first and second,
Figure BDA0002051064700000033
x ij the j-th dimension characteristic value, m, for the i-th sample kj For the mean value of cluster k on the j-th dimensional feature, all
Sum d of inter-class distances of K clusters on the j-th dimensional feature q In order to achieve this, the first and second,
Figure BDA0002051064700000034
m j for data setThe mean value on the j-th dimension feature, and then calculating the contribution degree c of the feature j to the cluster j
Figure BDA0002051064700000041
Finally, feature weights w of the j-th dimensional features j In order to achieve this, the first and second,
Figure BDA0002051064700000042
m represents the dimension of the sample feature.
Thereby obtaining a weighted euclidean distance formula, thereby obtaining the similarity d (m, n) between the samples,
Figure BDA0002051064700000043
furthermore, in the method for checking key personnel and house omission based on the DBSCAN clustering algorithm, in the fourth step, the core point fixing processing process is that a Scikit-learn machine learning framework is adopted, and all the core points are found out according to given neighborhood parameters.
In the fourth step, the feature weight is optimized, the data of the marked important person and house are divided into the known important person and house and the unknown important person and house, the known important person and house data samples are fixed to be the core points of the density clustering according to the core point fixing step, the proper neighborhood parameters are set, then the samples of the non-core points are classified on the data set of the important person and house based on the DBSCAN clustering algorithm, the contribution degree of each attribute to the clustering is calculated according to the classification result, and the feature weight is updated.
In the fifth step, each class of clustering results contains the number N of labeled key people and houses, whether N is greater than or equal to a preset threshold value T is judged, if the judgment result is that N is greater than or equal to T, the unlabeled people and houses in the class have high possibility of suspected missed registration of key people and houses, and finally a suspected missed registration of key people and houses is generated; otherwise, the class has low possibility of suspected missed registration of 'important personnel and houses', and manual judgment is needed.
By means of the scheme, the invention has at least the following advantages:
1. the marked important attention personnel and houses are used as cores, the personnel and houses similar to the important personnel are screened out through a density clustering algorithm, and then the checking range of suspected important personnel and houses is narrowed, so that the accuracy and efficiency of police service checking can be effectively improved.
2. The characteristic weight self-adaptive mechanism gives different characteristic weights to the attributes, so that the similarity between samples can be reflected more accurately and the clustering performance can be improved.
3. The intelligent recommended police service checking mode has the advantages that the checking objects can be predicted and prejudged in advance, the checking work is more scientific and accurate, and the police service is more active and safer.
4. The police mode transformation upgrading is promoted, and the important realistic effect is achieved in the aspect of improving the public security management capability of the house.
The foregoing description is only an overview of the present invention, and is intended to provide a better understanding of the present invention, as it is embodied in the following description, with reference to the preferred embodiments of the present invention and the accompanying drawings.
Drawings
FIG. 1 is a flow chart of feature weight optimization.
FIG. 2 is a flow chart of a DBSCAN clustering algorithm that adapts feature weights.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
The method for checking important personnel and house miss registration based on the DBSCAN clustering algorithm as shown in the figures 1 and 2 is characterized by comprising the following steps:
firstly, preprocessing population and house data sets collected by police, including missing value filling, category type variable discretization and numerical value variable standardization. Specifically, preprocessing population and house related data features in public security population and house databases comprises performing single-hot encoding on category type features in the population and house related data features, and performing dimensionless processing on numerical type feature variables. In order to meet the requirements of convenient classification, category type characteristics comprise gender and marital conditions, and numerical type characteristic variables comprise age and address longitude and latitude.
The purpose of this is to remove the dimension differences, which results in excessive differences between the data, and is susceptible to larger numerical features during the calculation of the distance. The missing value filling is that the mode is filled for the category type characteristic and the average value is filled for the numerical type characteristic variable. The reason for this is that key features containing missing values may have a large impact on the clustering results if discarded directly.
The adopted category type variable discretization process is as follows: if N qualitative values are provided, the feature is expanded into an N-medium feature, and when the original feature value is the ith qualitative value, the ith expansion feature is assigned 1, and the other expansion features are assigned 0.
The process of numerical variable normalization adopted requires calculation of the mean value of each dimension characteristic
Figure BDA0002051064700000062
And standard deviation (S), the calculation formula is,
Figure BDA0002051064700000061
next, the data labeled "accent person, house" are divided into known "accent person, house" and unknown "accent person, house", and the known "accent person, house" data samples are fixed as core points of the density clusters, and non-core points are separated.
The process of fixing the core points as density clusters is that weight is given to the data samples of key personnel and houses in the distance calculation process, the larger positive value represents that the samples are easy to become the core points, and the smaller negative value can prevent the samples from becoming the core points.
Next, a neighborhood parameter (ε, minPts) is set, where ε describes a neighborhood distance threshold for a sample and MinPts describes a threshold for the number of samples in a neighborhood of distance ε for a sample.
And classifying samples of non-core points on a data set of 'key personnel and houses' by adopting a DBSCAN clustering algorithm, and analyzing a clustering result.
Specifically, by using a DBSCAN algorithm, the similarity between samples is measured by using euclidean distance, and the smaller the distance is, the more similar the samples are, and n samples are divided into K clusters, wherein the number of samples in each cluster is respectively: n is n 1 ,n 2 ,…,n k Then the sum d of the intra-class distances of all K clusters on the j-th dimensional feature p In order to achieve this, the first and second,
Figure BDA0002051064700000071
wherein:
x ij the j-th dimension characteristic value, m, for the i-th sample kj Is the mean of cluster k over the j-th dimensional feature.
At the same time, the sum d of the inter-class distances of all K clusters on the j-th dimensional feature q In order to achieve this, the first and second,
Figure BDA0002051064700000072
wherein m is j For the mean of the dataset over the j-th dimensional feature, then, the feature j-pair cluster is calculatedContribution degree c j
Figure BDA0002051064700000073
Finally, feature weights w of the j-th dimensional features j In order to achieve this, the first and second,
Figure BDA0002051064700000074
where m represents the dimension of the sample feature.
Thereby obtaining a weighted euclidean distance formula, thereby obtaining the similarity d (m, n) between the samples,
Figure BDA0002051064700000081
and then, fixing core points of all data of the tag 'key personnel and houses', classifying samples of non-core points on population and house data sets through a DBSCAN clustering algorithm of the self-adaptive feature weights, and obtaining a clustering result.
Specifically, the core point fixing process is to use a Scikit-learn machine learning framework to find all core points according to a given neighborhood parameter (epsilon, minPts). And optimizing the feature weight, dividing the data labeled with the key personnel and the house into known key personnel and house and unknown key personnel and house, fixing the known key personnel and house data samples as core points of density clustering according to a core point fixing step, and setting proper neighborhood parameters.
And classifying samples of non-core points on a data set of 'key personnel and houses' based on a DBSCAN clustering algorithm, calculating the contribution degree of each attribute to the clustering on the classified result, and updating the feature weight.
Finally, the clustering result is counted and judged, and a suspected missed registration key person and house check list is finally generated. Specifically, the clustering result is set to include the number N of labeled "important persons and houses" in each class, and whether N is equal to or greater than a preset threshold T is determined.
Specifically, if the judgment result is that N is more than or equal to T, the personnel and the house which are not labeled in the class have high possibility to be suspected to be missed to register the key personnel and the house, and finally, a suspected to be missed to register the key personnel and the house is generated to check a table. Otherwise, the class has low possibility of suspected missed registration of 'important personnel and houses', manual judgment is needed, and an expert is needed to judge again by experience, so that manual misjudgment is reduced. Moreover, the checking task can be pushed to the police.
Therefore, the characteristics of complicated personnel and houses and insufficient police quantity in public security population management work are considered, and the screening and cluster analysis of data are used for selecting proper cluster characteristics on the basis of fully considering the characteristics of important personnel and houses, so that the important personnel and houses are accurately judged. Meanwhile, a suspected missing registration check list is generated, so that the check task is pushed in the aspect of public security conveniently. The density clustering algorithm reduces the range of checking important attention personnel and houses in missed registration, finally improves the accuracy of population and house checking, and also provides decision support for other public security management and control fields.
The working principle of the invention is as follows:
in Table 1, prior to population (house-like) data preprocessing:
name number Age of Sex (sex) Cultural degree Latitude of latitude Longitude and latitude
1 73 2 Null 31.323771 120.666739
2 53 1 Null 31.315803 120.665558
3 46 2 Null 31.317036 120.747582
4 29 2 70 34.646452 116.912783
5 21 1 40 32.066899 118.193343
6 46 1 70 27.221181 111.248449
7 44 1 20 31.319655 120.731328
8 62 2 Null 31.320779 120.665973
9 31 1 60 35.828924 116.013732
10 32 1 60 34.357221 115.363676
Wherein:
gender: 1-male, 2-female; cultural degree: 20-family, 40-Zhongjun, 60-junior, 70-junior, null-deletion.
After treatment by the method of the present invention, table 2 was obtained as follows:
Figure BDA0002051064700000091
Figure BDA0002051064700000101
taking name number 1 as an example to demonstrate the data preprocessing process:
category type variable discretization: the sex characteristic has 2 qualitative values, namely a male sex and a female sex, and the sex characteristic is expanded into 2 characteristics, and at the moment, the sex characteristic value of the name number 1 is the 2 nd qualitative value, so the 2 nd expansion characteristic is assigned to be 1, the other expansion characteristics are assigned to be 0, and the sex characteristic is represented by (0, 1) after discretization.
Numerical variable normalization: firstly, calculating the mean value and standard deviation of age characteristics, namely 43.7 and 15.23187447 respectively, obtaining a standardized age value according to a standardized calculation formula,
(73-43.7)/15.23187447=1.923598。
and the longitude and latitude standardized values can be obtained by the same method.
Missing value filling treatment: the cultural degree contains a missing value Null, is filled by adopting the mode 60 of the existing category, and then adopts a discretization processing mode similar to the sex characteristic.
As can be seen from the above text expressions and the accompanying drawings, the invention has the following advantages:
1. the marked important attention personnel and house are taken as the core, the personnel and house similar to the important personnel and house are screened out through a density clustering algorithm, and then the checking range of suspected important personnel and house is reduced, so that the accuracy and efficiency of police service checking can be effectively improved.
2. The characteristic weight self-adaptive mechanism gives different characteristic weights to the attributes, so that the similarity between samples can be reflected more accurately and the clustering performance can be improved.
3. The intelligent recommended police service checking mode has the advantages that the checking objects can be predicted and prejudged in advance, the checking work is more scientific and accurate, and the police service is more active and safer.
4. The police mode transformation upgrading is promoted, and the important realistic effect is achieved in the aspect of improving the public security management capability of the house.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and it should be noted that it is possible for those skilled in the art to make several improvements and modifications without departing from the technical principle of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention.

Claims (6)

1. The method for checking key personnel and house missing registration based on the DBSCAN clustering algorithm is characterized by comprising the following steps:
preprocessing population and house data sets collected by police, wherein the preprocessing comprises missing value filling, category type variable discretization and numerical value variable standardization;
dividing the data marked with the key personnel and the house into known key personnel and house and unknown key personnel and house, fixing the data samples of the known key personnel and house as core points of density clustering, and separating non-core points;
step three, setting a neighborhood parameter (epsilon, minPts), wherein epsilon describes a neighborhood distance threshold of a certain sample, minPts describes a threshold of the number of samples in the neighborhood with epsilon of the distance of the certain sample, classifying the samples of non-core points on a data set of 'key personnel and houses' by adopting a DBSCAN clustering algorithm, analyzing a clustering result,
in the third step, the Euclidean distance is used for measuring the sample through the DBSCAN algorithmThe smaller the distance is, the more similar the samples are, n samples are divided into K clusters, and the number of the samples in each cluster is as follows: n is n 1 ,n 2 ,…,n k Then the sum d of the intra-class distances of all K clusters on the j-th dimensional feature p In order to achieve this, the first and second,
Figure FDA0004125599280000011
x ij the j-th dimension characteristic value, m, for the i-th sample kj For the mean value of cluster k on the j-th dimensional feature, all
Sum d of inter-class distances of K clusters on the j-th dimensional feature q In order to achieve this, the first and second,
Figure FDA0004125599280000012
m j for the mean value of the data set on the j-th dimension feature, calculating the contribution degree c of the feature j to the cluster j
Figure FDA0004125599280000013
Finally, feature weights w of the j-th dimensional features j In order to achieve this, the first and second,
Figure FDA0004125599280000021
m represents the dimension of the sample feature,
thereby obtaining a weighted euclidean distance formula, thereby obtaining the similarity d (m, n) between the samples,
Figure FDA0004125599280000022
step four, fixing core points of all data of tag 'key personnel and house', classifying samples of non-core points on population and house data sets through DBSCAN clustering algorithm of self-adaptive feature weights to obtain clustering results,
in the fourth step, for the core point fixing process, a Scikit-learn machine learning framework is adopted, and all core points are found out according to given neighborhood parameters;
in the fourth step, the feature weight is optimized, the data of the marked important person and house are divided into known important person and house and unknown important person and house, the known important person and house data samples are fixed to be the core points of density clustering according to the core point fixing step, proper neighborhood parameters are set, then the samples of non-core points are classified on the important person and house data set based on a DBSCAN clustering algorithm, the contribution degree of each attribute to the clustering is calculated for the classification result, and the feature weight is updated;
and fifthly, counting and judging the clustering result in the step four, and finally generating a suspected missed registration key personnel and house check list.
2. The method for checking key personnel and house omission registration based on DBSCAN clustering algorithm as recited in claim 1, wherein the method comprises the following steps: in the first step, the data preprocessing step is to preprocess population and house related data features in public security population and house databases, and comprises the steps of performing single-heat coding on category type features in the population and house related data features, and performing dimensionless processing on numerical value type feature variables, wherein the missing value filling is to fill the category type features with mode and fill the numerical value type feature variables with average numbers.
3. The method for checking key personnel and house omission registration based on DBSCAN clustering algorithm as recited in claim 1, wherein the method comprises the following steps: the category type characteristics comprise gender and marital status, and the numerical type characteristic variables comprise age and address longitude and latitude.
4. The method for checking key personnel and house omission registration based on DBSCAN clustering algorithm as recited in claim 1, wherein the method comprises the following steps: step one, discretizing the category type variable: if N qualitative values are provided, the feature is expanded into N features, when the original feature value is the ith qualitative value, the ith expansion feature is assigned 1, the other expansion features are assigned 0, and the numerical variable normalization process needs to calculate the mean value of each dimension feature
Figure FDA0004125599280000031
And standard deviation (S), the calculation formula is,
Figure FDA0004125599280000032
5. the method for checking key personnel and house omission registration based on DBSCAN clustering algorithm as recited in claim 1, wherein the method comprises the following steps: in the second step, the process of fixing the core points as density clusters is to assign weights to the data samples of 'important personnel and houses' in the process of calculating the distance, wherein a larger positive value represents that the samples are easy to become core points, and a smaller negative value prevents the samples from becoming core points.
6. The method for checking key personnel and house omission registration based on DBSCAN clustering algorithm as recited in claim 1, wherein the method comprises the following steps: in the fifth step, setting the number N of marked key people and houses in each class in the clustering result, judging whether N is larger than or equal to a preset threshold value T, if the judgment result is that N is larger than or equal to T, the marked people and houses in the class are suspected to be missed to register the key people and houses with high possibility, and finally generating a suspected missed registration key people and houses checking list; otherwise, the class has low possibility of suspected missed registration of 'important personnel and houses', and manual judgment is needed.
CN201910374115.XA 2019-05-07 2019-05-07 Method for checking important personnel and house missing registration based on DBSCAN clustering algorithm Active CN110097126B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910374115.XA CN110097126B (en) 2019-05-07 2019-05-07 Method for checking important personnel and house missing registration based on DBSCAN clustering algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910374115.XA CN110097126B (en) 2019-05-07 2019-05-07 Method for checking important personnel and house missing registration based on DBSCAN clustering algorithm

Publications (2)

Publication Number Publication Date
CN110097126A CN110097126A (en) 2019-08-06
CN110097126B true CN110097126B (en) 2023-04-21

Family

ID=67447081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910374115.XA Active CN110097126B (en) 2019-05-07 2019-05-07 Method for checking important personnel and house missing registration based on DBSCAN clustering algorithm

Country Status (1)

Country Link
CN (1) CN110097126B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110650145A (en) * 2019-09-26 2020-01-03 湖南大学 Low-rate denial of service attack detection method based on SA-DBSCAN algorithm

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902655A (en) * 2014-02-28 2014-07-02 小米科技有限责任公司 Clustering method and device and terminal device
CN103902654A (en) * 2014-02-28 2014-07-02 小米科技有限责任公司 Clustering method and device and terminal device
CN106600059A (en) * 2016-12-13 2017-04-26 北京邮电大学 Intelligent power grid short-term load predication method based on improved RBF neural network
CN107993179A (en) * 2018-01-04 2018-05-04 南京市公安局栖霞分局 A kind of police service platform population house data examination register method
CN108280479A (en) * 2018-01-25 2018-07-13 重庆大学 A kind of power grid user sorting technique based on Load characteristics index weighted cluster algorithm
CN108875806A (en) * 2018-05-31 2018-11-23 中南林业科技大学 False forest fires hot spot method for digging based on space-time data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902655A (en) * 2014-02-28 2014-07-02 小米科技有限责任公司 Clustering method and device and terminal device
CN103902654A (en) * 2014-02-28 2014-07-02 小米科技有限责任公司 Clustering method and device and terminal device
CN106600059A (en) * 2016-12-13 2017-04-26 北京邮电大学 Intelligent power grid short-term load predication method based on improved RBF neural network
CN107993179A (en) * 2018-01-04 2018-05-04 南京市公安局栖霞分局 A kind of police service platform population house data examination register method
CN108280479A (en) * 2018-01-25 2018-07-13 重庆大学 A kind of power grid user sorting technique based on Load characteristics index weighted cluster algorithm
CN108875806A (en) * 2018-05-31 2018-11-23 中南林业科技大学 False forest fires hot spot method for digging based on space-time data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"一种 DBSCAN 聚类点密度的加权质心定位算法";李轶 等;《河南科技大学学报(自然科学版)》;20180430;第39卷(第2期);第36-39页 *
"一种基于密度和约束的数据流聚类算法";付家祺 等;《科技创新与应用》;20181231(第12期);第1-5页 *

Also Published As

Publication number Publication date
CN110097126A (en) 2019-08-06

Similar Documents

Publication Publication Date Title
CN106570178B (en) High-dimensional text data feature selection method based on graph clustering
Amini et al. On density-based data streams clustering algorithms: A survey
CN108733976B (en) Key protein identification method based on fusion biology and topological characteristics
CN104636449A (en) Distributed type big data system risk recognition method based on LSA-GCC
CN109635010B (en) User characteristic and characteristic factor extraction and query method and system
CN107291895B (en) Quick hierarchical document query method
CN111046930A (en) Power supply service satisfaction influence factor identification method based on decision tree algorithm
CN111326236A (en) Medical image automatic processing system
CN110990718A (en) Social network model building module of company image improving system
CN112926045A (en) Group control equipment identification method based on logistic regression model
CN112417152A (en) Topic detection method and device for case-related public sentiment
Wilkins et al. Comparison of five clustering algorithms to classify phytoplankton from flow cytometry data
CN110097126B (en) Method for checking important personnel and house missing registration based on DBSCAN clustering algorithm
CN113469288A (en) High-risk personnel early warning method integrating multiple machine learning algorithms
CN111325235B (en) Multilingual-oriented universal place name semantic similarity calculation method and application thereof
CN112732690B (en) Stabilizing system and method for chronic disease detection and risk assessment
CN116013084A (en) Traffic management and control scene determining method and device, electronic equipment and storage medium
Wang et al. A Novel Multi‐Input AlexNet Prediction Model for Oil and Gas Production
Mishra et al. Improving the efficacy of clustering by using far enhanced clustering algorithm
Sharma et al. Analysis of clustering algorithms in biological networks
CN112506930B (en) Data insight system based on machine learning technology
CN114943290A (en) Biological invasion identification method based on multi-source data fusion analysis
Liu et al. A Clustering Algorithm via Density Perception and Hierarchical Aggregation Based on Urban Multimodal Big Data for Identifying and Analyzing Categories of Poverty‐Stricken Households in China
CN116244426A (en) Geographic function area identification method, device, equipment and storage medium
CN117952658B (en) Urban resource allocation and industry characteristic analysis method and system based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant