CN114398934A - High-risk area identification method based on clustering algorithm - Google Patents

High-risk area identification method based on clustering algorithm Download PDF

Info

Publication number
CN114398934A
CN114398934A CN202111229509.XA CN202111229509A CN114398934A CN 114398934 A CN114398934 A CN 114398934A CN 202111229509 A CN202111229509 A CN 202111229509A CN 114398934 A CN114398934 A CN 114398934A
Authority
CN
China
Prior art keywords
risk
address
data set
feature vector
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111229509.XA
Other languages
Chinese (zh)
Inventor
程涛
廖毅
李英
罗龑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinaccs Information Industry Co ltd
Original Assignee
Chinaccs Information Industry Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinaccs Information Industry Co ltd filed Critical Chinaccs Information Industry Co ltd
Priority to CN202111229509.XA priority Critical patent/CN114398934A/en
Publication of CN114398934A publication Critical patent/CN114398934A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a high-risk area identification method based on a clustering algorithm, and belongs to the technical field of high-risk area identification. The technical scheme is as follows: a high-risk area identification method based on a clustering algorithm comprises the steps of butting information systems of related departments, obtaining historical case data and generating a training data set; extracting address information and high-risk features and generating a high-risk feature vector set; calculating the feature vector set of the high-risk area by using a clustering algorithm, performing clustering model training, and generating a model library; and (4) extracting residence information according to the identity information of the target user, and judging whether the target user is from an area with high-risk characteristics. The invention has the beneficial effects that: the historical case data of related departments are processed, and clustering of regions and high-risk features is realized by adopting automatic feature extraction and unsupervised clustering machine learning algorithm, so that automatic high-risk region identification is realized.

Description

High-risk area identification method based on clustering algorithm
Technical Field
The invention relates to the technical field of high-risk area identification, in particular to a high-risk area identification method based on a clustering algorithm.
Background
The high-risk areas refer to: in the case of persons who frequently have certain high-risk characteristics in a certain address or area range (the high-risk characteristics should be defined according to the identification requirements and relevant regulations), the address area can be set as a high-risk area with certain characteristics. In the daily management process of related departments, when the source and the residence of a person in a region have certain high-risk region characteristics, corresponding coping measures of the high-risk region are adopted to perform key prevention and control on the person.
At present, the identification of high-risk areas is mainly realized by adopting the following two modes:
experience: based on business experience formed by long-term accumulation in work, the probability of occurrence of error and leakage is higher;
a rule engine: if the experience is electronized, the experience can be further converted into rules, and automatic matching is realized through a rule engine. The rule engine is convenient to calculate and high in efficiency, however, the maintenance of the rules still needs manpower, and if the rules are not updated timely, the change of objective conditions cannot be reflected.
Disclosure of Invention
In view of the above problems in the prior art, an object of the present invention is to provide a high-risk area identification method based on a clustering algorithm, which generates a training data set by processing historical information of related systems, and implements automatic identification of high-risk areas by clustering areas and high-risk features using automatic feature extraction and unsupervised clustering machine learning algorithm.
The invention is realized by the following technical scheme: a high-risk area identification method based on a clustering algorithm comprises the following steps:
the method comprises the steps of butting an information system of a relevant department, obtaining historical case data, generating a relevant data set of case event information, address information and high-risk characteristics according to the case data, and using the relevant data set as a training data set; the method includes the steps that the case text file is characterized through a Chinese word segmentation technology, meanwhile, case characterization is conducted on high-risk characteristic words, expression of the high-risk characteristic words meets convention in relevant laws and regulations, such as theft, robbery and the like, and address information corresponds to residential addresses and household registration addresses of case-related personnel;
extracting address information in the training data set, coding the address information, generating an address vector corresponding to each address, and finally forming an address vector set;
and merging the address vectors with the similarity exceeding a set threshold value in the address vector set.
Extracting high-risk features in the training data set, and coding the high-risk features to form a high-risk feature vector set; and extracting high-risk features in all samples, and indexing the texts to form final high-risk feature codes. Such as theft- >1 and robbery- > 2.
Associating the address vector set with the high-risk feature vector set to obtain a high-risk region feature vector set; if { xx province, xx city, xx county, theft } is converted into a high-risk area feature vector, the high-risk area feature vector can be {1, 2, 5, 6, 9 };
calculating the feature vector set of the high-risk area by using a clustering algorithm, performing clustering model training, and generating a model library;
the method comprises the steps of extracting residence information according to identity data of a target user, and coding the residence information to generate an address code to be identified;
and matching the address code to be recognized with the model base, and judging whether the target user comes from an area with high-risk characteristics after model prediction.
Further, setting an updating period, periodically acquiring newly added case data, generating an incremental data set with the same format as the training data set, extracting and associating an address vector set and a high-risk feature vector set corresponding to the incremental data set, updating to the current high-risk area feature vector set, performing clustering model training again, and updating the model base.
Further, the encoding the address specifically includes: firstly, the national standard geographic information base is adopted for word segmentation, and each word is subjected to digital indexing, so that address vectorization is realized. To improve generalization, addresses are accurate to either city or county level.
Furthermore, similarity is calculated for the address vectors through an Euclidean distance algorithm, and address combinations with similarity larger than a threshold value are combined through multiple rounds of iteration.
The similarity of the address vectors is calculated as follows: the distance ρ (a, B) between a ═ a [1], a [2], …, a [ n ]) and B ═ B [1], B [2], …, B [ n ]) is defined by the following formula:
Figure RE-GDA0003543130300000021
where a smaller value of d indicates a higher degree of similarity for the two address vectors A, B.
Further, the training of the clustering model specifically comprises: the clustering algorithm is a K-means algorithm realized based on Spark; calculating a K value; inputting the calculated K value and the feature vector; the calculated results are stored in a model library.
Further, the clustering (K-Means) algorithm process is as follows:
1. given an initial data set
Figure RE-GDA0003543130300000022
The K-Means divides the data into K clusters, each cluster representing a different category;
2. from the training set
Figure RE-GDA0003543130300000031
In the method, K centroids are randomly selected and are respectively
Figure RE-GDA0003543130300000032
And initializing clusters
Figure RE-GDA0003543130300000033
3. Calculating xiDistance mu to centroid vectorjDistance d ofijSelecting dijTime of minimum CmIs xiClass of (1), update Cm=Cm∪xi
Figure RE-GDA0003543130300000034
4. Recalculating CjThe center of mass of (c):
Figure RE-GDA0003543130300000035
5. the above 3,4 process is repeated until the K centroid vectors are not changing or the number of iterations is reached.
When the K-Means clustering algorithm is adopted, the K value needs to be obtained manually or in a calculation mode, and the accuracy of the K value directly influences the final clustering effect. Generally, the K value is selected by adopting a manual + calculation mode. The K value is first estimated manually and then verified by the Elbow algorithm. The Elbow algorithm calculates the value of the loss function when different K values are obtained, and when the change rate of the loss function is changed greatly, the K value is a proper K value; after the K value is calculated, realizing a Kmeans algorithm based on Spark, inputting the calculated K value and the characteristic vector, and storing the obtained result in a result base;
during clustering calculation, if the clustering effect is not good, the encoding algorithm for adjusting the K value and modifying the characteristics is needed.
1. A high-risk area identification system based on a clustering algorithm comprises a first acquisition unit, a first database unit and a second acquisition unit, wherein the first acquisition unit is used for butting information systems of related departments, acquiring historical case data, and generating an associated data set comprising case information, address information and high-risk characteristics according to the case data to serve as a training data set;
the address vector generating unit is used for extracting the address information in the training data set, coding addresses, generating an address vector corresponding to each address, and finally forming an address vector set;
the address vector merging unit is used for merging the address vectors with similarity exceeding a set threshold in the address vector set;
the second acquisition unit is used for butting an information system of a relevant department and acquiring an incremental data set by combining a real-time stream processing technology, wherein the incremental data set is new data which is continuously generated by updating along with time;
the high-risk feature vector generating unit is used for extracting high-risk features in the training data set and the incremental data set, and encoding the high-risk features to form a high-risk feature vector set;
the vector merging unit is used for obtaining a high-risk area feature vector set after associating the address vector set with the high-risk feature vector set;
the model base generation unit is used for calculating the high-risk area feature vector set by using a clustering algorithm, carrying out clustering model training and generating a model base;
the identification unit is used for extracting the living information of the target user and coding the living information to generate an address code to be identified;
and the model prediction unit is used for matching the address code to be identified with the model library and predicting and judging whether the target user is from an area with high-risk characteristics.
Further, the system further comprises an updating unit, configured to update the model library, specifically: setting an updating period, periodically acquiring newly added case data, generating an incremental data set with the same format as the training data set, extracting and associating an address vector set and a high-risk feature vector set corresponding to the incremental data set, updating to the current high-risk area feature vector set, performing clustering model training again, and updating the model library.
The invention has the beneficial effects that: the method adopts unsupervised learning, does not need a large amount of labeled data, is low in training cost, simultaneously adopts Spark distributed computation as a training method, trains a speed block, has a larger usable data set, and can quickly verify the model. After the model is trained successfully, the daily administrative and social management work of relevant departments can be supported, the high-risk regional characteristics of the residence and the household location of the target personnel can be judged quickly without experience, quick response is realized, the work efficiency of the relevant departments is improved, and the study and judgment cost is reduced.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of a clustering process;
FIG. 3 is a table of associated data sets.
Detailed Description
In order to clearly illustrate the technical features of the present solution, the present solution is explained below by way of specific embodiments.
The first embodiment, referring to fig. 1-3, is realized by the following technical scheme: a high-risk area identification method based on a clustering algorithm comprises the following steps:
the method comprises the steps of butting an information system of a relevant department, obtaining historical case data, generating a relevant data set comprising case information, address information and high-risk characteristics according to the case data, and using the relevant data set as a training data set; the method includes the steps that the case text file is characterized through a Chinese word segmentation technology, meanwhile, case characterization is conducted on high-risk characteristic words, expression of the high-risk characteristic words is in accordance with convention in relevant laws and regulations, such as theft, robbery and the like, and address information is in accordance with the residential address and the household registration address of case-related personnel;
extracting address information in the training data set, coding the address information, generating an address vector corresponding to each address, and finally forming an address vector set;
merging the address vectors with similarity exceeding a set threshold in the address vector set;
extracting high-risk features in the training data set, and coding the high-risk features to form a high-risk feature vector set; and extracting high-risk features in all samples, and indexing the texts to form final high-risk feature codes. Such as theft- >1 and robbery- > 2.
Associating the address vector set with the high-risk feature vector set to obtain a high-risk region feature vector set; if { xx province, xx city, xx county, theft } is converted into a high-risk area feature vector, the high-risk area feature vector can be {1, 2, 5, 6, 9 };
calculating the feature vector set of the high-risk area by using a clustering algorithm, performing clustering model training, and generating a model library;
the method comprises the steps of extracting residence information according to identity data of a target user, and coding the residence information to generate an address code to be identified;
and matching the address code to be recognized with the model base, and judging whether the target user comes from an area with high-risk characteristics after model prediction.
Setting an updating period, periodically acquiring newly added case data, generating an incremental data set with the same format as the training data set, extracting and associating an address vector set and a high-risk feature vector set corresponding to the incremental data set, updating to the current high-risk area feature vector set, performing clustering model training again, and updating the model library.
The encoding of the address specifically includes: firstly, the national standard geographic information base is adopted for word segmentation, and each word is subjected to digital indexing, so that address vectorization is realized. To improve generalization, addresses are accurate to either city or county level.
And calculating similarity of the address vectors by an Euclidean distance algorithm, and combining address combinations with the similarity larger than a threshold value by multiple rounds of iteration.
The similarity of the address vectors is calculated as follows: the distance ρ (a, B) between a ═ a [1], a [2], …, a [ n ]) and B ═ B [1], B [2], …, B [ n ]) is defined by the following formula:
Figure RE-GDA0003543130300000051
where a smaller value of d indicates a higher degree of similarity for the two address vectors A, B.
The clustering model training specifically comprises the following steps: the clustering algorithm is a K-means algorithm realized based on Spark; calculating a K value; inputting the calculated K value and the feature vector; the calculated results are stored in a model library.
The clustering (K-Means) algorithm procedure is as follows:
1. given an initial data set
Figure RE-GDA0003543130300000061
The K-Means divides the data into K clusters, each cluster representing a different category;
2. from the training set
Figure RE-GDA0003543130300000062
In the method, K centroids are randomly selected and are respectively
Figure RE-GDA0003543130300000063
And initializing clusters
Figure RE-GDA0003543130300000064
3. Calculating xiDistance mu to centroid vectorjDistance d ofijSelecting dijTime of minimum CmIs xiClass of (1), update Cm=Cm∪xi
Figure RE-GDA0003543130300000065
4. Recalculating CjThe center of mass of (c):
Figure RE-GDA0003543130300000066
5. the above 3,4 process is repeated until the K centroid vectors are not changing or the number of iterations is reached.
When the K-Means clustering algorithm is adopted, the K value needs to be obtained manually or in a calculation mode, and the accuracy of the K value directly influences the final clustering effect. Generally, the K value is selected by adopting a manual + calculation mode. The K value is first estimated manually and then verified by the Elbow algorithm. The Elbow algorithm calculates the value of the loss function when different K values are obtained, and when the change rate of the loss function is changed greatly, the K value is a proper K value; after the K value is calculated, realizing a Kmeans algorithm based on Spark, inputting the calculated K value and the characteristic vector, and storing the obtained result in a result base;
during clustering calculation, if the clustering effect is not good, the encoding algorithm for adjusting the K value and modifying the characteristics is needed.
2. In a second embodiment, a high-risk area identification system based on clustering algorithm includes
3. The system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for butting an information system of a relevant department, acquiring historical case data, and generating a related data set comprising case information, address information and high-risk characteristics as a training data set according to the case data;
the address vector generating unit is used for extracting the address information in the training data set, coding addresses, generating an address vector corresponding to each address, and finally forming an address vector set;
the address vector merging unit is used for merging the address vectors with similarity exceeding a set threshold in the address vector set;
the second acquisition unit is used for butting an information system of a relevant department and acquiring an incremental data set by combining a real-time stream processing technology, wherein the incremental data set is new data which is continuously generated by updating along with time;
the high-risk feature vector generating unit is used for extracting high-risk features in the training data set and the incremental data set, and encoding the high-risk features to form a high-risk feature vector set;
the vector merging unit is used for obtaining a high-risk area feature vector set after associating the address vector set with the high-risk feature vector set;
the model base generation unit is used for calculating the high-risk area feature vector set by using a clustering algorithm, carrying out clustering model training and generating a model base;
the identification unit is used for extracting the living information of the target user and coding the living information to generate an address code to be identified;
and the model prediction unit is used for matching the address code to be identified with the model library and predicting and judging whether the target user is from an area with high-risk characteristics.
The system further comprises an updating unit for updating the model base, specifically: setting an updating period, periodically acquiring newly added case data, generating an incremental data set with the same format as the training data set, extracting and associating an address vector set and a high-risk feature vector set corresponding to the incremental data set, updating to the current high-risk area feature vector set, performing clustering model training again, and updating the model library.
In the description of the present invention, the foregoing detailed description has set forth various embodiments of the devices and/or processes via the use of block diagrams, flowcharts, and/or examples. To the extent that such block diagrams, flowcharts, and/or examples contain one or more functions and/or operations, those skilled in the art will appreciate that each function and/or operation within such block diagrams, flowcharts, or examples can be implemented, individually and/or collectively, by a wide range of different hardware, software, firmware, or virtually any combination thereof.
There is little difference between hardware and software implementations of aspects of the system; the use of hardware or software is typically (but not always, since in some scenarios the choice between hardware and software may become important) a design choice representing a cost versus efficiency tradeoff. There are various means (e.g., hardware, software, and/or firmware) by which processes and/or systems and/or other techniques described herein can be implemented, and the preferred means will vary from one scenario in which processes and/or systems and/or other techniques are deployed to another. For example, if the implementer determines that speed and accuracy are paramount, the implementer may opt for a mainly hardware and/or firmware approach; if flexibility is paramount, the implementer may opt for a mainly software implementation; alternatively, but again alternatively, the implementer may opt for some combination of hardware, software, and/or firmware.
The technical features of the present invention which are not described in the above embodiments may be implemented by or using the prior art, and are not described herein again, of course, the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and variations, modifications, additions or substitutions which may be made by those skilled in the art within the spirit and scope of the present invention should also fall within the protection scope of the present invention.

Claims (8)

1. A high-risk area identification method based on a clustering algorithm is characterized by comprising the following steps:
the method comprises the steps of butting an information system of a relevant department, obtaining historical case data, generating a relevant data set comprising case information, address information and high-risk characteristics according to the case data, and using the relevant data set as a training data set;
extracting address information in the training data set, coding the address information, generating an address vector corresponding to each address, and finally forming an address vector set;
merging the address vectors with similarity exceeding a set threshold in the address vector set;
extracting high-risk features in the training data set, and coding the high-risk features to form a high-risk feature vector set;
associating the address vector set with the high-risk feature vector set to obtain a high-risk region feature vector set;
calculating the feature vector set of the high-risk region by using a clustering algorithm, performing clustering model training, and generating a model library;
the method comprises the steps of extracting residence information according to identity data of a target user, and coding the residence information to generate an address code to be identified;
and matching the address code to be recognized with the model base, and judging whether the target user comes from an area with high-risk characteristics after model prediction.
2. The method for identifying high-risk regions based on clustering algorithm as claimed in claim 1, wherein an update cycle is set, newly added case data is periodically obtained, an incremental data set with the same format as the training data set is generated, an address vector set and a high-risk feature vector set corresponding to the incremental data set are extracted and associated, and updated to the current high-risk region feature vector set, clustering model training is performed again, and the model base is updated.
3. The high-risk region identification method based on clustering algorithm according to claim 1, wherein the encoding of the address specifically comprises: firstly, the national standard geographic information base is adopted for word segmentation, and each word is subjected to digital indexing, so that address vectorization is realized.
4. The method for identifying high-risk regions based on clustering algorithm as claimed in claim 3, wherein the similarity is calculated for the address vectors by Euclidean distance algorithm, and the address combinations with similarity greater than a threshold are combined by multiple iterations.
5. The high-risk region identification method based on the clustering algorithm as claimed in claim 4, wherein the similarity of the address vectors is calculated as follows: the distance ρ (a, B) between a ═ a [1], a [2], …, a [ n ]) and B ═ B [1], B [2], …, B [ n ]) is defined by the following formula:
Figure RE-FDA0003543130290000011
where a smaller value of d indicates a higher degree of similarity for the two address vectors A, B.
6. The high-risk region identification method based on the clustering algorithm as claimed in claim 4, wherein the clustering model training is specifically: the clustering algorithm is a K-means algorithm realized based on Spark; calculating a K value; inputting the calculated K value and the feature vector; the calculated results are stored in a model library.
7. A high-risk area identification system based on a clustering algorithm is characterized by comprising a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for butting information systems of related departments, acquiring historical case data, and generating a related data set comprising case information, address information and high-risk characteristics according to the case data to serve as a training data set;
the address vector generating unit is used for extracting the address information in the training data set, coding addresses, generating an address vector corresponding to each address, and finally forming an address vector set;
the address vector merging unit is used for merging the address vectors with similarity exceeding a set threshold in the address vector set;
the second acquisition unit is used for butting an information system of a relevant department and acquiring an incremental data set by combining a real-time stream processing technology, wherein the incremental data set is new data which is continuously generated by updating along with time;
the high-risk feature vector generating unit is used for extracting high-risk features in the training data set and the incremental data set, and encoding the high-risk features to form a high-risk feature vector set;
the vector merging unit is used for obtaining a high-risk area feature vector set after associating the address vector set with the high-risk feature vector set;
the model base generation unit is used for calculating the high-risk area feature vector set by using a clustering algorithm, carrying out clustering model training and generating a model base;
the identification unit is used for extracting the living information of the target user and coding the living information to generate an address code to be identified;
and the model prediction unit is used for matching the address code to be identified with the model library and predicting and judging whether the target user is from an area with high-risk characteristics.
8. The high-risk region identification system based on clustering algorithm according to claim 7, further comprising an updating unit for updating the model base, specifically: setting an updating period, periodically acquiring newly added case data, generating an incremental data set with the same format as the training data set, extracting and associating an address vector set and a high-risk feature vector set corresponding to the incremental data set, updating to the current high-risk area feature vector set, performing clustering model training again, and updating the model library.
CN202111229509.XA 2021-10-21 2021-10-21 High-risk area identification method based on clustering algorithm Pending CN114398934A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111229509.XA CN114398934A (en) 2021-10-21 2021-10-21 High-risk area identification method based on clustering algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111229509.XA CN114398934A (en) 2021-10-21 2021-10-21 High-risk area identification method based on clustering algorithm

Publications (1)

Publication Number Publication Date
CN114398934A true CN114398934A (en) 2022-04-26

Family

ID=81225266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111229509.XA Pending CN114398934A (en) 2021-10-21 2021-10-21 High-risk area identification method based on clustering algorithm

Country Status (1)

Country Link
CN (1) CN114398934A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170053A (en) * 2022-05-24 2022-10-11 中睿信数字技术有限公司 Event distribution processing system based on cluster fusion

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170053A (en) * 2022-05-24 2022-10-11 中睿信数字技术有限公司 Event distribution processing system based on cluster fusion

Similar Documents

Publication Publication Date Title
CN108536851B (en) User identity recognition method based on moving track similarity comparison
CN111783875A (en) Abnormal user detection method, device, equipment and medium based on cluster analysis
CN104765768A (en) Mass face database rapid and accurate retrieval method
CN111159387B (en) Recommendation method based on multi-dimensional alarm information text similarity analysis
CN112818690B (en) Semantic recognition method and device combined with knowledge graph entity information and related equipment
CN111209457B (en) Target typical activity pattern deviation warning method
Nikfalazar et al. A new iterative fuzzy clustering algorithm for multiple imputation of missing data
CN112199670B (en) Log monitoring method for improving IFOREST (entry face detection sequence) to conduct abnormity detection based on deep learning
CN110909106A (en) Trajectory prediction method and system
CN111931077B (en) Data processing method, device, electronic equipment and storage medium
CN106441316A (en) Single-point road network matching method based on historical data
CN114398934A (en) High-risk area identification method based on clustering algorithm
CN109445844A (en) Code Clones detection method based on cryptographic Hash, electronic equipment, storage medium
CN113839926A (en) Intrusion detection system modeling method, system and device based on gray wolf algorithm feature selection
CN109121133B (en) Location privacy protection method and device
CN109886206B (en) Three-dimensional object identification method and equipment
CN116561327B (en) Government affair data management method based on clustering algorithm
CN112100617B (en) Abnormal SQL detection method and device
CN115687429A (en) Social media user behavior pattern mining method
CN110807546A (en) Community grid population change early warning method and system
CN117093849A (en) Digital matrix feature analysis method based on automatic generation model
US11914956B1 (en) Unusual score generators for a neuro-linguistic behavioral recognition system
CN115048682B (en) Safe storage method for land circulation information
CN111144424A (en) Personnel feature detection and analysis method based on clustering algorithm
CN116451032A (en) AIS data restoration method based on DE-LSSVM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination