CN112308117A - Homogeneous crowd identification method based on double-index particle swarm algorithm - Google Patents

Homogeneous crowd identification method based on double-index particle swarm algorithm Download PDF

Info

Publication number
CN112308117A
CN112308117A CN202011075681.XA CN202011075681A CN112308117A CN 112308117 A CN112308117 A CN 112308117A CN 202011075681 A CN202011075681 A CN 202011075681A CN 112308117 A CN112308117 A CN 112308117A
Authority
CN
China
Prior art keywords
index
data set
adaptive
particle
particle swarm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011075681.XA
Other languages
Chinese (zh)
Inventor
胡晓敏
李瑞珠
李敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202011075681.XA priority Critical patent/CN112308117A/en
Publication of CN112308117A publication Critical patent/CN112308117A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a homogeneous crowd identification method based on a double-index particle swarm algorithm, aiming at the defect that a public health service platform cannot be comprehensively analyzed by a single-index clustering algorithm, and the method comprises the following steps: collecting the use crowd information of a public health service platform as a user information data set; obtaining two initial adaptive values of the user information data set through a clustering algorithm; and iterating the two initial adaptive values as adaptive functions to obtain a clustering result and obtain homogeneous crowd information data. According to the method, optimization guidance is performed on the particle swarm clustering on the basis of two index results through double-index adaptive value evaluation, the evaluation tendency of a single index is eliminated, the application unicity of an internal index is expanded, the labor and the time can be saved, and the complex and diverse crowd information can be comprehensively analyzed.

Description

Homogeneous crowd identification method based on double-index particle swarm algorithm
Technical Field
The invention relates to the field of population intelligent evolution, in particular to a homogeneous population identification method based on a double-index particle swarm algorithm.
Background
At present, for the identification of homogeneous population, many clustering algorithms are applied to the identification of population at home and abroad, for example: the K-means algorithm, but it has the disadvantage of being sensitive to the initial center and to the value of K; the grid-based method has a defect in precision; a multiple regression approach, which is overly sensitive to data; the density-based method has low noise immunity and depends on the value of the field radius.
In addition, because the design of the internal indexes has a tendency, the expression capability of a single index is limited, so that the genetic algorithm is combined with a clustering algorithm, a differential algorithm is combined with the clustering algorithm and the like, and the obtained result is also unified by taking the single index as an optimized adaptive value evaluation method.
For workers engaged in public health service, the working platform is optimized according to various kinds of people using the service platform, so that the public health service can meet the requirements of people to the maximum extent is extremely important and is required to be carried out along with the use of people all the time. The public health service platform has diverse use groups, and different age groups, different living environments and the like can be different. In the past, most of platform optimization is to know users in the form of on-line questionnaire survey and off-line visit survey, or to analyze the user data through a single-index clustering algorithm, but the above approaches need to consume a lot of manpower and time, and the analysis is not comprehensive enough in the face of complexity and diversity of crowd information.
Disclosure of Invention
The invention provides a homogeneous population identification method based on a double-index particle swarm algorithm aiming at the defect that a public health service platform cannot be comprehensively analyzed by a single-index clustering algorithm, and the homogeneous population clustering analysis is carried out by utilizing double-index adaptive value evaluation, so that the efficiency and the comprehensiveness of platform optimization are fully improved.
The technical scheme adopted by the invention for solving the technical problems is as follows: a homogeneous population identification method based on a double-index particle swarm algorithm comprises the following steps:
collecting the use crowd information of a public health service platform as a user information data set;
obtaining two initial adaptive values of the user information data set through a clustering algorithm;
and iterating the two initial adaptive values as adaptive functions to obtain a clustering result and obtain homogeneous crowd information data.
The method for acquiring the use crowd information of the public health service platform as the user information data set comprises the following steps:
the method comprises the steps that information of users of a public health service platform is converted into numbers to obtain a data set;
converting the data set into a readable file format;
removing useless attribute data columns in the data set to obtain a processed data set;
and carrying out standardization processing on the processed data set to obtain the user information data set.
The readable file format comprises: csv format and/or bat format.
The information of the using population of the public health service platform comprises the following steps: nationality, place of residence, age information.
Two of the "initial adaptation values" are: fitness1(CH), Fitness2(S _ Dbw).
The step of acquiring the Fitness1(CH) comprises the following steps:
index formula:
Figure BDA0002716336650000021
Figure BDA0002716336650000022
traces representing dispersion matrices in the categories;
Figure BDA0002716336650000023
the trace of the inter-class dispersion matrix is represented, and m represents the mean vector of the whole data set; n is the number of samples; and K is the iteration number.
The step of acquiring the Fitness2(S _ Dbw) is as follows:
index formula: s _ dbw (c) ═ scat (c) + Dens _ bw (c)
Wherein:
Figure BDA0002716336650000024
dens _ bw (c) to evaluate the relationship of the density of two classes together and the density of each individual class;
Figure BDA0002716336650000031
density (u) is used to characterize the number of points around u, and the threshold for comparison is stdev in 1;
Figure BDA0002716336650000032
stdev represents the average deviation of each cluster of a data set;
Figure BDA0002716336650000033
scater (c) represents the mean dispersion between the classes.
The process of taking the two initial adaptive values as adaptive functions to iterate to obtain clustering results and obtain homogeneous crowd information data is as follows:
a. initializing various parameters of a particle swarm algorithm:
b. respectively calculating an adaptive value Fitness1 and an adaptive value Fitness2 of each particle according to CH and S _ Dbw index formulas;
c. for each particle, comparing the adaptive value Fitness1 of the current position with the adaptive value Fitness2 and two adaptive values corresponding to the historical optimal position of the particle, if the two adaptive values of the current position are higher, updating the historical optimal position by using the current position, otherwise, not updating;
d. for each particle, comparing the adaptive value Fitness1 of the current position with the adaptive value Fitness2 and two adaptive values corresponding to the global optimal position of the particle, if the two adaptive values of the current position are higher, updating the global optimal position by using the current position, otherwise, not updating;
e. update of position and velocity of particles:
particle velocity update formula:
Figure BDA0002716336650000034
particle position update formula:
Figure BDA0002716336650000035
wherein Vidk represents the d-dimensional component of the velocity vector of the particle i at the k iteration; xidk represents the d-dimensional component of the position vector of the k-th iteration particle i; c1, c2 represents an acceleration constant; r1 and r2 represent two random parameters, and the value range is [0,1 ]; w represents an inertial weight;
f. if the end condition is not met, returning to the step b, if the end condition is met, ending the algorithm, and obtaining a global optimal position, namely a global optimal solution;
and finally, outputting a clustering result to obtain accurate homogeneous crowd information data.
The invention has the beneficial effects that: according to the method, optimization guidance is performed on particle swarm clustering on the basis of two index results through double-index adaptive value evaluation, the evaluation tendency of a single index is eliminated, and the application unicity of an internal index is expanded; compared with a single index, the adaptive value evaluation of the double indexes can more completely keep the crowd data information in the clustering process, and the clustering result based on the particle swarm algorithm is guaranteed to have the optimal results of inter-class dispersion and intra-class compactness to the greatest extent; according to the method, under the condition that the population data attribute number is large, the double indicators can obtain the most similar category division of all samples in a more inclusive manner, so that the labor and the time can be saved, and the complex and diverse population information can be comprehensively analyzed.
Drawings
Fig. 1 is a flowchart of user population information extraction according to the present invention.
FIG. 2 is a code map used by the algorithm of the present invention.
FIG. 3 is a flow chart of the dual-index particle swarm algorithm of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings.
Referring to fig. 1 to 3, the homogeneous population identification method based on the dual-index particle swarm algorithm of the invention comprises the following steps:
collecting the use crowd information of a public health service platform as a user information data set;
obtaining two initial adaptive values of the user information data set through a clustering algorithm;
and iterating the two initial adaptive values as adaptive functions to obtain a clustering result and obtain homogeneous crowd information data.
The method comprises the following specific steps:
firstly, the information of the public health service platform using population is converted into numbers, such as nationality and residence, age information, Body Mass Index (BMI), daily life activity score (ADL); and then converting the data set into a file format readable by the patent program, such as csv and bat, reading the data set, reading all attribute values of the crowd information, screening attributes, eliminating useless attribute data columns, containing N samples, wherein each sample has D attributes, and then standardizing all data to obtain a user information data set. In the actual case obj [ i ]. dataX [ j ] represents the jth attribute of the ith sample.
After all the steps are completed, a clustering algorithm program enters into a population individual code, a population is initialized, initial clustering clusters are obtained by utilizing a nearest neighbor clustering rule NMP (nearest multiple protocols), the initial class center of each cluster is obtained by utilizing an Euclidean distance formula, and then initial adaptive values Fitness1(CH) and Fitness2(S _ Dbw) are obtained
①CH(Calinski-Harabasz):
Figure BDA0002716336650000051
Figure BDA0002716336650000052
The trace representing the dispersion matrix in the class is the sum of the distances of the individual data points to the centroid point of the class in which they are located.
Wherein,
Figure BDA0002716336650000053
traces representing the inter-class dispersion matrix (a matrix consisting of vector distance values), and m represents the mean vector of the entire data set.
②S_Dbw:
S_Dbw(c)=Scat(c)+Dens_bw(c)
Figure BDA0002716336650000054
Dens _ bw (c) is used to evaluate the density of two classes together, as well as the density of each individual class. The density of the two classes together, which is significantly less than the density of each individual class, indicates better clustering.
Figure BDA0002716336650000055
Density (u) is used to characterize the number of points around u, and the threshold for comparison is stdev in 1.
Figure BDA0002716336650000061
Figure BDA0002716336650000062
stdev represents the average deviation of classes of a data set.
Figure BDA0002716336650000063
Scater (c) represents the mean dispersion between the classes.
Calculating the two internal indexes as adaptive value functions, and then entering the following iterative process:
while (number of cycles less than set value) do
a. Initializing various parameters of a particle swarm algorithm:
position of particle i: x is the number ofi=(xi1,xi2,...,xiD);
Velocity of particle i: vi ═ (vi1, vi 1.., viD);
optimal positions that particle i has experienced: pbesti ═ (pi1, pi 2.., piD), which corresponds to bfittness 1 and bfittness 2;
optimal positions that the population of particles has experienced: gbesti ═ (gi1, gi 2.., giD), which corresponds to gfittness 1 and gfittness 2;
the position of each dimension is limited to the interval: [ Xmin, d, Xmax, d ];
the speed in each dimension is limited to the interval: [ -Vmax, d, Vmax, d ];
b. calculating the adaptive values Fitness1 and Fitness2 of each particle according to CH and S _ Dbw index formulas;
c. for each particle, comparing the two adaptive values of the current position of the particle with the two adaptive values corresponding to the historical best position (pbest), if the two adaptive values of the current position are higher, updating the historical best position by using the current position, otherwise, not updating;
d. for each particle, comparing the two adaptive values of the current position of the particle with the two adaptive values corresponding to the global optimal position (gbest), if the two adaptive values of the current position are higher, updating the global optimal position by using the current position, otherwise, not updating;
e. the position and velocity of the particles are updated according to the formula:
particle velocity update formula:
Figure BDA0002716336650000071
particle position update formula:
Figure BDA0002716336650000072
wherein Vidk represents the d-dimensional component of the velocity vector of the particle i at the k iteration; xidk represents the d-dimensional component of the position vector of the k-th iteration particle i; c1, c2 represents an acceleration constant, and the maximum step size of the adjustment learning is obtained; r1 and r2 represent two random parameters, and the value range [0,1] is used for increasing the randomness w of the search to represent the inertia weight and a non-negative number and is used for adjusting the search range of the solution space;
f. and if the ending condition is not met, returning to the step b, if the ending condition is met, ending the algorithm, and obtaining a global optimal position (gbest), namely a global optimal solution.
end while
And finally, outputting a clustering result to obtain accurate homogeneous crowd information data.
It is to be understood that: besides CH and S _ Dbw used herein, other internal indicators may be selected for adaptive value replacement according to the characteristics of the indicators and the purpose of problem analysis. Meanwhile, clustering based on genetic algorithm and clustering combined with differential algorithm can be used for replacing clustering scheme based on particle swarm algorithm to perform clustering analysis.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily change or replace the present invention within the technical scope of the present invention, and the present invention is within the scope of the present invention. Therefore, the protection scope of the present invention is subject to the protection scope of the claims.

Claims (8)

1. A homogeneous population identification method based on a double-index particle swarm algorithm is characterized by comprising the following steps:
collecting the use crowd information of a public health service platform as a user information data set;
obtaining two initial adaptive values of the user information data set through a clustering algorithm;
and iterating the two initial adaptive values as adaptive functions to obtain a clustering result and obtain homogeneous crowd information data.
2. The homogeneous population recognition method based on the dual-index particle swarm algorithm according to claim 1, wherein the collecting of the use population information of the public health service platform as the user information data set comprises:
the method comprises the steps that information of users of a public health service platform is converted into numbers to obtain a data set;
converting the data set into a readable file format;
removing useless attribute data columns in the data set to obtain a processed data set;
and carrying out standardization processing on the processed data set to obtain the user information data set.
3. The homogeneous population recognition method based on the dual-index particle swarm algorithm of claim 2, wherein the "readable file format" comprises: csv format or bat format.
4. The homogeneous population identification method based on the dual-index particle swarm algorithm according to claim 2, wherein the information of the people using the public health service platform comprises: nationality, place of residence, age information.
5. The homogeneous population recognition method based on the dual-index particle swarm algorithm according to claim 1, wherein the two "initial adaptive values" are: fitness1(CH), Fitness2(S _ Dbw).
6. The homogeneous population recognition method based on the dual-index particle swarm algorithm of claim 5, wherein the Fitness1(CH) obtaining step is:
Figure FDA0002716336640000011
Figure FDA0002716336640000012
traces representing dispersion matrices in the categories;
Figure FDA0002716336640000021
the trace of the inter-class dispersion matrix is represented, and m represents the mean vector of the whole data set; n is the number of samples; and K is the iteration number.
7. The homogeneous population recognition method based on the dual-index particle swarm algorithm according to claim 5, wherein the Fitness2(S _ Dbw) obtaining step is:
S_Dbw(c)=Scat(c)+Dens_bw(c)
wherein:
Figure FDA0002716336640000022
dens _ bw (c) to evaluate the relationship of the density of two classes together and the density of each individual class;
Figure FDA0002716336640000023
density (u) is used to characterize the number of points around u, and the threshold for comparison is stdev in 1;
Figure FDA0002716336640000024
stdev represents the average deviation of classes of a data set;
Figure FDA0002716336640000025
scater (c) represents the mean dispersion between the classes.
8. The homogeneous population identification method based on the dual-index particle swarm algorithm according to claim 5, wherein the process of obtaining the homogeneous population information data by iterating the two initial adaptive values as adaptive functions to obtain clustering results is as follows:
a. initializing various parameters of a particle swarm algorithm:
b. respectively calculating an adaptive value Fitness1 and an adaptive value Fitness2 of each particle according to CH and S _ Dbw index formulas;
c. for each particle, comparing the adaptive value Fitness1 of the current position with the adaptive value Fitness2 and two adaptive values corresponding to the historical optimal position of the particle, if the two adaptive values of the current position are higher, updating the historical optimal position by using the current position, otherwise, not updating;
d. for each particle, comparing the adaptive value Fitness1 of the current position with the adaptive value Fitness2 and two adaptive values corresponding to the global optimal position of the particle, if the two adaptive values of the current position are higher, updating the global optimal position by using the current position, otherwise, not updating;
e. update of position and velocity of particles:
particle velocity update formula:
Figure FDA0002716336640000031
particle position update formula:
Figure FDA0002716336640000032
wherein, Vid kA d-dimension component representing a velocity vector of a particle i at the k-th iteration; x is the number ofi dkA d-dimension component representing a location vector of a particle i at the k-th iteration; c. C1,c2Represents an acceleration constant; r is1,r2Represents two random parameters with the value range of [0,1]](ii) a w represents an inertial weight;
f. if the end condition is not met, returning to the step b, if the end condition is met, ending the algorithm, and obtaining a global optimal position, namely a global optimal solution;
and finally, outputting a clustering result to obtain accurate homogeneous crowd information data.
CN202011075681.XA 2020-10-09 2020-10-09 Homogeneous crowd identification method based on double-index particle swarm algorithm Pending CN112308117A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011075681.XA CN112308117A (en) 2020-10-09 2020-10-09 Homogeneous crowd identification method based on double-index particle swarm algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011075681.XA CN112308117A (en) 2020-10-09 2020-10-09 Homogeneous crowd identification method based on double-index particle swarm algorithm

Publications (1)

Publication Number Publication Date
CN112308117A true CN112308117A (en) 2021-02-02

Family

ID=74489394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011075681.XA Pending CN112308117A (en) 2020-10-09 2020-10-09 Homogeneous crowd identification method based on double-index particle swarm algorithm

Country Status (1)

Country Link
CN (1) CN112308117A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663100A (en) * 2012-04-13 2012-09-12 西安电子科技大学 Two-stage hybrid particle swarm optimization clustering method
US20140257767A1 (en) * 2013-03-09 2014-09-11 Bigwood Technology, Inc. PSO-Guided Trust-Tech Methods for Global Unconstrained Optimization
CN107220841A (en) * 2016-03-22 2017-09-29 上海市玻森数据科技有限公司 A kind of clustering system based on business data
CN111723842A (en) * 2020-05-14 2020-09-29 广东工业大学 Particle swarm adaptive clustering optimization method based on multi-index evaluation mechanism

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663100A (en) * 2012-04-13 2012-09-12 西安电子科技大学 Two-stage hybrid particle swarm optimization clustering method
US20140257767A1 (en) * 2013-03-09 2014-09-11 Bigwood Technology, Inc. PSO-Guided Trust-Tech Methods for Global Unconstrained Optimization
CN107220841A (en) * 2016-03-22 2017-09-29 上海市玻森数据科技有限公司 A kind of clustering system based on business data
CN111723842A (en) * 2020-05-14 2020-09-29 广东工业大学 Particle swarm adaptive clustering optimization method based on multi-index evaluation mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
巧不巧克力: "Python商品数据预处理与K-Means聚类可视化分析", 《简书-HTTPS://WWW.JIANSHU.COM/P/5FC09877ECC1》 *
悄悄不加糖: "聚类分析实验(一)数据预处理", 《CSDN博客-HTTPS://BLOG.CSDN.NET/WEIXIN_42927719/ARTICLE/DETAILS/88829548》 *

Similar Documents

Publication Publication Date Title
CN107247737B (en) The analysis of platform area default electricity use and method for digging based on electricity consumption
CN111324642A (en) Model algorithm type selection and evaluation method for power grid big data analysis
CN111553127A (en) Multi-label text data feature selection method and device
CN112529638B (en) Service demand dynamic prediction method and system based on user classification and deep learning
CN108345908A (en) Sorting technique, sorting device and the storage medium of electric network data
CN110880369A (en) Gas marker detection method based on radial basis function neural network and application
CN108762503A (en) A kind of man-machine interactive system based on multi-modal data acquisition
CN104850868A (en) Customer segmentation method based on k-means and neural network cluster
WO2024131524A1 (en) Depression diet management method based on food image segmentation
CN111931616A (en) Emotion recognition method and system based on mobile intelligent terminal sensor equipment
CN110334777A (en) A kind of unsupervised attribute selection method of weighting multi-angle of view
CN112070126A (en) Internet of things data mining method
CN117314593B (en) Insurance item pushing method and system based on user behavior analysis
CN110503145A (en) A kind of typical load curve acquisition methods based on k-shape cluster
CN112418987B (en) Method and system for rating credit of transportation unit, electronic device and storage medium
CN112347162A (en) Multivariate time sequence data rule mining method based on online learning
CN112308117A (en) Homogeneous crowd identification method based on double-index particle swarm algorithm
CN111488520A (en) Crop planting species recommendation information processing device and method and storage medium
CN116361629A (en) Method and system for reducing dimension of vibration signal characteristics of mill cylinder
CN113570349B (en) EM-FCM algorithm-based image writing method and system for electric power staff
CN109063735A (en) A kind of classification of insect Design Method based on insect biology parameter
CN112201355B (en) Construction method of health evaluation iterative classifier model
CN110348323B (en) Wearable device gesture recognition method based on neural network optimization
CN113962509B (en) Temperature-sensitive load extraction and sensitivity calculation method based on load clustering
CN107944463A (en) A kind of micro- condition detection method of dynamic network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210202