CN112308117A - Homogeneous crowd identification method based on double-index particle swarm algorithm - Google Patents
Homogeneous crowd identification method based on double-index particle swarm algorithm Download PDFInfo
- Publication number
- CN112308117A CN112308117A CN202011075681.XA CN202011075681A CN112308117A CN 112308117 A CN112308117 A CN 112308117A CN 202011075681 A CN202011075681 A CN 202011075681A CN 112308117 A CN112308117 A CN 112308117A
- Authority
- CN
- China
- Prior art keywords
- index
- data set
- adaptive
- particle
- particle swarm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000002245 particle Substances 0.000 title claims abstract description 57
- 238000000034 method Methods 0.000 title claims abstract description 28
- 230000003044 adaptive effect Effects 0.000 claims abstract description 52
- 230000005180 public health Effects 0.000 claims abstract description 16
- 230000006870 function Effects 0.000 claims abstract description 7
- 239000006185 dispersion Substances 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 5
- 241001274197 Scatophagus argus Species 0.000 claims description 3
- 230000001133 acceleration Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 2
- 238000011156 evaluation Methods 0.000 abstract description 7
- 238000005457 optimization Methods 0.000 abstract description 4
- 230000007547 defect Effects 0.000 abstract description 3
- 238000013459 approach Methods 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a homogeneous crowd identification method based on a double-index particle swarm algorithm, aiming at the defect that a public health service platform cannot be comprehensively analyzed by a single-index clustering algorithm, and the method comprises the following steps: collecting the use crowd information of a public health service platform as a user information data set; obtaining two initial adaptive values of the user information data set through a clustering algorithm; and iterating the two initial adaptive values as adaptive functions to obtain a clustering result and obtain homogeneous crowd information data. According to the method, optimization guidance is performed on the particle swarm clustering on the basis of two index results through double-index adaptive value evaluation, the evaluation tendency of a single index is eliminated, the application unicity of an internal index is expanded, the labor and the time can be saved, and the complex and diverse crowd information can be comprehensively analyzed.
Description
Technical Field
The invention relates to the field of population intelligent evolution, in particular to a homogeneous population identification method based on a double-index particle swarm algorithm.
Background
At present, for the identification of homogeneous population, many clustering algorithms are applied to the identification of population at home and abroad, for example: the K-means algorithm, but it has the disadvantage of being sensitive to the initial center and to the value of K; the grid-based method has a defect in precision; a multiple regression approach, which is overly sensitive to data; the density-based method has low noise immunity and depends on the value of the field radius.
In addition, because the design of the internal indexes has a tendency, the expression capability of a single index is limited, so that the genetic algorithm is combined with a clustering algorithm, a differential algorithm is combined with the clustering algorithm and the like, and the obtained result is also unified by taking the single index as an optimized adaptive value evaluation method.
For workers engaged in public health service, the working platform is optimized according to various kinds of people using the service platform, so that the public health service can meet the requirements of people to the maximum extent is extremely important and is required to be carried out along with the use of people all the time. The public health service platform has diverse use groups, and different age groups, different living environments and the like can be different. In the past, most of platform optimization is to know users in the form of on-line questionnaire survey and off-line visit survey, or to analyze the user data through a single-index clustering algorithm, but the above approaches need to consume a lot of manpower and time, and the analysis is not comprehensive enough in the face of complexity and diversity of crowd information.
Disclosure of Invention
The invention provides a homogeneous population identification method based on a double-index particle swarm algorithm aiming at the defect that a public health service platform cannot be comprehensively analyzed by a single-index clustering algorithm, and the homogeneous population clustering analysis is carried out by utilizing double-index adaptive value evaluation, so that the efficiency and the comprehensiveness of platform optimization are fully improved.
The technical scheme adopted by the invention for solving the technical problems is as follows: a homogeneous population identification method based on a double-index particle swarm algorithm comprises the following steps:
collecting the use crowd information of a public health service platform as a user information data set;
obtaining two initial adaptive values of the user information data set through a clustering algorithm;
and iterating the two initial adaptive values as adaptive functions to obtain a clustering result and obtain homogeneous crowd information data.
The method for acquiring the use crowd information of the public health service platform as the user information data set comprises the following steps:
the method comprises the steps that information of users of a public health service platform is converted into numbers to obtain a data set;
converting the data set into a readable file format;
removing useless attribute data columns in the data set to obtain a processed data set;
and carrying out standardization processing on the processed data set to obtain the user information data set.
The readable file format comprises: csv format and/or bat format.
The information of the using population of the public health service platform comprises the following steps: nationality, place of residence, age information.
Two of the "initial adaptation values" are: fitness1(CH), Fitness2(S _ Dbw).
The step of acquiring the Fitness1(CH) comprises the following steps:
the trace of the inter-class dispersion matrix is represented, and m represents the mean vector of the whole data set; n is the number of samples; and K is the iteration number.
The step of acquiring the Fitness2(S _ Dbw) is as follows:
index formula: s _ dbw (c) ═ scat (c) + Dens _ bw (c)
Wherein:
dens _ bw (c) to evaluate the relationship of the density of two classes together and the density of each individual class;
density (u) is used to characterize the number of points around u, and the threshold for comparison is stdev in 1;
stdev represents the average deviation of each cluster of a data set;
scater (c) represents the mean dispersion between the classes.
The process of taking the two initial adaptive values as adaptive functions to iterate to obtain clustering results and obtain homogeneous crowd information data is as follows:
a. initializing various parameters of a particle swarm algorithm:
b. respectively calculating an adaptive value Fitness1 and an adaptive value Fitness2 of each particle according to CH and S _ Dbw index formulas;
c. for each particle, comparing the adaptive value Fitness1 of the current position with the adaptive value Fitness2 and two adaptive values corresponding to the historical optimal position of the particle, if the two adaptive values of the current position are higher, updating the historical optimal position by using the current position, otherwise, not updating;
d. for each particle, comparing the adaptive value Fitness1 of the current position with the adaptive value Fitness2 and two adaptive values corresponding to the global optimal position of the particle, if the two adaptive values of the current position are higher, updating the global optimal position by using the current position, otherwise, not updating;
e. update of position and velocity of particles:
particle velocity update formula:
particle position update formula:
wherein Vidk represents the d-dimensional component of the velocity vector of the particle i at the k iteration; xidk represents the d-dimensional component of the position vector of the k-th iteration particle i; c1, c2 represents an acceleration constant; r1 and r2 represent two random parameters, and the value range is [0,1 ]; w represents an inertial weight;
f. if the end condition is not met, returning to the step b, if the end condition is met, ending the algorithm, and obtaining a global optimal position, namely a global optimal solution;
and finally, outputting a clustering result to obtain accurate homogeneous crowd information data.
The invention has the beneficial effects that: according to the method, optimization guidance is performed on particle swarm clustering on the basis of two index results through double-index adaptive value evaluation, the evaluation tendency of a single index is eliminated, and the application unicity of an internal index is expanded; compared with a single index, the adaptive value evaluation of the double indexes can more completely keep the crowd data information in the clustering process, and the clustering result based on the particle swarm algorithm is guaranteed to have the optimal results of inter-class dispersion and intra-class compactness to the greatest extent; according to the method, under the condition that the population data attribute number is large, the double indicators can obtain the most similar category division of all samples in a more inclusive manner, so that the labor and the time can be saved, and the complex and diverse population information can be comprehensively analyzed.
Drawings
Fig. 1 is a flowchart of user population information extraction according to the present invention.
FIG. 2 is a code map used by the algorithm of the present invention.
FIG. 3 is a flow chart of the dual-index particle swarm algorithm of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings.
Referring to fig. 1 to 3, the homogeneous population identification method based on the dual-index particle swarm algorithm of the invention comprises the following steps:
collecting the use crowd information of a public health service platform as a user information data set;
obtaining two initial adaptive values of the user information data set through a clustering algorithm;
and iterating the two initial adaptive values as adaptive functions to obtain a clustering result and obtain homogeneous crowd information data.
The method comprises the following specific steps:
firstly, the information of the public health service platform using population is converted into numbers, such as nationality and residence, age information, Body Mass Index (BMI), daily life activity score (ADL); and then converting the data set into a file format readable by the patent program, such as csv and bat, reading the data set, reading all attribute values of the crowd information, screening attributes, eliminating useless attribute data columns, containing N samples, wherein each sample has D attributes, and then standardizing all data to obtain a user information data set. In the actual case obj [ i ]. dataX [ j ] represents the jth attribute of the ith sample.
After all the steps are completed, a clustering algorithm program enters into a population individual code, a population is initialized, initial clustering clusters are obtained by utilizing a nearest neighbor clustering rule NMP (nearest multiple protocols), the initial class center of each cluster is obtained by utilizing an Euclidean distance formula, and then initial adaptive values Fitness1(CH) and Fitness2(S _ Dbw) are obtained
①CH(Calinski-Harabasz):
The trace representing the dispersion matrix in the class is the sum of the distances of the individual data points to the centroid point of the class in which they are located.
Wherein,traces representing the inter-class dispersion matrix (a matrix consisting of vector distance values), and m represents the mean vector of the entire data set.
②S_Dbw:
S_Dbw(c)=Scat(c)+Dens_bw(c)
Dens _ bw (c) is used to evaluate the density of two classes together, as well as the density of each individual class. The density of the two classes together, which is significantly less than the density of each individual class, indicates better clustering.
Density (u) is used to characterize the number of points around u, and the threshold for comparison is stdev in 1.
stdev represents the average deviation of classes of a data set.
Scater (c) represents the mean dispersion between the classes.
Calculating the two internal indexes as adaptive value functions, and then entering the following iterative process:
while (number of cycles less than set value) do
a. Initializing various parameters of a particle swarm algorithm:
position of particle i: x is the number ofi=(xi1,xi2,...,xiD);
Velocity of particle i: vi ═ (vi1, vi 1.., viD);
optimal positions that particle i has experienced: pbesti ═ (pi1, pi 2.., piD), which corresponds to bfittness 1 and bfittness 2;
optimal positions that the population of particles has experienced: gbesti ═ (gi1, gi 2.., giD), which corresponds to gfittness 1 and gfittness 2;
the position of each dimension is limited to the interval: [ Xmin, d, Xmax, d ];
the speed in each dimension is limited to the interval: [ -Vmax, d, Vmax, d ];
b. calculating the adaptive values Fitness1 and Fitness2 of each particle according to CH and S _ Dbw index formulas;
c. for each particle, comparing the two adaptive values of the current position of the particle with the two adaptive values corresponding to the historical best position (pbest), if the two adaptive values of the current position are higher, updating the historical best position by using the current position, otherwise, not updating;
d. for each particle, comparing the two adaptive values of the current position of the particle with the two adaptive values corresponding to the global optimal position (gbest), if the two adaptive values of the current position are higher, updating the global optimal position by using the current position, otherwise, not updating;
e. the position and velocity of the particles are updated according to the formula:
particle velocity update formula:
particle position update formula:
wherein Vidk represents the d-dimensional component of the velocity vector of the particle i at the k iteration; xidk represents the d-dimensional component of the position vector of the k-th iteration particle i; c1, c2 represents an acceleration constant, and the maximum step size of the adjustment learning is obtained; r1 and r2 represent two random parameters, and the value range [0,1] is used for increasing the randomness w of the search to represent the inertia weight and a non-negative number and is used for adjusting the search range of the solution space;
f. and if the ending condition is not met, returning to the step b, if the ending condition is met, ending the algorithm, and obtaining a global optimal position (gbest), namely a global optimal solution.
end while
And finally, outputting a clustering result to obtain accurate homogeneous crowd information data.
It is to be understood that: besides CH and S _ Dbw used herein, other internal indicators may be selected for adaptive value replacement according to the characteristics of the indicators and the purpose of problem analysis. Meanwhile, clustering based on genetic algorithm and clustering combined with differential algorithm can be used for replacing clustering scheme based on particle swarm algorithm to perform clustering analysis.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily change or replace the present invention within the technical scope of the present invention, and the present invention is within the scope of the present invention. Therefore, the protection scope of the present invention is subject to the protection scope of the claims.
Claims (8)
1. A homogeneous population identification method based on a double-index particle swarm algorithm is characterized by comprising the following steps:
collecting the use crowd information of a public health service platform as a user information data set;
obtaining two initial adaptive values of the user information data set through a clustering algorithm;
and iterating the two initial adaptive values as adaptive functions to obtain a clustering result and obtain homogeneous crowd information data.
2. The homogeneous population recognition method based on the dual-index particle swarm algorithm according to claim 1, wherein the collecting of the use population information of the public health service platform as the user information data set comprises:
the method comprises the steps that information of users of a public health service platform is converted into numbers to obtain a data set;
converting the data set into a readable file format;
removing useless attribute data columns in the data set to obtain a processed data set;
and carrying out standardization processing on the processed data set to obtain the user information data set.
3. The homogeneous population recognition method based on the dual-index particle swarm algorithm of claim 2, wherein the "readable file format" comprises: csv format or bat format.
4. The homogeneous population identification method based on the dual-index particle swarm algorithm according to claim 2, wherein the information of the people using the public health service platform comprises: nationality, place of residence, age information.
5. The homogeneous population recognition method based on the dual-index particle swarm algorithm according to claim 1, wherein the two "initial adaptive values" are: fitness1(CH), Fitness2(S _ Dbw).
6. The homogeneous population recognition method based on the dual-index particle swarm algorithm of claim 5, wherein the Fitness1(CH) obtaining step is:
7. The homogeneous population recognition method based on the dual-index particle swarm algorithm according to claim 5, wherein the Fitness2(S _ Dbw) obtaining step is:
S_Dbw(c)=Scat(c)+Dens_bw(c)
wherein:
dens _ bw (c) to evaluate the relationship of the density of two classes together and the density of each individual class;
density (u) is used to characterize the number of points around u, and the threshold for comparison is stdev in 1;
stdev represents the average deviation of classes of a data set;
scater (c) represents the mean dispersion between the classes.
8. The homogeneous population identification method based on the dual-index particle swarm algorithm according to claim 5, wherein the process of obtaining the homogeneous population information data by iterating the two initial adaptive values as adaptive functions to obtain clustering results is as follows:
a. initializing various parameters of a particle swarm algorithm:
b. respectively calculating an adaptive value Fitness1 and an adaptive value Fitness2 of each particle according to CH and S _ Dbw index formulas;
c. for each particle, comparing the adaptive value Fitness1 of the current position with the adaptive value Fitness2 and two adaptive values corresponding to the historical optimal position of the particle, if the two adaptive values of the current position are higher, updating the historical optimal position by using the current position, otherwise, not updating;
d. for each particle, comparing the adaptive value Fitness1 of the current position with the adaptive value Fitness2 and two adaptive values corresponding to the global optimal position of the particle, if the two adaptive values of the current position are higher, updating the global optimal position by using the current position, otherwise, not updating;
e. update of position and velocity of particles:
particle velocity update formula:
particle position update formula:
wherein, Vid kA d-dimension component representing a velocity vector of a particle i at the k-th iteration; x is the number ofi dkA d-dimension component representing a location vector of a particle i at the k-th iteration; c. C1,c2Represents an acceleration constant; r is1,r2Represents two random parameters with the value range of [0,1]](ii) a w represents an inertial weight;
f. if the end condition is not met, returning to the step b, if the end condition is met, ending the algorithm, and obtaining a global optimal position, namely a global optimal solution;
and finally, outputting a clustering result to obtain accurate homogeneous crowd information data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011075681.XA CN112308117A (en) | 2020-10-09 | 2020-10-09 | Homogeneous crowd identification method based on double-index particle swarm algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011075681.XA CN112308117A (en) | 2020-10-09 | 2020-10-09 | Homogeneous crowd identification method based on double-index particle swarm algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112308117A true CN112308117A (en) | 2021-02-02 |
Family
ID=74489394
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011075681.XA Pending CN112308117A (en) | 2020-10-09 | 2020-10-09 | Homogeneous crowd identification method based on double-index particle swarm algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112308117A (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663100A (en) * | 2012-04-13 | 2012-09-12 | 西安电子科技大学 | Two-stage hybrid particle swarm optimization clustering method |
US20140257767A1 (en) * | 2013-03-09 | 2014-09-11 | Bigwood Technology, Inc. | PSO-Guided Trust-Tech Methods for Global Unconstrained Optimization |
CN107220841A (en) * | 2016-03-22 | 2017-09-29 | 上海市玻森数据科技有限公司 | A kind of clustering system based on business data |
CN111723842A (en) * | 2020-05-14 | 2020-09-29 | 广东工业大学 | Particle swarm adaptive clustering optimization method based on multi-index evaluation mechanism |
-
2020
- 2020-10-09 CN CN202011075681.XA patent/CN112308117A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102663100A (en) * | 2012-04-13 | 2012-09-12 | 西安电子科技大学 | Two-stage hybrid particle swarm optimization clustering method |
US20140257767A1 (en) * | 2013-03-09 | 2014-09-11 | Bigwood Technology, Inc. | PSO-Guided Trust-Tech Methods for Global Unconstrained Optimization |
CN107220841A (en) * | 2016-03-22 | 2017-09-29 | 上海市玻森数据科技有限公司 | A kind of clustering system based on business data |
CN111723842A (en) * | 2020-05-14 | 2020-09-29 | 广东工业大学 | Particle swarm adaptive clustering optimization method based on multi-index evaluation mechanism |
Non-Patent Citations (2)
Title |
---|
巧不巧克力: "Python商品数据预处理与K-Means聚类可视化分析", 《简书-HTTPS://WWW.JIANSHU.COM/P/5FC09877ECC1》 * |
悄悄不加糖: "聚类分析实验(一)数据预处理", 《CSDN博客-HTTPS://BLOG.CSDN.NET/WEIXIN_42927719/ARTICLE/DETAILS/88829548》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107247737B (en) | The analysis of platform area default electricity use and method for digging based on electricity consumption | |
CN111324642A (en) | Model algorithm type selection and evaluation method for power grid big data analysis | |
CN111553127A (en) | Multi-label text data feature selection method and device | |
CN112529638B (en) | Service demand dynamic prediction method and system based on user classification and deep learning | |
CN108345908A (en) | Sorting technique, sorting device and the storage medium of electric network data | |
CN110880369A (en) | Gas marker detection method based on radial basis function neural network and application | |
CN108762503A (en) | A kind of man-machine interactive system based on multi-modal data acquisition | |
CN104850868A (en) | Customer segmentation method based on k-means and neural network cluster | |
WO2024131524A1 (en) | Depression diet management method based on food image segmentation | |
CN111931616A (en) | Emotion recognition method and system based on mobile intelligent terminal sensor equipment | |
CN110334777A (en) | A kind of unsupervised attribute selection method of weighting multi-angle of view | |
CN112070126A (en) | Internet of things data mining method | |
CN117314593B (en) | Insurance item pushing method and system based on user behavior analysis | |
CN110503145A (en) | A kind of typical load curve acquisition methods based on k-shape cluster | |
CN112418987B (en) | Method and system for rating credit of transportation unit, electronic device and storage medium | |
CN112347162A (en) | Multivariate time sequence data rule mining method based on online learning | |
CN112308117A (en) | Homogeneous crowd identification method based on double-index particle swarm algorithm | |
CN111488520A (en) | Crop planting species recommendation information processing device and method and storage medium | |
CN116361629A (en) | Method and system for reducing dimension of vibration signal characteristics of mill cylinder | |
CN113570349B (en) | EM-FCM algorithm-based image writing method and system for electric power staff | |
CN109063735A (en) | A kind of classification of insect Design Method based on insect biology parameter | |
CN112201355B (en) | Construction method of health evaluation iterative classifier model | |
CN110348323B (en) | Wearable device gesture recognition method based on neural network optimization | |
CN113962509B (en) | Temperature-sensitive load extraction and sensitivity calculation method based on load clustering | |
CN107944463A (en) | A kind of micro- condition detection method of dynamic network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210202 |