CN113128535B - Cluster model selection method and device, electronic equipment and storage medium - Google Patents
Cluster model selection method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN113128535B CN113128535B CN201911414989.XA CN201911414989A CN113128535B CN 113128535 B CN113128535 B CN 113128535B CN 201911414989 A CN201911414989 A CN 201911414989A CN 113128535 B CN113128535 B CN 113128535B
- Authority
- CN
- China
- Prior art keywords
- model
- clustering
- parameter
- result
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention provides a method, a device, electronic equipment and a storage medium for selecting a cluster model, wherein the method comprises the following steps: inputting a target sample into a clustering model to perform clustering to obtain a clustering result, wherein the clustering model comprises N model parameters, and N is an integer greater than 1; comparing the clustering result corresponding to each model parameter with the target sample, and calculating the evaluation score of the clustering result corresponding to each model parameter according to the comparison result; and selecting model parameters corresponding to M highest evaluation scores, wherein M is greater than or equal to 1, and M is less than N. And determining a corresponding clustering model according to model parameters corresponding to the M highest evaluation scores. According to the data clustering method and device, the optimal clustering model corresponding to the optimal model parameters can be selected, so that when data clustering is carried out, the data are clustered by using the optimal clustering model, and further the data clustering efficiency is effectively improved.
Description
Technical Field
The present invention relates to the field of data clustering technologies, and in particular, to a method and apparatus for selecting a clustering model, an electronic device, and a storage medium.
Background
At present, on the existing data clustering platform, various data are required to be clustered to adapt to different application scenes, and the accuracy of a clustering model required by different application scenes is also different, so that parameters of the clustering model need to be adjusted under different application scenes to change the accuracy of the clustering model. However, the existing adjustment mode is generally that the user manually adjusts the parameters of the clustering model, and because the user manually adjusts the parameters, the adjustment is not accurate enough and the adjustment time is long due to subjective judgment of the user. When the data is clustered by the clustering model, the data cannot be clustered by the optimal clustering model quickly and accurately, so that the data clustering efficiency and the accuracy are low.
Disclosure of Invention
The embodiment of the invention provides a method for selecting a clustering model, which can quickly and accurately select an optimal clustering model to cluster data when data clustering is performed, so that the data clustering efficiency and the accuracy are effectively improved.
In a first aspect, an embodiment of the present invention provides a method for selecting a cluster model, where the method includes the following steps:
inputting a target sample into a clustering model to perform clustering to obtain a clustering result, wherein the clustering model comprises N model parameters, and N is an integer greater than 1;
Comparing the clustering result corresponding to each model parameter with the target sample, and calculating the evaluation score of the clustering result corresponding to each model parameter according to the comparison result;
and selecting model parameters corresponding to M highest evaluation scores, wherein M is greater than or equal to 1, and M is less than N.
And determining a corresponding clustering model according to model parameters corresponding to the M highest evaluation scores.
Optionally, the clustering model includes: a first-layer cluster model and a second-layer cluster model, wherein the model parameters comprise a first model parameter corresponding to the first-layer cluster model and a second model parameter corresponding to the second-layer cluster model; the step of inputting the target sample into the clustering model for clustering, and obtaining a clustering result comprises the following steps:
Inputting the target sample into the first layer clustering model to perform clustering to obtain a first clustering result, wherein the first layer clustering model comprises N first model parameters, and N is an integer greater than or equal to 1;
inputting the first clustering result into the second-layer clustering model to perform clustering to obtain a second clustering result, wherein the second-layer clustering model comprises N second model parameters, and N is an integer greater than or equal to 1;
and taking the second clustering result as a clustering result of a clustering model.
Optionally, the step of calculating the evaluation score of the clustering result corresponding to each model parameter according to the comparison result includes:
calculating the recall rate of the clustering result corresponding to each model parameter according to the comparison result;
calculating the accuracy of the clustering result corresponding to each model parameter;
and calculating and obtaining the evaluation score of the clustering result corresponding to each similar parameter according to the recall rate and the accuracy rate.
Optionally, the evaluation score is equal to twice the product of the recall and the accuracy divided by the sum of the recall and the accuracy.
Optionally, the step of selecting model parameters corresponding to the M highest evaluation scores includes:
sorting the N evaluation scores obtained by calculation;
And counting the number of the model parameters corresponding to the highest evaluation score from the N evaluation scores to obtain M model parameters corresponding to the highest evaluation score.
Optionally, the parameter range of the model parameter is between a preset first model parameter threshold value and a preset second model parameter threshold value.
Optionally, the acquiring process of the N model parameters is:
sequentially increasing from the first model parameter threshold value in a preset precision in the parameter range of the model parameters until the increase to the second model parameter threshold value is finished, wherein a new model parameter is obtained after each increase;
And counting the number of the model parameters between the first model parameter threshold value and the second model parameter threshold value, and finally obtaining N model parameters.
In a second aspect, an embodiment of the present invention further provides a device for selecting a cluster model, where the device includes:
The clustering module is used for inputting the target samples into a clustering model to perform clustering to obtain a clustering result, wherein the clustering model comprises N model parameters, and N is an integer greater than 1;
the evaluation score calculation module is used for comparing the clustering result corresponding to each model parameter with the target sample and calculating the evaluation score of the clustering result corresponding to each model parameter according to the comparison result;
The model parameter selection module is used for selecting model parameters corresponding to M highest evaluation scores, wherein M is greater than or equal to 1, and M is smaller than N;
And the cluster model determining module is used for determining a corresponding cluster model according to the model parameters corresponding to the M highest evaluation scores.
In a third aspect, an embodiment of the present invention further provides an electronic device, including: the method comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps in the cluster model selection method provided by the embodiment when executing the computer program.
In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where a computer program is stored, where the computer program when executed by a processor implements the steps in the method for selecting a cluster model provided in the foregoing embodiment.
In the embodiment of the invention, a clustering result is obtained by inputting a target sample into a clustering model, wherein the clustering model comprises N model parameters, and N is an integer greater than 1; comparing the clustering result corresponding to each model parameter with the target sample, and calculating the evaluation score of the clustering result corresponding to each model parameter according to the comparison result; and selecting model parameters corresponding to M highest evaluation scores, wherein M is greater than or equal to 1, and M is less than N. And determining a corresponding clustering model according to model parameters corresponding to the M highest evaluation scores. According to the method, the evaluation scores of the clustered target samples under different model parameters can be calculated, the evaluation scores corresponding to the model parameters are compared, the model parameter with the highest evaluation score is selected, and the corresponding clustered model is determined through the model parameter with the highest evaluation score, so that the optimal clustered model can be automatically and quickly selected. When data clustering is carried out, the optimal clustering model can be selected rapidly and accurately to cluster the data, and further the data clustering efficiency is improved effectively.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for selecting a cluster model according to an embodiment of the present invention;
FIG. 2 is a calculation formula of an evaluation score according to an embodiment of the present invention;
FIG. 3 is a flow chart of a method provided by step 101 in the embodiment of FIG. 1;
FIG. 4 is a device for selecting a cluster model according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of one of the modules provided by the clustering module of the embodiment of FIG. 4;
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, fig. 1 is a flow chart of a method for selecting a cluster model according to an embodiment of the present invention, as shown in fig. 1, including the following steps:
Step 101, inputting a target sample into a clustering model to perform clustering to obtain a clustering result, wherein the clustering model comprises N model parameters, and N is an integer larger than 1.
The target samples are pre-clustered data reference samples, and the clustering result is hundred percent correct, for example, the target samples are eigenvalue reference samples. In the present embodiment, a characteristic value sample is mainly described as a reference sample. Each characteristic value in the target sample is provided with a class identifier (also called as a characteristic value ID), and the class identifier is used for indicating which class the characteristic value belongs to; or each characteristic value in the target sample has a corresponding relation with the class to which the target sample belongs. Two files can be set to store the characteristic values of the target samples, one file is used for storing the characteristic values in the target samples, and the other file is used for storing 100% correct class identifications corresponding to the characteristic values or the corresponding relation between the characteristic values and the belonging classes. For example, the target sample may be preset with 50 pieces of already clustered feature value data, and the clustering result of each piece of feature value data is hundred percent accurate. The 50 pieces of characteristic value data are respectively picture characteristic value data of 10 persons, each person has 5 pieces of characteristic value, and then each characteristic value is provided with a class identifier. Since the 50 eigenvalues are 10 individuals and 5 eigenvalues per individual, respectively, the 50 eigenvalues are divided into 10 classes, each class having 5 eigenvalues. If two files are to be used to store the target sample, one file is used to store the 50 eigenvalues and one file is used to store the 50 eigenvalues corresponding to 100% correct person ID. It should be noted that the number of eigenvalues in the target sample may be plural. The feature value type in the target sample may be a picture feature value (may be a face picture feature value), a speech feature value, a text feature value, a video feature value, a file feature value, and the like. The number and type of eigenvalues in the target sample are not limited here. The target samples are stored in a database.
The clustering model is a model for clustering target samples, for example, a model for clustering face pictures. The clustering model may be used to cluster, archive, etc. data in an archive system, and may also be used to classify data in a classification system.
The clustering result is obtained by inputting the target sample into a clustering model for clustering.
The model parameters are parameters corresponding to the clustering model. The model parameters are set, and the number of the model parameters is a plurality of the model parameters, so that the model parameters can be selected by the clustering model. Model parameters may influence the clustering result of the clustering model. The model parameters of the clustering model are adjustable, so that the clustering model can be suitable for use scenes with different clustering precision, and the clustering model with different model parameters can be selected according to actual needs. The N model parameters correspond to N clustering models, and a user can select the clustering model corresponding to the model parameters meeting the requirements according to the needs to cluster the data.
In this embodiment, the parameter range of the model parameter is between the preset first model parameter threshold value and the preset second model parameter threshold value. For example, the first preset model parameter threshold may be 0.5, and the second preset model parameter threshold is 0.9, and the parameter range of the model parameter is [0.5,0.9], which may also be (0.5,0.9), [0.5,0.9), (0.5,0.9 ]. The preset first model parameter threshold may be 0.5, and the first preset second model parameter threshold is 1, and the parameter range of the model parameter is [0.5,1], or (0.5, 1), [0.5,1], or (0.5, 1]. And the precision of the model parameters can be set according to the needs, for example, the user can set the model parameters to four decimal places, so that the precision of the model parameters can be ensured, for example, the parameter range of the model parameters is (0.5000,0.9000,4), and the like. If the model parameters are used to represent similarity parameters, the model parameters are all set to 0.5 or more, and only 0.5 or more can be used to determine whether the plurality of feature values belong to the same person or object. Otherwise the judging process is meaningless.
The N model parameters are obtained by the following steps: in the parameter range of the model parameters, the model parameters are sequentially increased with a preset precision from the first model parameter threshold value until the model parameters are increased to the second model parameter threshold value, wherein a new model parameter is obtained every time the model parameters are increased. And counting the number of the model parameters between the first model parameter threshold value and the second model parameter threshold value, and finally obtaining N model parameters. The preset precision may be set as required, and when the precision is four bits after the decimal point, the preset precision may be set to 0.0001, or may be set to 0.0002, or the like. Specifically, within a certain parameter range, the larger N is, the smaller N is, and the larger N is.
Specifically, when an optimal clustering model needs to be selected for clustering, according to an acquisition process preset by N model parameters, the model parameters can be sequentially acquired from a parameter range of the model parameters, so as to obtain N model parameters, correspondingly, N clustering models are sequentially obtained, a preset target sample is sequentially used as input data of each clustering model, and is input into each clustering model corresponding to the N model parameters for clustering, so that N clustering results are sequentially obtained. For example, N model parameters obtained in the parameter range [0.5,0.9] are sequentially increased by 0.1 with a preset precision, and n=5, which is sequentially 0.5, 0.6, 0.7, 0.8, and 0.9.
And 102, comparing the clustering result corresponding to each model parameter with a target sample, and calculating the evaluation score of the clustering result corresponding to each model parameter according to the comparison result.
The evaluation score is an evaluation criterion for evaluating the quality of the cluster model. The higher the evaluation score, the higher the clustering model precision and the better the clustering effect. The lower the evaluation score, the lower the clustering model accuracy and the worse the clustering effect.
Specifically, after N clustering results are sequentially obtained, comparing each clustering result with a target sample sequentially to obtain N comparison results, and further calculating the evaluation score of each comparison result according to each comparison result. It should be noted that, the target samples are preset, so that the target clustering results corresponding to the target samples are hundred percent correct. So only each of the N cluster results is compared with the target cluster result.
Calculating the recall rate of the clustering result corresponding to each model parameter according to the comparison result; and calculating the accuracy of the clustering result corresponding to each model parameter. And calculating and obtaining the evaluation score of the clustering result corresponding to each similar parameter according to the recall rate and the accuracy rate.
In this embodiment, recall = number of correct eigenvalues extracted/number of eigenvalues in the target sample; accuracy = number of extracted correct eigenvalues/number of extracted eigenvalues; the values of the two are between 0 and 1. Evaluation score = accuracy rate x recall rate x 2/(accuracy rate + recall rate). Wherein, the recall rate can be represented by the letter R, the accuracy rate can be represented by the letter P, the evaluation score can be represented by F-Measuer, the calculation formula of the evaluation score is shown in figure 2,
For example, there are 2000 target samples of eigenvalues, where 1400 eigenvalues are for the first person, 300 eigenvalues are for the second person, and 300 eigenvalues are for the third person. At present, 2000 eigenvalues are input into a clustering model for clustering by taking the eigenvalues of a person A as the purpose, and a clustering result is obtained, wherein the clustering result is as follows: the 700 eigenvalues are the eigenvalues of the first person, the 200 eigenvalues are the eigenvalues of the second person, and the 100 eigenvalues are the eigenvalues of the second person. At this time, the accuracy p=700/(700+200+100) =70% of the characteristic value of the first person in the clustering result is calculated; recall r=700/1400=50%; evaluation score F-Measuer =70% ×50% ×2/(70% +50%) =58.3%.
The number of evaluation scores may be plural, and the number of evaluation scores is the same as the number of model parameters to be clustered. Since the clustering model includes N model parameters, it is necessary to calculate respective evaluation scores according to respective clustering results and evaluation score calculation formulas, respectively. For example, 10 model parameters exist, then after executing step 101, corresponding 10 clustering results are obtained, and after executing step 102, corresponding 10 evaluation scores are obtained.
For example, in the parameter range [0.5,0.9] of the model parameters, according to the preset obtaining process of the N model parameters, different model parameters are sequentially selected to calculate the corresponding evaluation score, if the preset precision is set to 0.0001, then the evaluation score is calculated from 0.5 to 4 bits, the model parameter is calculated to be 0.5000 as the parameter of the cluster model, the evaluation score corresponding to the cluster model of 0.5000 is calculated and stored, then the preset precision of 0.0001 is increased backwards, then the parameter of the cluster model becomes 0.5001,05002 …,0.9000, and the sequential evaluation score is calculated every time the preset precision is increased. In this example, the preset precision may be set to 0.1, so that n=5 obtained by sequentially increasing the preset precision to 0.1 in the parameter range [0.5,0.9] is indicated to obtain 5 model parameters, and then the evaluation scores are calculated for the 5 model parameters respectively, so that 5 evaluation scores are obtained correspondingly.
And 103, selecting model parameters corresponding to M highest evaluation scores, wherein M is greater than or equal to 1, and M is smaller than N.
Specifically, the N evaluation scores obtained through calculation are sequenced; and counting the number of the model parameters corresponding to the highest evaluation score from the N evaluation scores to obtain M model parameters corresponding to the highest evaluation score.
In particular, when the number of evaluation scores equal to the number of model parameters is obtained after step 102 is performed, all the obtained evaluation scores may be ranked, for example, from high to low or from low to high according to the evaluation score.
For example, if n=10 evaluation scores are obtained after calculation, and the 10 evaluation scores are 50%, 75%, 57.5%, 95.5%, 60%, 80%, 70%, 98%, 66%, 86%, respectively. Then the 10 evaluation scores need to be ranked from high to low as: 98%, 95.5%, 86%, 80%, 75%, 70%, 66%, 60%, 57.5%, 50%. At this time, the highest ranking evaluation score is selected from the ranked evaluation score queues, at this time, the highest evaluation score is 99%, and at this time, M is equal to 1.
For another example, 10 evaluation scores of 50%, 75%, 57.5%, 95.5%, 98%, 80%, 70%, 98%, 66%, 86%, respectively, were obtained, and the 10 evaluation scores after sorting were: 98%, 95.5%, 86%, 80%, 75%, 70%, 66%, 57.5%, 50%. At this time, the highest ranking evaluation score is selected from the ranked evaluation score queues, at this time, the highest evaluation scores are 98% and 98%, at this time, M is equal to 2.
And 104, determining a corresponding clustering model according to model parameters corresponding to the M highest evaluation scores.
Specifically, when M is equal to 1, there is only one highest evaluation score, and then the model parameter corresponding to the highest evaluation score is the optimal model parameter in the N model parameters, and the cluster model corresponding to the model parameter of the highest evaluation score is the optimal cluster model. If M is greater than 1, a plurality of optimal model parameters in the N model parameters are indicated, and only one optimal model parameter is selected from the plurality of optimal model parameters, and any optimal clustering model is selected from the plurality of optimal clustering models correspondingly.
It should be noted that, the evaluation score F-Measure is associated with two indexes, the recall rate R and the accuracy rate P, if the accuracy of calculating the face features is unchanged, the more face feature pictures in the target sample can be recalled, the better, and similarly, if the recall rate is unchanged, the higher the accuracy rate is, the better, the model parameters corresponding to the final highest F-Measuer index are to be selected, namely, the corresponding obtained clustering model is the clustering model to be selected according to the model parameters corresponding to the highest F-Measuer index.
In the embodiment of the invention, a clustering result is obtained by inputting a target sample into a clustering model, wherein the clustering model comprises N model parameters, and N is an integer greater than 1; comparing the clustering result corresponding to each model parameter with a target sample, and calculating the evaluation score of the clustering result corresponding to each model parameter according to the comparison result; and selecting model parameters corresponding to M highest evaluation scores, wherein M is greater than or equal to 1, and M is less than N. And determining a corresponding clustering model according to model parameters corresponding to the M highest evaluation scores. According to the method, the evaluation scores of the clustered target samples under different model parameters can be calculated, the evaluation scores corresponding to the model parameters are compared, the model parameter with the highest evaluation score is selected, and the corresponding clustered model is determined through the model parameter with the highest evaluation score, so that the optimal clustered model can be automatically and quickly selected. When data clustering is performed, an optimal clustering model can be selected to cluster the data, so that the data clustering efficiency is effectively improved.
Referring to fig. 3, fig. 3 is a flow chart illustrating a method provided in step 101 in the embodiment of fig. 1, where the cluster model includes: the model parameters comprise first model parameters corresponding to the first-layer clustering model and second model parameters corresponding to the second-layer clustering model. Step 101 comprises:
step 201, inputting a target sample into a first layer of clustering model to perform clustering to obtain a first clustering result, wherein the first layer of clustering model comprises N first model parameters, and N is an integer greater than or equal to 1.
Step 202, inputting the first clustering result into a second-layer clustering model to perform clustering to obtain a second clustering result, wherein the second-layer clustering model comprises N second model parameters, and N is an integer greater than or equal to 1.
And 203, taking the second clustering result as a clustering result of the clustering model.
The first layer of clustering model is K-means (K-means clustering algorithm) which is an iterative solution clustering analysis algorithm. The second-layer clustering model is DBSCAN (Density-Based Spatial Clustering of Applications with Noise, clustering algorithm), which is a representative Density-based clustering algorithm. The first clustering result is obtained by inputting the target sample into a first layer of clustering model for clustering. The second clustering result is obtained by inputting the second clustering result into a second-layer clustering model for fine clustering. The first layer clustering model is used for carrying out coarse clustering on the target samples, and then carrying out fine clustering on the first clustering result through the second layer clustering model, so that the most effective clustering result is obtained.
The first model parameters and the second model parameters are parameters corresponding to the first layer clustering model and the second layer clustering model. The first model parameters and the second model parameters are set, and the number of the first model parameters and the second model parameters can be multiple, so that the first-layer clustering model and the second-layer clustering model can be selected. The first model parameter and the second model parameter can influence the first clustering result and the second clustering result of the first-layer clustering model and the second-layer clustering model. The first model parameter and the second model parameter are adjustable, so that the method can be suitable for use scenes with different clustering precision, and different clustering models of the first model parameter and the second model parameter can be selected according to actual needs. The N first model parameters and the second model parameters correspond to N first-layer clustering models and second-layer clustering models respectively, and a user can select the first model parameters meeting requirements, the first-layer clustering models corresponding to the second model parameters and the second-layer clustering models corresponding to the second model parameters to cluster the data according to the requirements.
It should be noted that, the parameter ranges of the first model parameter and the second model parameter may be the same, for example, may be [0.5,1]; or may be different, such as [0.5,1], [0.5,0.9], respectively. The first model parameter and the second model parameter may be the same size, for example, each may be 0.6. Or may be different, such as 0.6 and 0.7, respectively. When different first model parameters and second model parameters are matched, the first layer clustering model and the second layer clustering model are matched together, and the obtained clustering result evaluation score is possibly better. The method can also comprise the steps of selecting the optimal first model parameters and the optimal second model parameters, further obtaining an optimal first-layer clustering model and an optimal second-layer clustering model, and enabling the evaluation score corresponding to the whole clustering model to be optimal.
The parameter range of the first model parameter may be between a preset third model parameter threshold and a preset fourth model parameter threshold. The parameter range of the second model parameter may be between a preset fifth model parameter threshold and a preset sixth model parameter threshold.
The process of acquiring the N first model parameters and the N second model parameters may be the same as the process of acquiring the N model parameters in step 101.
Specifically, a target sample is input into a first layer of clustering model (K-means) to be clustered, and a corresponding first clustering result is obtained. And clustering the target samples again through a second-layer clustering model (DBSCAN) by taking the first clustering result as input data of the second-layer clustering model (DBSCAN) to obtain a second clustering result. At this time, the second result is the clustering result of the whole clustering model, and step 102 may be performed downward. For example, a 5-ten-thousand face feature value of the community a is used as a target sample, the target sample is the same as the target sample in the step 101, the 5-ten-thousand face feature value is input to K-means for coarse clustering, and since the K-means has a K value, the K value represents the number of partitions after the K-means is clustered, and K classes exist after the clustering, for example, 5 partitions are clustered by the 5-ten-thousand face feature value first, and each partition has 1 ten-thousand faces. And after the number of the partitions is 5, performing DBSCAN second-layer fine clustering on the K cluster groups, for example, inputting the obtained 5 partitions into DBSCAN for second-time fine clustering respectively, and finally obtaining a second clustering result.
In the embodiment of the invention, the target sample is input into a first layer of clustering model for clustering, the first clustering result is input into a second layer of clustering model for fine clustering, and the second clustering result is finally used as the clustering result of the clustering model. Further, parameters of the first layer clustering model and the second layer clustering model are adjusted by setting the first model parameters and the second model parameters, so that model parameters of the whole clustering model are adjusted. Clustering is carried out through the first-layer clustering model corresponding to each first model parameter and the second-layer clustering model parameter, and the evaluation score of the clustering model is calculated according to the clustering result. And selecting the optimal first model parameter and the optimal second model parameter, or selecting the optimal model parameter combination obtained by matching the first model parameter and the second model parameter. And selecting the optimal first-layer clustering model and the optimal second-layer clustering model, or selecting the optimal clustering model obtained by matching the first-layer clustering model and the second-layer clustering model. And the data can be clustered by selecting an optimal clustering model, so that the clustering effect is improved.
Referring to fig. 4, fig. 4 is a device for selecting a cluster model according to an embodiment of the present invention, where the device 300 for selecting a cluster model includes:
the clustering module 301 is configured to input a target sample into a clustering model to perform clustering, so as to obtain a clustering result, where the clustering model includes N model parameters, and N is an integer greater than 1.
And the evaluation score calculating module 302 is configured to compare the clustering result corresponding to each model parameter with the target sample, and calculate an evaluation score of the clustering result corresponding to each model parameter according to the comparison result.
The model parameter selection module 303 is configured to select model parameters corresponding to M highest evaluation scores, where M is greater than or equal to 1 and M is less than N.
The cluster model determining module 304 is configured to determine a corresponding cluster model according to model parameters corresponding to the M highest evaluation scores.
Referring to FIG. 5, FIG. 5 is a schematic diagram of one of the modules provided by the clustering module of the embodiment of FIG. 4; the clustering module 301 includes:
The first clustering unit 3011 is configured to input a target sample into a first layer of clustering model to perform clustering, so as to obtain a first clustering result, where the first layer of clustering model includes N first model parameters, and N is an integer greater than or equal to 1.
And the second clustering unit 3012 is configured to input the first clustering result into a second-layer clustering model to perform clustering, so as to obtain a second clustering result, where the second-layer clustering model includes N second model parameters, and N is an integer greater than or equal to 1.
The cluster result determining unit 3013 is configured to take the second cluster result as a cluster result of the cluster model.
Optionally, the evaluation score calculating module 302 includes:
and the recall rate calculation unit is used for calculating the recall rate of the clustering result corresponding to each model parameter according to the comparison result.
And the accuracy rate calculation unit is used for calculating the accuracy rate of the clustering result corresponding to each model parameter.
And the evaluation score calculation unit is used for calculating the evaluation score of the clustering result corresponding to each similar parameter according to the recall rate and the accuracy rate.
Optionally, the evaluation score is calculated according to the following formula: the evaluation score is equal to twice the product of recall and accuracy divided by the sum of recall and accuracy.
Optionally, the model parameter selection module 303 includes:
and the sorting unit is used for sorting the N calculated evaluation scores.
The model parameter selecting unit is used for counting the number of the model parameters corresponding to the highest evaluation score from the N evaluation scores so as to obtain the model parameters corresponding to the M highest evaluation scores.
Optionally, the parameter range of the model parameter is between a preset first model parameter threshold value and a preset second model parameter threshold value.
Optionally, the acquiring process of the N model parameters is:
In the parameter range of the model parameters, the model parameters are sequentially increased with a preset precision from the first model parameter threshold value until the model parameters are increased to the second model parameter threshold value, wherein a new model parameter is obtained every time the model parameters are increased.
And counting the number of the model parameters between the first model parameter threshold value and the second model parameter threshold value, and finally obtaining N model parameters.
The device for selecting the cluster model provided by the embodiment of the invention can realize each implementation mode in the embodiment of the method and the corresponding beneficial effects, and in order to avoid repetition, the description is omitted.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device 400 includes: the memory 402, the processor 401 and the computer program stored in the memory 402 and capable of running on the processor 401, when the processor 401 executes the computer program, the steps in the cluster model selection method provided in the above embodiment are implemented, and the processor 401 executes the following steps:
And inputting the target samples into a clustering model to perform clustering to obtain a clustering result, wherein the clustering model comprises N model parameters, and N is an integer greater than 1.
And comparing the clustering result corresponding to each model parameter with the target sample, and calculating the evaluation score of the clustering result corresponding to each model parameter according to the comparison result.
And selecting model parameters corresponding to M highest evaluation scores, wherein M is greater than or equal to 1, and M is less than N.
And determining a corresponding clustering model according to model parameters corresponding to the M highest evaluation scores.
Optionally, the clustering model includes: the model parameters comprise first model parameters corresponding to the first-layer clustering model and second model parameters corresponding to the second-layer clustering model; the step of inputting the target samples into the clustering model for clustering performed by the processor 401 to obtain a clustering result includes:
And inputting the target sample into a first layer of clustering model to perform clustering to obtain a first clustering result, wherein the first layer of clustering model comprises N first model parameters, and N is an integer greater than or equal to 1.
And inputting the first clustering result into a second-layer clustering model to perform clustering to obtain a second clustering result, wherein the second-layer clustering model comprises N second model parameters, and N is an integer greater than or equal to 1.
And taking the second clustering result as a clustering result of the clustering model.
Optionally, the step performed by the processor 401 to calculate the evaluation score of the clustering result corresponding to each model parameter according to the comparison result includes:
and calculating the recall rate of the clustering result corresponding to each model parameter according to the comparison result.
And calculating the accuracy of the clustering result corresponding to each model parameter.
And calculating and obtaining the evaluation score of the clustering result corresponding to each similar parameter according to the recall rate and the accuracy rate.
Optionally, the evaluation score is equal to twice the product of recall and accuracy divided by the sum of recall and accuracy.
Optionally, the step of selecting model parameters corresponding to the M highest evaluation scores performed by the processor 401 includes:
and sequencing the N calculated evaluation scores.
And counting the number of the model parameters corresponding to the highest evaluation score from the N evaluation scores to obtain M model parameters corresponding to the highest evaluation score.
Optionally, the parameter range of the model parameter is between a preset first model parameter threshold value and a preset second model parameter threshold value.
Optionally, the acquiring process of the N model parameters is:
In the parameter range of the model parameters, starting from a first model parameter threshold value, sequentially increasing with a preset precision until the increase is finished after reaching a second model parameter threshold value, wherein a new model parameter is obtained after each increase;
And counting the number of the model parameters between the first model parameter threshold value and the second model parameter threshold value, and finally obtaining N model parameters.
The electronic device 400 provided in the embodiment of the present invention can implement each implementation manner and corresponding beneficial effects in the foregoing method embodiment, and in order to avoid repetition, details are not repeated here.
The embodiment of the invention also provides a computer readable storage medium, and a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the process of the cluster model selection method provided by the embodiment of the invention is realized, and the same technical effect can be achieved, so that repetition is avoided, and the description is omitted here.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM) or the like.
The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.
Claims (7)
1. The method for selecting the clustering model is characterized by comprising the following steps of:
Inputting a face picture characteristic value into a clustering model to perform clustering to obtain a clustering result of the face picture characteristic value, wherein the clustering model comprises N model parameters, N is an integer larger than 1, the clustering model is a model for clustering face pictures, the clustering model comprises a first layer of clustering model and a second layer of clustering model, the model parameters comprise a first model parameter corresponding to the first layer of clustering model and a second model parameter corresponding to the second layer of clustering model, and the parameter range of the model parameters is between a preset first model parameter threshold value and a preset second model parameter threshold value;
Comparing the clustering result corresponding to each model parameter with the face picture characteristic value, and calculating the evaluation score of the clustering result corresponding to each model parameter according to the comparison result;
selecting model parameters corresponding to M highest evaluation scores, wherein M is greater than or equal to 1, and M is less than N;
Determining a corresponding clustering model according to model parameters corresponding to the M highest evaluation scores;
Inputting the face picture feature values into a clustering model for clustering, wherein the step of obtaining the clustering result of the face picture feature values comprises the following steps:
inputting the face picture characteristic value into the first layer clustering model to perform clustering to obtain a first clustering result, wherein the first layer clustering model comprises N first model parameters, and N is an integer greater than or equal to 1;
inputting the first clustering result into the second-layer clustering model to perform clustering to obtain a second clustering result, wherein the second-layer clustering model comprises N second model parameters, and N is an integer greater than or equal to 1;
Taking the second clustering result as a clustering result of a clustering model;
The process for acquiring the N model parameters comprises the following steps:
sequentially increasing from the first model parameter threshold value in a preset precision in the parameter range of the model parameters until the increase to the second model parameter threshold value is finished, wherein a new model parameter is obtained after each increase;
And counting the number of the model parameters between the first model parameter threshold value and the second model parameter threshold value, and finally obtaining N model parameters.
2. The method for selecting a clustering model according to claim 1, wherein the step of calculating the evaluation score of the clustering result corresponding to each model parameter according to the comparison result comprises:
calculating the recall rate of the clustering result corresponding to each model parameter according to the comparison result;
calculating the accuracy of the clustering result corresponding to each model parameter;
and calculating and obtaining the evaluation score of the clustering result corresponding to each similar parameter according to the recall rate and the accuracy rate.
3. The method for selecting a cluster model according to claim 2, wherein the evaluation score has a calculation formula: an evaluation score is equal to twice the product of the recall and the accuracy divided by the sum of the recall and the accuracy.
4. The method for selecting a cluster model according to claim 1, wherein the selecting M highest evaluation scores
The corresponding model parameters include:
sorting the N evaluation scores obtained by calculation;
And counting the number of the model parameters corresponding to the highest evaluation score from the N evaluation scores to obtain M model parameters corresponding to the highest evaluation score.
5. A device for selecting a cluster model, the device comprising:
the clustering module is used for inputting the characteristic values of the face pictures into a clustering model to perform clustering to obtain a clustering result of the characteristic values of the face pictures, the clustering model is a model for clustering the face pictures, the clustering model comprises N model parameters, N is an integer larger than 1, the clustering model comprises a first layer of clustering model and a second layer of clustering model, the model parameters comprise a first model parameter corresponding to the first layer of clustering model and a second model parameter corresponding to the second layer of clustering model, and the parameter range of the model parameters is between a preset first model parameter threshold value and a preset second model parameter threshold value;
the evaluation score calculation module is used for comparing the clustering result corresponding to each model parameter with the face picture characteristic value and calculating the evaluation score of the clustering result corresponding to each model parameter according to the comparison result;
The model parameter selection module is used for selecting model parameters corresponding to M highest evaluation scores, wherein M is greater than or equal to 1, and M is smaller than N;
the cluster model determining module is used for determining a corresponding cluster model according to model parameters corresponding to the M highest evaluation scores;
the first clustering unit is used for inputting the target sample into a first layer of clustering model to perform clustering to obtain a first clustering result, wherein the first layer of clustering model comprises N first model parameters, and N is an integer greater than or equal to 1;
The second clustering unit is used for inputting the first clustering result into a second-layer clustering model to perform clustering to obtain a second clustering result, wherein the second-layer clustering model comprises N second model parameters, and N is an integer greater than or equal to 1;
the clustering result determining unit is used for taking the second clustering result as a clustering result of the clustering model;
The parameter selection module is further configured to sequentially increment from the first model parameter threshold value with a preset precision in a parameter range of the model parameter until the increment is completed after reaching the second model parameter threshold value, where a new model parameter is obtained once per increment; and counting the number of the model parameters between the first model parameter threshold value and the second model parameter threshold value, and finally obtaining N model parameters.
6. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the method of selecting a cluster model according to any one of claims 1 to 4 when the computer program is executed.
7. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the method of selecting a cluster model according to any of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911414989.XA CN113128535B (en) | 2019-12-31 | 2019-12-31 | Cluster model selection method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911414989.XA CN113128535B (en) | 2019-12-31 | 2019-12-31 | Cluster model selection method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113128535A CN113128535A (en) | 2021-07-16 |
CN113128535B true CN113128535B (en) | 2024-07-02 |
Family
ID=76770498
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911414989.XA Active CN113128535B (en) | 2019-12-31 | 2019-12-31 | Cluster model selection method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113128535B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107330452A (en) * | 2017-06-16 | 2017-11-07 | 悦享趋势科技(北京)有限责任公司 | Clustering method and device |
CN110618082A (en) * | 2019-10-29 | 2019-12-27 | 中国石油大学(北京) | Reservoir micro-pore structure evaluation method and device based on neural network |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB0204474D0 (en) * | 2002-02-26 | 2002-04-10 | Canon Kk | Speech recognition system |
US20130097103A1 (en) * | 2011-10-14 | 2013-04-18 | International Business Machines Corporation | Techniques for Generating Balanced and Class-Independent Training Data From Unlabeled Data Set |
CN105868243A (en) * | 2015-12-14 | 2016-08-17 | 乐视网信息技术(北京)股份有限公司 | Information processing method and apparatus |
CN106572493B (en) * | 2016-10-28 | 2018-07-06 | 南京华苏科技有限公司 | Rejecting outliers method and system in LTE network |
CN107844865A (en) * | 2017-11-20 | 2018-03-27 | 天津科技大学 | Feature based parameter chooses the stock index prediction method with LSTM models |
US20190166024A1 (en) * | 2017-11-24 | 2019-05-30 | Institute For Information Industry | Network anomaly analysis apparatus, method, and non-transitory computer readable storage medium thereof |
CN108280477B (en) * | 2018-01-22 | 2021-12-10 | 百度在线网络技术(北京)有限公司 | Method and apparatus for clustering images |
CN109104731B (en) * | 2018-07-04 | 2021-10-29 | 广东海格怡创科技有限公司 | Method and device for building cell scene category division model and computer equipment |
CN109472453B (en) * | 2018-10-12 | 2021-09-21 | 山大地纬软件股份有限公司 | Power consumer credit evaluation method based on global optimal fuzzy kernel clustering model |
-
2019
- 2019-12-31 CN CN201911414989.XA patent/CN113128535B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107330452A (en) * | 2017-06-16 | 2017-11-07 | 悦享趋势科技(北京)有限责任公司 | Clustering method and device |
CN110618082A (en) * | 2019-10-29 | 2019-12-27 | 中国石油大学(北京) | Reservoir micro-pore structure evaluation method and device based on neural network |
Also Published As
Publication number | Publication date |
---|---|
CN113128535A (en) | 2021-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106021364B (en) | Foundation, image searching method and the device of picture searching dependency prediction model | |
US12167089B2 (en) | Method for pushing anchor information, computer device, and storage medium | |
WO2019015246A1 (en) | Image feature acquisition | |
CN108898479B (en) | Credit evaluation model construction method and device | |
CN109299344A (en) | Generation method of ranking model, and ranking method, device and equipment of search results | |
CN103473327A (en) | Image retrieval method and image retrieval system | |
CN116109195B (en) | Performance evaluation method and system based on graph convolution neural network | |
CN112396428B (en) | User portrait data-based customer group classification management method and device | |
CN110348516B (en) | Data processing method, data processing device, storage medium and electronic equipment | |
CN110990576A (en) | Intention classification method based on active learning, computer device and storage medium | |
CN112785566B (en) | Metaphase image scoring method, metaphase image scoring device, electronic equipment and storage medium | |
WO2018006631A1 (en) | User level automatic segmentation method and system | |
CN110992124A (en) | House resource recommendation method and system | |
CN114048148A (en) | Crowdsourcing test report recommendation method and device and electronic equipment | |
CN113536020A (en) | Method, storage medium and computer program product for data query | |
CN118673058A (en) | Data modeling system and method based on real world research data | |
CN115982144A (en) | Similar text duplicate removal method and device, storage medium and electronic device | |
CN116089639A (en) | Auxiliary three-dimensional modeling method, system, device and medium | |
CN111078859A (en) | Author recommendation method based on reference times | |
CN108073567B (en) | Feature word extraction processing method, system and server | |
CN116468102B (en) | Tool image classification model pruning method, device, and computer equipment | |
CN113128535B (en) | Cluster model selection method and device, electronic equipment and storage medium | |
CN113486202A (en) | Method for classifying small sample images | |
CN113379004A (en) | Data table classification method and device, electronic equipment and storage medium | |
CN111639712A (en) | Positioning method and system based on density peak clustering and gradient lifting algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |