CN106874367A - A kind of sampling distribution formula clustering method based on public sentiment platform - Google Patents
A kind of sampling distribution formula clustering method based on public sentiment platform Download PDFInfo
- Publication number
- CN106874367A CN106874367A CN201611260883.5A CN201611260883A CN106874367A CN 106874367 A CN106874367 A CN 106874367A CN 201611260883 A CN201611260883 A CN 201611260883A CN 106874367 A CN106874367 A CN 106874367A
- Authority
- CN
- China
- Prior art keywords
- data
- cluster
- sampling
- clustering method
- method based
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Abstract
The invention provides a kind of sampling distribution formula clustering method based on public sentiment platform, and comprise the following steps:First, obtaining data to be clustered, and treat cluster data carries out burst treatment, obtains multiple bursts;2nd, sampling of data is carried out using each burst of Map function pairs in MapReduce;3rd, the data from the sample survey that will be obtained collects, and the data from the sample survey for collecting is clustered during the Reduce of MapReduce frameworks;4th, it is repeated in step 2 and the total r that carries out of step 3 takes turns sampling of data, the cluster result of the data from the sample survey of each round is denoted as base cluster, and obtain Π={ π 1, π 2 ..., π r } vector, wherein, r is the positive integer more than or equal to 2, and π i are the base cluster of the i-th wheel, 1≤i≤r, and be positive integer;5th, it is final cluster result to reuse MapReduce frameworks by base clustering ensemble.The sampling distribution formula clustering method based on public sentiment platform can effectively improve the cluster efficiency of mass data and improve data diversity while data scale is reduced.
Description
Technical field
The invention belongs to data mining and machine learning field, more particularly to a kind of sampling distribution based on public sentiment platform
Formula clustering method.
Background technology
Data clusters problem, is that it is operated by the similitude between data sample point, makes similarity high
Data sample point is in same class cluster, and the relatively low sample point of similarity is away from each other.Cluster is all the time data mining
One of with the important method in machine learning, but the user's original content brought with the development particularly Web2.0 of internet
Explosive growth, data volume has turned into the bottleneck of traditional clustering method, especially news recommendation, machine translation, literature search, feelings
The text data of the application fields such as analysis, public sentiment monitoring of calling the score, with the characteristic that higher-dimension is sparse.Clustering algorithm how is improved to be particularly
The efficiency of the clustering method of high dimension sparse data, it has also become internet big data data mining major issue urgently to be resolved hurrily.
Therefore, it is necessary to provide a kind of efficiency of the clustering method that can improve high dimension sparse data based on public sentiment platform
Sampling distribution formula clustering method.
The content of the invention
It is an object of the invention to provide a kind of efficiency of the clustering method that can improve high dimension sparse data based on carriage
The sampling distribution formula clustering method of feelings platform.
Technical scheme is as follows:A kind of sampling distribution formula clustering method based on public sentiment platform includes following step
Suddenly:First, data to be clustered are obtained, and burst treatment is carried out to the data to be clustered, obtain multiple bursts;2nd, utilize
The each burst of Map function pairs in MapReduce carries out sampling of data;3rd, the data from the sample survey that will be obtained collects, and
The data from the sample survey for collecting is clustered during the Reduce of MapReduce frameworks;4th, step 2 and step are repeated in
Rapid three it is total carry out r wheel sampling of datas, the cluster result of the data from the sample survey of each round is denoted as base cluster, and obtain Π=π 1,
π 2 ..., π r } vector, wherein, r is the positive integer more than or equal to 2, and π i are the base cluster of the i-th wheel, 1≤i≤r, and for just whole
Number;5th, it is final cluster result to reuse MapReduce frameworks by the base clustering ensemble.
Preferably, in step one, horizontal segmentation is carried out to the data to be clustered, and ensure every in cutting procedure
The integrality of data, and the burst storage that segmentation is obtained is in distributed file system.
Preferably, carry out that sampling of data at least meets in the step 2 requires to include:Sampling techniques letter enough in itself
Single, sampling carries out having certain randomness with sampling results based on local data.
Preferably, in step 3, using specific sampling of data round as key, the data from the sample survey conduct for obtaining
Value, in converging to a Reduce function of MapReduce by shuffle functions, to taking out in the Reduce functions
Sample data are clustered.
Preferably, comprise the following steps in step 5:The a number of base cluster is randomly choosed as barycenter, and
The distance between other bases clusters and the barycenter is calculated with Map functions, each base cluster is assigned to and its distance
In class cluster where the nearest barycenter, and the barycenter of class cluster is updated in Reduce functions;This process is repeated until institute
The barycenter for stating class cluster no longer changes.
Preferably, z is setkK-th barycenter of class cluster in base Clustering Vector Π is represented, is described as rk dimensional vectors:
Wherein,
Preferably, setting vector Π is described as a vector x for rk dimensionsl, then xlWith zkBetween COS distance be:
Wherein wi represents i-th weight of base cluster, and value is 1/r when in the absence of priori.
Preferably, barycenter zkIt is updated using equation below:
WhereinIt is the constant vector on Π,
Represent the quantity of example in i-th k-th cluster of base cluster;
ForWithFor, if the given real vector y, | | y | | of d dimensionspThe Lp norms of y are represented, i.e.,
The technical scheme that the present invention is provided has the advantages that:
The sampling distribution formula clustering method based on public sentiment platform reduces data scale using sampling techniques, by many wheels
Sampling improves the diversity of base cluster result, then defines COS distance and base cluster result is integrated into final cluster result,
Therefore, it is possible to effectively improve the cluster efficiency of mass data;
Also, by introducing sampling techniques, data diversity is improved while reduction data scale, then using distribution
Computational frame designs two stage cluster process, to improve the clustering result quality and efficiency of public sentiment project analysis in internet big data
There is provided effective ways.
Brief description of the drawings
Fig. 1 is the FB(flow block) of the sampling distribution formula clustering method based on public sentiment platform provided in an embodiment of the present invention.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
The description of specific distinct unless the context otherwise, element and component in the present invention, quantity both can be with single shape
Formula is present, it is also possible to which multiple forms is present, and the present invention is defined not to this.Although the step in the present invention is entered with label
Arrangement is gone, but has been not used to limit the precedence of step, unless expressly stated the order of step or holding for certain step
Row is needed based on other steps, and the relative rank of otherwise step is adjustable.It is appreciated that used herein
Term "and/or" is related to and covers one of associated Listed Items or one or more of any and all possible group
Close.
Fig. 1 is referred to, the sampling distribution formula clustering method 100 based on public sentiment platform provided in an embodiment of the present invention is included such as
Lower step:
S1, acquisition data to be clustered, and burst treatment is carried out to the data to be clustered, obtain multiple bursts.
In step sl, treating cluster data carries out horizontal segmentation, some bursts (Sharding) is obtained, in cutting procedure
In should ensure that the integrality of every data (such as newsletter archive etc.).And, the burst storage for obtaining will be split in distribution
In formula file system such as HDFS, the size of the burst is determined by selected distributed file system, each burst in such as HDFS
Size is 64M.And, by accessing distributed file system, calculate node can share burst, and be localized by calculating,
It is effectively reduced I/O consumption.
S2, carry out sampling of data using each burst of Map function pairs in MapReduce.
Specifically, in step s 2, sampling of data is carried out on each burst, for the consideration divided and ruled with efficiency,
Sampling techniques should at least meet the following requirements:1st, sampling techniques need in itself it is enough simple, can otherwise turn into new bottleneck 2,
Sampling can carry out should having certain randomness without relying on global view 3, sampling results based on local data.
And, the more than satisfaction methods of sampling of some can be applied in the present invention, not do specific restriction to this.And
In step s 2, subsampling operation is to realize that this is denoted as first stage Map by Map functions in MapReduce frameworks
Process.
S3, the data from the sample survey that will be obtained collect, and to taking out described in collecting during the Reduce of MapReduce frameworks
Sample data are clustered.
Specifically, in step s3, to the sampling results of each round, using specific sampling of data round as key, obtain
Data from the sample survey as value, in converging to a Reduce function of MapReduce by shuffle functions, described
Data from the sample survey is clustered in Reduce functions, this is denoted as first stage Reduce process.
And, specific clustering method includes but is not limited to K averages, spectral clustering and hierarchical clustering etc., to this present invention not
Limit.
S4, it is repeated in that step S2 and step S3 is total to carry out r wheel sampling of datas, by the cluster of the data from the sample survey of each round
Result is denoted as base cluster, and obtains the vector of Π={ π 1, π 2 ..., π r }, wherein, r is the positive integer more than or equal to 2, and π i are the
The base cluster of i wheels, 1≤i≤r, and be positive integer.
S5, to reuse MapReduce frameworks by the base clustering ensemble be final cluster result.
In step s 5, clustering ensemble is carried out to vectorial Π, and each described base cluster is considered as entirety, so as to calculate every
Distance between the individual base cluster.
Specifically, the step S5 comprises the following steps:
A number of base cluster is randomly choosed as barycenter, and with Map functions calculate other described bases cluster with
Distance between the barycenter, each base is clustered in the class cluster where being assigned to the barycenter closest with it, and
The barycenter of class cluster is updated in Reduce functions, this is denoted as second stage Map processes and second stage Reduce processes;
This process is repeated until the barycenter of the class cluster no longer changes.
In the present embodiment, the class cluster of the calculating and base cluster of entering row distance during the Map of the second stage refers to
Group;The renewal of barycenter in being carried out during the Reduce of the second stage.
And, during the Map of the second stage, the sampling distribution formula clustering method based on public sentiment platform
100 define COS distance is calculated:
Setting zkK-th barycenter of class cluster in base Clustering Vector Π is represented, is described as rk dimensional vectors:
Wherein,
And, setting vector Π is described as a vector x for rk dimensionsl, then xlWith zkBetween COS distance be:
Wherein wiI-th weight of base cluster is represented, value is 1/r when in the absence of priori.
During the Reduce of the second stage, after all base clusters are assigned to a certain class cluster in step S5,
Update the barycenter of all class clusters, barycenter zkIt is updated using equation below:
Wherein,It is the constant vector on Π,
Represent the quantity of example in i-th k-th cluster of base cluster;
ForWithFor, if the given real vector y, | | y | | of d dimensionspThe Lp norms of y are represented, i.e.,
In the calculating process that barycenter during the Reduce of the second stage updates, when each class cluster barycenter more
After new, the Map processes of the second stage are repeated, and recalculate the distance of each base cluster and new barycenter and carry out base and gather
The class cluster of class is assigned, untill class cluster barycenter no longer changes.
By taking microblog users group discovery as an example, the specific embodiment of the invention and step are described in detail.User data is included
Its association attributes, such as age, sex, hobby, pay close attention to, be concerned, forwarding, may be defined as a vector, colony's discovery is root
Clustered according to user vector, be a colony by similitude user clustering higher.Because the quantity of microblog users is extremely huge
Greatly, it is adaptable to distributed clustering method proposed by the invention.
Mass users are stored on HDFS bursts first, each burst 64M ensures the data of each user not during storage
It is divided, i.e., the data storage of unique user is on same burst.
Using the Map functions of distributed memory Computational frame Spark (of MapReduce frameworks implements) every
User is randomly choosed on individual burst.
The user chosen on all bursts is focused on into same node to be clustered, the foundation of cluster is user vector, profit
Realize that clustering method can select conventional K mean cluster method, the machine learning storehouse in Spark with the Reduce functions of Spark
MLib provides K mean algorithms implementing in Spark.Repeat said process n times, a base cluster is obtained every time.
After obtaining n base cluster, these bases cluster is carried out integrated, integrating process is in " appointments of class cluster " and " barycenter renewal "
Between iterate, it is same to be realized using distributed memory Computational frame Spark.Data in Spark are by elasticity distribution formula data
Collection (Resilient Distributed Datasets, RDD) carries out abstract and description, and this is also the most important cores of Spark
One of technology, all of operation is all based on RDD to be carried out.
Here, example is the x in formula (3)l, this is a higher-dimension sparse vector, to save memory space, in RDD
Two arrays are modeled as, non-zero index are deposited respectively and is considered that the quantity of example is very big with numerical value, vector is constituted first
Matrix carry out horizontal fragmentation, each burst import be a RDD, each RDD correspondence one Map task, calculate RDD in example
With the distance of all barycenter, the appointment (classify) of example is carried out.
It is worth noting that, in the calculating of COS distance, it is not necessary that each vectorial is calculated, and is only needed
The inner product indexed between identical in two vectors is calculated, This further reduces the complexity for calculating.In Map outputs
Key-value centering, key represents the numbering of cluster, and value represents specific example, in the Reduce stages then according to all key identical values, i.e.,
Belong to the renewal (recenter) that the example of same class cluster carries out barycenter, the new barycenter for obtaining (including randomly choosed during initialization
K barycenter) be sent to all RDD using the broadcast mechanism of Spark, the iteration of next round is carried out, until barycenter or class cluster
No longer change.
Compared to prior art, the embodiment of the present invention has the advantages that:
The sampling distribution formula clustering method 100 based on public sentiment platform reduces data scale using sampling techniques, passes through
Many wheel sampling improve the diversity of base cluster result, then define COS distance and base cluster result is integrated into final cluster knot
Really, therefore, it is possible to effectively improve the cluster efficiency of mass data;
Also, by introducing sampling techniques, data diversity is improved while reduction data scale, then using distribution
Computational frame designs two stage cluster process, to improve the clustering result quality and efficiency of public sentiment project analysis in internet big data
There is provided effective ways.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie
In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be in other specific forms realized.Therefore, no matter
From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, the scope of the present invention is by appended power
Profit requires to be limited rather than described above, it is intended that all in the implication and scope of the equivalency of claim by falling
Change is included in the present invention.Any reference in claim should not be considered as the claim involved by limitation.
Moreover, it will be appreciated that although the present specification is described in terms of embodiments, not each implementation method is only wrapped
Containing an independent technical scheme, this narrating mode of specification is only that for clarity, those skilled in the art should
Specification an as entirety, the technical scheme in each embodiment can also be formed into those skilled in the art through appropriately combined
May be appreciated other embodiment.
Claims (8)
1. a kind of sampling distribution formula clustering method based on public sentiment platform, it is characterised in that:Comprise the following steps:
First, data to be clustered are obtained, and burst treatment is carried out to the data to be clustered, obtain multiple bursts;
2nd, sampling of data is carried out using each burst of Map function pairs in MapReduce;
3rd, the data from the sample survey that will be obtained collects, and the sampling number during the Reduce of MapReduce frameworks to collecting
According to being clustered;
4th, it is repeated in step 2 and the total r that carries out of step 3 takes turns sampling of data, by the cluster result of the data from the sample survey of each round
It is denoted as base to cluster, and obtains the vector of Π={ π 1, π 2 ..., π r }, wherein, r is the positive integer more than or equal to 2, and π i are the i-th wheel
Base cluster, 1≤i≤r, and be positive integer;
5th, it is final cluster result to reuse MapReduce frameworks by the base clustering ensemble.
2. the sampling distribution formula clustering method based on public sentiment platform according to claim 1, it is characterised in that:In step
In, the data to be clustered are carried out with horizontal segmentation, and ensure the integrality per data in cutting procedure, and will split
To the burst store in distributed file system.
3. the sampling distribution formula clustering method based on public sentiment platform according to claim 1, it is characterised in that:The step
Carry out that sampling of data at least meets in two requires to include:Simple enough, sampling is carried out sampling techniques based on local data in itself
There is certain randomness with sampling results.
4. the sampling distribution formula clustering method based on public sentiment platform according to claim 1, it is characterised in that:In step 3
In, using specific sampling of data round as key, the data from the sample survey for obtaining is converged to as value by shuffle functions
In one Reduce function of MapReduce, data from the sample survey is clustered in the Reduce functions.
5. the sampling distribution formula clustering method based on public sentiment platform according to claim 1, it is characterised in that:In step 5
In comprise the following steps:
A number of base cluster is randomly choosed as barycenter, and with Map functions calculate other described bases cluster with it is described
Distance between barycenter, each base is clustered in the class cluster where being assigned to the barycenter closest with it, and
The barycenter of class cluster is updated in Reduce functions;
This process is repeated until the barycenter of the class cluster no longer changes.
6. the sampling distribution formula clustering method based on public sentiment platform according to claim 5, it is characterised in that:Setting zkTable
Show k-th barycenter of class cluster in base Clustering Vector Π, be described as rk dimensional vectors:
Wherein,
7. the sampling distribution formula clustering method based on public sentiment platform according to claim 6, it is characterised in that:Setting vector
Π is described as a vector x for rk dimensionsl, then xlWith zkBetween COS distance be:
Wherein wi represents i-th weight of base cluster, and value is 1/r when in the absence of priori.
8. the sampling distribution formula clustering method based on public sentiment platform according to claim 5, it is characterised in that:Barycenter zkProfit
It is updated with equation below:
WhereinIt is the constant vector on Π,
Represent the quantity of example in i-th k-th cluster of base cluster;
ForWith | | T(i)||2For, if given real vector y, | | y | | the p of d dimensions represents the Lp norms of y, i.e.,
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611260883.5A CN106874367A (en) | 2016-12-30 | 2016-12-30 | A kind of sampling distribution formula clustering method based on public sentiment platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611260883.5A CN106874367A (en) | 2016-12-30 | 2016-12-30 | A kind of sampling distribution formula clustering method based on public sentiment platform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106874367A true CN106874367A (en) | 2017-06-20 |
Family
ID=59164125
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611260883.5A Pending CN106874367A (en) | 2016-12-30 | 2016-12-30 | A kind of sampling distribution formula clustering method based on public sentiment platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106874367A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107516110A (en) * | 2017-08-22 | 2017-12-26 | 华南理工大学 | A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding |
CN110008336A (en) * | 2019-01-14 | 2019-07-12 | 阿里巴巴集团控股有限公司 | A kind of public sentiment method for early warning and system based on deep learning |
CN110704515A (en) * | 2019-12-11 | 2020-01-17 | 四川新网银行股份有限公司 | Two-stage online sampling method based on MapReduce model |
CN110909817A (en) * | 2019-11-29 | 2020-03-24 | 深圳市商汤科技有限公司 | Distributed clustering method and system, processor, electronic device and storage medium |
WO2021249502A1 (en) * | 2020-06-12 | 2021-12-16 | 支付宝(杭州)信息技术有限公司 | Method and apparatus for clustering privacy data of multiple parties |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104156463A (en) * | 2014-08-21 | 2014-11-19 | 南京信息工程大学 | Big-data clustering ensemble method based on MapReduce |
CN104809242A (en) * | 2015-05-15 | 2015-07-29 | 成都睿峰科技有限公司 | Distributed-structure-based big data clustering method and device |
CN104820708A (en) * | 2015-05-15 | 2015-08-05 | 成都睿峰科技有限公司 | Cloud computing platform based big data clustering method and device |
CN106095791A (en) * | 2016-01-31 | 2016-11-09 | 长源动力(山东)智能科技有限公司 | A kind of abstract sample information searching system based on context and abstract sample characteristics method for expressing thereof |
US20160350146A1 (en) * | 2015-05-29 | 2016-12-01 | Cisco Technology, Inc. | Optimized hadoop task scheduler in an optimally placed virtualized hadoop cluster using network cost optimizations |
-
2016
- 2016-12-30 CN CN201611260883.5A patent/CN106874367A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104156463A (en) * | 2014-08-21 | 2014-11-19 | 南京信息工程大学 | Big-data clustering ensemble method based on MapReduce |
CN104809242A (en) * | 2015-05-15 | 2015-07-29 | 成都睿峰科技有限公司 | Distributed-structure-based big data clustering method and device |
CN104820708A (en) * | 2015-05-15 | 2015-08-05 | 成都睿峰科技有限公司 | Cloud computing platform based big data clustering method and device |
US20160350146A1 (en) * | 2015-05-29 | 2016-12-01 | Cisco Technology, Inc. | Optimized hadoop task scheduler in an optimally placed virtualized hadoop cluster using network cost optimizations |
CN106095791A (en) * | 2016-01-31 | 2016-11-09 | 长源动力(山东)智能科技有限公司 | A kind of abstract sample information searching system based on context and abstract sample characteristics method for expressing thereof |
Non-Patent Citations (1)
Title |
---|
蔡静颖等: "《模糊聚类算法及应用》", 31 August 2015 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107516110A (en) * | 2017-08-22 | 2017-12-26 | 华南理工大学 | A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding |
CN107516110B (en) * | 2017-08-22 | 2020-02-18 | 华南理工大学 | Medical question-answer semantic clustering method based on integrated convolutional coding |
CN110008336A (en) * | 2019-01-14 | 2019-07-12 | 阿里巴巴集团控股有限公司 | A kind of public sentiment method for early warning and system based on deep learning |
CN110008336B (en) * | 2019-01-14 | 2023-04-07 | 创新先进技术有限公司 | Public opinion early warning method and system based on deep learning |
CN110909817A (en) * | 2019-11-29 | 2020-03-24 | 深圳市商汤科技有限公司 | Distributed clustering method and system, processor, electronic device and storage medium |
CN110909817B (en) * | 2019-11-29 | 2022-11-11 | 深圳市商汤科技有限公司 | Distributed clustering method and system, processor, electronic device and storage medium |
CN110704515A (en) * | 2019-12-11 | 2020-01-17 | 四川新网银行股份有限公司 | Two-stage online sampling method based on MapReduce model |
WO2021249502A1 (en) * | 2020-06-12 | 2021-12-16 | 支付宝(杭州)信息技术有限公司 | Method and apparatus for clustering privacy data of multiple parties |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106874367A (en) | A kind of sampling distribution formula clustering method based on public sentiment platform | |
Lebedev et al. | Fast convnets using group-wise brain damage | |
CN102129451B (en) | Method for clustering data in image retrieval system | |
US20150242497A1 (en) | User interest recommending method and apparatus | |
CN104200369B (en) | Method and device for determining commodity distribution range | |
CN104615779B (en) | A kind of Web text individuations recommend method | |
CN103902704B (en) | Towards the multidimensional inverted index and quick retrieval of large-scale image visual signature | |
CN106210044B (en) | A kind of any active ues recognition methods based on access behavior | |
CN109885640B (en) | Multi-keyword ciphertext sorting and searching method based on alpha-fork index tree | |
Chen et al. | Coarsening the granularity: Towards structurally sparse lottery tickets | |
CN107180079B (en) | Image retrieval method based on convolutional neural network and tree and hash combined index | |
CN111125469B (en) | User clustering method and device of social network and computer equipment | |
CN106570173B (en) | Spark-based high-dimensional sparse text data clustering method | |
CN111177410A (en) | Knowledge graph storage and similarity retrieval method based on evolution R-tree | |
CN109840551B (en) | Method for optimizing random forest parameters for machine learning model training | |
CN109033453A (en) | A kind of film recommended method and system based on RBM Yu the cluster of difference secret protection | |
CN106503146B (en) | The feature selection approach of computer version | |
CN106897276A (en) | A kind of internet data clustering method and system | |
CN105354343B (en) | User characteristics method for digging based on remote dialogue | |
CN110825738A (en) | Data storage and query method and device based on distributed RDF | |
CN104978395B (en) | Visual dictionary building and application method and device | |
Adinugroho et al. | Optimizing K-means text document clustering using latent semantic indexing and pillar algorithm | |
CN103761298B (en) | Distributed-architecture-based entity matching method | |
US20150012563A1 (en) | Data mining using associative matrices | |
CN107704872A (en) | A kind of K means based on relatively most discrete dimension segmentation cluster initial center choosing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170620 |