CN104965869A - Mobile application sorting and clustering method based on heterogeneous information network - Google Patents
Mobile application sorting and clustering method based on heterogeneous information network Download PDFInfo
- Publication number
- CN104965869A CN104965869A CN201510312733.3A CN201510312733A CN104965869A CN 104965869 A CN104965869 A CN 104965869A CN 201510312733 A CN201510312733 A CN 201510312733A CN 104965869 A CN104965869 A CN 104965869A
- Authority
- CN
- China
- Prior art keywords
- app
- network
- category
- sequence
- author
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a mobile application sorting and clustering method based on a heterogeneous information network. A sorting result mainly reflects the importance degree of an object, and thus, when the sorting result is introduced in the clustering process, a clustering result is more meaningful; moreover, the sorting result and the clustering result are continuously regulated by adopting an iterative method and supplement each other, so that a clustering effect is integrally promoted. Conventionally, only one or two types of information are used commonly in methods capable of being used for clustering of mobile applications; the mobile application sorting and clustering method is based on the heterogeneous information network consisting of four types of information of the applications and uses more information sources, so that clustering accuracy can be essentially promoted.
Description
Technical field
The invention belongs to application recommendation field, particularly relate to a kind of speciality based on Heterogeneous Information network and the clustering method based on sequence, achieve a kind of method of Mobile solution being carried out to effective cluster and sequence.
Background technology
Along with developing rapidly of mobile Internet, emerged the application of substantial amounts in Mobile Market, these Mobile solution miscellaneous are changing the life of people gradually.Each Mobile solution associates each autocorrelative information, and thousands of Mobile solution just defines a huge Heterogeneous Information network, this network packet contains a large amount of valuable information, is therefore had very important significance by tool to the research of Mobile solution information network.On the one hand, the service condition analysing in depth a large amount of Mobile solution can help us to understand the usage behavior of user in detail, thus provides more personalized service for user.Such as personalized application is recommended to be come for targeted customer recommends Mobile solution more accurately by the potential structured relations between digging user or between application, thus promotes the Experience Degree of user.On the other hand, company can also be helped to find more effective advertisement promotion platform to the analysis of Mobile solution data.The method that usual user obtains application from application market is mainly divided into three kinds of approach: a kind of is the search engine using application market, directly search for, the second uses application class label in application market and rank to find the application of needs, and the third obtains application in the list of application of system recommendation.Wherein, application searches mainly adopts keyword match method, the information type used is the title of Mobile solution, and tag along sort is fixing, artificial setting in advance often, along with the growth of number of applications, the unreasonable part of label setting will display gradually, in view of this, adopts a kind of effective information extraction technology to be very necessary to make up these weak points.Cluster is a kind of understanding data, grasp one of important method of effective information, data mixed and disorderly are in a large number attributed to different groups by using clustering method, are conducive to the analysis to data and study, and carrying out cluster analysis to Mobile solution data can as the pre-treatment step before prediction modeling.At present, the most method that can be used in application data cluster analysis is mainly for isomorphism information network, namely based on a certain type information of application, use the information source of single type owing to have ignored other relevant informations, greatly limit the accuracy of cluster.Therefore by extracting the dissimilar information of application to build a Mobile solution heterogeneous network, then based on this network, the method that cluster analysis is carried out in application itself and its relevant information has been become to the active demand of academia and industry member.
Summary of the invention
For above-mentioned technical matters, the present invention proposes the sequence of a kind of Mobile solution based on Heterogeneous Information network and clustering method
In order to solve the problems of the technologies described above, technical scheme of the present invention is as follows:
Based on Mobile solution sequence and the clustering method of Heterogeneous Information network, system comprises data preprocessing module, sequence distribution calculation module and probability generation module, specifically comprises the steps:
11) data preprocessing module obtains Mobile solution information document from Mobile solution market, and carry out pre-service to this Mobile solution information document, described preprocessing process comprises information filtering, word segmentation processing and keyword extraction;
12) a star heterogeneous network be made up of four category informations is built; Carry out stochastic clustering to this star heterogeneous network, star heterogeneous network is divided into multiple sub-network thereupon;
13) distribution calculation module that sorts receives the sequence distribution that sub-network settles accounts attribute node in each sub-network respectively, then exports;
13) probability generation model receives the sequence distribution of attribute node for the posterior probability of computing center's node in each sub-network, the posterior probability of other attribute nodes is calculated afterwards by neighbor relationships, finally check whether cluster result restrains, just repartition sub-network according to new probability distribution be input to sequence distribution calculation module if do not restrained, if convergence just exports as cluster result.
Further, described sequence distribution calculation module sequence flow process specifically comprises the steps:
First the sub-network of a cluster numbers K and K Mobile solution is input as, then the sequence distribution of three generic attribute nodes in each sub-network is calculated respectively, for the object of AUTHOR and CATEGORY Class type, adopt transitivity sort method, the method is the process of an iteration, and end condition is the maximum times that sequence convergence in distribution or iterations are greater than setting; Object for TERM type adopts count sort method to calculate its sequence distribution, and whole sequence distribution calculation process exports the sequence distribution of each attribute type the most at last; The object of described AUTHOR, CATEGORY and TERM type is the keyword of extraction.
Further, first importation comprises cluster numbers K, the sequence distribution of the attribute type of K Mobile solution sub-network and correspondence thereof, EM method will be adopted after setting up probability generation model to obtain optimum parameter value, utilize the posterior probability of sequence distribution generating center type node in each cluster of optimal value of the parameter and the attribute type obtained, then neighbor relationships is utilized to calculate the posterior probability of each attribute type node, finally redistribute each node to different clusters according to probability distribution situation, then export cluster result.
Further, building a star heterogeneous network be made up of four category informations for setting up star network: G=(V, E, W), wherein V={APP, AUTHOR, CATEGORY, TERM}, comprising four category information nodes of application, APP={ap
1, ap
2ap
ncentroid set, AUTHOR={au
1, au
2au
n, CATEGORY={ca
1, ca
2ca
n, TERM={te
1, te
2te
mthree generic attribute node set, and E is that the limit connecting Centroid and attribute node is gathered, and W is the weight set on limit, and weights are divided into three kinds, the first, if limit e
ithat connect is APP and { node of AUTHOR, CATEGORY}, so w
ivalue be 1, the second, if limit e
iwhat connect is the node of APP and TERM, so w
ivalue can be any positive integer, the 3rd, if there is no fillet between two nodes, so w
ibe expressed as 0.
Further, star network calculates will obtain the sequence distribution results of attribute type information through sequence distribution, three types information node has the sequence of oneself to distribute, and they will be input in probability generation model as conditional probability, and wherein the sequence of AUTHOR is distributed as R={r (au
1), r (au
2) ... r (au
n), wherein r (au
i)>=0, and
the sequence distribution of other two attribute type informations also represents in the same fashion, sequence distribution concrete computation process be divided into two parts, first part adopt be transitivity sort method, for AUTHOR, the information of CATEGORY two type, this is the computation process of an iteration:
P(AUTHOR|G)=(W
AUTHOR,APPσ
-1 AUTHOR,APP)(W
APP,CATEGORYσ
-1 APP,CATEGORY)P(CATOGORY|G)(1)
P(CATEGORY|G)=(W
CATEGORY,APPσ
-1 CATEGORY,APP)(W
APP,AUTHORσ
-1 APP,AUTHOR)P(AUTHOR|G)(2)
Wherein σ
-1 aUTHOR, APP, σ
-1 aPP, CATEGORY, σ
-1 cATEGORY, APP, σ
-1 aPP, AUTHORbe diagonal matrix, value equals weight matrix W respectively
aUTHOR, APP, W
aPP, CATEGORY, W
cATEGORY, APP, W
aPP, AUTHORthe summation of each train value, Part II is count sort method, and for TERM type, concrete computation process is as follows:
Wherein N
g(te
i) represent in G network, te
ineighbor node.
Further, probability generation model will use sequence distribution as one of initial conditions, then uses EM method to comment the Posterior probability distribution of APP node in different cluster, certain sub-network G of definition access
kthe probability of certain attribute node of middle d x is:
p(x|G
k)=p(X|G
k)×p(x|X,G
k)(4)
Wherein p (X|G
k) represent in network G
kthe probability of middle access type X, p (x|X, G
k) represent in network G
kin, the probability of some nodes in access type X, in order to avoid p (x|X, G
k) there is zero probability phenomenon, add global information, to its smoothing process:
p′(x|X,G
k)=(1-ε)p(x|X,G
k)+εp(x|X,G) (5)
G in certain sub-network
kaccess a Centroid ap
iprobability decided by its attribute node:
According to Bayes law, obtain Centroid ap
iposterior probability: p (G
k| ap
i) ∝ p (ap
i| G
k) × p (G
k), in order to obtain suitable P (G
k) consider to maximize posterior probability p (G
k| ap
i), then use EM method to obtain best p (G
k), concrete calculation procedure is as follows:
Wherein, K is the quantity needing cluster that user inputs, and after obtaining all center type probability distribution, for each attribute node calculates its posterior probability in each cluster, concrete formula is as follows:
Wherein x is certain attribute node, and N (x) is a Centroid set, is the neighbor node of x, and for certain attribute node, its posterior probability in certain cluster equals the average of the posterior probability of its neighbor node in this cluster.
Beneficial effect of the present invention is: the importance degree of ranking results mainly reflection object, introducing this ranking results in cluster process makes cluster result more meaningful, and adopt the method for iteration that ranking results and cluster result are constantly adjusted, complement each other, improve the effect of cluster on the whole.Traditional, can be used in the method for Mobile solution cluster, usually only use information that is a kind of or two types, the present invention is based on the Heterogeneous Information network be made up of the Four types information applied, the information source used is more, inherently can promote the accuracy of cluster.
Accompanying drawing explanation
Fig. 1 is one-piece construction figure of the present invention;
Fig. 2 is that the present invention sorts distribution calculation module internal process figure;
Fig. 3 is probability generation module internal process figure of the present invention;
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described further.
In traditional clustering method, often have ignored the analysis of Mobile solution data and apply relevant other types data, this limits the accuracy of clustering method to a certain extent.Present invention employs a kind of clustering method based on sequence, first pre-service has been carried out to Mobile solution data, extract the data of Four types, comprise the Apply Names and other three attribute types that are called as center type: application publisher, applicating category and application descriptor, wherein word segmentation processing is carried out to application descriptor, TF-IDF method is adopted to extract key vocabularies, then get up these information consolidations formation star Heterogeneous Information network, class weight matrix is adopted to identify, then the clustering method based on sequence is adopted, the sequence distribution of classification information is calculated by sort method, for reflecting the degree of classification importance.Then on the basis of sequence distribution, probability generation model is set up, the posterior probability be applied in each cluster is obtained with this, after calculating each Posterior probability distribution be applied in each cluster, the probability distribution of other attribute category node in each cluster is obtained by neighbor relationships, calculate sequence distribution and estimate that these two parts of posterior probability are continuously and iteration, iteration will constantly be carried out until result convergence.
Whole Mobile solution sequence and clustering method form primarily of three modules: data preprocessing module, sequence distribution calculation module and probability generation module.
As can be seen from Figure 1, the whole process to Mobile solution sequence and cluster is formed primarily of data preprocessing module, sequence distribution calculation module and probability generation module three part order.First data preprocessing module obtains Mobile solution information document from Mobile solution market, and preprocessing process comprises information filtering, word segmentation processing and keyword extraction, then builds a star heterogeneous network be made up of four category informations; Initialization section carries out stochastic clustering, and star network is divided into multiple sub-network thereupon, and sequence distribution calculation module receives the sequence distribution that sub-network settles accounts attribute node in each sub-network respectively, then exports; The sequence distribution that probability generation model receives attribute node is used for the posterior probability of computing center's node in each sub-network, the posterior probability of other attribute nodes is calculated afterwards by neighbor relationships, finally check whether cluster result restrains, just repartition sub-network according to new probability distribution be input to sequence distribution calculation module if do not restrained, if convergence just exports as cluster result.
Data preprocessing module carries out data extraction, information filtering, word segmentation processing and keyword extraction to the Mobile solution document obtained from Mobile Market, first the data of the Four types corresponding to each application will be extracted, next word segmentation processing to be carried out to the application descriptor extracted, the key utilizing TF-IDF method to extract each application describes vocabulary, finally identify these information with weight matrix, form a Heterogeneous Information network.
Fig. 2 describes the flow process of sequence distribution calculation module.First the sub-network of a cluster numbers K and K Mobile solution is input as, then the sequence distribution of three generic attribute nodes in each sub-network is calculated respectively, for the object of AUTHOR and CATEGORY Class type, adopt transitivity sort method, the method is the process of an iteration, and end condition is the maximum times that sequence convergence in distribution or iterations are greater than setting; Object for TERM type adopts count sort method to calculate its sequence distribution.Whole sequence distribution calculation process exports the sequence distribution of each attribute type the most at last.
Sequence distribution calculation module for obtain can reflection object significance level in different cluster sequence distribution, two parts are subdivided into again for different types of data order module, what one of them part adopted is transitivity sort method, be mainly used for the sequence distribution of computing application publisher and these two attribute types of applicating category, what another part adopted is count sort method, is mainly used for the sequence distribution of computing application key vocabularies.
Fig. 3 describes the internal work flow process of probability generation module.First importation comprises cluster numbers K, the sequence distribution of the attribute type of K Mobile solution sub-network and correspondence thereof, EM method will be adopted after setting up probability generation model to obtain optimum parameter value, utilize the posterior probability of sequence distribution generating center type node in each cluster of optimal value of the parameter and the attribute type obtained, then neighbor relationships is utilized to calculate the posterior probability of each attribute type node, finally redistribute each node to different clusters according to probability distribution situation, then export cluster result.
Probability generation module is used for computing center's type, the i.e. posterior probability of application originally in different cluster, EM method is adopted to estimate the posterior probability of center type, then the probability distribution of other three attribute type information of application is obtained according to neighbor relationships, finally carry out cluster again according to posterior probability, export cluster result.
Needed to be the Heterogeneous Information network be made up of four category informations by the document subject feature vector of Mobile solution before carrying out sequence and cluster:
Star network: G=(V, E, W), wherein V={APP, AUTHOR, CATEGORY, TERM}, comprise four category information nodes of application, APP={ap
1, ap
2ap
ncentroid set, AUTHOR={au
1, au
2au
n, CATEGORY={ca
1, ca
2ca
n, TERM={te
1, te
2te
mthree generic attribute node set, and E is that the limit connecting Centroid and attribute node is gathered, and W is the weight set on limit, and weights are divided into three kinds, the first, if limit e
ithat connect is APP and { node of AUTHOR, CATEGORY}, so w
ivalue be 1, the second, if limit e
iwhat connect is the node of APP and TERM, so w
ivalue can be any positive integer, the 3rd, if there is no fillet between two nodes, so w
ibe expressed as 0.
Star network calculates will obtain the sequence distribution results of attribute type information through sequence distribution, three types information node has the sequence of oneself to distribute, they will be input in probability generation model as conditional probability, and wherein the sequence of AUTHOR is distributed as R={r (au
1), r (au
2) ... r (au
n), wherein r (au
i)>=0, and
the sequence distribution of other two attribute type informations also represents in the same fashion.The concrete computation process of sequence distribution is divided into two parts, and what first part adopted is transitivity sort method, and mainly for AUTHOR, CATEGORY two information of type, this is the computation process of an iteration:
P(AUTHOR|G)=(W
AUTHOR,APPσ
-1 AUTHOR,APP)(W
APP,CATEGORYσ
-1 APP,CATEGORY)P(CATOGORY|G)(1)
P(CATEGORY|G)=(W
CATEGORY,APPσ
-1 CATEGORY,APP)(W
APP,AUTHORσ
-1 APP,AUTHOR)P(AUTHOR|G)(2)
Wherein σ
-1 aUTHOR, APP, σ
-1 aPP, CATEGORY, σ
-1 cATEGORY, APP, σ
-1 aPP, AUTHORbe diagonal matrix, value equals weight matrix W respectively
aUTHOR, APP, W
aPP, CATEGORY, W
cATEGORY, APP, W
aPP, AUTHORthe summation of each train value.Part II is count sort method, and for TERM type, concrete computation process is as follows:
Wherein N
g(te
i) represent in G network, te
ineighbor node.Probability generation model will use sequence distribution to do
For one of initial conditions, then use the Posterior probability distribution of EM method assessment APP node in different cluster.Certain sub-network G of definition access
kthe probability of certain attribute node of middle d x is:
p(x|G
k)=p(X|G
k)×p(x|X,G
k) (4)
Wherein p (X|G
k) represent in network G
kthe probability of middle access type X, p (x|X, G
k) represent in network G
kin, the probability of some nodes in access type X.In order to avoid p (x|X, G
k) there is zero probability phenomenon, add global information, to its smoothing process:
p′(x|X,G
k)=(1-ε)p(x|X,G
k)+εp(x|X,G) (5)
G in certain sub-network
kaccess a Centroid ap
iprobability decided by its attribute node:
According to Bayes law, Centroid ap can be obtained
iposterior probability: p (G
k| ap
i) ∝ p (ap
i| G
k) × p (G
k).In order to obtain suitable P (G
k) consider to maximize posterior probability p (G
k| ap
i), then use EM method to obtain best p (G
k), concrete calculation procedure is as follows:
Wherein, K is the quantity needing cluster that user inputs, and after obtaining all center type probability distribution, we can calculate its posterior probability in each cluster for each attribute node, and concrete formula is as follows:
Wherein x is certain attribute node, and N (x) is a Centroid set, is the neighbor node of x.For certain attribute node, its posterior probability in certain cluster equals the average of the posterior probability of its neighbor node in this cluster.
The above is only the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, without departing from the inventive concept of the premise; can also make some improvements and modifications, these improvements and modifications also should be considered as in scope.
Claims (6)
1., based on Mobile solution sequence and the clustering method of Heterogeneous Information network, it is characterized in that, system comprises data preprocessing module, sequence distribution calculation module and probability generation module, specifically comprises the steps:
11) data preprocessing module obtains Mobile solution information document from Mobile solution market, and carry out pre-service to this Mobile solution information document, described preprocessing process comprises information filtering, word segmentation processing and keyword extraction;
12) a star heterogeneous network be made up of four category informations is built; Carry out stochastic clustering to this star heterogeneous network, star heterogeneous network is divided into multiple sub-network thereupon;
13) distribution calculation module that sorts receives the sequence distribution that sub-network settles accounts attribute node in each sub-network respectively, then exports;
13) probability generation model receives the sequence distribution of attribute node for the posterior probability of computing center's node in each sub-network, the posterior probability of other attribute nodes is calculated afterwards by neighbor relationships, finally check whether cluster result restrains, just repartition sub-network according to new probability distribution be input to sequence distribution calculation module if do not restrained, if convergence just exports as cluster result.
2. a kind of sequence of the Mobile solution based on Heterogeneous Information network according to claim 1 and clustering method, is characterized in that, described sequence distribution calculation module sequence flow process specifically comprises the steps:
First the sub-network of a cluster numbers K and K Mobile solution is input as, then the sequence distribution of three generic attribute nodes in each sub-network is calculated respectively, for the object of AUTHOR and CATEGORY Class type, adopt transitivity sort method, the method is the process of an iteration, and end condition is the maximum times that sequence convergence in distribution or iterations are greater than setting; Object for TERM type adopts count sort method to calculate its sequence distribution, and whole sequence distribution calculation process exports the sequence distribution of each attribute type the most at last; The object of described AUTHOR, CATEGORY and TERM type is the keyword of extraction.
3. a kind of sequence of the Mobile solution based on Heterogeneous Information network according to claim 2 and clustering method, it is characterized in that, first importation comprises cluster numbers K, the sequence distribution of the attribute type of K Mobile solution sub-network and correspondence thereof, EM method will be adopted after setting up probability generation model to obtain optimum parameter value, utilize the posterior probability of sequence distribution generating center type node in each cluster of optimal value of the parameter and the attribute type obtained, then neighbor relationships is utilized to calculate the posterior probability of each attribute type node, finally redistribute each node to different clusters according to probability distribution situation, then cluster result is exported.
4. a kind of sequence of the Mobile solution based on Heterogeneous Information network according to claim 1 and clustering method, it is characterized in that, build a star heterogeneous network be made up of four category informations for setting up star network: G=(V, E, W), wherein V={APP, AUTHOR, CATEGORY, TERM}, comprise four category information nodes of application, APP={ap
1, ap
2ap
ncentroid set, AUTHOR={au
1, au
2au
n, CATEGORY={ca
1, ca
2ca
n, TERM={te
1, te
2te
mthree generic attribute node set, and E is that the limit connecting Centroid and attribute node is gathered, and W is the weight set on limit, and weights are divided into three kinds, the first, if limit e
ithat connect is APP and { node of AUTHOR, CATEGORY}, so w
ivalue be 1, the second, if limit e
iwhat connect is the node of APP and TERM, so w
ivalue can be any positive integer, the 3rd, if there is no fillet between two nodes, so w
ibe expressed as 0.
5. a kind of sequence of the Mobile solution based on Heterogeneous Information network according to claim 4 and clustering method, it is characterized in that, star network calculates will obtain the sequence distribution results of attribute type information through sequence distribution, three types information node has the sequence of oneself to distribute, they will be input in probability generation model as conditional probability, and wherein the sequence of AUTHOR is distributed as R={r (au
1), r (au
2) ... r (au
n), wherein r (au
i)>=0, and
the sequence distribution of other two attribute type informations also represents in the same fashion, sequence distribution concrete computation process be divided into two parts, first part adopt be transitivity sort algorithm, for AUTHOR, the information of CATEGORY two type, this is the computation process of an iteration:
P(AUTHOR|G)
=(W
AUTHOR,APPσ
-1 AUTHOR,APP)(W
APP,CATEGORYσ
-1 APP,CATEGORY)P(CATOGORY|G)(1)
P(CATEGORY|G)
=(W
CATEGORY,APPσ
-1 CATEGORY,APP)(W
APP,AUTHORσ
-1 APP,AUTHOR)P(AUTHOR|G) (2)
Wherein σ
-1 aUTHOR, APP, σ
-1 aPP, CATEGORY, σ
-1 cATEGORY, APP, σ
-1 aPP, AUTHORbe diagonal matrix, value equals weight matrix W respectively
aUTHOR, APP, W
aPP, CATEGORY, W
cATEGORY, APP, W
aPP, AUTHORthe summation of each train value, Part II is count sort algorithm, and for TERM type, concrete computation process is as follows:
Wherein NG (te
i) represent in G network, te
ineighbor node.
6. a kind of sequence of the Mobile solution based on Heterogeneous Information network according to claim 5 and clustering method, it is characterized in that, probability generation model will use sequence distribution as one of initial conditions, then EM method is used to comment the Posterior probability distribution of APP node in different cluster, certain sub-network G of definition access
kthe probability of certain attribute node of middle d x is:
p(x|G
k)=p(X|G
k)×p(x|X,G
k)(4)
Wherein p (X|G
k) represent in network G
kthe probability of middle access type X, p (x|X, G
k) represent in network G
kin, the probability of some nodes in access type X, in order to avoid p (x|X, G
k) there is zero probability phenomenon, add global information, to its smoothing process:
p′(x|X,G
k)=(1-ε)p(x|X,G
k)+εp(x|X,G) (5)
G in certain sub-network
kaccess a Centroid ap
iprobability decided by its attribute node:
According to Bayes law, obtain Centroid ap
iposterior probability: p (G
k| ap
i) ∝ p (ap
i| G
k) × p (G
k), in order to obtain suitable P (G
k) consider to maximize posterior probability p (G
k| ap
i), then use EM method to obtain best p (G
k), concrete calculation procedure is as follows:
Wherein, K is the quantity needing cluster that user inputs, and after obtaining all center type probability distribution, for each attribute node calculates its posterior probability in each cluster, concrete formula is as follows:
Wherein x is certain attribute node, and N (x) is a Centroid set, is the neighbor node of x, and for certain attribute node, its posterior probability in certain cluster equals the average of the posterior probability of its neighbor node in this cluster.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510312733.3A CN104965869A (en) | 2015-06-09 | 2015-06-09 | Mobile application sorting and clustering method based on heterogeneous information network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510312733.3A CN104965869A (en) | 2015-06-09 | 2015-06-09 | Mobile application sorting and clustering method based on heterogeneous information network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104965869A true CN104965869A (en) | 2015-10-07 |
Family
ID=54219906
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510312733.3A Pending CN104965869A (en) | 2015-06-09 | 2015-06-09 | Mobile application sorting and clustering method based on heterogeneous information network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104965869A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106934071A (en) * | 2017-04-27 | 2017-07-07 | 北京大学 | Recommendation method and device based on Heterogeneous Information network and Bayes's personalized ordering |
CN108776684A (en) * | 2018-05-25 | 2018-11-09 | 华东师范大学 | Optimization method, device, medium, equipment and the system of side right weight in knowledge mapping |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103425711A (en) * | 2012-05-25 | 2013-12-04 | 株式会社理光 | Object value aligning method based on multiple object instances |
CN103810288A (en) * | 2014-02-25 | 2014-05-21 | 西安电子科技大学 | Method for carrying out community detection on heterogeneous social network on basis of clustering algorithm |
CN104778205A (en) * | 2015-03-09 | 2015-07-15 | 浙江大学 | Heterogeneous information network-based mobile application ordering and clustering method |
-
2015
- 2015-06-09 CN CN201510312733.3A patent/CN104965869A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103425711A (en) * | 2012-05-25 | 2013-12-04 | 株式会社理光 | Object value aligning method based on multiple object instances |
CN103810288A (en) * | 2014-02-25 | 2014-05-21 | 西安电子科技大学 | Method for carrying out community detection on heterogeneous social network on basis of clustering algorithm |
CN104778205A (en) * | 2015-03-09 | 2015-07-15 | 浙江大学 | Heterogeneous information network-based mobile application ordering and clustering method |
Non-Patent Citations (1)
Title |
---|
YIZHOU SUN 等: "Ranking-Based Clustering of Heterogeneous Information Networks with Star Network Schema", 《PROCEEDINGS OF THE 15TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106934071A (en) * | 2017-04-27 | 2017-07-07 | 北京大学 | Recommendation method and device based on Heterogeneous Information network and Bayes's personalized ordering |
CN108776684A (en) * | 2018-05-25 | 2018-11-09 | 华东师范大学 | Optimization method, device, medium, equipment and the system of side right weight in knowledge mapping |
CN108776684B (en) * | 2018-05-25 | 2021-01-01 | 华东师范大学 | Optimization method, device, medium, equipment and system for edge weight in knowledge graph |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8442863B2 (en) | Real-time-ready behavioral targeting in a large-scale advertisement system | |
CN103984714B (en) | Ontology semantics-based supply and demand matching method for cloud manufacturing service | |
CN102591915B (en) | Recommending method based on label migration learning | |
JP2021166109A (en) | Fusion sorting model training method and device, search sorting method and device, electronic device, storage medium, and program | |
CN110674407A (en) | Hybrid recommendation method based on graph convolution neural network | |
CN103679462A (en) | Comment data processing method and device and searching method and system | |
CN105224959A (en) | The training method of order models and device | |
CN105308631A (en) | Predicting behavior using features derived from statistical information | |
CN102012936B (en) | Massive data aggregation method and system based on cloud computing platform | |
CN104391883B (en) | A kind of online advertisement audient's sort method based on transfer learning | |
CN103838756A (en) | Method and device for determining pushed information | |
CN105677857B (en) | method and device for accurately matching keywords with marketing landing pages | |
CN104111973A (en) | Scholar name duplication disambiguation method and system | |
CN103678418A (en) | Information processing method and equipment | |
CN104361102A (en) | Expert recommendation method and system based on group matching | |
CN101206674A (en) | Enhancement type related search system and method using commercial articles as medium | |
CN112950276B (en) | Seed population expansion method based on multi-order feature combination | |
CN106055661A (en) | Multi-interest resource recommendation method based on multi-Markov-chain model | |
CN104778205A (en) | Heterogeneous information network-based mobile application ordering and clustering method | |
CN103473128A (en) | Collaborative filtering method for mashup application recommendation | |
CN103678336A (en) | Method and device for identifying entity words | |
CN104268247A (en) | Master data imputation method based on fuzzy analytic hierarchy process | |
CN115098650A (en) | Comment information analysis method based on historical data model and related device | |
CN105159971A (en) | Cloud platform data retrieval method | |
CN110110213A (en) | Excavate method, apparatus, computer readable storage medium and the terminal device of user's occupation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20151007 |