CN105447117A - User clustering method and apparatus - Google Patents

User clustering method and apparatus Download PDF

Info

Publication number
CN105447117A
CN105447117A CN201510783263.9A CN201510783263A CN105447117A CN 105447117 A CN105447117 A CN 105447117A CN 201510783263 A CN201510783263 A CN 201510783263A CN 105447117 A CN105447117 A CN 105447117A
Authority
CN
China
Prior art keywords
classification
attribute
attached
dimensional data
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510783263.9A
Other languages
Chinese (zh)
Other versions
CN105447117B (en
Inventor
牛凯
杜帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201510783263.9A priority Critical patent/CN105447117B/en
Publication of CN105447117A publication Critical patent/CN105447117A/en
Application granted granted Critical
Publication of CN105447117B publication Critical patent/CN105447117B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification

Abstract

Embodiments of the present invention disclose a user clustering method and apparatus. The method comprises: receiving a clustering request, wherein the clustering request carries a category of to-be-collected user data, and collecting user data according to the clustering request; processing the user data, to obtain main properties and affiliated properties of each user data, and obtaining multi-dimensional data corresponding to each main property according to all the affiliated properties; obtaining relevance between each main property and all the affiliated properties according to multi-dimensional data corresponding to each main property; and performing fuzzy clustering according to the relevance between each main property and all affiliated properties, to obtain a clustering result. According to the method, analysis may be performed on user data from multiple dimensions, so that a clustering result meeting an actual situation can be obtained.

Description

A kind of method and apparatus of user clustering
Technical field
The present invention relates to the computer application field of data mining, particularly a kind of method and apparatus of user clustering.
Background technology
At present, the new data that human society produced in every day all increases rapidly with explosive manner, these mass datas of real-time analysis process, and excavates the problem that its internal relations person that is analysis decision pays special attention to.Such as, the development of China's information science is very rapid, the patent of scientific research project, the paper delivered and application is all difficult to counting, analyze these scientific research projects, relational network between paper and the knowledge data of patent, and the study hotspot of prediction this technical field following or focus, can help the Scientific research management department more effectively management of project implementation and examination & approval, the researchist for this field opens up new study hotspot direction.Social media field, the number that Adds User is growing with exchanging between user, analyzes friend relation, the community structure between user, recommends to throw in, analyze user behavior to orientation, and the management of zone user differentiated also has great significance.Commodity transaction field, no matter which kind of sales mode, every day all can produce a large amount of commodity transactions, and customer volume, trade company's amount and commodity amount all can reach necessarily even more than one hundred million rank, and classification kind is wherein all a lot, and the classification of user also has a lot.In the face of the data of the One's name is legion relevant to all types of user, kind complexity, only by the data of single kind, user is analyzed, carry out cluster and obviously do not meet actual conditions.
Existing method is setting data excacation stream, and this workflow comprises multiple parallel data processing task, data processing task, obtains corresponding result by mapping/concluding machine-processed executed in parallel.In prior art, be just limited to one dimension angle to the excavation cluster of data, the analysis of various dimensions multi-angle cannot be carried out data, with make to the analysis of data and understanding comprehensive not, affect Clustering Effect.
Summary of the invention
The object of the embodiment of the present invention is the method and apparatus providing a kind of user clustering, and the cluster result of user is tallied with the actual situation.
First aspect, the embodiment of the invention discloses a kind of method of user clustering, is applied to cluster server, comprises step:
Receive cluster request, gather user data according to described cluster request, the classification of the user data that will gather is carried in described cluster request;
According to the user data collected described in preset rules process, obtain the primary attribute of each user data and attached attribute, according to primary attribute and the attached attribute of each user data obtained, determine all attached attributes, according to all attached attributes, obtain the multi-dimensional data that each primary attribute is corresponding; Wherein, described primary attribute comprises user ID, and attached attribute comprises the relevant information of this user obtained from each user data; Described multi-dimensional data identifies the relation of having of this primary attribute and all attached attributes or nothing;
The multi-dimensional data corresponding according to each primary attribute, obtains the degree of correlation of each primary attribute and all attached attributes;
According to the degree of correlation of each primary attribute and all attached attributes, carry out fuzzy clustering, obtain cluster result,
Comprise: according to default classifying rules, all primary attributes are classified, obtain first distribution situation of each primary attribute in each classification, according to the degree of correlation and described first distribution situation of described each primary attribute and all attached attributes, determine second distribution situation of each attached attribute of described multi-dimensional data in each classification, when wherein classifying, ensure to there is at least one primary attribute in each classification;
According to the second described distribution situation, use default fuzzy clustering algorithm, carry out interative computation, obtain the cluster result of user.
Preferably, the described user data according to collecting described in preset rules process, obtains the primary attribute of each user data and attached attribute, comprising:
Word segmentation processing, filtering useless word and unallowable instruction digit process are carried out to the described user data collected;
Obtain a unique primary attribute and at least one attached attribute of each user data.
Preferably, described determine second distribution situation of each attached attribute of described multi-dimensional data in each classification after, also comprise:
According to second distribution situation of each attached attribute in each classification of described multi-dimensional data, determine that each attached attribute of described multi-dimensional data accounts for the weight of each classification described;
The second distribution situation described in described basis, uses default fuzzy clustering algorithm, carries out interative computation, obtains the cluster result of user, for:
Account for the weight of each classification described according to each attached attribute of described multi-dimensional data, use default fuzzy clustering algorithm, carry out interative computation, obtain the cluster result of user.
Preferably, the described each attached attribute according to described multi-dimensional data accounts for the weight of each classification described, uses default fuzzy clustering algorithm, carries out interative computation, obtains the cluster result of user, comprising:
S1: the weight accounting for each classification described according to each attached attribute of described multi-dimensional data, determines that each primary attribute of described multi-dimensional data is to the membership vector of each classification; Wherein, each primary attribute of described multi-dimensional data is determined by all attached attributes the membership vector of each classification;
S2: according to each primary attribute of described multi-dimensional data to the membership vector of each classification, determine the center vector of the cluster centre that each classification is current, the center vector of the cluster centre that each classification described is current be all primary attributes of existing in each classification to the mean value of such other degree of membership, described membership vector comprises each primary attribute of described multi-dimensional data to the degree of membership of each classification;
S3: the mould of the difference of the center vector of cluster centre that each classification relatively more described is current and the center vector of the previous cluster centre of each classification and the size setting threshold value;
S4: if comparative result is for being less than or equal to described setting threshold value, then judge cluster result convergence, terminate cluster process;
S5: if comparative result is for being greater than described setting threshold value, then judge that cluster result is not restrained, continue cluster process, by each primary attribute of described multi-dimensional data to the membership vector of each classification, the first distribution situation that described in each classification being defined as new round cluster process, each primary attribute of multi-dimensional data is new in each classification, according to the degree of correlation and described first distribution situation newly of described each primary attribute and all attached attributes, determine second distribution situation of each attached attribute of described multi-dimensional data in each classification, according to second distribution situation of each attached attribute in each classification of described multi-dimensional data, determine that each attached attribute of described multi-dimensional data accounts for the weight of each classification described, return step S1.
Preferably, described judgement cluster result convergence, also comprises after terminating cluster process:
By primary attribute described in current cluster process to the membership vector of each classification, be defined as the ownership probability of described primary attribute for each classification, according to the ownership probability of described primary attribute for each classification, sort in each classification.
Second aspect, the embodiment of the present invention additionally provides a kind of device of user clustering, is applied to cluster server, and described device comprises:
Cluster request receiving module: for receiving cluster request, gather user data according to described cluster request, the classification of the user data that will gather is carried in described cluster request;
Multi-dimensional data acquisition module: for according to the user data collected described in preset rules process, obtain the primary attribute of each user data and attached attribute, according to primary attribute and the attached attribute of each user data obtained, determine all attached attributes, according to all attached attributes, obtain the multi-dimensional data that each primary attribute is corresponding; Wherein, described primary attribute comprises user ID, and attached attribute comprises the relevant information of this user obtained from each user data; Described multi-dimensional data identifies the relation of having of this primary attribute and all attached attributes or nothing;
Degree of correlation acquisition module: for the multi-dimensional data corresponding according to each primary attribute, obtains the degree of correlation of each primary attribute and all attached attributes;
Fuzzy clustering module: for the degree of correlation according to each primary attribute and all attached attributes, carry out fuzzy clustering, obtains cluster result,
Described fuzzy clustering module comprises distribution situation determination submodule and cluster result obtains submodule,
Described distribution situation determination submodule specifically for: according to default classifying rules, all primary attributes are classified, obtain first distribution situation of each primary attribute in each classification, according to the degree of correlation and described first distribution situation of described each primary attribute and all attached attributes, determine second distribution situation of each attached attribute of described multi-dimensional data in each classification, when wherein classifying, ensure to there is at least one primary attribute in each classification;
Described cluster result obtain submodule specifically for: according to the second described distribution situation, use default fuzzy clustering algorithm, carry out interative computation, obtain the cluster result of user.
Preferably, described multi-dimensional data acquisition module is according to the user data collected described in preset rules process, when obtaining the primary attribute of each user data and attached attribute, word segmentation processing, filtering useless word and unallowable instruction digit process are carried out to the described user data collected; Unique primary attribute of each user data obtained and at least one attached attribute.
Preferably, described distribution situation determination submodule, after determining the distribution situation of each attached attribute of described multi-dimensional data in each classification, also comprises:
According to second distribution situation of each attached attribute in each classification of described multi-dimensional data, determine that each attached attribute of described multi-dimensional data accounts for the weight of each classification described;
The second distribution situation described in described basis, uses default fuzzy clustering algorithm, carries out interative computation, obtains the cluster result of user, for:
Account for the weight of each classification described according to each attached attribute of described multi-dimensional data, use default fuzzy clustering algorithm, carry out interative computation, obtain the cluster result of user.
Preferably, described cluster result obtains submodule and comprises: membership vector determination submodule, center vector determination submodule, comparison sub-module, the first decision sub-module and the second decision sub-module,
Described membership vector determination submodule: for accounting for the weight of each classification described according to each attached attribute of described multi-dimensional data, determine that each primary attribute of described multi-dimensional data is to the membership vector of each classification; Wherein, each primary attribute of described multi-dimensional data is determined by all attached attributes the membership vector of each classification;
Described center vector determination submodule: for according to each primary attribute of described multi-dimensional data to the membership vector of each classification, determine the center vector of the cluster centre that each classification is current, the center vector of the cluster centre that each classification described is current be all primary attributes of existing in each classification to the mean value of such other degree of membership, described membership vector comprises each primary attribute of described multi-dimensional data to the degree of membership of each classification;
Described comparison sub-module: the mould of the center vector of cluster centre current for each classification relatively more described and the difference of the center vector of the previous cluster centre of each classification and the size setting threshold value, if comparative result is for being less than or equal to described setting threshold value, then trigger described first decision sub-module, if comparative result is for being greater than described setting threshold value, then trigger described second decision sub-module
Described first decision sub-module: for judging that cluster result is restrained, terminates cluster process;
Described second decision sub-module: for judging that cluster result is not restrained, continue cluster process, by each primary attribute of described multi-dimensional data to the membership vector of each classification, the first distribution situation that described in each classification being defined as new round cluster process, each primary attribute of multi-dimensional data is new in each classification, according to the degree of correlation and described first distribution situation newly of described each primary attribute and all attached attributes, determine second distribution situation of each attached attribute of described multi-dimensional data in each classification, according to second distribution situation of each attached attribute in each classification of described multi-dimensional data, determine that each attached attribute of described multi-dimensional data accounts for the weight of each classification described, trigger described membership vector determination submodule.
Preferably, also order module is comprised:
Described order module specifically for: by primary attribute described in current cluster process to the membership vector of each classification, be defined as the ownership probability of described primary attribute for each classification, according to the ownership probability of described primary attribute for each classification, sort in each classification.
As seen from the above technical solutions, the embodiment of the invention discloses a kind of method and apparatus of user clustering, receive cluster request, gather user data according to this cluster request, the classification of the user data that will gather is carried in this cluster request; According to this user data collected of preset rules process, obtain the primary attribute of each user data and attached attribute, according to primary attribute and the attached attribute of each user data obtained, determine all attached attributes, according to all attached attributes, obtain the multi-dimensional data that each primary attribute is corresponding; Wherein, this primary attribute comprises user ID, and attached attribute comprises the relevant information of this user obtained from each user data; This multi-dimensional data identifies the relation of having of this primary attribute and all attached attributes or nothing; The multi-dimensional data corresponding according to each primary attribute, obtains the degree of correlation of each primary attribute and all attached attributes; According to the degree of correlation of each primary attribute and all attached attributes, carry out fuzzy clustering, obtain cluster result, comprise: according to default classifying rules, all primary attributes are classified, obtain first distribution situation of each primary attribute in each classification, according to the degree of correlation and this first distribution situation of this each primary attribute and all attached attributes, determine second distribution situation of each attached attribute of this multi-dimensional data in each classification, when wherein classifying, ensure to there is at least one primary attribute in each classification; According to this second distribution situation, use default fuzzy clustering algorithm, carry out interative computation, obtain the cluster result of user.
Visible, the primary attribute of each user data and attached attribute is collected in the embodiment of the present invention, then can various dimensions multi-angle to user carry out analysis describe, according to follow-up default fuzzy clustering algorithm, various dimensions multi-angle cluster is carried out to user, carry out iteration, avoid, from single dimension, cluster is carried out to user, obtain the cluster result tallied with the actual situation.Certainly, arbitrary product of the present invention is implemented or method must not necessarily need to reach above-described all advantages simultaneously.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the schematic flow sheet of a kind of method embodiments providing user clustering;
Fig. 2 is the structural representation of the device embodiments providing a kind of user clustering.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Embodiments provide a kind of method and apparatus of user clustering, to carry out the analysis of various dimensions multi-angle to user data, more comprehensive to the analysis of user data, obtain the cluster result tallied with the actual situation.
Below by specific embodiment, the present invention is described in detail.
The method of a kind of user clustering that the embodiment of the present invention provides, as shown in Figure 1, is applied to cluster server, can comprises the steps:
S101: receive cluster request, gather user data according to described cluster request, the classification of the user data that will gather is carried in described cluster request.
Certainly, it should be noted that, do not limit in the application to the instrument that data acquisition uses, any possible instrument carrying out data acquisition can be applied in the application.
When gathering user data according to described cluster request, reptile instrument can be used to gather user data, also can use open application programming interfaces API online acquisition user data, or reptile instrument is combined collection user data with open application programming interfaces API.
Concrete example as: cluster server receives a cluster request, the classification that the user data that will gather is carried in described cluster request is social media class user, be specially microblog users, then cluster server is the cluster request of microblog users according to the classification of the user data carried, reptile instrument is used to gather microblog users data, or use open application programming interfaces API online acquisition microblog users data, or reptile instrument is combined with the application programming interfaces API of opening and gathers microblog users data.Wherein, described microblog users data can comprise: the relevant information of the interest tags chosen during microblog users registration, the microblogging delivered, the comment of participation and the good friend of interaction.
Or suppose that the classification that the user data that will gather is carried in described cluster request is commodity transaction class user, be specially shopping user, then cluster server is the cluster request of shopping user according to the classification of the user data carried, use reptile instrument collection shopping user data, or use open application programming interfaces API online acquisition shopping user data, or reptile instrument is combined with the application programming interfaces API of opening and gathers user data of doing shopping.Wherein, described shopping user data can comprise: the information of the trade company of the commodity that user bought and kind, the commodity browsed, collect or paid close attention to and kind and concern or collection.
Or suppose that the classification that the user data that will gather is carried in described cluster request is sciemtifec and technical sphere expert user, then cluster server is the cluster request of sciemtifec and technical sphere expert user according to the classification of the user data carried, reptile instrument is used to gather sciemtifec and technical sphere expert user data, or use open application programming interfaces API online acquisition sciemtifec and technical sphere expert user data, or reptile instrument is combined with the application programming interfaces API of opening and gathers sciemtifec and technical sphere expert user data.Wherein, described sciemtifec and technical sphere expert user data can comprise: the information of the expert of the paper that described expert user was delivered, the investigation conferencing information participated in or scientific research project information, cooperation.
Certainly, it should be noted that, do not limit in the application to user data, any possible user data can be applied in the application.
S102: according to the user data collected described in preset rules process, obtain the primary attribute of each user data and attached attribute, according to primary attribute and the attached attribute of each user data obtained, determine all attached attributes, according to all attached attributes, obtain the multi-dimensional data that each primary attribute is corresponding; Wherein, described primary attribute comprises user ID, and attached attribute comprises the relevant information of this user obtained from each user data; Described multi-dimensional data identifies the relation of having of this primary attribute and all attached attributes or nothing.
Concrete, the described user data according to collecting described in preset rules process, obtains the primary attribute of each user data and attached attribute, can comprise:
Word segmentation processing, filtering useless word and unallowable instruction digit process are carried out to the described user data collected;
Obtain a unique primary attribute and at least one attached attribute of each user data.
Concrete for microblog users, microblog users A, when register account number, can choose interest tags according to the interest place of self.Collect these interest tags, the relevant information of the good friend of the microblogging that user A delivers, the comment participated in and interaction, wherein, the title A of microblog users can be confirmed as primary attribute, its pass through described in the interest determined of the data message collected can be confirmed as attached attribute, the data collected are carried out word segmentation processing, filtering useless word and unallowable instruction digit process, obtains can having A, B, C, D, E as the primary attribute of user data; Can as the attached attribute of user data have first, second, third, fourth, penta.Wherein, the primary attribute that first user data is corresponding is A and attached attribute is first, second; The primary attribute that second user data is corresponding is B and attached attribute is second, third; Primary attribute corresponding to third party data is C and attached attribute is first, the third; 4th primary attribute that user data is corresponding is D and attached attribute is fourth; 5th primary attribute that user data is corresponding is E and attached attribute is penta.After relevant information according to user data after treatment, can determine the multi-dimensional data that each primary attribute is corresponding, described multi-dimensional data identifies the relation of having of this primary attribute and all attached attributes or nothing, and namely described multi-dimensional data can be expressed as:
The multi-dimensional data that primary attribute A is corresponding comprise attached attribute first, second, third, fourth, penta, wherein, primary attribute A has attached attribute first, second, without attached attribute third, fourth, penta;
The multi-dimensional data that primary attribute B is corresponding comprise attached attribute first, second, third, fourth, penta, wherein, primary attribute B has attached attribute second, the third, without attached attribute first, fourth, penta;
The multi-dimensional data that primary attribute C is corresponding comprise attached attribute first, second, third, fourth, penta, wherein, primary attribute C has attached attribute first, the third, without attached attribute second, fourth, penta;
The multi-dimensional data that primary attribute D is corresponding comprise attached attribute first, second, third, fourth, penta, wherein, primary attribute D has attached attribute fourth, without attached attribute first, second, third, penta;
The multi-dimensional data that primary attribute E is corresponding comprise attached attribute first, second, third, fourth, penta, wherein, primary attribute E has attached attribute penta, without attached attribute first, second, third, fourth;
S103: the multi-dimensional data corresponding according to each primary attribute, obtains the degree of correlation of each primary attribute and all attached attributes.
Concrete, according to step S102, the degree of correlation of described each primary attribute and all attached attributes can represent with adjacency matrix W, is expressed as
W = 1 1 0 0 0 0 1 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 1
Wherein, the degree of correlation arranging the attached attribute of primary attribute and its existence is 1, and the correlativity arranging primary attribute non-existent attached attribute with it is 0, the first row represent primary attribute A respectively with attached attribute first, second, third, fourth, penta correlativity; Second row represent primary attribute B respectively with attached attribute first, second, third, fourth, penta correlativity; The third line represent primary attribute C respectively with attached attribute first, second, third, fourth, penta correlativity; Fourth line represent primary attribute D respectively with attached attribute first, second, third, fourth, penta correlativity; Fifth line represent primary attribute E respectively with attached attribute first, second, third, fourth, penta correlativity.
S104: according to the degree of correlation of each primary attribute and all attached attributes, carry out fuzzy clustering, obtains cluster result, comprising:
According to default classifying rules, all primary attributes are classified, obtain first distribution situation of each primary attribute in each classification, according to the degree of correlation and described first distribution situation of described each primary attribute and all attached attributes, determine second distribution situation of each attached attribute of described multi-dimensional data in each classification, when wherein classifying, ensure to there is at least one primary attribute in each classification;
According to the second described distribution situation, use default fuzzy clustering algorithm, carry out interative computation, obtain the cluster result of user.
Concrete, described determine second distribution situation of each attached attribute of described multi-dimensional data in each classification after, can also comprise:
According to second distribution situation of each attached attribute in each classification of described multi-dimensional data, determine that each attached attribute of described multi-dimensional data accounts for the weight of each classification described;
The second distribution situation described in described basis, uses default fuzzy clustering algorithm, carries out interative computation, obtains the cluster result of user, for:
Account for the weight of each classification described according to each attached attribute of described multi-dimensional data, use default fuzzy clustering algorithm, carry out interative computation, obtain the cluster result of user.
Concrete example as: according to step S103, according to preset classifying rules all primary attributes are classified, can be random assortment, also can be average classification.
Described primary attribute A, B, C, D, E are divided into two classes at random, are respectively d 1and d 2, wherein d 1comprise A and B, d 2comprise C, D, E, and then can obtain first distribution situation of each primary attribute in each classification, described first distribution situation can represent with the first distribution matrix, can be expressed as: X A → = 1 0 , X B → = 1 0 , X C → = 0 1 , X D → = 0 1 , X E → = 0 1 ,
Wherein with X A → = 1 0 For example, represent that primary attribute A is respectively at d 1and d 2in the first distribution situation, A has been assigned randomly to d 1in, i.e. A and d 1intersect, with d 2non-intersect, determine that A belongs to d 1, be then expressed as X A → = 1 0 , Other primary attribute is at d 1and d 2in first distribution situation statement similar, no longer repeat at this.
According to the degree of correlation and described first distribution situation of described each primary attribute and all attached attributes, determine second distribution situation of each attached attribute of described multi-dimensional data in each classification, described second distribution situation is d 1in comprise attached attribute first, second, the third, d 2in comprise attached attribute first, the third, fourth, penta.
According to described second distribution situation, determine that each attached attribute of described multi-dimensional data accounts for the weight of each classification described, can represent with the second distribution matrix, can be expressed as:
With for example, represent that attached attribute first accounts for d 1and d 2weight, " 1 " above represents that first is at d 1weight, " 1 " below represents that first is at d 2weight.
Then account for the weight of each classification described according to each attached attribute of described multi-dimensional data, use default fuzzy clustering algorithm, carry out interative computation, obtain the cluster result of user.
Concrete, each attached attribute of described multi-dimensional data accounts for the weight of each classification described, uses default fuzzy clustering algorithm, carries out interative computation, obtains the cluster result of user, can comprise:
S1: the weight accounting for each classification described according to each attached attribute of described multi-dimensional data, determines that each primary attribute of described multi-dimensional data is to the membership vector of each classification; Wherein, each primary attribute of described multi-dimensional data is determined by all attached attributes the membership vector of each classification;
S2: according to each primary attribute of described multi-dimensional data to the membership vector of each classification, determine the center vector of the cluster centre that each classification is current, the center vector of the cluster centre that each classification described is current be all primary attributes of existing in each classification to the mean value of such other degree of membership, described membership vector comprises each primary attribute of described multi-dimensional data to the degree of membership of each classification;
S3: the mould of the difference of the center vector of cluster centre that each classification relatively more described is current and the center vector of the previous cluster centre of each classification and the size setting threshold value;
S4: if comparative result is for being less than or equal to described setting threshold value, then judge cluster result convergence, terminate cluster process;
S5: if comparative result is for being greater than described setting threshold value, then judge that cluster result is not restrained, continue cluster process, by each primary attribute of described multi-dimensional data to the membership vector of each classification, the first distribution situation that described in each classification being defined as new round cluster process, each primary attribute of multi-dimensional data is new in each classification, according to the degree of correlation and described first distribution situation newly of described each primary attribute and all attached attributes, determine second distribution situation of each attached attribute of described multi-dimensional data in each classification, according to second distribution situation of each attached attribute in each classification of described multi-dimensional data, determine that each attached attribute of described multi-dimensional data accounts for the weight of each classification described, return step S1.
Concrete example as, use default fuzzy clustering algorithm, enter interative computation:
According to above-mentioned steps, suppose that primary attribute comprises X 1, X 2, X 3, X 4and X 5, attached attribute comprises Y 1, Y 2, Y 3, Y 4, Y 5, wherein, X 1comprise attached attribute Y 1and Y 2, X 2comprise attached attribute Y 2and Y 3, X 3comprise attached attribute Y 1and Y 3, X 4comprise attached attribute Y 4, X 5comprise attached attribute Y 5,
By X 1, X 2assign to d 1in, X 3, X 4and X 5assign to d 2in, then have X 1 → = 1 0 , X 2 → = 1 0 , X 3 → = 0 1 , X 4 → = 0 1 , X 5 → = 0 1 , According to the degree of correlation and described first distribution situation of described each primary attribute and all attached attributes, determine second distribution situation of each attached attribute of described multi-dimensional data in each classification, determine d 1in comprise attached attribute Y 1, Y 2, Y 3, d 2in comprise attached attribute Y 1, Y 4, Y 5.According to second distribution situation of each attached attribute in each classification of described multi-dimensional data, determine that each attached attribute of described multi-dimensional data accounts for the weight of each classification described, then have Y 1 → = 1 1 , Y 2 → = 2 0 , Y 3 → = 1 1 , Y 4 → = 0 1 , Y 5 → = 0 1 . Determine that each primary attribute of described multi-dimensional data is to the membership vector of each classification:
By the membership vector normalization of described each primary attribute to each classification, obtain
According to required each primary attribute to the membership vector of each classification, determine the center vector of the cluster centre that each classification is current: because this iteration is first time iteration, the center vector of the previous cluster centre of its each classification is assumed to be the mould calculating the difference of the current center vector of cluster centre of each classification described and the center vector of the previous cluster centre of each classification is: | P 1 → - P 0 → | = ( 2 5 - 1 ) 2 + ( 3 5 - 0 ) 2 = 3 2 5 , Suppose that setting threshold value is then have judge that cluster result is not restrained, continue cluster process, therefore carry out an iteration again;
Carry out second time iteration, the first distribution situation that described in each classification membership vector of described each primary attribute to each classification being defined as new round cluster process, each primary attribute of multi-dimensional data is new in each classification, according to the degree of correlation and described first distribution situation newly of described each primary attribute and all attached attributes, determine second distribution situation of each attached attribute of described multi-dimensional data in each classification, and then determine that each attached attribute of described multi-dimensional data accounts for the weight of each classification described
Namely have
Now, each primary attribute of described multi-dimensional data to the membership vector of each classification is:
Be normalized to:
According to required each primary attribute to the membership vector of each classification, determine the center vector of the cluster centre that each classification is current: the center vector of the previous cluster centre of each classification: the mould calculating the difference of the current center vector of cluster centre of each classification described and the center vector of the previous cluster centre of each classification is: setting threshold value is then have then judge cluster result convergence, terminate cluster process.
Above-described embodiment is only for example, does not limit the concrete enforcement of the method for the user clustering in the application.In actual applications, more accurate in order to ensure the result of user clustering, the smaller the better to the setting of this threshold value.Same, in order to make the result of user clustering more accurate, the number of times carrying out cluster is also The more the better.Certain consideration problem such as computing time and cost consumption in actual applications, after the mould in the difference calculating the current center vector of cluster centre of each classification described and the center vector of the previous cluster centre of each classification is less than and sets threshold value, carry out the iteration of preset times again, ensure that result is in setting threshold range, cluster process can be stopped, such as carry out 3 ~ 5 iteration again, ensure that result is in setting threshold range, can stop cluster process.
It is emphasized that the user data of asking for user clustering request in this programme is more, user data type is more under complicated situation, the advantage embodied can be more obvious.
Concrete, described judgement cluster result convergence, can also comprise after terminating cluster process:
By the primary attribute of multi-dimensional data described in current cluster process to the membership vector of each classification, be defined as the ownership probability of described primary attribute for each classification, according to the ownership probability of described primary attribute for each classification, sort in each classification.
According to above-mentioned steps, the membership vector of primary attribute to each classification of described multi-dimensional data can be determined,
X can be determined 1have belong to d 1, belong to d 2, i.e. X 1belong to d 1ownership probability be x 1belong to d 2ownership probability be
X 2have belong to d 1, belong to d 2, i.e. X 2belong to d 1ownership probability be x 2belong to d 2ownership probability be
X 3have belong to d 1, belong to d 2, i.e. X 3belong to d 1ownership probability be x 3belong to d 2ownership probability be
X 40 is had to belong to d 1, 1 belongs to d 2, i.e. X 4belong to d 1ownership probability be 0, X 4belong to d 2ownership probability be 1;
X 50 is had to belong to d 1, 1 belongs to d 2, i.e. X 5belong to d 1ownership probability be 0, X 5belong to d 2ownership probability be 1.
According to the ownership probability of described primary attribute for each classification, sort in each classification, at d 1middle ownership probability puts in order from high to low as X 1, X 2; At d 2middle ownership probability puts in order from high to low as X 4, X 5, X 3.
After sorting, according to the height of the ownership probability of each primary attribute in classification, estimation setting can also be carried out to each primary attribute to such other disturbance degree.Wherein, ownership probability is higher, and each primary attribute is higher to such other disturbance degree.According to described disturbance degree, can follow-up work carried out, as in social media field, can according to disturbance degree, to the better more fully commending friends of other users; In commodity transaction field, to the better more fully Recommendations of shopping user; At sciemtifec and technical sphere, for user better more fully recommends the expert in this field.
The application embodiment of the present invention, can carry out the analysis of various dimensions multi-angle to user data, more comprehensive to the analysis of user data, application fuzzy clustering algorithm, can carry out cluster more accurately to user, makes it reduced by the impact of first time classification.
Correspond to said method embodiment, the device of a kind of user clustering that the embodiment of the present invention provides, as shown in Figure 2, be applied to cluster server, described device can comprise: cluster request receiving module 201, multi-dimensional data acquisition module 202, degree of correlation acquisition module 203 and fuzzy clustering module 204
Fuzzy clustering module 201: for receiving cluster request, gather user data according to described cluster request, the classification of the user data that will gather is carried in described cluster request.
Multi-dimensional data acquisition module 202: for according to the user data collected described in preset rules process, obtain the primary attribute of each user data and attached attribute, according to primary attribute and the attached attribute of each user data obtained, determine all attached attributes, according to all attached attributes, obtain the multi-dimensional data that each primary attribute is corresponding; Wherein, described primary attribute comprises user ID, and attached attribute comprises the relevant information of this user obtained from each user data; Described multi-dimensional data identifies the relation of having of this primary attribute and all attached attributes or nothing.
Concrete, described multi-dimensional data acquisition module is according to the user data collected described in preset rules process, when obtaining the primary attribute of each user data and attached attribute, word segmentation processing, filtering useless word and unallowable instruction digit process are carried out to the described user data collected; Unique primary attribute of each user data obtained and at least one attached attribute.
Degree of correlation acquisition module 203: for the multi-dimensional data corresponding according to each primary attribute, obtains the degree of correlation of each primary attribute and all attached attributes.
Fuzzy clustering module 204: for the degree of correlation according to each primary attribute and all attached attributes, carry out fuzzy clustering, obtains cluster result,
Described fuzzy clustering module 204 comprises distribution situation determination submodule 2041 and cluster result obtains submodule 2042, (not marking in figure)
Described distribution situation determination submodule 2041 specifically for: according to default classifying rules, all primary attributes are classified, obtain first distribution situation of each primary attribute in each classification, according to the degree of correlation and described first distribution situation of described each primary attribute and all attached attributes, determine second distribution situation of each attached attribute of described multi-dimensional data in each classification, when wherein classifying, ensure to there is at least one primary attribute in each classification;
Described cluster result obtain submodule 2042 specifically for: according to the second described distribution situation, use default fuzzy clustering algorithm, carry out interative computation, obtain the cluster result of user.
Concrete, described distribution situation determination submodule 2041, after determining the distribution situation of each attached attribute of described multi-dimensional data in each classification, can also comprise:
According to second distribution situation of each attached attribute in each classification of described multi-dimensional data, determine that each attached attribute of described multi-dimensional data accounts for the weight of each classification described;
The second distribution situation described in described basis, uses default fuzzy clustering algorithm, carries out interative computation, obtains the cluster result of user, Ke Yiwei:
Account for the weight of each classification described according to each attached attribute of described multi-dimensional data, use default fuzzy clustering algorithm, carry out interative computation, obtain the cluster result of user.
Concrete, described cluster result obtains submodule 2042 and can comprise: membership vector determination submodule, center vector determination submodule, comparison sub-module, the first decision sub-module and the second decision sub-module, (not marking in figure)
Described membership vector determination submodule: for accounting for the weight of each classification described according to each attached attribute of described multi-dimensional data, determine that each primary attribute of described multi-dimensional data is to the membership vector of each classification; Wherein, each primary attribute of described multi-dimensional data is determined by all attached attributes the membership vector of each classification;
Described center vector determination submodule: for according to each primary attribute of described multi-dimensional data to the membership vector of each classification, determine the center vector of the cluster centre that each classification is current, the center vector of the cluster centre that each classification described is current be all primary attributes of existing in each classification to the mean value of such other degree of membership, described membership vector comprises each primary attribute of described multi-dimensional data to the degree of membership of each classification;
Described comparison sub-module: the mould of the center vector of cluster centre current for each classification relatively more described and the difference of the center vector of the previous cluster centre of each classification and the size setting threshold value, if comparative result is for being less than or equal to described setting threshold value, then trigger described first decision sub-module, if comparative result is for being greater than described setting threshold value, then trigger described second decision sub-module
Described first decision sub-module: for judging that cluster result is restrained, terminates cluster process;
Described second decision sub-module: for judging that cluster result is not restrained, continue cluster process, by each primary attribute of described multi-dimensional data to the membership vector of each classification, the first distribution situation that described in each classification being defined as new round cluster process, each primary attribute of multi-dimensional data is new in each classification, according to the degree of correlation and described first distribution situation newly of described each primary attribute and all attached attributes, determine second distribution situation of each attached attribute of described multi-dimensional data in each classification, according to second distribution situation of each attached attribute in each classification of described multi-dimensional data, determine that each attached attribute of described multi-dimensional data accounts for the weight of each classification described, trigger described membership vector determination submodule.
Concrete, order module (not marking in figure) can also be comprised:
Described order module specifically for: by primary attribute described in current cluster process to the membership vector of each classification, be defined as the ownership probability of described primary attribute for each classification, according to the ownership probability of described primary attribute for each classification, sort in each classification.
The application embodiment of the present invention, can carry out the analysis of various dimensions multi-angle to user data, more comprehensive to the analysis of user data, application fuzzy clustering algorithm, can carry out cluster more accurately to user, makes it reduced by the impact of first time classification.
For device embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
It should be noted that, in this article, the such as relational terms of first and second grades and so on is only used for an entity or operation to separate with another entity or operational zone, and not necessarily requires or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or equipment.When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment comprising described key element and also there is other identical element.
One of ordinary skill in the art will appreciate that all or part of step realized in said method embodiment is that the hardware that can carry out instruction relevant by program has come, described program can be stored in computer read/write memory medium, here the alleged storage medium obtained, as: ROM/RAM, magnetic disc, CD etc.
The foregoing is only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention.All any amendments done within the spirit and principles in the present invention, equivalent replacement, improvement etc., be all included in protection scope of the present invention.

Claims (10)

1. a method for user clustering, is characterized in that, is applied to cluster server, and described method comprises step:
Receive cluster request, gather user data according to described cluster request, the classification of the user data that will gather is carried in described cluster request;
According to the user data collected described in preset rules process, obtain the primary attribute of each user data and attached attribute, according to primary attribute and the attached attribute of each user data obtained, determine all attached attributes, according to all attached attributes, obtain the multi-dimensional data that each primary attribute is corresponding; Wherein, described primary attribute comprises user ID, and attached attribute comprises the relevant information of this user obtained from each user data; Described multi-dimensional data identifies the relation of having of this primary attribute and all attached attributes or nothing;
The multi-dimensional data corresponding according to each primary attribute, obtains the degree of correlation of each primary attribute and all attached attributes;
According to the degree of correlation of each primary attribute and all attached attributes, carry out fuzzy clustering, obtain cluster result,
Comprise: according to default classifying rules, all primary attributes are classified, obtain first distribution situation of each primary attribute in each classification, according to the degree of correlation and described first distribution situation of described each primary attribute and all attached attributes, determine second distribution situation of each attached attribute of described multi-dimensional data in each classification, when wherein classifying, ensure to there is at least one primary attribute in each classification;
According to the second described distribution situation, use default fuzzy clustering algorithm, carry out interative computation, obtain the cluster result of user.
2. method according to claim 1, is characterized in that, the described user data according to collecting described in preset rules process, obtains the primary attribute of each user data and attached attribute, comprising:
Word segmentation processing, filtering useless word and unallowable instruction digit process are carried out to the described user data collected;
Obtain a unique primary attribute and at least one attached attribute of each user data.
3. method according to claim 1, is characterized in that, described determine second distribution situation of each attached attribute of described multi-dimensional data in each classification after, also comprise:
According to second distribution situation of each attached attribute in each classification of described multi-dimensional data, determine that each attached attribute of described multi-dimensional data accounts for the weight of each classification described;
The second distribution situation described in described basis, uses default fuzzy clustering algorithm, carries out interative computation, obtains the cluster result of user, for:
Account for the weight of each classification described according to each attached attribute of described multi-dimensional data, use default fuzzy clustering algorithm, carry out interative computation, obtain the cluster result of user.
4. method according to claim 3, is characterized in that, the described each attached attribute according to described multi-dimensional data accounts for the weight of each classification described, uses default fuzzy clustering algorithm, carries out interative computation, obtains the cluster result of user, comprising:
S1: the weight accounting for each classification described according to each attached attribute of described multi-dimensional data, determines that each primary attribute of described multi-dimensional data is to the membership vector of each classification; Wherein, each primary attribute of described multi-dimensional data is determined by all attached attributes the membership vector of each classification;
S2: according to each primary attribute of described multi-dimensional data to the membership vector of each classification, determine the center vector of the cluster centre that each classification is current, the center vector of the cluster centre that each classification described is current be all primary attributes of existing in each classification to the mean value of such other degree of membership, described membership vector comprises each primary attribute of described multi-dimensional data to the degree of membership of each classification;
S3: the mould of the difference of the center vector of cluster centre that each classification relatively more described is current and the center vector of the previous cluster centre of each classification and the size setting threshold value;
S4: if comparative result is for being less than or equal to described setting threshold value, then judge cluster result convergence, terminate cluster process;
S5: if comparative result is for being greater than described setting threshold value, then judge that cluster result is not restrained, continue cluster process, by each primary attribute of described multi-dimensional data to the membership vector of each classification, the first distribution situation that described in each classification being defined as new round cluster process, each primary attribute of multi-dimensional data is new in each classification, according to the degree of correlation and described first distribution situation newly of described each primary attribute and all attached attributes, determine second distribution situation of each attached attribute of described multi-dimensional data in each classification, according to second distribution situation of each attached attribute in each classification of described multi-dimensional data, determine that each attached attribute of described multi-dimensional data accounts for the weight of each classification described, return step S1.
5. method according to claim 4, is characterized in that, described judgement cluster result convergence, also comprises after terminating cluster process:
By primary attribute described in current cluster process to the membership vector of each classification, be defined as the ownership probability of described primary attribute for each classification, according to the ownership probability of described primary attribute for each classification, sort in each classification.
6. a device for user clustering, is characterized in that, is applied to cluster server, and described device comprises:
Cluster request receiving module: for receiving cluster request, gather user data according to described cluster request, the classification of the user data that will gather is carried in described cluster request;
Multi-dimensional data acquisition module: for according to the user data collected described in preset rules process, obtain the primary attribute of each user data and attached attribute, according to primary attribute and the attached attribute of each user data obtained, determine all attached attributes, according to all attached attributes, obtain the multi-dimensional data that each primary attribute is corresponding; Wherein, described primary attribute comprises user ID, and attached attribute comprises the relevant information of this user obtained from each user data; Described multi-dimensional data identifies the relation of having of this primary attribute and all attached attributes or nothing;
Degree of correlation acquisition module: for the multi-dimensional data corresponding according to each primary attribute, obtains the degree of correlation of each primary attribute and all attached attributes;
Fuzzy clustering module: for the degree of correlation according to each primary attribute and all attached attributes, carry out fuzzy clustering, obtains cluster result,
Described fuzzy clustering module comprises distribution situation determination submodule and cluster result obtains submodule,
Described distribution situation determination submodule specifically for: according to default classifying rules, all primary attributes are classified, obtain first distribution situation of each primary attribute in each classification, according to the degree of correlation and described first distribution situation of described each primary attribute and all attached attributes, determine second distribution situation of each attached attribute of described multi-dimensional data in each classification, when wherein classifying, ensure to there is at least one primary attribute in each classification;
Described cluster result obtain submodule specifically for: according to the second described distribution situation, use default fuzzy clustering algorithm, carry out interative computation, obtain the cluster result of user.
7. device according to claim 6, it is characterized in that, described multi-dimensional data acquisition module is according to the user data collected described in preset rules process, when obtaining the primary attribute of each user data and attached attribute, word segmentation processing, filtering useless word and unallowable instruction digit process are carried out to the described user data collected; Unique primary attribute of each user data obtained and at least one attached attribute.
8. device according to claim 6, is characterized in that, described distribution situation determination submodule, after determining the distribution situation of each attached attribute of described multi-dimensional data in each classification, also comprises:
According to second distribution situation of each attached attribute in each classification of described multi-dimensional data, determine that each attached attribute of described multi-dimensional data accounts for the weight of each classification described;
The second distribution situation described in described basis, uses default fuzzy clustering algorithm, carries out interative computation, obtains the cluster result of user, for:
Account for the weight of each classification described according to each attached attribute of described multi-dimensional data, use default fuzzy clustering algorithm, carry out interative computation, obtain the cluster result of user.
9. device according to claim 8, is characterized in that, described cluster result obtains submodule and comprises: membership vector determination submodule, center vector determination submodule, comparison sub-module, the first decision sub-module and the second decision sub-module,
Described membership vector determination submodule: for accounting for the weight of each classification described according to each attached attribute of described multi-dimensional data, determine that each primary attribute of described multi-dimensional data is to the membership vector of each classification; Wherein, each primary attribute of described multi-dimensional data is determined by all attached attributes the membership vector of each classification;
Described center vector determination submodule: for according to each primary attribute of described multi-dimensional data to the membership vector of each classification, determine the center vector of the cluster centre that each classification is current, the center vector of the cluster centre that each classification described is current be all primary attributes of existing in each classification to the mean value of such other degree of membership, described membership vector comprises each primary attribute of described multi-dimensional data to the degree of membership of each classification;
Described comparison sub-module: the mould of the center vector of cluster centre current for each classification relatively more described and the difference of the center vector of the previous cluster centre of each classification and the size setting threshold value, if comparative result is for being less than or equal to described setting threshold value, then trigger described first decision sub-module, if comparative result is for being greater than described setting threshold value, then trigger described second decision sub-module
Described first decision sub-module: for judging that cluster result is restrained, terminates cluster process;
Described second decision sub-module: for judging that cluster result is not restrained, continue cluster process, by each primary attribute of described multi-dimensional data to the membership vector of each classification, the first distribution situation that described in each classification being defined as new round cluster process, each primary attribute of multi-dimensional data is new in each classification, according to the degree of correlation and described first distribution situation newly of described each primary attribute and all attached attributes, determine second distribution situation of each attached attribute of described multi-dimensional data in each classification, according to second distribution situation of each attached attribute in each classification of described multi-dimensional data, determine that each attached attribute of described multi-dimensional data accounts for the weight of each classification described, trigger described membership vector determination submodule.
10. device according to claim 6, is characterized in that, also comprises order module:
Described order module specifically for: by primary attribute described in current cluster process to the membership vector of each classification, be defined as the ownership probability of described primary attribute for each classification, according to the ownership probability of described primary attribute for each classification, sort in each classification.
CN201510783263.9A 2015-11-16 2015-11-16 A kind of method and apparatus of user's cluster Active CN105447117B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510783263.9A CN105447117B (en) 2015-11-16 2015-11-16 A kind of method and apparatus of user's cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510783263.9A CN105447117B (en) 2015-11-16 2015-11-16 A kind of method and apparatus of user's cluster

Publications (2)

Publication Number Publication Date
CN105447117A true CN105447117A (en) 2016-03-30
CN105447117B CN105447117B (en) 2019-03-26

Family

ID=55557295

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510783263.9A Active CN105447117B (en) 2015-11-16 2015-11-16 A kind of method and apparatus of user's cluster

Country Status (1)

Country Link
CN (1) CN105447117B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463564A (en) * 2016-06-02 2017-12-12 华为技术有限公司 The characteristic analysis method and device of data in server
CN110610200A (en) * 2019-08-27 2019-12-24 浙江大搜车软件技术有限公司 Vehicle and merchant classification method and device, computer equipment and storage medium
CN110648195A (en) * 2019-08-28 2020-01-03 苏宁云计算有限公司 User identification method and device and computer equipment
CN111444933A (en) * 2019-11-26 2020-07-24 北京邮电大学 Object classification method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750647A (en) * 2012-06-29 2012-10-24 南京大学 Merchant recommendation method based on transaction network
CN103810288A (en) * 2014-02-25 2014-05-21 西安电子科技大学 Method for carrying out community detection on heterogeneous social network on basis of clustering algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750647A (en) * 2012-06-29 2012-10-24 南京大学 Merchant recommendation method based on transaction network
CN103810288A (en) * 2014-02-25 2014-05-21 西安电子科技大学 Method for carrying out community detection on heterogeneous social network on basis of clustering algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHUAI DU 等: "Community Detection Analysis of Heterogeneous Network", 《2015 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463564A (en) * 2016-06-02 2017-12-12 华为技术有限公司 The characteristic analysis method and device of data in server
CN110610200A (en) * 2019-08-27 2019-12-24 浙江大搜车软件技术有限公司 Vehicle and merchant classification method and device, computer equipment and storage medium
CN110648195A (en) * 2019-08-28 2020-01-03 苏宁云计算有限公司 User identification method and device and computer equipment
WO2021036453A1 (en) * 2019-08-28 2021-03-04 苏宁云计算有限公司 Method and device for user identification, and computer device
CN110648195B (en) * 2019-08-28 2022-02-25 苏宁云计算有限公司 User identification method and device and computer equipment
CN111444933A (en) * 2019-11-26 2020-07-24 北京邮电大学 Object classification method and device
CN111444933B (en) * 2019-11-26 2023-10-10 北京邮电大学 Object classification method and device

Also Published As

Publication number Publication date
CN105447117B (en) 2019-03-26

Similar Documents

Publication Publication Date Title
Matsunaga et al. Exploring graph neural networks for stock market predictions with rolling window analysis
Yan et al. Measuring technological distance for patent mapping
US20200192894A1 (en) System and method for using data incident based modeling and prediction
Yue et al. Bitextract: Interactive visualization for extracting bitcoin exchange intelligence
Sabau Survey of clustering based financial fraud detection research
Shen et al. A pricing model for big personal data
WO2021254027A1 (en) Method and apparatus for identifying suspicious community, and storage medium and computer device
Zhang et al. A system for tender price evaluation of construction project based on big data
CN106845846A (en) Big data asset evaluation method
CN105447117A (en) User clustering method and apparatus
Zhang et al. Reputationpro: The efficient approaches to contextual transaction trust computation in e-commerce environments
UnnisaBegum et al. Data mining techniques for big data
Hu Predicting and improving invoice-to-cash collection through machine learning
CN112419030B (en) Method, system and equipment for evaluating financial fraud risk
KR101259417B1 (en) Hybrid type method and system for extracting a emerging technologies using collective intelligence
Xia et al. PE‐EDD: An efficient peer‐effect‐based financial fraud detection approach in publicly traded China firms
Zhao et al. Detecting fake reviews via dynamic multimode network
Sharawi et al. Utilization of data visualization for knowledge discovery in modern logistic service companies
Gamidullaeva et al. Study of regional innovation ecosystem based on the big data intellectual analysis
Knyazeva et al. A graph-based data mining approach to preventing financial fraud: a case study
Hiziroglu et al. Customer portfolio analysis: Crisp classification versus fuzzy classification–Based on the supermarket industry
Kasinadh et al. Building fuzzy OLAP using multi-attribute summarization
Sun et al. Reconfiguring star inventors with commercialization: a case of the graphene sector
Yan et al. Research on the application of data mining technology in insurance informatization
DE LA TORRE From Econophysics to Networks: Structure of the Large-Scale Estonian Network of Payments

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant