CN111597220A - Data mining method and device - Google Patents

Data mining method and device Download PDF

Info

Publication number
CN111597220A
CN111597220A CN201910128629.7A CN201910128629A CN111597220A CN 111597220 A CN111597220 A CN 111597220A CN 201910128629 A CN201910128629 A CN 201910128629A CN 111597220 A CN111597220 A CN 111597220A
Authority
CN
China
Prior art keywords
trust
users
user
category
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910128629.7A
Other languages
Chinese (zh)
Other versions
CN111597220B (en
Inventor
张文翔
郑瑞峰
于均均
蒋龙龙
刘珂
牛慧倩
曹逾
王成栋
刘蕾
袁蒙蒙
董哲
刘广鑫
周子雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Wodong Tianjun Information Technology Co Ltd
Priority to CN201910128629.7A priority Critical patent/CN111597220B/en
Publication of CN111597220A publication Critical patent/CN111597220A/en
Application granted granted Critical
Publication of CN111597220B publication Critical patent/CN111597220B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a data mining method and device, and relates to the field of computers. According to the method, the users generating trust relationships based on the characteristics of the objects are mined according to the user generation contents, and the user trust networks at characteristic levels are constructed, so that various hidden user trust relationships are mined, and the problem of data sparsity existing in the user trust networks is solved to a certain extent.

Description

Data mining method and device
Technical Field
The present disclosure relates to the field of computers, and in particular, to a data mining method and apparatus.
Background
A correlation technique utilizes an evaluation value matrix of users and objects formed by evaluation values of the users and the objects to mine trust relationships among the users and construct a user trust network. For example, an evaluation value vector 1 of each object corresponding to the user 1 and an evaluation value vector 2 of each object corresponding to the user 2 are acquired from the matrix, and the degrees of trust of the user 1 and the user 2 are determined by the similarity between the evaluation value vector 1 and the evaluation value vector 2, and users having degrees of trust greater than a certain degree can be considered to have a trust relationship. Other users having a trust relationship with a certain user constitute the user's trust network.
Disclosure of Invention
The inventor finds that the user trust network constructed by the related art is object-level and has the problem of data sparseness. For example, the related art can mine a comparatively explicit trust relationship existing between different users having approximate evaluation values for many identical objects, but a user having no approximate evaluation value for the identical object or a user having approximate evaluation values for a small number of identical objects may have a hidden trust relationship without fail.
According to the method, the users generating trust relationships based on the characteristics of the objects are mined according to the user generation contents, and the user trust networks at characteristic levels are constructed, so that various hidden user trust relationships are mined, and the problem of data sparsity existing in the user trust networks is solved to a certain extent.
Some embodiments of the present disclosure provide a data mining method, including:
acquiring user generated content of each user;
extracting a plurality of features of an object from user-generated content of respective users;
determining all users generating trust relations based on any one of the extracted multiple characteristics according to the user generated content of each user;
and forming a user trust network of the characteristic level corresponding to any one characteristic by all users generating trust relations based on any one characteristic.
In some embodiments, from the user generated content of each user, mining a tendency vector of each object in the preset object set on any one feature of the user;
determining a trust metric value of a direct trust relationship between any two users on any one feature according to the tendency vectors of the any two users on the any one feature;
the arbitrary two users with the trust metric value of the direct trust relationship on the arbitrary one feature larger than the preset value are determined as the users generating the direct trust relationship based on the arbitrary one feature;
and forming a user trust network of the characteristic level corresponding to any one characteristic by all users generating the direct trust relationship based on the any one characteristic.
In some embodiments, the determining the trust metric value of the direct trust relationship between any two users on any one feature according to the tendency vectors of any two users on the any one feature comprises:
calculating the product of the tendency vectors of any two users on any one feature;
calculating the product of the moduli of the tendency vectors of any two users on the any one feature;
and determining the ratio of the product of the tendency vectors of any two users on any one feature and the product of the moduli of the tendency vectors as the trust metric value of the direct trust relationship between any two users on any one feature.
In some embodiments, for any two users that do not generate a direct trust relationship based on the first characteristic and that generate a direct trust relationship based on the at least one second characteristic,
determining the mean value information of the trust metric values of the direct trust relationships between any two users on each second characteristic as the comprehensive trust metric value of the direct trust relationship between any two users;
determining a product of the comprehensive trust metric value of the direct trust relationship between any two users and the similarity between any two users as a trust metric value of the indirect trust relationship between any two users on the first characteristic;
the arbitrary two users of which the trust metric value of the indirect trust relationship on the first characteristic is larger than the preset value are determined as generating the indirect trust relationship based on the first characteristic;
all users generating indirect trust relationships based on the first feature are added to the user trust network of the feature level corresponding to the first feature.
In some embodiments, the method for calculating the similarity between any two users includes:
acquiring an evaluation vector formed by the evaluation values of the two arbitrary users on a plurality of identical objects;
calculating the product of the evaluation vectors of any two users;
calculating the product of the moduli of the evaluation vectors of the arbitrary two users;
and determining the ratio of the product of the evaluation vectors of the any two users and the product of the moduli of the evaluation vectors as the similarity between the any two users.
In some embodiments, the objects are categories,
the user trust network of the category level corresponding to the category consists of the user trust networks of all the characteristic levels corresponding to the characteristics of the category,
wherein the content of the first and second substances,
the trust metric values of any two users in the user trust network of the category level corresponding to the category on the category are as follows: the confidence measure values of the two users on all the characteristics of the category are weighted and summed based on the weights of the characteristics of the category in the category.
In some embodiments, the category-level user trust network corresponding to a category of a previous level to which the category belongs is composed of all category-level user trust networks corresponding to respective categories of a category of the next level to which the category belongs,
wherein the trust metric values of any two users in the user trust network of the category level corresponding to the category of the previous level on the category of the previous level are: by accumulating trust metrics over all categories to which the any two users belong under the category of the previous level.
In some embodiments, further comprising: determining a user trust network of a characteristic level where a target user is located based on the target characteristics;
determining the evaluation value of the target user to the object according to the trust metric values of the target user and other users in the user trust network of the characteristic level where the target user is located and the evaluation values of the other users to the object;
and determining the object to be pushed to the target user based on the evaluation value of the target user on the object.
In some embodiments, further comprising: determining a user trust network of a category level where a target user is located based on the target category;
determining the evaluation value of the target user to the object according to the trust metric values of the target user and other users in the user trust network of the category level where the target user is located and the evaluation values of the other users to the object;
and determining the object to be pushed to the target user based on the evaluation value of the target user on the object.
Some embodiments of the present disclosure provide a data mining device, including:
a memory; and
a processor coupled to the memory, the processor configured to perform the data mining method of any of the foregoing embodiments based on instructions stored in the memory.
Some embodiments of the present disclosure propose a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data mining method of any of the preceding embodiments.
Drawings
The drawings that will be used in the description of the embodiments or the related art will be briefly described below. The present disclosure will be more clearly understood from the following detailed description, which proceeds with reference to the accompanying drawings,
it is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without undue inventive faculty.
Fig. 1 is a schematic flow diagram of some embodiments of a data mining method for mining a user trust network according to the present disclosure.
Fig. 2 is a flow diagram illustrating some embodiments of the disclosed method for building a feature-level user trust network for step 130.
Fig. 3 is a flow diagram illustrating some embodiments of the method of the present disclosure for extending a feature level user trust network based on the embodiment illustrated in fig. 2.
Figure 4 shows a schematic diagram of one example of a feature level user trust network constructed by the embodiment shown in figures 1-3.
Fig. 5 is a flowchart illustrating some embodiments of a method for constructing a category-level user trust network according to the present disclosure on the basis of the feature-level user trust network constructed in the embodiments shown in fig. 1 to 3.
Fig. 6 is a diagram showing one example of correspondence of categories and their features.
FIG. 7 illustrates a schematic diagram of one example of user trust network aggregation implemented based on feature and based on category relationships.
Fig. 8 is a flow diagram illustrating some embodiments of mining push information based on the feature-level user trust network constructed by the embodiments of fig. 1-3.
Fig. 9 is a schematic flowchart of some embodiments of mining and pushing information based on the category-level user trust network constructed by the embodiment of fig. 5 in the present disclosure.
Fig. 10 is a schematic diagram of some embodiments of a data mining device of the present disclosure.
Fig. 11 is a flowchart illustrating an example of information pushing according to the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure.
Fig. 1 is a schematic flow diagram of some embodiments of a data mining method for mining a user trust network according to the present disclosure. As shown in fig. 1, the method of this embodiment includes:
in step 110, User Generated Content (UGC) of each User is obtained.
With the development of the internet application, the interaction of the user is embodied, and the user is not only a browser of the network content, but also a generator of the network content. User-generated content refers to web content generated by a user. The user-generated content may be, for example, comment information, community question and answer information, and the like, but is not limited to the illustrated example.
In step 120, a plurality of features of the object are extracted from the user generated content of each user.
In some embodiments, feature mining techniques are employed to extract a plurality of features of an object from user-generated content of respective users. For example, the user generated content of each user is subjected to word segmentation processing, stop words such as 'yes' are filtered, information such as the part of speech and the word frequency of the word segmentation is counted, and the word segmentation with the part of speech and the word frequency meeting preset conditions is extracted as features according to business needs.
For example, for the object "mobile phone," the words such as "resolution", "memory", and "appearance" are determined as the features of "mobile phone" by mining the comment information of "mobile phone.
Step 130, according to the user generated content of each user, for example, by using the trust metric value on any one feature between any two users, determining all users generating trust relationships based on any one feature of the extracted multiple features, where the users form a user trust network at a feature level corresponding to the any one feature.
For example, a confidence metric value between any two users on any one feature is determined based on a similarity between the any two users on the any one feature. That is, the confidence metric value of different users on a feature is determined by the similarity of different users on the feature.
Wherein the trust relationship comprises a direct trust relationship and an indirect trust relationship. The indirect trust relationship is formed from the direct trust relationship and the trust transfer. The subsequent embodiments shown in fig. 2 and 3 will describe these two trust relationships, respectively.
According to the embodiment, the users generating the trust relationship based on the characteristics of the object are mined according to the user generation content, and the user trust network at the characteristic level is constructed, so that various hidden user trust relationships are mined, and the data sparsity problem of the user trust network is improved to a certain extent.
Fig. 2 is a flow diagram illustrating some embodiments of the disclosed method for building a feature-level user trust network for step 130. This embodiment generally describes building a feature level user trust network based on direct trust relationships.
As shown in fig. 2, the method of this embodiment includes:
step 210, mining the tendency vector of each object in the preset object set on the any feature of each user from the user generated content of each user.
In some embodiments, through emotional tendency analysis technology in short text mining, according to comments of users on objects, emotional tendency of the users on various features of the objects is mined based on the emotional tendency of sentences in which the features are located. Emotional tendencies are divided, for example, into positive, negative and unviewed, denoted by 1, -1 and 0, respectively. If the feature is from the object's comments, then for some feature f, there is a set of objects (the set of objects may have filtered objects that comment less).
The inclination vector of each object k in the preset object Set of the user i on any one feature f is represented as a vector Li
Li=[scorei,1,f,scorei,2,f,…,scorei,k,f…]
Wherein, scorei,k,fIndicating the tendency value of user i to object k on feature f.
The inclination vector of each object k in the preset object Set of the user j on any one feature f is represented as a vector Lj
Lj=[scorej,1,f,scorej,2,f,…,scorej,k,f…]
Wherein, scorej,k,fRepresenting the tendency value of user j on feature f for object k.
And step 220, determining a trust metric value of a direct trust relationship between any two users on any one feature according to the similarity of the tendency vectors of any two users on the any one feature.
In some embodiments, a product of the tendency vectors of any two users on any one feature is calculated, a product of the moduli of the tendency vectors of any two users on any one feature is calculated, and a ratio between the product of the tendency vectors of any two users on any one feature and the product of the moduli of the tendency vectors is determined as a trust metric value of a direct trust relationship between any two users on any one feature. Thus, the trust metric values for the direct trust relationships of different users are calculated based on cosine similarity.
For example, trust metric value trust of direct trust relationship of two users i and j on feature ff(i, j) can be expressed by the following formula:
Figure BDA0001974457910000071
wherein L isiAnd LjWith reference to the foregoing, | represents the modulus of the vector.
In step 230, the two arbitrary users whose trust metric of the direct trust relationship on the arbitrary feature is greater than the preset value are determined as users who generate the direct trust relationship based on the arbitrary feature, and all the users who generate the direct trust relationship based on the arbitrary feature form the user trust network at the feature level corresponding to the arbitrary feature.
The description information of the user trust network of the feature level corresponding to the arbitrary feature includes, for example, the arbitrary feature, the user generating the trust relationship based on the arbitrary feature, and the trust metric value of the trust relationship. For example, TrustNet of user trust network of characteristic level corresponding to characteristic ffCan be expressed as:
trustf(i,j),i,j∈SetUserf
wherein SetUserfRepresenting all users who generate direct trust relationships based on the feature f, and the meaning of the other symbols is referred to above.
Fig. 3 is a flow diagram illustrating some embodiments of the method of the present disclosure for extending a feature level user trust network based on the embodiment illustrated in fig. 2. The embodiment mainly describes that an indirect trust relationship is formed according to a direct trust relationship and trust transfer, and a user trust network of a characteristic level is constructed based on the indirect trust relationship.
As shown in fig. 3, for any two users who do not generate a direct trust relationship based on a first feature and generate a direct trust relationship based on at least one second feature, the method of this embodiment comprises:
step 310, determining the average information of the trust metric values of the direct trust relationships between any two users on each second feature as the integrated trust metric value of the direct trust relationship between any two users.
Assuming that users u and i do not generate a direct trust relationship based on the first feature F1 and generate a direct trust relationship based on the at least one second feature, the integrated trust metric value trust (i, u) for the direct trust relationship between users u and i is:
Figure BDA0001974457910000081
wherein, SetFeatureu,iRepresenting the feature set of users u and i generating direct trust relationship, N representing the number of features in the feature set, trustf(i, u) represents a trust metric value for the direct trust relationship of two users u and i on feature f.
Step 320, determining the product of the integrated trust metric value of the direct trust relationship between any two users and the similarity between any two users as the trust metric value of the indirect trust relationship between any two users on the first characteristic. The formula is expressed as follows:
TrustF1(i,u)=simi,u×trust(i,u)
wherein, TrustF1(i, u) a trust metric value, sim, representing the indirect trust relationship between any two users u and i over the first feature F1i,uRepresenting the similarity between two users u and i, and the other symbols have the meanings referred to above.
In some embodiments, the similarity between any two users is calculated using cosine similarity:
acquiring an evaluation vector formed by evaluation values of the two arbitrary users on a plurality of identical objects;
calculating the product of the evaluation vectors of the any two users;
calculating the product of the moduli of the evaluation vectors of the arbitrary two users;
and determining the ratio of the product of the evaluation vectors of the any two users and the product of the moduli of the evaluation vectors as the similarity between the any two users.
Similarity sim between any two users i and ji,jThe calculation formula of (a) is as follows:
Figure BDA0001974457910000091
wherein R isiAnd RjAnd an evaluation vector representing evaluation values of the users i and j for a plurality of identical objects.
In step 330, any two users whose trust metric of the indirect trust relationship on the first feature is greater than the preset value are determined as generating the indirect trust relationship based on the first feature, and all users generating the indirect trust relationship based on the first feature are added to the user trust network of the feature level corresponding to the first feature.
Therefore, through trust transfer, different users without trust relationship originally in the first characteristic can generate trust relationship, and the data sparsity problem of the user trust network is further improved.
Figure 4 shows a schematic diagram of one example of a feature level user trust network constructed by the embodiment shown in figures 1-3. As shown in fig. 4, the objects include, for example, a mobile phone and a computer, the features of the objects include, for example, resolution and memory, and the user trust networks at the corresponding feature levels include a resolution-based user trust network and a memory-based user trust network.
Fig. 5 is a flowchart illustrating some embodiments of a method for constructing a category-level user trust network according to the present disclosure on the basis of the feature-level user trust network constructed in the embodiments shown in fig. 1 to 3. Wherein, the object in the embodiments of fig. 1-3 is, for example, a certain category.
As shown in fig. 5, the method of this embodiment includes:
step 510, the user trust network of category level corresponding to a certain category is composed of user trust networks of all feature levels corresponding to each feature of the category. Therefore, the user trust network aggregation is realized based on the characteristics, and the data sparsity problem of the user trust network is further improved.
The description information of the user trust network at the category level includes, for example, the category, the users having trust relationships in the category, and the trust metric value of the user in the category.
Wherein, the Trust metric value Trust of any two users i and j in the user Trust network of the category level corresponding to a certain category c on the category cc(i, j): by the weight w in the category c based on the respective feature f of the category cc(f) Trust metric values for all features f of the class object for any two users i and jf(i, j) are obtained by weighted summation. The formula is expressed as follows:
Figure BDA0001974457910000101
in some embodiments, the weight of each feature of a category in the category is determined by the word frequency of the feature over the category. For example, the weight w of a feature f of a category c in the category cc(f) The calculation formula of (2) is as follows:
Figure BDA0001974457910000102
wherein, FreqfFrequency of words, Freq, representing characteristic fkWord frequency, Set, representing characteristic kcRepresenting all the features contained in category c.
Fig. 6 is a diagram showing one example of correspondence of categories and their features. As shown in fig. 6, categories include cell phones and computers, and features of these categories include resolution and memory. The weight of "resolution" in "cell phone" is, for example, 0.5, the weight of "resolution" in "computer" is, for example, 0.2, the weight of "memory" in "cell phone" is, for example, 0.4, and the weight of "memory" in "computer" is, for example, 0.3.
If there is a category at the previous level in the "category" in step 510, step 520 may also be performed to perform user trust network aggregation based on the category relationship.
Step 520, the user trust network of the category level corresponding to the category of the previous level to which a certain category belongs is composed of the user trust networks of all category levels corresponding to the categories of the category of the previous level. Therefore, user trust network aggregation is realized based on the category relation, and the data sparsity problem of the user trust network is further improved.
Wherein the trust metric values of any two users in the user trust network of the category level corresponding to the category of the previous level on the category of the previous level are: by accumulating trust metrics for any two users over all categories under the category of the previous level.
For example, the Trust metric value Trust of any two users i and j in the user Trust network of the category level corresponding to a certain secondary category cate2 on the secondary category cate2cate2(i, j) is:
Trustcate2(i,j)=∑cate3Trustcate3(i,j),cate3∈cate2
wherein, the cat 3 represents all the tertiary categories, Trust, under the secondary category, cat 2cate3(i, j) represents the confidence metric value for two users i and j on some third level category, cate 3.
FIG. 7 illustrates a schematic diagram of one example of user trust network aggregation implemented based on feature and based on category relationships. The feature-level user trust network corresponding to feature 3 includes, for example, user pairs a and B, and user pairs a and C. The feature-level user trust network corresponding to feature 4 includes, for example, user pairs a and B, user pairs a and D. The class-level user trust network corresponding to the three-level class 1.2.2 formed based on feature 3 and 4 aggregation includes, for example, a pair of users a and B, a pair of users a and C, and a pair of users a and D. Assume that the user trust network at the category level corresponding to the three categories 1.2.1 includes, for example, a pair of users a and B, a pair of users a and C, and a pair of users D and E. According to the category relationship, the user trust network of the category level corresponding to the secondary category 1.2 formed by aggregation based on the tertiary category 1.2.1 and the tertiary category 1.2.2 comprises a user pair A and B, a user pair A and C, a user pair A and D, and a user pair D and E, for example. The user trust network at the category level corresponding to the first category formed based on the second category aggregation may refer to the user trust network at the category level corresponding to the second category formed based on the third category aggregation, and details are not repeated here.
Fig. 8 is a flow diagram illustrating some embodiments of mining push information based on the feature-level user trust network constructed by the embodiments of fig. 1-3.
As shown in fig. 8, the method of this embodiment includes:
in step 810, since the description information of the user trust network at the feature level includes, for example, the feature, the user generating the trust relationship based on the feature, and the trust metric value of the trust relationship, the user trust network at the feature level where the target user is located may be determined based on the target feature.
And step 820, determining the evaluation value of the target user on the object according to the trust metric values of the target user and other users in the user trust network of the characteristic level where the target user is located and the evaluation values of the other users on the object.
For example, the evaluation value r of the target user i for the object pi,pExpressed as:
Figure BDA0001974457910000111
wherein r isi,pDenotes an evaluation value, r, of the target user i with respect to the object pu,pRepresents the evaluation value of the user U on the object p, U represents the user contained in the user trust network where the target user i is based on the characteristic f, trustf(i, u) represents the trust metric value over the feature f for both users u and i, which may be, for example, the trust metric value for the direct trust relationship described above, or for an indirect trust relationship in the absence of a trust metric value for a direct trust relationship.
Step 830, determining the objects to be pushed to the target user based on the evaluation value of the target user for the objects, so as to push the objects to the target user or push the showing links of the objects.
For example, to ri,pAnd sorting, wherein the top N objects are determined as the objects to be pushed to the target user i.
Therefore, based on the information push of the user trust network with the characteristic level, different objects with the same characteristic can obtain the push opportunity, and the diversity of the push objects is improved.
Fig. 9 is a schematic flowchart of some embodiments of mining and pushing information based on the category-level user trust network constructed by the embodiment of fig. 5 in the present disclosure.
As shown in fig. 9, the method of this embodiment includes:
in step 910, since the description information of the user trust network at the category level includes, for example, the category, the user having trust relationship in the category, and the trust metric value of the user in the category, the user trust network at the category level where the target user is located may be determined based on the target category.
The level of the target category is not limited, and may be, for example, a third category, a second category, or a first category.
And step 920, determining the evaluation value of the target user for the object according to the trust metric values of the target user and other users in the user trust network of the category level where the target user is located and the evaluation values of the other users for the object.
For example, the evaluation value r of the target user i for the object pi,pExpressed as:
Figure BDA0001974457910000121
wherein, trustc(i, u) represents a confidence measure value, r, for two users u and i over a target category ci,pDenotes an evaluation value, r, of the target user i with respect to the object pu,pRepresents the evaluation value of the user U for the object p, and U represents the user included in the user trust network where the target user i is located based on the target category c.
The embodiment of fig. 9 is generally larger in scope for category-based user trust networks than for feature-based user trust networks, as opposed to the embodiment of fig. 8, where the scope of objects also extends from the feature level to the category level.
Step 930, determining the objects to be pushed to the target user based on the evaluation value of the target user for the objects, so as to push the objects or push the showing links of the objects to the target user.
For example, to ri,pAnd sorting, wherein the top N objects are determined as the objects to be pushed to the target user i.
Therefore, various objects at the category level can obtain push opportunities based on the information push of the user trust network at the category level, and the diversity of the pushed objects is further improved.
Those skilled in the art will appreciate that the above description is only a few examples of information push based on user trust network, and that other information push methods may be adopted based on user trust network. For example, the object browsed or purchased by other users in the user trust network where the target user is located is used as the push object of the target user. For another example, the user trust network is combined with a random walk algorithm to determine the push object of the target user.
Fig. 10 is a schematic diagram of some embodiments of a data mining device of the present disclosure.
As shown in fig. 10, the data mining apparatus of this embodiment includes: a memory 1010 and a processor 1020 coupled to the memory, the processor 1020 configured to perform a data mining method in any of the foregoing embodiments based on instructions stored in the memory, for example, a mining method for data such as a user trusted network or push information. The data mining device may be, for example, a server or the like.
Memory 1010 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs.
In some application examples of information push, as shown in fig. 11, each user submits its own user-generated content through its client, the data mining device collects the user-generated content of each user, then constructs a user trust network based on the user-generated content of each user by using the method of any one of fig. 1 to 3 and fig. 5, then determines push information of a target user based on the user trust network by using the method of any one of fig. 8 to 9, and finally sends the determined push information to the client of the target user, so that the client presents the push information to the target user.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-readable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only exemplary of the present disclosure and is not intended to limit the present disclosure, so that any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (11)

1. A method of data mining, comprising:
acquiring user generated content of each user;
extracting a plurality of features of an object from user-generated content of respective users;
determining all users generating trust relations based on any one of the extracted multiple characteristics according to the user generated content of each user;
and forming a user trust network of the characteristic level corresponding to any one characteristic by all users generating trust relations based on any one characteristic.
2. The method of claim 1,
mining the tendency vector of each object in the preset object set on any one feature of the user from the user generated content of each user;
determining a trust metric value of a direct trust relationship between any two users on any one feature according to the similarity of the tendency vectors of any two users on the any one feature;
the arbitrary two users with the trust metric value of the direct trust relationship on the arbitrary one feature larger than the preset value are determined as the users generating the direct trust relationship based on the arbitrary one feature;
and forming a user trust network of the characteristic level corresponding to any one characteristic by all users generating the direct trust relationship based on the any one characteristic.
3. The method of claim 2, wherein determining the trust metric value for the direct trust relationship between any two users on any one feature based on their tendency vectors on the any one feature comprises:
calculating the product of the tendency vectors of any two users on any one feature;
calculating the product of the moduli of the tendency vectors of any two users on the any one feature;
and determining the ratio of the product of the tendency vectors of any two users on any one feature and the product of the moduli of the tendency vectors as the trust metric value of the direct trust relationship between any two users on any one feature.
4. The method of claim 2,
for any two users that do not generate a direct trust relationship based on the first characteristic and that generate a direct trust relationship based on the at least one second characteristic,
determining the mean value information of the trust metric values of the direct trust relationships between any two users on each second characteristic as the comprehensive trust metric value of the direct trust relationship between any two users;
determining a product of the comprehensive trust metric value of the direct trust relationship between any two users and the similarity between any two users as a trust metric value of the indirect trust relationship between any two users on the first characteristic;
the arbitrary two users of which the trust metric value of the indirect trust relationship on the first characteristic is larger than the preset value are determined as generating the indirect trust relationship based on the first characteristic;
all users generating indirect trust relationships based on the first feature are added to the user trust network of the feature level corresponding to the first feature.
5. The method of claim 4, wherein the method of calculating the similarity between any two users comprises:
acquiring an evaluation vector formed by the evaluation values of the two arbitrary users on a plurality of identical objects;
calculating the product of the evaluation vectors of any two users;
calculating the product of the moduli of the evaluation vectors of the arbitrary two users;
and determining the ratio of the product of the evaluation vectors of the any two users and the product of the moduli of the evaluation vectors as the similarity between the any two users.
6. The method of claim 2, wherein the object is a category,
the user trust network of the category level corresponding to the category consists of the user trust networks of all the characteristic levels corresponding to the characteristics of the category,
wherein the content of the first and second substances,
the trust metric values of any two users in the user trust network of the category level corresponding to the category on the category are as follows: the confidence measure values of the two users on all the characteristics of the category are weighted and summed based on the weights of the characteristics of the category in the category.
7. The method of claim 6,
the user trust network of category level corresponding to the category of the upper level to which the category belongs consists of all the user trust networks of category level corresponding to the categories of the lower level to which the category of the upper level belongs,
wherein the trust metric values of any two users in the user trust network of the category level corresponding to the category of the previous level on the category of the previous level are: by accumulating trust metrics over all categories to which the any two users belong under the category of the previous level.
8. The method of any one of claims 1-5, further comprising:
determining a user trust network of a characteristic level where a target user is located based on the target characteristics;
determining the evaluation value of the target user to the object according to the trust metric values of the target user and other users in the user trust network of the characteristic level where the target user is located and the evaluation values of the other users to the object;
and determining the object to be pushed to the target user based on the evaluation value of the target user on the object.
9. The method of any one of claims 6-7, further comprising:
determining a user trust network of a category level where a target user is located based on the target category;
determining the evaluation value of the target user to the object according to the trust metric values of the target user and other users in the user trust network of the category level where the target user is located and the evaluation values of the other users to the object;
and determining the object to be pushed to the target user based on the evaluation value of the target user on the object.
10. A data mining device, comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the data mining method of any of claims 1-9 based on instructions stored in the memory.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the data mining method of any one of claims 1 to 9.
CN201910128629.7A 2019-02-21 2019-02-21 Data mining method and device Active CN111597220B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910128629.7A CN111597220B (en) 2019-02-21 2019-02-21 Data mining method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910128629.7A CN111597220B (en) 2019-02-21 2019-02-21 Data mining method and device

Publications (2)

Publication Number Publication Date
CN111597220A true CN111597220A (en) 2020-08-28
CN111597220B CN111597220B (en) 2024-03-05

Family

ID=72184854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910128629.7A Active CN111597220B (en) 2019-02-21 2019-02-21 Data mining method and device

Country Status (1)

Country Link
CN (1) CN111597220B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102867016A (en) * 2012-07-18 2013-01-09 北京开心人信息技术有限公司 Label-based social network user interest mining method and device
WO2013037329A1 (en) * 2011-09-14 2013-03-21 北京大学 Secure digital content sharing method, device, and system
CN103631901A (en) * 2013-11-20 2014-03-12 清华大学 Rumor control method based on maximum spanning tree of user-trusted network
US20150187024A1 (en) * 2013-12-27 2015-07-02 Telefonica Digital España, S.L.U. System and Method for Socially Aware Recommendations Based on Implicit User Feedback
CN107358533A (en) * 2017-06-15 2017-11-17 桂林理工大学 A kind of user of Web Community recommends method and system
CN107679101A (en) * 2017-09-12 2018-02-09 重庆邮电大学 It is a kind of that method is recommended based on the network service of position and trusting relationship
CN109360058A (en) * 2018-10-12 2019-02-19 平安科技(深圳)有限公司 Method for pushing, device, computer equipment and storage medium based on trust network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013037329A1 (en) * 2011-09-14 2013-03-21 北京大学 Secure digital content sharing method, device, and system
CN102867016A (en) * 2012-07-18 2013-01-09 北京开心人信息技术有限公司 Label-based social network user interest mining method and device
CN103631901A (en) * 2013-11-20 2014-03-12 清华大学 Rumor control method based on maximum spanning tree of user-trusted network
US20150187024A1 (en) * 2013-12-27 2015-07-02 Telefonica Digital España, S.L.U. System and Method for Socially Aware Recommendations Based on Implicit User Feedback
CN107358533A (en) * 2017-06-15 2017-11-17 桂林理工大学 A kind of user of Web Community recommends method and system
CN107679101A (en) * 2017-09-12 2018-02-09 重庆邮电大学 It is a kind of that method is recommended based on the network service of position and trusting relationship
CN109360058A (en) * 2018-10-12 2019-02-19 平安科技(深圳)有限公司 Method for pushing, device, computer equipment and storage medium based on trust network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
俞春花;刘学军;李斌;章玮;: "基于上下文相似度和社会网络的移动服务推荐方法", 电子学报, no. 06 *
沈千里;章剑林;汤兵勇;: "基于网络评论信息和D-S证据理论的云计算服务信任及采纳研究", 图书馆学研究, no. 01 *
高亨德;王智强;李茹;: "基于信任关系和词相关关系的冷启动用户词特征重建", 中文信息学报, no. 05 *

Also Published As

Publication number Publication date
CN111597220B (en) 2024-03-05

Similar Documents

Publication Publication Date Title
CN110162593B (en) Search result processing and similarity model training method and device
Bartunov et al. Joint link-attribute user identity resolution in online social networks
US7444279B2 (en) Question answering system and question answering processing method
US9870465B1 (en) Apparatus, method and article to facilitate automatic detection and removal of fraudulent user information in a network environment
EP2866421B1 (en) Method and apparatus for identifying a same user in multiple social networks
US9536201B2 (en) Identifying associations in data and performing data analysis using a normalized highest mutual information score
US20150356091A1 (en) Method and system for identifying microblog user identity
US8589408B2 (en) Iterative set expansion using samples
CN108269122B (en) Advertisement similarity processing method and device
CN111371767B (en) Malicious account identification method, malicious account identification device, medium and electronic device
CN106156163B (en) Text classification method and device
Chatterjee et al. Single document extractive text summarization using genetic algorithms
US8560466B2 (en) Method and arrangement for automatic charset detection
US11200501B2 (en) Accurate and interpretable rules for user segmentation
KR20190128246A (en) Searching methods and apparatus and non-transitory computer-readable storage media
CN113807073B (en) Text content anomaly detection method, device and storage medium
CN111967503A (en) Method for constructing multi-type abnormal webpage classification model and abnormal webpage detection method
Sitorus et al. Sensing trending topics in twitter for greater Jakarta area
CN110727842B (en) Web service developer on-demand recommendation method and system based on auxiliary knowledge
Bashir Combining pre-retrieval query quality predictors using genetic programming
Najafi et al. Dependability‐based cluster weighting in clustering ensemble
WO2021070394A1 (en) Learning device, classification device, learning method, and learning program
CN111597220A (en) Data mining method and device
CN110941638A (en) Application classification rule base construction method, application classification method and device
CN114491296A (en) Proposal affiliate recommendation method, system, computer device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant