CN109242520A

CN109242520A - A kind of potential user group localization method and device

Info

Publication number: CN109242520A
Application number: CN201710556934.7A
Authority: CN
Inventors: 柴俊滔; 柯于皇; 王兴武; 马启华
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Priority date: 2017-07-10
Filing date: 2017-07-10
Publication date: 2019-01-18

Abstract

The present invention relates to data mining technology fields, more particularly to a kind of potential user group localization method and device, in order to improve the accuracy of existing potential user group localization method, this method is, using the potential user group location model pre-established, obtain the user content scoring of at least one user corresponding with object content to be positioned, then, the user content scoring that value meets preset threshold is filtered out from the scoring of each user content, and the corresponding each user of the user content filtered out scoring is determined as potential user group, wherein, potential user group location model is that the behavioral data for executing different behavior types for different content according to different user is established through multiple training, in this way, by comprehensive different user to the behavioral data of different content, it scores the correlation between different user and different content, and by Scoring height auxiliary carries out the positioning of potential user group, improves the accuracy of positioning, further improves user experience.

Description

A kind of potential user group localization method and device

Technical field

The present invention relates to data mining technology field more particularly to a kind of potential user group localization methods and device.

Background technique

Potential user group positioning is that the basis that product operation is promoted is carried out frequently with following two scheme in the prior art Potential user group positioning.

Scheme one: being positioned manually mode, firstly, presupposing potential user group essential attribute feature that may be present, so Afterwards, in a manner of database retrieval, the user of search and the attributive character exact matching assumed, and filter out satisfaction matching and require All users, form potential user group.

However, the essential attribute feature of user can not represent the hobby and interest of the group, example completely in scheme one Such as, by taking real estate sale as an example, subject of a sale is located in the successful personage group between 25 to 45 years old, if depositing House property has been purchased in some people, may result in that above-mentioned some people interest-degree is not high, and therefore, the accuracy of this scheme is not It is high.

Scheme two: tag location mode is respectively interior by pre-establishing tag library, and with the label in above-mentioned tag library Rong Ku and user library are labeled, and by taking a content as an example, for said one content, circle selects the mark with said one content Identical all users are signed, the potential user group of said one content is formed.

However, in scheme two, since the interest-degree of each user is not unalterable, accordingly, it is possible to can because with The tag update of family interest setting not in time, influences the accuracy of positioning result, for example, liking seeing beauty before user's half a year Play, system are American series fan to the label of this user, if user interest changes, and label fails to timely update, Cause when carrying out potential user group positioning to " American series " this content, finally determining potential user group is comprising the user , it is clear that the positioning result inaccuracy of above-mentioned potential user group.

In view of this, a kind of potential user group localization method need to be redesigned, to overcome drawbacks described above.

Summary of the invention

The embodiment of the present invention provides a kind of potential user group localization method and device, fixed to improve existing potential user group The accuracy of position method.

Specific technical solution provided in an embodiment of the present invention is as follows:

In a first aspect, a kind of method for building up of potential user group location model, comprising:

Obtain several user content behavioral datas, wherein several user content behavioral datas include different user needle The behavioral data of different behavior types is executed to different content；

Using default quantizing rule, quantization operation is executed to several user content behavioral datas, obtains corresponding use Indoor appearance rating matrix；

According to preset ratio, the user content rating matrix is divided into training sample set and test sample collection；

Based on preset collaborative filtering, model training is carried out using the training sample set, until determining to use institute Test sample collection is stated to carry out determining potential user group location model until evaluation index meets specified requirements obtained by model evaluation.

Optionally, using default quantizing rule, quantization operation, acquisition pair are executed to several user content behavioral datas The user content rating matrix answered, comprising:

Based on several user content behavioral datas, following behaviour is executed for each corresponding content of each user Make:

Numerical value of each behavior type of Current Content in each default weight factor is determined respectively, wherein described each Default weight factor is including but not limited to following any one or combination: behavior number weight, behavior type weight and time decline Subtract weight；

Following operation is executed for each behavior type: using pre-set user content scores rule, Behavior-based control number The numerical value of the numerical value of weight, the numerical value of behavior type weight and time decaying weight determines corresponding user content scoring；

The scoring of the various actions type of acquisition corresponding user content is overlapped, it is corresponding to obtain the Current Content User content scoring；

It scores the corresponding user content of each content of each user of acquisition and executes normalized, composition is used Indoor appearance rating matrix.

Optionally, the pre-set user content scores regular expression are as follows:

User content scoring=behavior number weight * behavior type weight * time decaying weight；

The time decaying weight expression formula are as follows:

Time decaying weight=exp (- a* (t1-t0)), wherein the t1 is current point in time, and the t0 is behavior hair Raw time point, a are preset specified attenuation amplitude.

Optionally, it is based on preset collaborative filtering, model training is carried out using the training sample set, comprising:

Based on preset collaborative filtering, matrix decomposition is carried out to the training sample set, obtains user characteristics vector Matrix and content feature vector matrix；

Based on the user characteristics vector matrix and the content feature vector matrix, target user's content scores square is determined Battle array.

Optionally, model evaluation is carried out using the test sample collection, comprising:

At least one test user is selected from test sample concentration, and determines at least one described test user couple The user content scoring answered, as actual user's content scores；

From target user's content scores matrix, determine that at least one described corresponding user content of test user is commented Point, as test user content scoring；

The standard error for calculating actual user's content scores and corresponding test user content scoring, refers to as evaluation Mark.

Second aspect, a kind of potential user group localization method, using as the described in any item methods of first aspect obtain Potential user group location model, comprising:

Using the potential user group location model, the use of at least one corresponding user of object content to be positioned is determined Family content scores；

The user content scoring that value meets preset threshold is filtered out from the scoring of at least one user content, and will screening The corresponding each user of user content scoring out is determined as potential user group.

Optionally, further comprise:

When the target user's number for determining that the potential user group includes is not up to expectation threshold value, mentioned from designated user pond Take at least one candidate user；

Using preset user's similarity calculation algorithm, calculate in the potential user group each target user respectively with institute State the similarity between each candidate user of at least one candidate user；

Based on the size of similarity value, at least one described candidate user is ranked up, and filter out meet it is described Expectation threshold value and the preceding each candidate user of similarity value sequence, merge with initial potential user group, form new mesh Mark user group.

Optionally, it is based on preset user's similarity calculation algorithm, calculates any one target in the potential user group Similarity between user and any one candidate user of at least one candidate user, comprising:

The weighted value for each label that any one determining described target user carries, and determine any one described candidate The weighted value for each label that user carries, wherein the tag class of target user and the tag class of candidate user are identical；

It is calculated using the following equation similar between any one described target user and any one described candidate user Degree:

Wherein, i indicates predefined number of tags；A indicates any one target user；B indicates that any one candidate uses Family；weight A_iIndicate the weighted value of i-th of label of any one target user；weight B_iIndicate described any The weighted value of i-th of label of one candidate user；Indicate each label that any one described target user carries Weighted value mean value；Indicate the mean value of the weighted value for each label that any one described candidate user carries.

The third aspect, a kind of potential user group location model establish device, comprising:

Acquiring unit, for obtaining several user content behavioral datas, wherein several user content behavioral data packets The behavioral data of different behavior types is executed for different content containing different user；

Quantifying unit, for executing quantization operation to several user content behavioral datas using default quantizing rule, Obtain corresponding user content rating matrix；

Division unit, for according to preset ratio, the user content rating matrix to be divided into training sample set and survey Try sample set；

Training unit carries out model training using the training sample set, directly for being based on preset collaborative filtering Until determining that carrying out evaluation index obtained by model evaluation using the test sample collection meets specified requirements, target user is determined Group's location model.

Fourth aspect, a kind of potential user group positioning device are obtained using such as above-mentioned described in any item methods of first aspect The potential user group location model obtained, comprising:

Determination unit determines that object content to be positioned is corresponding extremely for using the potential user group location model The user content scoring of a few user；

Screening unit, the user content for meeting preset threshold for filtering out value from the scoring of at least one user content Scoring, and the corresponding each user of the user content filtered out scoring is determined as potential user group.

5th aspect, a kind of electronic equipment, comprising: one or more processors；And

One or more computer-readable mediums are stored with for potential user group location model on the readable medium The program of foundation, wherein when described program is executed by one or more of processors, realize as any in above-mentioned first aspect The step of method described in item.

6th aspect, one or more computer-readable mediums are stored with for potential user group on the readable medium The program of the foundation of location model, wherein when described program is executed by one or more processors, so that communication equipment executes such as Method described in any one of above-mentioned first aspect.

7th aspect, a kind of electronic equipment, comprising: one or more processors；And

One or more computer-readable mediums are stored with the journey for potential user group positioning on the readable medium Sequence, wherein when described program is executed by one or more of processors, realize as described in any one of above-mentioned second aspect The step of method.

Eighth aspect, one or more computer-readable mediums are stored with for potential user group on the readable medium The program of positioning, wherein when described program is executed by one or more processors, so that communication equipment executes such as above-mentioned second party Method described in any one of face.

In the embodiment of the present invention, using the potential user group location model pre-established, obtain in target to be positioned Hold the user content scoring of at least one corresponding user, then, filters out value from the scoring of each user content and meet in advance If the user content of threshold value scores, and is determined as potential user group for the corresponding each user of the user content filtered out scoring, Wherein, potential user group location model is the behavioral data warp for executing different behavior types for different content according to different user Multiple training and establish, in this way, by comprehensive different user to the behavioral data of different content, in different user and difference Correlation between appearance scores, and the positioning of potential user group is carried out by scoring height auxiliary, improves the accurate of positioning Degree, further improves user experience.

Detailed description of the invention

Fig. 1 is potential user group location model training process flow chart in the embodiment of the present invention；

Fig. 2 is that time decaying weight changes schematic diagram in the embodiment of the present invention；

Fig. 3 is the first potential user group localization method flow chart in the embodiment of the present invention；

Fig. 4 is second of potential user group localization method flow chart in the embodiment of the present invention；

Fig. 5 is that second of potential user group positions schematic diagram in the embodiment of the present invention；

Fig. 6 is potential user group position fixing process schematic diagram in the embodiment of the present invention；

Fig. 7 is that potential user group location model establishes apparatus structure schematic diagram in the embodiment of the present invention；

Fig. 8 is target user's positioning device structure schematic diagram in the embodiment of the present invention.

Specific embodiment

In order to improve the accuracy of existing potential user group localization method, in the embodiment of the present invention, one kind has been redesigned Potential user group localization method, this method are, using established potential user group location model, to determine in target to be positioned Hold the user content scoring of at least one corresponding user, then, filtering out value from the scoring of at least one user content expires The user content of sufficient preset threshold scores, and the corresponding each user of the user content filtered out scoring is determined as target user Group.

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, is not whole embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

The solution of the present invention will be described in detail by specific embodiment below, certainly, the present invention is not limited to Lower embodiment.

Embodiment one

As shown in fig.1, before carrying out potential user group positioning, need to first establish initial mesh in the embodiment of the present invention User group location model is marked, and carries out model training, until training result meets specified requirements, just finally determines that target is used Family group's location model, the detailed process about model training are as follows:

Step 100: obtaining several user content behavioral datas, wherein several user content behavioral datas include different use Family executes the behavioral data of different behavior types for different content.

Specifically, first obtaining several user content behavioral datas, wherein several user content behavioral datas include different use Family executes the behavioral data of different behavior types for different content, since the present embodiment is by analysis user when to which A content executes the behavioral data of which kind of behavior type, and then it is interested in which content to analyze user, to be to be positioned Object content carries out the positioning of potential user group, and usual User action log has recorded user and executes difference for different content The behavioral data of behavior type in the embodiment of the present invention, extracts user content behavioral data from User action log.

For example, it is assumed that content is e-book, and the predefined behavior type that can be performed for e-book has and " reads online Read, check introduction, collection and downloading " four classes, it is assumed that measurement period is one month, and ginseng is shown in Table 1.

Table 1

Step 110: using default quantizing rule, quantization operation being executed to several user content behavioral datas of acquisition, is obtained Obtain corresponding user content rating matrix.

Specifically, after determining user content behavioral data, since the user content behavioral data of acquisition is not specific number Therefore value need to execute quantization operation to several user content behavioral datas of acquisition, with acquisition pair using default quantizing rule The user content rating matrix answered.

Further, firstly, several user content behavioral datas based on acquisition, corresponding each for each user A content executes following operation: numerical value of each behavior type of Current Content in each default weight factor is determined respectively, Wherein, above-mentioned each default weight factor is including but not limited to following any one or combination: behavior number weight, behavior type are weighed Weight and time decaying weight.

In the embodiment of the present invention, behavior number weight characterizes a kind of number for behavior type that user executes for content；

Behavior type weight is the coefficient for being in advance the setting of behavior type, for example, it is assumed that being " e-book " curriculum offering Behavior type be divided into " online reading checks introduction, collection and downloading " four classes, if all behavior type weights of a content Summation is 10, and in general, the interest-degree that " downloading " is characterized is higher than " online reading " higher than " collection " and is higher than " checking introduction ", The behavior type weight of " online reading " can be set as " 2 ", the behavior type weight of " checking introduction " is set as " 1 ", " will received The behavior type weight of hiding " is set as " 3 ", and the behavior type weight of " downloading " is set as " 4 "；

Time decaying weight characterizes influence of the historical behavior data to user's current interest, that is, time behavior more remote Data, a possibility that capable of characterizing user's current interest, with regard to smaller, in the present embodiment, calculate time decaying weight with following formula: Time decaying weight value=exp (- a* (t1-t0)), wherein t1 is current point in time, and t0 is behavior time of origin point, and a is pre- If specified attenuation amplitude, be previously known empirical value.

For example, it is assumed that a is 0.6, and decaying in time decaying weight every 10 days is primary, as shown in fig.2, with the increasing of number of days Add, time decaying weight can be smaller and smaller.

Secondly, determining each corresponding behavior type of each content of each user in each default weight factor After numerical value, following operation is executed for each above-mentioned behavior type: using pre-set user content scores rule, Behavior-based control Numerical value, the numerical value of behavior type weight and the numerical value of time decaying weight of number weight determine corresponding user content scoring, In, user content code of points expression formula are as follows: user content scoring=behavior number weight * behavior type weight * time decays Weight.

It is then determined after the corresponding user content scoring of each behavior type of each content of each user, Following operation is executed for each content of each user: being scored the corresponding user content of the various actions type of acquisition It is overlapped, obtains the corresponding user content scoring of Current Content.

Finally, after determining the user content scoring of each content of each user, to each user's of acquisition The corresponding user content scoring of each content executes normalization, and is scored and made with the user content of normalized each user For element, user content rating matrix is determined.

Further, scoring each user content and executing normalization operation is with a user for an execution unit , for example, if user " UID1 " corresponding content " CID1, CID2 and CID3 ", it is assumed that user " UID1 " corresponding content " CID1 " User content scoring is " a ", and the user content scoring of corresponding content " CID2 " is " b ", the user content of corresponding content " CID1 " Scoring is " c ", and after executing normalization operation, the user content scoring of corresponding content " CID1 " isIn correspondence The user content for holding " CID2 ", which scores, isThe user content of corresponding content " CID3 " scores

Further, if hypothesis user is UID1-UIDm, content CID1-CIDn, the user content finally obtained commented Sub-matrix is refering to shown in table 2.

Table 2

	CID1	CID2	CID3	……	CIDn
						UID1
UID2
						UID3
……
						UIDm

For example, continuing to be illustrated with the example in table 1, it is assumed that current point in time 20161230, firstly, determining user " 13452304628 " and user " 13628526456 ", wherein user " 13452304628 " and content " 88641 " and content " 88642 " are related, and user " 13628526456 " is related to content " 88642 "；

Based on user content code of points expression formula: when user content scoring=behavior number weight * behavior type weight * Between decaying weight, it is known that, the scoring of the user content of the corresponding content " 88641 " of user " 13452304628 " and content " 88642 " Calculating process it is as follows:

Score (cid=88641)=2*2*0.3+1*1*0.3=1.5, wherein user " 13452304628 " is corresponding In content " 88641 ", the behavior type weight of " online reading " is " 2 " (using aforementioned example values), corresponding behavior number power Weight is " 2 ", and corresponding behavior time of origin point is " 20161215 ", is separated by 15 days with current point in time, provides according to fig. 2 Time decaying weight figure (also referred to as time attenuation coefficient figure), it may be determined that time decaying weight is " 0.3 "；The behavior of " checking introduction " Type weight is " 1 " (use aforementioned example values), and corresponding behavior number weight is " 1 ", and corresponding behavior time of origin point For " 20161215 ", it is separated by 15 days with current point in time, time decaying weight figure (the also referred to as time decaying system provided according to fig. 2 Number figure), it may be determined that time decaying weight is " 0.3 ".

Score (cid=88642)=2*1*0.5=1.5, wherein the corresponding content of user " 13452304628 " In " 88642 ", the behavior type weight of " checking introduction " is " 1 ", and corresponding behavior time of origin point is " 20161220 ", and is worked as Preceding time point is separated by 10 days, the time decaying weight figure provided according to fig. 2, it may be determined that time decaying weight is " 0.5 "；

It is found that the calculating process that the user content of the corresponding content " 88642 " of user " 13628526456 " scores is as follows:

Score (cid=88642)=1*3*0.5=1.5, wherein the corresponding content of user " 13628526456 " In " 88642 ", the behavior type weight of " collection " is " 3 ", and corresponding behavior time of origin point is " 20161220 ", with it is current when Between point be separated by 10 days, the time decaying weight figure provided according to fig. 2, it may be determined that time decaying weight be " 0.5 "；

Further, after executing normalization operation to above-mentioned each user content rating matrix, with each after normalization User content scoring is used as matrix element, that is, can determine that user content rating matrix is as shown in table 3:

Table 3

	CID1	CID2
			UID1	0.7	0.7
UID2	0	1

Step 120: according to preset ratio, user content rating matrix being divided into training sample set and test sample collection.

Specifically, the user content rating matrix that combing has obtained, and according to preset ratio, above-mentioned user content is scored Matrix is divided into training sample set and test sample collection, for example, can " 1:9 " ratio by 90% user's rating matrix determine For training sample set, remaining 10% user's rating matrix is determined as test sample collection.

Step 130: being based on preset collaborative filtering, model training is carried out using training sample set.

Specifically, be based on preset collaborative filtering, to training sample set carry out matrix decomposition, obtain user characteristics to Moment matrix and content feature vector matrix, and it is based on acquired user characteristics vector matrix and content feature vector matrix, really Set the goal user content rating matrix, wherein and user characteristics vector matrix is matrix of the user to the preference of content hidden feature, The matrix for the hidden feature that content feature vector matrix includes by content.

Preferably, collaborative filtering can be weighting regularization least square method in the embodiment of the present invention (Alternating-Least-Squares with Weighted- λ-Regularization, ALS-WR).

In the embodiment of the present invention, ALS-WR why is used to carry out matrix to the user content rating matrix of training sample set It decomposes, is because some users also just do not score accordingly there is no the preference clearly fed back to certain content, lead to initial use Item is lacked because there is a large amount of scoring in indoor appearance rating matrix, matrix Sparse Problems occurs, therefore, by matrix decomposition, obtaining Two low-dimensional matrixes (user characteristics vector matrix and content feature vector matrix) are obtained, then, utilize user characteristics vector matrix Multiplication cross is carried out in lower dimensional space with content feature vector matrix, target user's content scores matrix is obtained, in this process In, the scoring missing item in initial user content rating matrix can be filled, and solve the problems, such as that matrix is sparse.

So far, the first training of potential user group location model is completed, then, selectes at least one from test sample concentration User is tested, and determines the corresponding user content scoring of test user, as actual user's content scores, then, from target In user content rating matrix, the corresponding user content scoring of test user is determined, as test user content scoring, most Afterwards, the standard error of calculating actual user's content scores and corresponding test user content scoring, as evaluation index, and in judgement State whether evaluation index meets specified requirements, wherein standard error can be root-mean-square error (Root Mean Square Error, RMSE), specified requirements can be RMSE and be in a certain range.

If trained evaluation index is unsatisfactory for specified requirements for the first time, then model instruction is continued using training sample set Practice, in the embodiment of the present invention, every training for carrying out a potential user group location model can all use corresponding test sample collection Trained potential user group location model is tested, until determining to carry out commenting obtained by model evaluation using test sample collection Until valence index meets specified requirements, potential user group location model is just finally determined, in this way, subsequent can be based on target user Target user's content scores matrix in group's location model recommends preferred contents to user, alternatively, potential based on content mining User group.

Embodiment two

As shown in fig.3, in the embodiment of the present invention, after final determining potential user group location model, using above-mentioned model The method flow for carrying out potential user group positioning is as follows:

Step 300: determining object content to be positioned.

Specifically, being based on the potential user group of content mining, therefore, it is first determined to be positioned in the embodiment of the present invention Object content, for example, by taking music as an example, e.g., " American-European rock and roll style of song song ".

Step 310: use potential user group location model, obtain above-mentioned object content to be positioned it is corresponding at least one The user content of user scores.

Specifically, using the target in established potential user group location model after determining object content to be positioned User content rating matrix determines in target user's content scores matrix, the user of the corresponding each user of the object content Content scores.

For example, it is assumed that " American-European rock and roll style of song song " corresponding each user has " user 1, user 2, user 3,4 and of user User 5 ", wherein " user 1 " is " 0.6 " to the user content scoring of " American-European rock and roll style of song song ", and " user 2 " are to " America and Europe shakes The user content scoring of rolling style of song song " is " 0.1 ", and " user 3 " are to the user content scoring of " American-European rock and roll style of song song " " 0.3 ", " user 4 " are " 0.8 " to the user content scoring of " American-European rock and roll style of song song ", and " user 5 " are to " American-European rock and roll style of song The user content scoring of song " is " 0.5 ".

Step 320: the user content scoring that value meets preset threshold is filtered out from the scoring of at least one user content, And the corresponding each user of the user content filtered out scoring is determined as potential user group.

Specifically, the user content scoring for meeting preset threshold is filtered out from fixed each user content scoring, And the corresponding each user of the user content filtered out scoring is determined as potential user group.

For example, being still illustrated with above-mentioned example, if preset threshold is " 0.5 ", the user of the satisfaction filtered out " 0.5 " The user of content scores has: " user 1 ", " user 4 " and " user 5 ", that is, forms mesh by " user 1 ", " user 4 " and " user 5 " Mark user group.

However, the above-mentioned method for carrying out potential user group positioning based on potential user group location model, finally filters out The user content scoring of target user need to meet preset threshold, a member that can be just chosen to be in potential user group, therefore, accuracy It is higher, however, in practical application, the number for the high target user that scores is limited, therefore, it is also desirable to excavate more potential Valuable user, to expand marketing scope.

Embodiment three

As shown in fig.4, being based on the first potential user group localization method, determine that the potential user group for enclosing choosing includes When target user's number is not up to expectation threshold value, in the embodiment of the present invention, a kind of potential user group localization method, the party are also provided Method further excavates each target user's similarity for including with initial target user group based on the first localization method High user expands potential user group, and detailed process is as follows:

Step 400: at least one candidate user is extracted from designated user pond.

Specifically, extracting at least one candidate user from designated user pond, wherein designated user pond stores a large amount of use Family information, above-mentioned user information can come from User action log.

It is to excavate the high use of each target user's similarity for including with potential user group in the embodiment of the present invention Family, therefore, in order to reduce calculation amount, the candidate user extracted in this step can not be target user.

Step 410: using preset user's similarity calculation algorithm, calculate each target user's difference in potential user group Similarity between each candidate user of at least one candidate user of extraction.

Specifically, calculating each target user's difference in potential user group using preset user's similarity calculation algorithm Similarity between each candidate user of extraction.

By in potential user group a target user and a candidate user for, firstly, determine said one target The weighted value for each label that user carries, and determine the weighted value for each label that said one candidate user carries, wherein mesh The tag class for marking user is identical with the tag class of candidate user, in the embodiment of the present invention, whether target user or waits Family is selected, corresponding label has all been pre-set.

For example, it is assumed that content is music, wherein the label of target user and candidate user is 71 kinds, the power of each label Weight values are respectively refering to shown in table 4 and table 5.

Table 4

	Label 1 (Chinese)	Label 2 (English)	Label 3 (Japanese)	…	Label 71 (after 00)
						Weighted value	0.16	0.23	0.34	…	0.1

Table 5

	Label 1 (Chinese)	Label 2 (English)	Label 3 (Japanese)	…	Label 71 (after 00)
						Weighted value	0.15	0.13	0.24	…	0.34

Preferably, being calculated by the following formula said one target user in the embodiment of the present invention and said one being candidate Similarity between user:

Wherein, i indicates the number of predefined label；A indicates any one target user；B indicates that any one candidate uses Family；weight A_iIndicate the weighted value of i-th of label of any one target user；weight B_iIndicate described any The weighted value of i-th of label of one candidate user；Indicate each label that any one described target user carries Weighted value mean value；Indicate the mean value of the weighted value for each label that any one described candidate user carries.

Step 420: the size based on similarity value is ranked up at least one above-mentioned candidate user, and filters out Meet expectation threshold value and the preceding each candidate user of similarity value sequence, merges with initial potential user group, composition is new Potential user group.

Specifically, determining in each candidate user extracted and initial potential user group between each target user Similarity after, each candidate user can be ranked up, then, filter out and meet the phase based on the size of each similarity It hopes threshold value and the preceding each candidate user of similarity value sequence, merges with initial potential user group, form new target User group.

For example, it is assumed that initial potential user group co-exists in 1,000 target users, expectation threshold value is 10,000, then using upper Method is stated, preceding 9,000 candidate users of similarity value sequence is filtered out, is formed with fixed initial potential user group New potential user group.

As shown in fig.5, after carrying out user's extension to initial potential user group, being obtained new in the present embodiment Potential user group.

Below with reference to specific implement scene, the above-mentioned three kinds of embodiments of the present invention are described in further detail, it is specific to join It reads shown in Fig. 6, in the embodiment of the present invention, is based on User action log, carries out the training of potential user group location model, determine mesh After marking user group location model, by the fixed target user's content scores matrix of model, preferred contents are recommended based on user, with And it is based on commending contents potential user group, for the insufficient defect of the potential number of users of commending contents potential user group, into one Step ground, excavates user similar with target user, carries out user's extension.

Based on embodiment one, as shown in fig.7, potential user group location model establishes device in the embodiment of the present invention, It includes at least, acquiring unit 71, quantifying unit 72, division unit 73 and training unit 74, wherein

Acquiring unit 71, for obtaining several user content behavioral datas, wherein several user content behavioral datas The behavioral data of different behavior types is executed for different content comprising different user；

Quantifying unit 72, for executing quantization behaviour to several user content behavioral datas using default quantizing rule Make, obtains corresponding user content rating matrix；

Division unit 73, for according to preset ratio, by the user content rating matrix be divided into training sample set and Test sample collection；

Training unit 74 carries out model training using the training sample set for being based on preset collaborative filtering, Until determining that carrying out evaluation index obtained by model evaluation using the test sample collection meets specified requirements, determine that target is used Family group's location model.

Optionally, using default quantizing rule, quantization operation, acquisition pair are executed to several user content behavioral datas When the user content rating matrix answered, the quantifying unit is used for:

Optionally, the pre-set user content scores regular expression are as follows:

The time decaying weight expression formula are as follows:

Optionally, it is based on preset collaborative filtering, when carrying out model training using the training sample set, the instruction Practice unit 74 to be used for:

Optionally, when carrying out model evaluation using the test sample collection, the training unit is used for:

Based on embodiment two and embodiment three, as shown in fig.8, in the embodiment of the present invention, potential user group positioning device, It includes at least, determination unit 81 and screening unit 82, wherein

Determination unit 81 determines that object content to be positioned is corresponding for using the potential user group location model The user content of at least one user scores；

Screening unit 82, for filtered out from the scoring of at least one user content value meet preset threshold with indoor Hold scoring, and the corresponding each user of the user content filtered out scoring is determined as potential user group.

Optionally, described device further includes expanding element 83, and the expanding element 83 is used for:

Optionally, it is based on preset user's similarity calculation algorithm, calculates any one target in the potential user group When similarity between user and any one candidate user of at least one candidate user, the expanding element 83 is used In:

In conclusion in the embodiment of the present invention, using the potential user group location model pre-established, obtain with it is to be positioned The user content scoring of at least one corresponding user of object content then filter out and take from the scoring of each user content Value meets the user content scoring of preset threshold, and the corresponding each user of the user content filtered out scoring is determined as target User group, wherein potential user group location model is the row for executing different behavior types for different content according to different user It is established for data through multiple training, in this way, by comprehensive different user to the behavioral data of different content, to different user Correlation between different content scores, and the positioning of potential user group is carried out by scoring height auxiliary, and it is fixed to improve The accuracy of position, further improves user experience.

It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.

Obviously, those skilled in the art can carry out various modification and variations without departing from this hair to the embodiment of the present invention The spirit and scope of bright embodiment.In this way, if these modifications and variations of the embodiment of the present invention belong to the claims in the present invention And its within the scope of equivalent technologies, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of method for building up of potential user group location model characterized by comprising

Obtain several user content behavioral datas, wherein several user content behavioral datas include different user for not The behavioral data of different behavior types is executed with content；

Using default quantizing rule, quantization operation is executed to several user content behavioral datas, is obtained corresponding with indoor Hold rating matrix；

Based on preset collaborative filtering, model training is carried out using the training sample set, until determining to use the survey Until evaluation index obtained by examination sample set progress model evaluation meets specified requirements, potential user group location model is determined.

2. the method as described in claim 1, which is characterized in that using default quantizing rule, to several user content rows Quantization operation is executed for data, obtains corresponding user content rating matrix, comprising:

Based on several user content behavioral datas, following operation is executed for each corresponding content of each user:

Numerical value of each behavior type of Current Content in each default weight factor is determined respectively, wherein described each default Weight factor is including but not limited to following any one or combination: behavior number weight, behavior type weight and time decaying power Weight；

Following operation is executed for each behavior type: using pre-set user content scores rule, Behavior-based control number weight Numerical value, the numerical value of behavior type weight and the numerical value of time decaying weight, determine the scoring of corresponding user content；

The scoring of the various actions type of acquisition corresponding user content is overlapped, the corresponding user of the Current Content is obtained Content scores；

It scores the corresponding user content of each content of each user of acquisition and executes normalized, form with indoor Hold rating matrix.

3. method according to claim 2, which is characterized in that the pre-set user content scores regular expression are as follows:

The time decaying weight expression formula are as follows:

Time decaying weight=exp (- a* (t1-t0)), wherein the t1 is current point in time, when the t0 is that behavior occurs Between point, a be preset specified attenuation amplitude.

4. method as claimed in claim 1,2 or 3, which is characterized in that preset collaborative filtering is based on, using the instruction Practice sample set and carry out model training, comprising:

Based on preset collaborative filtering, matrix decomposition is carried out to the training sample set, obtains user characteristics vector matrix With content feature vector matrix；

Based on the user characteristics vector matrix and the content feature vector matrix, target user's content scores matrix is determined.

5. method as claimed in claim 4, which is characterized in that carry out model evaluation using the test sample collection, comprising:

At least one test user is selected from test sample concentration, and determines that at least one described test user is corresponding User content scoring, as actual user's content scores；

From target user's content scores matrix, the corresponding user content scoring of at least one described test user is determined, As test user content scoring；

The standard error for calculating actual user's content scores and corresponding test user content scoring, as evaluation index.

6. a kind of potential user group localization method, which is characterized in that obtained using the method according to claim 1 to 5 Potential user group location model, comprising:

Using the potential user group location model, determine at least one corresponding user of object content to be positioned with indoor Hold scoring；

Value is filtered out from the scoring of at least one user content and meets the user content scoring of preset threshold, and will be filtered out The corresponding each user of user content scoring is determined as potential user group.

7. method as claimed in claim 6, which is characterized in that further comprise:

When the target user's number for determining that the potential user group includes is not up to expectation threshold value, from designated user pond extract to A few candidate user；

Using preset user's similarity calculation algorithm, calculate in the potential user group each target user respectively with it is described extremely Similarity between each candidate user of a few candidate user；

Based on the size of similarity value, at least one described candidate user is ranked up, and filters out and meets the expectation Threshold value and the preceding each candidate user of similarity value sequence, merge with initial potential user group, form new target and use Family group.

8. the method for claim 7, which is characterized in that preset user's similarity calculation algorithm is based on, described in calculating In potential user group between any one target user and any one candidate user of at least one candidate user Similarity, comprising:

It determines the weighted value for each label that any one described target user carries, and determines any one described candidate user The weighted value of each label carried, wherein the tag class of target user and the tag class of candidate user are identical；

The similarity being calculated using the following equation between any one described target user and any one described candidate user:

Wherein, i indicates predefined number of tags；A indicates any one target user；B indicates any one candidate user； weight A_iIndicate the weighted value of i-th of label of any one target user；weight B_iIndicate it is described any one The weighted value of i-th of label of candidate user；Indicate the power for each label that any one described target user carries The mean value of weight values；Indicate the mean value of the weighted value for each label that any one described candidate user carries.

9. a kind of potential user group location model establishes device characterized by comprising

Acquiring unit, for obtaining several user content behavioral datas, wherein several user content behavioral datas include not The behavioral data of different behavior types is executed for different content with user；

Quantifying unit, for executing quantization operation to several user content behavioral datas, obtaining using default quantizing rule Corresponding user content rating matrix；

Division unit, for according to preset ratio, the user content rating matrix to be divided into training sample set and test specimens This collection；

Training unit carries out model training using the training sample set, until sentencing for being based on preset collaborative filtering Surely until meeting specified requirements using evaluation index obtained by test sample collection progress model evaluation, determine that potential user group is fixed Bit model.

10. a kind of potential user group positioning device, which is characterized in that obtained using the method according to claim 1 to 5 The potential user group location model obtained, comprising:

Determination unit determines object content to be positioned corresponding at least one for using the potential user group location model The user content of a user scores；

Screening unit, the user content that preset threshold is met for filtering out value from the scoring of at least one user content are commented Point, and the corresponding each user of the user content filtered out scoring is determined as potential user group.

11. a kind of electronic equipment characterized by comprising one or more processors；And

One or more computer-readable mediums are stored with the foundation for potential user group location model on the readable medium Program, wherein when described program is executed by one or more of processors, realize such as any one of claims 1 to 5 institute The step of method stated.

12. one or more computer-readable mediums, which is characterized in that be stored on the readable medium for potential user group The program of the foundation of location model, wherein when described program is executed by one or more processors, so that communication equipment executes such as Method described in any one of claims 1 to 5.

13. a kind of electronic equipment characterized by comprising one or more processors；And

One or more computer-readable mediums are stored with the program for potential user group positioning on the readable medium, In, when described program is executed by one or more of processors, realize the method as described in any one of claim 6 to 8 The step of.

14. one or more computer-readable mediums, which is characterized in that be stored on the readable medium for potential user group The program of positioning, wherein when described program is executed by one or more processors, so that communication equipment executes such as claim 6 To method described in any one of 8.