CN110377821A - Generate method, apparatus, computer equipment and the storage medium of interest tags - Google Patents

Generate method, apparatus, computer equipment and the storage medium of interest tags Download PDF

Info

Publication number
CN110377821A
CN110377821A CN201910525807.XA CN201910525807A CN110377821A CN 110377821 A CN110377821 A CN 110377821A CN 201910525807 A CN201910525807 A CN 201910525807A CN 110377821 A CN110377821 A CN 110377821A
Authority
CN
China
Prior art keywords
user
application
sample
preference value
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910525807.XA
Other languages
Chinese (zh)
Inventor
苏显政
蔡健
郭凌峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Smart Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Smart Technology Co Ltd filed Critical OneConnect Smart Technology Co Ltd
Priority to CN201910525807.XA priority Critical patent/CN110377821A/en
Publication of CN110377821A publication Critical patent/CN110377821A/en
Priority to PCT/CN2020/086369 priority patent/WO2020253369A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application involves user's Portrait brand technology fields, provide a kind of method, apparatus, computer equipment and storage medium for generating interest tags.The described method includes: obtaining user's usage record collection of application program at the appointed time section, each application program identification is calculated corresponding to preference value corresponding to user identifier;According to the application program identification under same application domain type correspond to user identifier corresponding to the preference value, determine the classification thresholds of each Application Type respectively;Data set is used according to the user of each Application Type determined based on user's usage record collection, and carries out conditional filtering according to the classification thresholds, to filter out user identifier;According to user where the user identifier filtered out using Application Type corresponding to data set, interest tags corresponding to the user identifier filtered out are determined.It can be improved the accuracy rate for generating the interest tags of each behavior type using this method.

Description

Generate method, apparatus, computer equipment and the storage medium of interest tags
Technical field
This application involves technical field of information processing, more particularly to a kind of method, apparatus for generating interest tags, calculate Machine equipment and storage medium.
Background technique
It with the development of internet and applies, the differentiated services such as personalized recommendation, diversification marketing are in people's lives It is widely applied, and these differentiated services be unable to do without user's portrait.The core work of user's portrait is generated to user Label.By carrying out labeling work to user, user behavior can be analyzed and predicted from macroscopic perspective, help to mention Enterprise is risen for the precision of the marketing behavior of specific user.
Currently, the label generating method of most of user's portrait generates user tag using keyword extracting method, so And this method has that the accuracy rate for generating label is lower.
Summary of the invention
Based on this, it is necessary in view of the above technical problems, provide method, apparatus, the computer of a kind of generation interest tags Equipment and storage medium.
A method of generating interest tags, which comprises
The user's usage record collection for obtaining application program at the appointed time section, calculates each application program identification and corresponds to Preference value corresponding to user identifier;User's usage record that user's usage record is concentrated includes user identifier and using journey Sequence mark;
Application Type is determined based on application program identification, according to the application program identification under same application domain type Corresponding to the preference value corresponding to user identifier, the classification thresholds of each Application Type are determined respectively;It is described to apply journey There are corresponding default interest tags for sequence type;
According to the user of each Application Type determined based on user's usage record collection using data set, and according to The classification thresholds carry out conditional filtering, to filter out user identifier;
According to default interest corresponding to Application Type of the user using data set where the user identifier filtered out Label determines interest tags corresponding to the user identifier filtered out.
User's usage record that user's usage record is concentrated in one of the embodiments, further includes using weight; The basis at the appointed time in section application program user's usage record collection, calculate each application program identification corresponding to user Identifying corresponding preference value includes:
Obtain the corresponding number of users of each application program identification and the corresponding total user of user's usage record collection Number;
Acquisition is corresponding with the user identifier and the application program identification to use weight;
According to the specific gravity of total number of users and the number of users and it is described use weight, calculate each application program mark Know and corresponds to preference value corresponding to user identifier.
The application program identification under the type according to same application domain corresponds to user in one of the embodiments, The corresponding preference value of mark, determines that the classification thresholds of each Application Type include: respectively
Correspond to preference value corresponding to user identifier based on the application program identification under the same application domain type, The corresponding preference value of same application domain type is ranked up by ascending order, obtains the ranking results of preference value;
According to the ranking results of the preference value, corresponding each preference value under same application domain type is calculated Quantile;
The classification thresholds of each Application Type are determined according to the quantile.
The ranking results according to the preference value in one of the embodiments, calculate under each Application Type The quantile of corresponding each preference value includes:
According to the ranking results of the preference value, determine each preference value under each Application Type in corresponding sequence knot Probability of occurrence in fruit;The cumulative probability that each preference value under each Application Type is determined according to the probability of occurrence, obtains The quantile of each preference value under to each Application Type;.Or,
The ranking results according to the preference value in one of the embodiments, calculate under each Application Type The quantile of corresponding each preference value includes:
Obtain sequence position of each preference value under each Application Type in locating ranking results and respectively using journey Sequence identifies the corresponding ordering user number of owning application type;By each preference value under each Application Type in locating row Sequence position in sequence result obtains point of the corresponding each preference value of each Application Type divided by the ordering user number Digit.
The classification thresholds packet that each Application Type is determined according to the quantile in one of the embodiments, It includes:
According to the quantile, correspond to each Application Type, filters out be greater than or equal to accordingly default threshold respectively The quantile of value;
Corresponding to each Application Type, the difference of adjacent quantile is calculated according to the quantile filtered out;
Quantile corresponding to corresponding each calculated each maximum difference of Application Type is obtained, each application is obtained The classification thresholds of Program Type.
Each Application Type that the basis is determined based on user's usage record collection in one of the embodiments, User use data set, and according to the classification thresholds carry out conditional filtering, include: to filter out user identifier
Obtain user's usage record sample set of known interest tags;
According to user's usage record sample set, the classification thresholds are adjusted;
Data set is used according to the user, and carries out conditional filtering according to the classification thresholds adjusted, with screening User identifier out.
User's usage record sample includes that sample is used in user's usage record sample set in one of the embodiments, Family mark, interest tags, sample Application Type, sample application program identification and sample use weight;
It is described according to user's usage record sample set, the classification thresholds be adjusted include:
According to user's usage record sample set, various kinds application type is determined by the known interest tags Sample of users use data set, the sample using data set include corresponding sample of users mark, sample application program identification, Interest tags and sample preference value;
The sample of users of known label based on various kinds application type uses data set, calculates various kinds this application journey The quantile of each sample preference value of sequence type;
Data set is used according to the sample of users of the known label, carries out conditional filtering according to the classification thresholds, with Filter out sample of users mark;
It is right using the sample Application Type institute of data set according to sample of users where the sample of users mark filtered out The default interest tags answered determine prediction interest tags corresponding to the user identifier filtered out;
According to the prediction interest tags of the sample of users data set and the calculated every class of known corresponding interest tags The recall ratio of sample Application Type adjusts the classification thresholds.
A kind of device generating interest tags, described device include:
Usage record obtains module, for obtaining user's usage record collection of application program at the appointed time section, calculates Each application program identification corresponds to preference value corresponding to user identifier;The user that user's usage record is concentrated uses note Record includes user identifier and application program identification;
Classification thresholds determining module, for determining Application Type based on application program identification, according to same application journey Application program identification under sequence type corresponds to the preference value corresponding to user identifier, determines each Application Type respectively Classification thresholds;There are corresponding default interest tags for the Application Type;
Subscriber Identity Module is screened, for according to each Application Type determined based on user's usage record collection User uses data set, and carries out conditional filtering according to the classification thresholds, to filter out user identifier;
Interest tags generation module, for using the application program of data set according to user where the user identifier filtered out Default interest tags corresponding to type determine interest tags corresponding to the user identifier filtered out.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing The step of device realizes above-mentioned generation interest tags method when executing the computer program.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor The step of above-mentioned generation interest tags method is realized when row.
Method, apparatus, computer equipment and the storage medium of above-mentioned generation interest tags, based on being obtained at the appointed time section User's usage record collection of the application program taken determines that each application program identification corresponds to the preference value of user identifier, more preferably Characterization user use each application program preference.Further, pass through the application under analysis same application domain type Program identification corresponds to the overall distribution situation of preference value corresponding to user identifier, and point of each Application Type is determined with this Class threshold value has fully considered the overall distribution situation of preference value under same application domain type, mentions for subsequent screening user identifier For more accurate screening foundation.Furthermore by the user of each Application Type using data set according to corresponding classification thresholds It is screened, to filter out qualified user identifier, improves the accuracy rate for generating the interest tags of each behavior type.
Detailed description of the invention
Fig. 1 is the application scenario diagram that the method for interest tags is generated in one embodiment;
Fig. 2 is the flow diagram that the method for interest tags is generated in one embodiment;
Fig. 3 is the structural block diagram that the device of interest tags is generated in one embodiment;
Fig. 4 is the internal structure chart of computer equipment in one embodiment.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.
Generation interest tags method provided by the present application, can be applied in application environment as shown in Figure 1.Wherein, eventually End 102 is communicated with server 104 by network by network.Server 104 obtains application program in designated time period User's usage record collection calculates each application program identification corresponding to preference value corresponding to user identifier;Wherein user uses Record set can be triggered by terminal 102 and be generated;And user is corresponded to according to the application program identification under same application domain type The corresponding preference value of mark, determines the classification thresholds of each Application Type respectively.Further, server 104 is according to obtaining Classification thresholds conditional filtering is carried out using data set to the user of corresponding application programs type, to filter out user identifier;According to According to Application Type corresponding to the user identifier filtered out, server 104 is using the Application Type as filtering out The interest tags of user identifier.Wherein, terminal 102 can be, but not limited to be various personal computers, laptop, intelligent hand Machine, tablet computer and portable wearable device, server 104 can be formed with the either multiple servers of independent server Server cluster realize.
In one embodiment, as shown in Fig. 2, providing a kind of method for generating interest tags, it is applied in this way It is illustrated for server in Fig. 1, comprising the following steps:
Step S202 obtains user's usage record collection of application program at the appointed time section, calculates each application program Mark corresponds to preference value corresponding to user identifier;User's usage record concentrate user's usage record include user identifier and Application program identification.
Wherein, user's usage record collection includes each user's usage record, each user's usage record include user identifier, Application program identification and use weight.User's usage record contains information abundant, such as the similitude between user, application The preference of similitude and user between program to each application program.Wherein, user identifier is each user of difference Unique identification can be User ID (Identification).Application program identification is the unique mark for distinguishing each application program Know.
Wherein, preference value is characterization and the corresponding user's use of user identifier application program corresponding with application program identification Use preference;Preference value number of users corresponding with application program identification, the corresponding total number of users of user's usage record collection And it is related using weight.
Specifically, user's triggering terminal generates user's usage record collection of each application program, and the user of generation is made User's usage record collection can also be stored directly in terminal by network transmission to server with record set.Server can To obtain user's usage record collection in middle acquisition designated time period from each terminal, designated time period can also be obtained from server Interior user's usage record collection.Server after getting user's usage record collection of application program in designated time period, according to Family usage record, which collects, to be calculated corresponding to each application program identification in the preference value of user identifier.
Server is based on user's usage record and concentrates each user's usage record in one of the embodiments, obtains every The corresponding number of users of a application program identification and the corresponding total number of users of user's usage record collection;And obtain corresponding user's mark Know the use weight with application program identification, and then according to the specific gravity of the total number of users of number of users Zhan and uses weight calculation each The preference value of the corresponding user identifier of application program identification.
Step S204 determines Application Type based on application program identification, according to answering under same application domain type Correspond to preference value corresponding to user identifier with program identification, determine the classification thresholds of each Application Type respectively, applies There are corresponding default interest tags for Program Type.
Wherein, Application Type refers to the classification for distinguishing each application program, such as video type.Classification thresholds refer to Preference value owning application type classification Rule of judgment, according to the classification thresholds may determine that preference value corresponding to use Family identifies whether to belong to Application Type belonging to the preference value.Classification thresholds characterize under same application domain type, Each user accounts for the specific gravity of the whole usage behavior under the Application Type to the usage behavior of application program.
Specifically, server is based on user's usage record collection, and each application program identification is calculated and marks corresponding to user Know corresponding preference value, and corresponding Application Type, each application class are determined according to each application program identification There are corresponding default interest tags for type;Wherein preset interest tags can be consistent with Application Type, is also possible to table Levy the mark being consistent with Application Type.Under same application domain type, server is according to the preference value being calculated point The classification thresholds of each Application Type are not determined.Whether may determine that user identifier corresponding to preference value by classification thresholds Belong to Application Type belonging to the preference value.
Step S206 uses data set according to the user of each Application Type determined based on user's usage record collection, And conditional filtering is carried out according to classification thresholds, to filter out user identifier.
Wherein, user includes the corresponding user of each Application Type using data set using data set, and user uses Data set includes the user identifier to correspond to each other, application program identification and preference value.
Specifically, data set is used for the corresponding user of each Application Type, server makes according to the user The corresponding classification thresholds of the Application Type where data set carry out conditional filtering, are sieved for the user using data set with this Select qualified user identifier.
Step S208 uses Application Type corresponding to data set according to user where the user identifier filtered out, Determine interest tags corresponding to the user identifier filtered out.
Wherein, interest tags refer to the label for being different from tendency of the user with certain class behavior type;For example, user is frequent Using video class application program, the interest tags of the corresponding user are video.
Specifically, based on each user filtered out using qualified user identifier in data set, server is from number Application Type, the i.e. interest of the user identifier corresponding to data set are used according to user where obtaining the user identifier in library Label is corresponding Application Type.
In above-described embodiment, user's usage record collection based on the application program obtained at the appointed time section is determined each A application program identification corresponds to the preference value of user identifier, preferably characterizes the preference journey that user uses each application program Degree.Further, preference corresponding to user identifier is corresponded to by the application program identification under analysis same application domain type The overall distribution situation of value determines the classification thresholds of each Application Type with this, has fully considered same application domain type The overall distribution situation of lower preference value provides more accurate screening foundation for subsequent screening user identifier.Furthermore by each application The user of Program Type is screened using data set according to corresponding classification thresholds, to filter out qualified user's mark Know, improves the accuracy rate for generating the interest tags of each behavior type.
In one embodiment, user's usage record that user's usage record is concentrated further includes using weight;According to referring to User's usage record collection of application program in section of fixing time calculates each application program identification corresponding to corresponding to user identifier Preference value, comprising the following steps: obtain the corresponding number of users of each application program identification and user's usage record collection is corresponding Total number of users;Acquisition is corresponding with user identifier and application program identification to use weight;According to the ratio of the total number of users of number of users Zhan Again and weight is used, calculates each application program identification corresponding to preference value corresponding to user identifier.
Wherein, the usage degree of specific application program in various application programs used by a user is characterized using weight Specific gravity.It can determine according to the mount message of application program, access times, using duration and power consumption using weight.
Specifically, server obtains the corresponding user of each application program identification based on obtained user's usage record collection The several and corresponding total number of users of user's usage record collection;And it is obtained pair according to user identifier and application program identification from database The use weight answered.Server calculates each application program mark according to the total number of users, the number of users and using weight that get Know and corresponds to preference value corresponding to user identifier.I.e. server is according to total number of users number of users corresponding with application program identification Specific gravity and application program identification are corresponding using weight, calculate each application program identification corresponding to corresponding to user identifier Preference value.
Preference value is corresponding with application program identification in one of the embodiments, is positively correlated using weight, and with answer It is positively correlated with the corresponding number of users specific gravity of program.Wherein number of users specific gravity is with the corresponding total number of users of user's usage record collection Increase and increase, and is reduced with the growth of the corresponding number of users of application program identification.Optionally, preference value, which can be, answers With the corresponding number of users specific gravity of program identification and the corresponding product using weight of application program identification;Number of users specific gravity can be The logarithm of the ratio of total number of users number of users corresponding with application program identification.
For example, for example, obtain Xiao Ming and the small red application program in nearest one month user's usage record collection, Xiao Ming and the small red usage record using Tencent's video, Baidu's video and potato video are obtained, { (A is expressed as1, A2), (B2, B3)}.Wherein, A1Indicate that Xiao Ming watches the weight of Tencent's video, A2And B2Respectively indicate Xiao Ming and small red viewing potato video Weight, B3Indicate the weight of small red viewing Baidu's video.Then steps are as follows using the calculating of preference value of Tencent's video by Xiao Ming:
(1) total number of users of Tencent's video corresponding number of users and user's usage record collection is obtained:
The corresponding number of users of Tencent's video is 1, and total number of users of user's usage record collection is 2;That is user's usage record collection Total number of users and Tencent's video number of users specific gravity are as follows: IDF=log (2/1), in order to avoid in log (x) function variable ginseng The denominator of number x is 0, can also add 1 to the denominator of x.
(2) the weight TF:TF=A that Xiao Ming watches Tencent's video is obtained1
(3) the preference value TF*IDF:TF*IDF=A that Xiao Ming uses Tencent's video is calculated1*log(2/1)。
In the present embodiment, corresponding total based on the corresponding number of users of each application program identification, user's usage record collection Number of users and each application program identification correspond to use weight corresponding to user identifier, calculate each application program identification Preference value corresponding to corresponding and user identifier.By introducing using weight and application program and whole accounting situation, more Good characterization user uses the preference of each application program.
In one embodiment, right corresponding to user identifier institute according to the application program identification under same application domain type The preference value answered determines the classification thresholds of each Application Type respectively, comprising the following steps: is based on same application domain type Under application program identification correspond to user identifier corresponding to preference value, by the corresponding preference of same application domain type Value is ranked up by ascending order, obtains the ranking results of preference value;According to the ranking results of preference value, same application domain class is calculated The quantile of corresponding each preference value under type;The classification thresholds of each Application Type are determined according to quantile.
Wherein, quantile refers to: concentrating in discrete data, the quantile of data a is to meet owning for condition P (X≤a) The cumulative probability that the probability conjunction of data, the i.e. quantile of a are corresponding a.The value range of quantile is and to be less than or wait greater than 0 In 1.
Specifically, user's usage record collection based on acquisition and the preference value being calculated, in same application domain class Under type, preference value corresponding under same application domain type is ranked up by server by sequence from small to large respectively, Obtain the ranking results of the preference value of each Application Type.According to the ranking results of each preference value of acquisition, server is calculated The quantile of corresponding each preference value under same application domain type;And each application class is determined according to quantile The corresponding classification thresholds of type, the i.e. value range of classification thresholds can be between 0 to 1, and can be 1.
In the present embodiment, it by being ranked up to the corresponding preference value of each Application Type by ascending order, obtains The corresponding ranking results of each Application Type;Further each Application Type is calculated separately according to ranking results respectively to correspond to Each preference value quantile, the classification thresholds of each behavior type are determined according to each quantile being calculated.Using each The quantile overall distribution situation of Application Type determines classification thresholds, has fully considered overall distribution situation, is subsequent The generation of interest tags provides foundation.
In one embodiment, it according to the ranking results of preference value, calculates corresponding every under each Application Type Each of the quantile of a preference value, comprising the following steps: according to the ranking results of preference value, determine under each Application Type Probability of occurrence of the preference value in corresponding ranking results;Each preference value under each Application Type is determined according to probability of occurrence Cumulative probability, obtain the quantile of each preference value under each Application Type.
Wherein, probability of occurrence refers to that in the corresponding user of a certain behavior type, the user uses data using in data set The probability for concentrating each preference value to occur.Cumulative probability refers to be used in data set in the corresponding user of a certain behavior type, will Probability of occurrence no more than all preference values of the preference value is added, and acquired results are cumulative probability.
Specifically, server divides according to the ranking results of the obtained corresponding preference value of each Application Type Probability of occurrence of the corresponding each preference value of each Application Type in ranking results is not calculated.Based on what is be calculated Probability of occurrence, server determine the cumulative probability of the corresponding each preference value of each Application Type according to probability of occurrence, I.e. the cumulative probability be corresponding preference values quantile.
For example, including each application in the data set for example, data set for a certain same application domain type Program identification corresponds to preference value corresponding to user identifier;Each preference value is ranked up according to ascending order, obtains preference value Ranking results.If the ranking results of preference value are as follows: 1,1,2,2,3,4,5,6,7,8;The appearance then corresponded to when preference value is 1 is general Rate: P (1)=2/10, probability of occurrence when preference value is 2: P (2)=2/10, probability of occurrence when preference value is 3: P (3)=1/ 10, then cumulative probability when preference value is 3 is P (1)+P (2)+P (3), i.e., quantile when preference value is 3 is 50%.
In the present embodiment, determine each preference value under each Application Type in phase based on the ranking results of preference value The probability of occurrence in ranking results is answered, the tired of each preference value under each Application Type is further obtained according to probability of occurrence Product probability, to obtain the quantile of each preference value under each Application Type.Quantile is calculated using cumulative probability, from Reflect that individual accounts for whole specific gravity situation in each Application Type, has fully considered the relationship between data on the whole, is subsequent Screening user identifier provides more accurate screening foundation.
In one embodiment, it according to the ranking results of preference value, calculates corresponding every under each Application Type The quantile of a preference value, comprising the following steps: obtain each preference value under each Application Type in locating ranking results In sequence position and the corresponding ordering user number of each application program identification owning application type;By each Application Type Under sequence position of each preference value in locating ranking results divided by ordering user number, it is respectively right to obtain each Application Type The quantile for each preference value answered.
Wherein, sequence position refers to that each element in a data set is ranked up according to certain logic, and each element exists The location of in data set.Ordering user number refers to that a data concentrate the total number of corresponding all elements.
Specifically, sequence knot of the server based on the corresponding each preference value of each Application Type being calculated Fruit, get respectively sequence position of the corresponding each preference value of each Application Type in the ranking results of locating preference value with And each corresponding ordering user number of Application Type.After server gets corresponding data, by each Application Type The sequence position of corresponding each preference value is divided by with the ordering user number of the corresponding Application Type, i.e., resulting calculating It as a result is the quantile of the corresponding preference value of each Application Type.
For example, including each application program identification in the data set for the data set of a certain same application domain type Corresponding to preference value corresponding to user identifier;Each preference value is ranked up according to ascending order, obtains the sequence knot of preference value Fruit.If the preference value A in data set sorts in corresponding ranking results, position is 5, while preference value A is in locating application program The ordering user number of type is 10, then the quantile of the preference value is 5/10*100%, i.e. quantile is 50%.For example, preference The ranking results of value are as follows: 0,1,2,3,4,5,6,7,8,9;Corresponding quantile is 70% when then preference value is 6.
In the present embodiment, based on the corresponding each preference value of each Application Type in locating ranking results Sort position and the corresponding ordering user number of each Application Type, determines that each Application Type is corresponding each The quantile of preference value.Quantile is determined by sequence position and ordering user number, can be further reduced in computer level Calculation amount improves the rate for generating interest tags to improve the speed of calculating.
In one embodiment, the classification thresholds of each Application Type are determined according to quantile, comprising the following steps: according to According to quantile, corresponds to each Application Type, filter out the quantile more than or equal to corresponding first preset threshold respectively; Corresponding to each Application Type, the difference of adjacent quantile is calculated according to the quantile filtered out;Correspondence is obtained respectively to answer The quantile corresponding to the calculated each maximum difference of Program Type, obtains the classification thresholds of each Application Type.
Wherein, preset threshold is the boundary value for the judgement quantile being set in advance, and threshold value can store in the database;In advance If threshold value is the boundary value of quantile corresponding with each Application Type.Difference refers to that two data carry out obtained by subtraction Calculated result;It can be two adjacent quantiles to carry out subtracting each other resulting result.
Specifically, according to the quantile of the corresponding each preference value of each Application Type being calculated, for Each corresponding quantile of Application Type, server obtain the default threshold of corresponding Application Type from database Value, filters out the quantile more than or equal to the preset threshold according to preset threshold.Corresponding to each Application Type, service Device calculates separately the difference of two adjacent quantiles according to the quantile filtered out.Server is answered according to each of being calculated With the corresponding difference of Program Type, two quantiles corresponding to maximum difference are obtained, will sort the quantile of position rearward Classification thresholds as the corresponding Application Type.
In the present embodiment, the classification thresholds of the corresponding preference value of each Application Type are determined based on quantile, Select the classification thresholds that more apparent quantile is distributed in each Application Type as the Application Type.Further, The overall distribution characteristic of each Application Type data is made full use of, provides guarantee for the accuracy rate of interest tags.
In one embodiment, number is used according to the user of each Application Type determined based on user's usage record collection Conditional filtering is carried out according to collection, and according to classification thresholds, to filter out user identifier, comprising the following steps: obtain known interest mark User's usage record sample set of label;According to user's usage record sample set, classification thresholds are adjusted;It is used according to user Data set, and conditional filtering is carried out according to classification thresholds adjusted, to filter out user identifier.
Wherein, user's usage record sample set includes each user's usage record sample, and user's usage record collection includes each The corresponding user of a Application Type uses data set, and user includes the user identifier to correspond to each other, application using data set Program identification and preference value.
Specifically, server obtains user's usage record sample set of interest tags, root from database or terminal The corresponding classification thresholds of each Application Type are adjusted respectively according to the user's usage record sample set got.Into one Step uses data set based on user, and server is corresponding to each Application Type every according to classification thresholds adjusted A preference value carries out conditional filtering, to filter out the user identifier for the screening conditions for meeting above-mentioned preference value.
In the present embodiment, user's usage record sample set based on known interest tags, to each Application Type pair The classification thresholds answered are adjusted, the classification thresholds after being adjusted with this.Using user's usage record sample set to classification threshold Value is tested, and the accuracy of interest tags is improved.
In one embodiment, in user's usage record sample set user's usage record sample include sample of users mark, Interest tags, sample Application Type, sample application program identification and sample use weight;According to user's usage record sample Collection, is adjusted classification thresholds, comprising the following steps: according to user's usage record sample set, determines by known interest tags The sample of users of various kinds application type uses data set, sample using data set include corresponding sample of users mark, Sample application program identification, interest tags and sample preference value;The sample of known label based on various kinds application type User uses data set, calculates the quantile of each sample preference value of various kinds application type;According to known label Sample of users uses data set, carries out conditional filtering according to classification thresholds, to filter out sample of users mark;According to what is filtered out User is determined corresponding to the user identifier filtered out using Application Type corresponding to data set where sample of users mark Prediction interest tags;It is calculated every according to the prediction interest tags of sample of users data set and known corresponding interest tags The recall ratio of class sample Application Type adjusts classification thresholds.
Wherein, user's usage record sample set includes each user's usage record sample, each user's usage record sample Weight is used including sample of users mark, interest tags, sample Application Type, sample application program identification and sample. Sample of users mark is the unique identification for distinguishing each sample of users.Sample Application Type is answered with each of sample of users With the corresponding type of program, sample Application Type and Application Type are corresponding relationships, and Application Type includes All sample Application Types.Sample application program identification is to distinguish the unique identification of each application program.Sample preference Value characterization and sample of users identify corresponding sample of users and use sample application program corresponding with sample application program identification Use preference.
Wherein, user's usage record sample set includes that the corresponding sample of users of each sample Application Type uses data Collection;Sample of users includes that corresponding sample of users mark, sample application program identification, interest tags and sample are inclined using data set Good value.
Wherein, interest tags refer to the label for being different from tendency of the user with certain class Application Type, for example, user The interest tags of often viewing video class application program, the corresponding user can be video.Predict that interest tags are according to emerging Interesting label generates the interest tags for the prediction that model generates.Recall ratio is for every class sample Application Type, each sample Total use of the prediction interest tags and the consistent number of users of known interest tags and such sample Application Type of user identifier The ratio of amount.Recall ratio illustrates the prediction interest tags for corresponding to such sample Application Type and known interest closer to 1 The consistency of label is higher, and it is more appropriate to further illustrate that the classification thresholds of such sample Application Type are chosen.
Specifically, server obtains user's usage record sample set of interest tags, root from database or terminal Classify according to known interest tags to it according to the user's usage record sample set got, obtains various kinds application class The corresponding sample of users of type uses data set.The corresponding sample of various kinds application type obtained based on classification User uses data set, and server calculates separately the quartile of the corresponding each sample preference value of various kinds application type Number.
Sample of users based on above-mentioned known label uses data set, and server is according to various kinds application type from number According to searching corresponding classification thresholds in library, and sample of users is screened using data set according to the classification thresholds found. When each sample of users using sample preference value meets screening conditions in data set when, filtered out sample of users mark.Its Middle screening conditions are: corresponding to each sample of users and use data set, sample preference value is greater than or equal to corresponding classification thresholds.
Server is identified according to the sample of users filtered out, and sample where sample of users mark is searched from database is used Family can be correspondence and looked into using sample Application Type corresponding to data set, i.e. the prediction interest tags of sample of users mark The sample Application Type found.Prediction interest tags and known corresponding interest based on sample of users using data set Label, corresponds to every class sample Application Type, server judge the prediction interest tags of each sample of users mark with Know whether interest tags are consistent, and with identification record judging result and stores in the server.When judging result is consistent can be with Labeled as 1;Otherwise, it is labeled as 0.For example, in a certain sample Application Type, the known interest of some sample of users mark Label is film, if prediction interest tags are also film, is recorded as 1;If the prediction interest tags of sample of users mark are It has a meal, is then recorded as 0.
According to record as a result, server calculates the recall ratio of every class sample Application Type;It is answered further according to Different categories of samples Corresponding classification thresholds are adjusted with the recall ratio of Program Type.If recall ratio does not meet adjustment threshold value, it is not required to classification thresholds It is adjusted;If recall ratio meets adjustment threshold value, classification thresholds are adjusted.It is determined further according to classification thresholds adjusted Sample of users uses the prediction label of data set, and calculates the recall ratio of every class sample Application Type, until user uses When the recall ratio of record sample set does not meet the range of adjustment threshold value, then stop the adjustment to corresponding classification thresholds;Adjust threshold value It can be set are as follows: recall ratio is lower than 95%.
In the present embodiment, user's usage record sample set based on known interest tags, is adjusted classification thresholds, Classification thresholds are adjusted according to the recall ratio of calculated each Application Type, until looking into for each Application Type is complete Rate does not meet adjustment threshold value.Classification thresholds are tested using user's usage record sample set, and are verified by recall ratio emerging The accuracy rate of interesting label further improves the accuracy of interest tags.
It should be understood that although each step in the flow chart of Fig. 2 is successively shown according to the instruction of arrow, this A little steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly state otherwise herein, these steps It executes there is no the limitation of stringent sequence, these steps can execute in other order.Moreover, at least part in Fig. 2 Step may include that perhaps these sub-steps of multiple stages or stage are executed in synchronization to multiple sub-steps It completes, but can execute at different times, the execution sequence in these sub-steps or stage, which is also not necessarily, successively to be carried out, But it can be executed in turn or alternately at least part of the sub-step or stage of other steps or other steps.
In one embodiment, as shown in figure 3, providing a kind of device 300 for generating interest tags, comprising: use note Record obtains module 302, classification thresholds determining module 304, screening Subscriber Identity Module 306 and interest tags generation module 308, Wherein:
Usage record obtains module 302, for obtaining user's usage record collection of application program at the appointed time section, counts Each application program identification is calculated corresponding to preference value corresponding to user identifier;User's usage record that user's usage record is concentrated Including user identifier and application program identification.
Classification thresholds determining module 304, for determining Application Type based on application program identification, according to same application Application program identification under Program Type corresponds to preference value corresponding to user identifier, determines each Application Type respectively Classification thresholds;There are corresponding default interest tags for Application Type.
Subscriber Identity Module 306 is screened, for according to each Application Type determined based on user's usage record collection User uses data set, and carries out conditional filtering according to classification thresholds, to filter out user identifier.
Interest tags generation module 308, for being used corresponding to data set according to user where the user identifier filtered out Application Type, determine interest tags corresponding to the user identifier that filters out.
In one embodiment, it includes: data acquisition module and preference value computing module that above-mentioned usage record, which obtains module,. Data acquisition module, for obtaining the corresponding number of users of each application program identification and the corresponding total use of user's usage record collection Amount;Acquisition is corresponding with user identifier and application program identification to use weight;Preference value computing module, for according to total user Number calculates each application program identification corresponding to preference corresponding to user identifier with the specific gravity of number of users and using weight Value.
In one embodiment, above-mentioned classification thresholds determining module includes: sorting module, quantile acquisition module and divides Class threshold calculation module.Sorting module, for being marked based on the application program identification under same application domain type corresponding to user Know corresponding preference value, the corresponding preference value of same application domain type is ranked up by ascending order, obtains preference value Ranking results;Quantile obtains module, for the ranking results according to preference value, calculates under same application domain type respectively The quantile of corresponding each preference value;Classification thresholds computing module, for determining each Application Type according to quantile Classification thresholds.
In one embodiment, above-mentioned quantile computing module includes: probability evaluation entity and cumulative probability computing module. Probability evaluation entity determines each preference value under each Application Type corresponding for the ranking results according to preference value Probability of occurrence in ranking results;Cumulative probability computing module, for being determined under each Application Type according to probability of occurrence The cumulative probability of each preference value obtains the quantile of each preference value under each Application Type.
In one embodiment, it includes: that sorting data obtains module and quantile calculating mould that above-mentioned quantile, which obtains module, Block.Sorting data obtains module, for obtaining row of each preference value in locating ranking results under each Application Type Tagmeme and the corresponding ordering user number of each application program identification owning application type;Quantile computing module, being used for will Sequence position of each preference value in locating ranking results under each Application Type obtains each application divided by ordering user number The quantile of the corresponding each preference value of Program Type.
In one embodiment, above-mentioned classification thresholds computing module include: the first screening module, difference calculating module and Second screening module.First screening module filters out big respectively for corresponding to each Application Type according to quantile In or equal to corresponding preset threshold quantile;Difference calculating module, for corresponding to each Application Type, according to screening Quantile out calculates the difference of adjacent quantile;Second screening module is calculated for obtaining corresponding each Application Type Each of out quantile corresponding to maximum difference, obtains the classification thresholds of each Application Type.
In one embodiment, above-mentioned screening Subscriber Identity Module includes: usage record sample acquisition module, classification thresholds Adjust module and conditional filtering module.Usage record sample acquisition module, the user for obtaining known interest tags use note Record sample set;Classification thresholds adjust module, for being adjusted to classification thresholds according to user's usage record sample set;Condition Screening module for using data set according to user, and carries out conditional filtering according to classification thresholds adjusted, to filter out use Family mark.
In one embodiment, above-mentioned classification thresholds adjustment module includes: that sample of users usage record collection obtains module, sample This user data set determining module, sample fractiles computing module, sample of users mark screening module, prediction interest tags generate Module and recall ratio computing module.Sample of users usage record collection obtains module, is used for according to user's usage record sample set, Being adjusted to classification thresholds includes: sample of users data set determining module, for pressing according to user's usage record sample set Know that interest tags determine the sample of users of various kinds application type using data set, sample includes corresponding using data set Sample of users mark, sample application program identification, interest tags and sample preference value;Sample fractiles computing module is used for base Data set is used in the sample of users of the known label of various kinds application type, calculates the every of various kinds application type The quantile of a sample preference value;Sample of users identifies screening module, for using data according to the sample of users of known label Collection carries out conditional filtering according to classification thresholds, to filter out sample of users mark;Predict interest tags generation module, for according to It is emerging according to being preset corresponding to sample Application Type of the sample of users using data set where the sample of users mark filtered out Interesting label determines prediction interest tags corresponding to the user identifier filtered out;Recall ratio computing module, for being used according to sample Looking into for the calculated every class sample Application Type of prediction interest tags and known corresponding interest tags of user data collection is complete Rate adjusts classification thresholds.
In the present embodiment, it based on user's usage record collection of the application program obtained at the appointed time section, determines each A application program identification corresponds to the preference value of user identifier, preferably characterizes the preference journey that user uses each application program Degree.Further, preference corresponding to user identifier is corresponded to by the application program identification under analysis same application domain type The overall distribution situation of value determines the classification thresholds of each Application Type with this, has fully considered same application domain type The overall distribution situation of lower preference value provides more accurate screening foundation for subsequent screening user identifier.Furthermore by each application The user of Program Type is screened using data set according to corresponding classification thresholds, to filter out qualified user's mark Know, improves the accuracy rate for generating the interest tags of each behavior type.
Specific about the device for generating interest tags limits the method that may refer to above for interest tags are generated Restriction, details are not described herein.Modules in the device of above-mentioned generation interest tags can be fully or partially through software, hard Part and combinations thereof is realized.Above-mentioned each module can be embedded in the form of hardware or independently of in the processor in computer equipment, It can also be stored in a software form in the memory in computer equipment, execute the above modules in order to which processor calls Corresponding operation.
In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 4.The computer equipment include by system bus connect processor, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment is used to store user's usage record collection, user uses data set, classification thresholds data.The computer equipment Network interface be used to communicate with external terminal by network connection.To realize one when the computer program is executed by processor The method that kind generates interest tags.
It will be understood by those skilled in the art that structure shown in Fig. 4, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with Computer program, which performs the steps of when executing computer program obtains application program at the appointed time section User's usage record collection calculates each application program identification corresponding to preference value corresponding to user identifier;User's usage record User's usage record of concentration includes user identifier and application program identification;Application class is determined based on application program identification Type, according to the application program identification under same application domain type correspond to user identifier corresponding to preference value, determine respectively The classification thresholds of each Application Type;There are corresponding default interest tags for Application Type;It is used according to based on user The user for each Application Type that record set determines uses data set, and carries out conditional filtering according to classification thresholds, with screening User identifier out;It is emerging according to being preset corresponding to Application Type of the user using data set where the user identifier filtered out Interesting label determines interest tags corresponding to the user identifier filtered out.
In one embodiment, it is also performed the steps of when processor executes computer program and obtains each application program Identify corresponding number of users and the corresponding total number of users of user's usage record collection;It obtains and user identifier and application program identification It is corresponding to use weight;According to the specific gravity of total number of users and number of users and weight is used, calculates each application program identification pair It should be in the preference value corresponding to user identifier.
In one embodiment, it also performs the steps of when processor executes computer program based on same application domain Application program identification under type corresponds to preference value corresponding to user identifier, and same application domain type is corresponding Preference value is ranked up by ascending order, obtains the ranking results of preference value;According to the ranking results of preference value, same application journey is calculated The quantile of corresponding each preference value under sequence type;The classification thresholds of each Application Type are determined according to quantile.
In one embodiment, the sequence according to preference value is also performed the steps of when processor executes computer program As a result, determining probability of occurrence of each preference value in corresponding ranking results under each Application Type;According to probability of occurrence The cumulative probability for determining each preference value under each Application Type obtains each preference value under each Application Type Quantile.In one embodiment, it is also performed the steps of when processor executes computer program and obtains each Application Type Under sequence position of each preference value in locating ranking results and each application program identification owning application type it is corresponding Ordering user number;Sequence position of each preference value under each Application Type in locating ranking results is used divided by sequence Amount obtains the quantile of the corresponding each preference value of each Application Type.
In one embodiment, it is also performed the steps of when processor executes computer program according to quantile, is corresponded to Each Application Type filters out the quantile more than or equal to corresponding preset threshold respectively;Corresponding to each application program Type calculates the difference of adjacent quantile according to the quantile filtered out;It is calculated to obtain corresponding each Application Type Quantile corresponding to each maximum difference, obtains the classification thresholds of each Application Type.
In one embodiment, acquisition known interest tags are also performed the steps of when processor executes computer program User's usage record sample set;According to user's usage record sample set, classification thresholds are adjusted;Number is used according to user Conditional filtering is carried out according to collection, and according to classification thresholds adjusted, to filter out user identifier.
In one embodiment, it also performs the steps of when processor executes computer program according to user's usage record Sample set determines that the sample of users of various kinds application type uses data set by known interest tags, and sample uses data Collection includes corresponding sample of users mark, sample application program identification, interest tags and sample preference value;Based on various kinds this application The sample of users of the known label of Program Type uses data set, calculates each sample preference value of various kinds application type Quantile;Data set is used according to the sample of users of known label, conditional filtering is carried out according to classification thresholds, to filter out sample This user identifier;According to corresponding to sample Application Type of the user using data set where the sample of users mark filtered out Default interest tags, determine prediction interest tags corresponding to the user identifier that filters out;According to sample of users data set Predict the recall ratio of interest tags and the calculated every class sample Application Type of known corresponding interest tags, adjustment classification Threshold value.
In the present embodiment, it based on user's usage record collection of the application program obtained at the appointed time section, determines each A application program identification corresponds to the preference value of user identifier, preferably characterizes the preference journey that user uses each application program Degree.Further, preference corresponding to user identifier is corresponded to by the application program identification under analysis same application domain type The overall distribution situation of value determines the classification thresholds of each Application Type with this, has fully considered same application domain type The overall distribution situation of lower preference value provides more accurate screening foundation for subsequent screening user identifier.Furthermore by each application The user of Program Type is screened using data set according to corresponding classification thresholds, to filter out qualified user's mark Know, improves the accuracy rate for generating the interest tags of each behavior type.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program performs the steps of the user's usage record collection for obtaining application program at the appointed time section when being executed by processor, Each application program identification is calculated corresponding to preference value corresponding to user identifier;The user that user's usage record is concentrated uses note Record includes user identifier and application program identification;Application Type is determined based on application program identification, according to same application journey Application program identification under sequence type corresponds to preference value corresponding to user identifier, determines point of each Application Type respectively Class threshold value;There are corresponding default interest tags for Application Type;According to each application determined based on user's usage record collection The user of Program Type uses data set, and carries out conditional filtering according to classification thresholds, to filter out user identifier;According to screening Default interest tags corresponding to Application Type of the user using data set where user identifier out, what determination filtered out Interest tags corresponding to user identifier.
In one embodiment, it is performed the steps of when computer program is executed by processor and obtains each application program Identify corresponding number of users and the corresponding total number of users of user's usage record collection;It obtains and user identifier and application program identification It is corresponding to use weight;According to the specific gravity of total number of users and number of users and weight is used, calculates each application program identification pair It should be in the preference value corresponding to user identifier.
In one embodiment, it performs the steps of when computer program is executed by processor based on same application domain Application program identification under type corresponds to preference value corresponding to user identifier, and same application domain type is corresponding Preference value is ranked up by ascending order, obtains the ranking results of preference value;According to the ranking results of preference value, same application journey is calculated The quantile of corresponding each preference value under sequence type;The classification thresholds of each Application Type are determined according to quantile.
In one embodiment, the sequence according to preference value is performed the steps of when computer program is executed by processor As a result, determining probability of occurrence of each preference value in corresponding ranking results under each Application Type;According to probability of occurrence The cumulative probability for determining each preference value under each Application Type obtains each preference value under each Application Type Quantile.
In one embodiment, it is performed the steps of when computer program is executed by processor and obtains each application class Sequence position and each application program identification owning application type pair of each preference value in locating ranking results under type The ordering user number answered;By sequence position of each preference value under each Application Type in locating ranking results divided by sequence Number of users obtains the quantile of the corresponding each preference value of each Application Type.
In one embodiment, it is performed the steps of when computer program is executed by processor according to quantile, is corresponded to Each Application Type filters out the quantile more than or equal to corresponding preset threshold respectively;Corresponding to each application program Type calculates the difference of adjacent quantile according to the quantile filtered out;It is calculated to obtain corresponding each Application Type Quantile corresponding to each maximum difference, obtains the classification thresholds of each Application Type.
In one embodiment, acquisition known interest tags are performed the steps of when computer program is executed by processor User's usage record sample set;According to user's usage record sample set, classification thresholds are adjusted;Number is used according to user Conditional filtering is carried out according to collection, and according to classification thresholds adjusted, to filter out user identifier.
In one embodiment, it is performed the steps of when computer program is executed by processor according to user's usage record Sample set determines that the sample of users of various kinds application type uses data set by known interest tags, and sample uses data Collection includes corresponding sample of users mark, sample application program identification, interest tags and sample preference value;Based on various kinds this application The sample of users of the known label of Program Type uses data set, calculates each sample preference value of various kinds application type Quantile;Data set is used according to the sample of users of known label, conditional filtering is carried out according to classification thresholds, to filter out sample This user identifier;According to corresponding to sample Application Type of the user using data set where the sample of users mark filtered out Default interest tags, determine prediction interest tags corresponding to the user identifier that filters out;According to sample of users data set Predict the recall ratio of interest tags and the calculated every class sample Application Type of known corresponding interest tags, adjustment classification Threshold value.
In the present embodiment, it based on user's usage record collection of the application program obtained at the appointed time section, determines each A application program identification corresponds to the preference value of user identifier, preferably characterizes the preference journey that user uses each application program Degree.Further, preference corresponding to user identifier is corresponded to by the application program identification under analysis same application domain type The overall distribution situation of value determines the classification thresholds of each Application Type with this, has fully considered same application domain type The overall distribution situation of lower preference value provides more accurate screening foundation for subsequent screening user identifier.Furthermore by each application The user of Program Type is screened using data set according to corresponding classification thresholds, to filter out qualified user's mark Know, improves the accuracy rate for generating the interest tags of each behavior type.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Instruct relevant hardware to complete by computer program, computer program to can be stored in a non-volatile computer readable It takes in storage medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, this Shen Please provided by any reference used in each embodiment to memory, storage, database or other media, may each comprise Non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.
Above embodiments only express the several embodiments of the application, and the description thereof is more specific and detailed, but can not Therefore it is construed as limiting the scope of the patent.It should be pointed out that for those of ordinary skill in the art, Under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the protection scope of the application. Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims (10)

1. a kind of method for generating interest tags, which comprises
The user's usage record collection for obtaining application program at the appointed time section calculates each application program identification corresponding to user The corresponding preference value of mark;User's usage record that user's usage record is concentrated includes user identifier and application program mark Know;
Application Type is determined based on application program identification, it is corresponding according to the application program identification under same application domain type The preference value corresponding to user identifier, determines the classification thresholds of each Application Type respectively;The application class There are corresponding default interest tags for type;
Data set is used according to the user of each Application Type determined based on user's usage record collection, and according to described Classification thresholds carry out conditional filtering, to filter out user identifier;
Default interest tags corresponding to the Application Type of data set are used according to user where the user identifier filtered out, Determine interest tags corresponding to the user identifier filtered out.
2. the method according to claim 1, wherein user's usage record that user's usage record is concentrated is also Including using weight;The basis at the appointed time in section application program user's usage record collection, calculate each application program Mark corresponds to preference value corresponding to user identifier
Obtain the corresponding number of users of each application program identification and the corresponding total number of users of user's usage record collection;
Acquisition is corresponding with the user identifier and the application program identification to use weight;
According to the specific gravity of total number of users and the number of users and it is described use weight, calculate each application program identification pair It should be in the preference value corresponding to user identifier.
3. the method according to claim 1, wherein the application program under the type according to same application domain Mark corresponds to the preference value corresponding to user identifier, determines that the classification thresholds of each Application Type include: respectively
Correspond to preference value corresponding to user identifier based on the application program identification under the same application domain type, by phase It is ranked up with the corresponding preference value of Application Type by ascending order, obtains the ranking results of preference value;
According to the ranking results of the preference value, the quartile of corresponding each preference value under same application domain type is calculated Number;
The classification thresholds of each Application Type are determined according to the quantile.
4. according to the method described in claim 3, it is characterized in that, the ranking results according to the preference value, calculate each The quantile of corresponding each preference value includes: under Application Type
According to the ranking results of the preference value, determine each preference value under each Application Type in corresponding ranking results Probability of occurrence;The cumulative probability that each preference value under each Application Type is determined according to the probability of occurrence obtains each The quantile of each preference value under Application Type;Or,
Obtain sequence position and each application program mark of each preference value under each Application Type in locating ranking results Know the corresponding ordering user number of owning application type;Each preference value under each Application Type is tied in locating sequence Sequence position in fruit obtains the quartile of the corresponding each preference value of each Application Type divided by the ordering user number Number.
5. according to the method described in claim 3, it is characterized in that, described determine each Application Type according to the quantile Classification thresholds include:
According to the quantile, corresponds to each Application Type, filtered out respectively more than or equal to corresponding preset threshold Quantile;
Corresponding to each Application Type, the difference of adjacent quantile is calculated according to the quantile filtered out;
Quantile corresponding to corresponding each calculated each maximum difference of Application Type is obtained, each application program is obtained The classification thresholds of type.
6. the method according to claim 1, wherein what the basis was determined based on user's usage record collection The user of each Application Type uses data set, and carries out conditional filtering according to the classification thresholds, to filter out user's mark Knowledge includes:
Obtain user's usage record sample set of known interest tags;
According to user's usage record sample set, the classification thresholds are adjusted;
Data set is used according to the user, and carries out conditional filtering according to the classification thresholds adjusted, to filter out use Family mark.
7. according to the method described in claim 6, it is characterized in that, user's usage record in user's usage record sample set Sample includes sample of users mark, interest tags, sample Application Type, sample application program identification and the sample right to use Weight;
It is described according to user's usage record sample set, the classification thresholds be adjusted include:
According to user's usage record sample set, the sample of various kinds application type is determined by the known interest tags User uses data set, and the sample includes corresponding sample of users mark, sample application program identification, interest using data set Label and sample preference value;
The sample of users of known label based on various kinds application type uses data set, calculates various kinds application class The quantile of each sample preference value of type;
Data set is used according to the sample of users of the known label, conditional filtering is carried out according to the classification thresholds, with screening Sample of users identifies out;
According to corresponding to sample Application Type of the sample of users using data set where the sample of users mark filtered out Default interest tags, determine prediction interest tags corresponding to the user identifier filtered out;
According to the prediction interest tags of the sample of users data set and the calculated every class sample of known corresponding interest tags The recall ratio of Application Type adjusts the classification thresholds.
8. a kind of device for generating interest tags, which is characterized in that described device includes:
Usage record obtains module, for obtaining user's usage record collection of application program at the appointed time section, calculates each Application program identification corresponds to preference value corresponding to user identifier;User's usage record packet that user's usage record is concentrated Include user identifier and application program identification;
Classification thresholds determining module, for determining Application Type based on application program identification, according to same application domain class Application program identification under type corresponds to the preference value corresponding to user identifier, determines point of each Application Type respectively Class threshold value;There are corresponding default interest tags for the Application Type;
Subscriber Identity Module is screened, for the user according to each Application Type determined based on user's usage record collection Conditional filtering is carried out using data set, and according to the classification thresholds, to filter out user identifier;
Interest tags generation module, for using the Application Type of data set according to user where the user identifier filtered out Corresponding default interest tags determine interest tags corresponding to the user identifier filtered out.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
CN201910525807.XA 2019-06-18 2019-06-18 Generate method, apparatus, computer equipment and the storage medium of interest tags Pending CN110377821A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910525807.XA CN110377821A (en) 2019-06-18 2019-06-18 Generate method, apparatus, computer equipment and the storage medium of interest tags
PCT/CN2020/086369 WO2020253369A1 (en) 2019-06-18 2020-04-23 Method and device for generating interest tag, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910525807.XA CN110377821A (en) 2019-06-18 2019-06-18 Generate method, apparatus, computer equipment and the storage medium of interest tags

Publications (1)

Publication Number Publication Date
CN110377821A true CN110377821A (en) 2019-10-25

Family

ID=68249072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910525807.XA Pending CN110377821A (en) 2019-06-18 2019-06-18 Generate method, apparatus, computer equipment and the storage medium of interest tags

Country Status (2)

Country Link
CN (1) CN110377821A (en)
WO (1) WO2020253369A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079023A (en) * 2019-12-30 2020-04-28 Oppo广东移动通信有限公司 Target account identification method, device, terminal and storage medium
WO2020253369A1 (en) * 2019-06-18 2020-12-24 深圳壹账通智能科技有限公司 Method and device for generating interest tag, computer equipment and storage medium
WO2021159276A1 (en) * 2020-02-11 2021-08-19 Citrix Systems, Inc. Systems and methods for expedited access to applications

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110066949A1 (en) * 2009-09-15 2011-03-17 International Business Machines Corporation Visualization of real-time social data informatics
CN104700289A (en) * 2015-03-17 2015-06-10 中国联合网络通信集团有限公司 Advertising method and device
CN106503269A (en) * 2016-12-08 2017-03-15 广州优视网络科技有限公司 Method, device and server that application is recommended
CN107908686A (en) * 2017-10-31 2018-04-13 广东欧珀移动通信有限公司 Information-pushing method, device, server and readable storage medium storing program for executing

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9946798B2 (en) * 2015-06-18 2018-04-17 International Business Machines Corporation Identification of target audience for content delivery in social networks by quantifying semantic relations and crowdsourcing
US10636053B2 (en) * 2017-05-31 2020-04-28 Facebook, Inc. Evaluating content publisher options against benchmark publisher
RU2757546C2 (en) * 2017-07-25 2021-10-18 Общество С Ограниченной Ответственностью "Яндекс" Method and system for creating personalized user parameter of interest for identifying personalized target content element
CN110377821A (en) * 2019-06-18 2019-10-25 深圳壹账通智能科技有限公司 Generate method, apparatus, computer equipment and the storage medium of interest tags

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110066949A1 (en) * 2009-09-15 2011-03-17 International Business Machines Corporation Visualization of real-time social data informatics
CN104700289A (en) * 2015-03-17 2015-06-10 中国联合网络通信集团有限公司 Advertising method and device
CN106503269A (en) * 2016-12-08 2017-03-15 广州优视网络科技有限公司 Method, device and server that application is recommended
CN107908686A (en) * 2017-10-31 2018-04-13 广东欧珀移动通信有限公司 Information-pushing method, device, server and readable storage medium storing program for executing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨晶;成卫青;郭常忠;: "基于标准标签的用户兴趣模型研究", 计算机技术与发展, no. 10 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020253369A1 (en) * 2019-06-18 2020-12-24 深圳壹账通智能科技有限公司 Method and device for generating interest tag, computer equipment and storage medium
CN111079023A (en) * 2019-12-30 2020-04-28 Oppo广东移动通信有限公司 Target account identification method, device, terminal and storage medium
WO2021159276A1 (en) * 2020-02-11 2021-08-19 Citrix Systems, Inc. Systems and methods for expedited access to applications
US11455227B2 (en) 2020-02-11 2022-09-27 Citrix Systems, Inc. Systems and methods for expedited access to applications
US11748082B2 (en) 2020-02-11 2023-09-05 Citrix Systems, Inc. Systems and methods for expedited access to applications

Also Published As

Publication number Publication date
WO2020253369A1 (en) 2020-12-24

Similar Documents

Publication Publication Date Title
US9092549B2 (en) Recommendation of search keywords based on indication of user intention
CN107451199B (en) Question recommendation method, device and equipment
CN109408724A (en) Multimedia resource estimates the determination method, apparatus and server of clicking rate
CN108090208A (en) Fused data processing method and processing device
CN108510402A (en) Insurance kind information recommendation method, device, computer equipment and storage medium
JP2020507135A (en) Exclusive agent pool distribution method, electronic device, and computer-readable storage medium
CN110377821A (en) Generate method, apparatus, computer equipment and the storage medium of interest tags
CN108563680A (en) Resource recommendation method and device
CN111538901A (en) Article recommendation method and device, server and storage medium
CN112104505B (en) Application recommendation method, device, server and computer readable storage medium
CN111814759B (en) Method and device for acquiring face quality label value, server and storage medium
CN112749330B (en) Information pushing method, device, computer equipment and storage medium
CN107977445A (en) Application program recommends method and device
CN113076416A (en) Information heat evaluation method and device and electronic equipment
CN112330055A (en) User complaint prediction method and device
CN111061948A (en) User label recommendation method and device, computer equipment and storage medium
CN111177500A (en) Data object classification method and device, computer equipment and storage medium
CN102930016B (en) A kind of method and apparatus for providing Search Results on mobile terminals
Kuzovkin et al. Image selection in photo albums
CN108228869A (en) The method for building up and device of a kind of textual classification model
CN111680236A (en) Menu display method and device, terminal equipment and storage medium
CN108460475A (en) Poor student's prediction technique and device based on network playing by students behavior
CN108763242A (en) Label generating method and device
CN113297406A (en) Picture searching method and system and electronic equipment
CN108595513B (en) Video search cheating processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination