CN110377821A - Generate method, apparatus, computer equipment and the storage medium of interest tags - Google Patents
Generate method, apparatus, computer equipment and the storage medium of interest tags Download PDFInfo
- Publication number
- CN110377821A CN110377821A CN201910525807.XA CN201910525807A CN110377821A CN 110377821 A CN110377821 A CN 110377821A CN 201910525807 A CN201910525807 A CN 201910525807A CN 110377821 A CN110377821 A CN 110377821A
- Authority
- CN
- China
- Prior art keywords
- user
- application
- sample
- preference value
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000001914 filtration Methods 0.000 claims abstract description 29
- 238000004590 computer program Methods 0.000 claims description 31
- 238000012216 screening Methods 0.000 claims description 29
- 230000001186 cumulative effect Effects 0.000 claims description 15
- 230000005484 gravity Effects 0.000 claims description 15
- 230000001174 ascending effect Effects 0.000 claims description 9
- 235000013399 edible fruits Nutrition 0.000 claims description 4
- 230000000875 corresponding effect Effects 0.000 description 249
- 230000006399 behavior Effects 0.000 description 13
- 238000004458 analytical method Methods 0.000 description 5
- 238000012512 characterization method Methods 0.000 description 4
- 238000013480 data collection Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 239000012141 concentrate Substances 0.000 description 3
- 244000061456 Solanum tuberosum Species 0.000 description 2
- 235000002595 Solanum tuberosum Nutrition 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 235000012054 meals Nutrition 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application involves user's Portrait brand technology fields, provide a kind of method, apparatus, computer equipment and storage medium for generating interest tags.The described method includes: obtaining user's usage record collection of application program at the appointed time section, each application program identification is calculated corresponding to preference value corresponding to user identifier;According to the application program identification under same application domain type correspond to user identifier corresponding to the preference value, determine the classification thresholds of each Application Type respectively;Data set is used according to the user of each Application Type determined based on user's usage record collection, and carries out conditional filtering according to the classification thresholds, to filter out user identifier;According to user where the user identifier filtered out using Application Type corresponding to data set, interest tags corresponding to the user identifier filtered out are determined.It can be improved the accuracy rate for generating the interest tags of each behavior type using this method.
Description
Technical field
This application involves technical field of information processing, more particularly to a kind of method, apparatus for generating interest tags, calculate
Machine equipment and storage medium.
Background technique
It with the development of internet and applies, the differentiated services such as personalized recommendation, diversification marketing are in people's lives
It is widely applied, and these differentiated services be unable to do without user's portrait.The core work of user's portrait is generated to user
Label.By carrying out labeling work to user, user behavior can be analyzed and predicted from macroscopic perspective, help to mention
Enterprise is risen for the precision of the marketing behavior of specific user.
Currently, the label generating method of most of user's portrait generates user tag using keyword extracting method, so
And this method has that the accuracy rate for generating label is lower.
Summary of the invention
Based on this, it is necessary in view of the above technical problems, provide method, apparatus, the computer of a kind of generation interest tags
Equipment and storage medium.
A method of generating interest tags, which comprises
The user's usage record collection for obtaining application program at the appointed time section, calculates each application program identification and corresponds to
Preference value corresponding to user identifier;User's usage record that user's usage record is concentrated includes user identifier and using journey
Sequence mark;
Application Type is determined based on application program identification, according to the application program identification under same application domain type
Corresponding to the preference value corresponding to user identifier, the classification thresholds of each Application Type are determined respectively;It is described to apply journey
There are corresponding default interest tags for sequence type;
According to the user of each Application Type determined based on user's usage record collection using data set, and according to
The classification thresholds carry out conditional filtering, to filter out user identifier;
According to default interest corresponding to Application Type of the user using data set where the user identifier filtered out
Label determines interest tags corresponding to the user identifier filtered out.
User's usage record that user's usage record is concentrated in one of the embodiments, further includes using weight;
The basis at the appointed time in section application program user's usage record collection, calculate each application program identification corresponding to user
Identifying corresponding preference value includes:
Obtain the corresponding number of users of each application program identification and the corresponding total user of user's usage record collection
Number;
Acquisition is corresponding with the user identifier and the application program identification to use weight;
According to the specific gravity of total number of users and the number of users and it is described use weight, calculate each application program mark
Know and corresponds to preference value corresponding to user identifier.
The application program identification under the type according to same application domain corresponds to user in one of the embodiments,
The corresponding preference value of mark, determines that the classification thresholds of each Application Type include: respectively
Correspond to preference value corresponding to user identifier based on the application program identification under the same application domain type,
The corresponding preference value of same application domain type is ranked up by ascending order, obtains the ranking results of preference value;
According to the ranking results of the preference value, corresponding each preference value under same application domain type is calculated
Quantile;
The classification thresholds of each Application Type are determined according to the quantile.
The ranking results according to the preference value in one of the embodiments, calculate under each Application Type
The quantile of corresponding each preference value includes:
According to the ranking results of the preference value, determine each preference value under each Application Type in corresponding sequence knot
Probability of occurrence in fruit;The cumulative probability that each preference value under each Application Type is determined according to the probability of occurrence, obtains
The quantile of each preference value under to each Application Type;.Or,
The ranking results according to the preference value in one of the embodiments, calculate under each Application Type
The quantile of corresponding each preference value includes:
Obtain sequence position of each preference value under each Application Type in locating ranking results and respectively using journey
Sequence identifies the corresponding ordering user number of owning application type;By each preference value under each Application Type in locating row
Sequence position in sequence result obtains point of the corresponding each preference value of each Application Type divided by the ordering user number
Digit.
The classification thresholds packet that each Application Type is determined according to the quantile in one of the embodiments,
It includes:
According to the quantile, correspond to each Application Type, filters out be greater than or equal to accordingly default threshold respectively
The quantile of value;
Corresponding to each Application Type, the difference of adjacent quantile is calculated according to the quantile filtered out;
Quantile corresponding to corresponding each calculated each maximum difference of Application Type is obtained, each application is obtained
The classification thresholds of Program Type.
Each Application Type that the basis is determined based on user's usage record collection in one of the embodiments,
User use data set, and according to the classification thresholds carry out conditional filtering, include: to filter out user identifier
Obtain user's usage record sample set of known interest tags;
According to user's usage record sample set, the classification thresholds are adjusted;
Data set is used according to the user, and carries out conditional filtering according to the classification thresholds adjusted, with screening
User identifier out.
User's usage record sample includes that sample is used in user's usage record sample set in one of the embodiments,
Family mark, interest tags, sample Application Type, sample application program identification and sample use weight;
It is described according to user's usage record sample set, the classification thresholds be adjusted include:
According to user's usage record sample set, various kinds application type is determined by the known interest tags
Sample of users use data set, the sample using data set include corresponding sample of users mark, sample application program identification,
Interest tags and sample preference value;
The sample of users of known label based on various kinds application type uses data set, calculates various kinds this application journey
The quantile of each sample preference value of sequence type;
Data set is used according to the sample of users of the known label, carries out conditional filtering according to the classification thresholds, with
Filter out sample of users mark;
It is right using the sample Application Type institute of data set according to sample of users where the sample of users mark filtered out
The default interest tags answered determine prediction interest tags corresponding to the user identifier filtered out;
According to the prediction interest tags of the sample of users data set and the calculated every class of known corresponding interest tags
The recall ratio of sample Application Type adjusts the classification thresholds.
A kind of device generating interest tags, described device include:
Usage record obtains module, for obtaining user's usage record collection of application program at the appointed time section, calculates
Each application program identification corresponds to preference value corresponding to user identifier;The user that user's usage record is concentrated uses note
Record includes user identifier and application program identification;
Classification thresholds determining module, for determining Application Type based on application program identification, according to same application journey
Application program identification under sequence type corresponds to the preference value corresponding to user identifier, determines each Application Type respectively
Classification thresholds;There are corresponding default interest tags for the Application Type;
Subscriber Identity Module is screened, for according to each Application Type determined based on user's usage record collection
User uses data set, and carries out conditional filtering according to the classification thresholds, to filter out user identifier;
Interest tags generation module, for using the application program of data set according to user where the user identifier filtered out
Default interest tags corresponding to type determine interest tags corresponding to the user identifier filtered out.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing
The step of device realizes above-mentioned generation interest tags method when executing the computer program.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor
The step of above-mentioned generation interest tags method is realized when row.
Method, apparatus, computer equipment and the storage medium of above-mentioned generation interest tags, based on being obtained at the appointed time section
User's usage record collection of the application program taken determines that each application program identification corresponds to the preference value of user identifier, more preferably
Characterization user use each application program preference.Further, pass through the application under analysis same application domain type
Program identification corresponds to the overall distribution situation of preference value corresponding to user identifier, and point of each Application Type is determined with this
Class threshold value has fully considered the overall distribution situation of preference value under same application domain type, mentions for subsequent screening user identifier
For more accurate screening foundation.Furthermore by the user of each Application Type using data set according to corresponding classification thresholds
It is screened, to filter out qualified user identifier, improves the accuracy rate for generating the interest tags of each behavior type.
Detailed description of the invention
Fig. 1 is the application scenario diagram that the method for interest tags is generated in one embodiment;
Fig. 2 is the flow diagram that the method for interest tags is generated in one embodiment;
Fig. 3 is the structural block diagram that the device of interest tags is generated in one embodiment;
Fig. 4 is the internal structure chart of computer equipment in one embodiment.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood
The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not
For limiting the application.
Generation interest tags method provided by the present application, can be applied in application environment as shown in Figure 1.Wherein, eventually
End 102 is communicated with server 104 by network by network.Server 104 obtains application program in designated time period
User's usage record collection calculates each application program identification corresponding to preference value corresponding to user identifier;Wherein user uses
Record set can be triggered by terminal 102 and be generated;And user is corresponded to according to the application program identification under same application domain type
The corresponding preference value of mark, determines the classification thresholds of each Application Type respectively.Further, server 104 is according to obtaining
Classification thresholds conditional filtering is carried out using data set to the user of corresponding application programs type, to filter out user identifier;According to
According to Application Type corresponding to the user identifier filtered out, server 104 is using the Application Type as filtering out
The interest tags of user identifier.Wherein, terminal 102 can be, but not limited to be various personal computers, laptop, intelligent hand
Machine, tablet computer and portable wearable device, server 104 can be formed with the either multiple servers of independent server
Server cluster realize.
In one embodiment, as shown in Fig. 2, providing a kind of method for generating interest tags, it is applied in this way
It is illustrated for server in Fig. 1, comprising the following steps:
Step S202 obtains user's usage record collection of application program at the appointed time section, calculates each application program
Mark corresponds to preference value corresponding to user identifier;User's usage record concentrate user's usage record include user identifier and
Application program identification.
Wherein, user's usage record collection includes each user's usage record, each user's usage record include user identifier,
Application program identification and use weight.User's usage record contains information abundant, such as the similitude between user, application
The preference of similitude and user between program to each application program.Wherein, user identifier is each user of difference
Unique identification can be User ID (Identification).Application program identification is the unique mark for distinguishing each application program
Know.
Wherein, preference value is characterization and the corresponding user's use of user identifier application program corresponding with application program identification
Use preference;Preference value number of users corresponding with application program identification, the corresponding total number of users of user's usage record collection
And it is related using weight.
Specifically, user's triggering terminal generates user's usage record collection of each application program, and the user of generation is made
User's usage record collection can also be stored directly in terminal by network transmission to server with record set.Server can
To obtain user's usage record collection in middle acquisition designated time period from each terminal, designated time period can also be obtained from server
Interior user's usage record collection.Server after getting user's usage record collection of application program in designated time period, according to
Family usage record, which collects, to be calculated corresponding to each application program identification in the preference value of user identifier.
Server is based on user's usage record and concentrates each user's usage record in one of the embodiments, obtains every
The corresponding number of users of a application program identification and the corresponding total number of users of user's usage record collection;And obtain corresponding user's mark
Know the use weight with application program identification, and then according to the specific gravity of the total number of users of number of users Zhan and uses weight calculation each
The preference value of the corresponding user identifier of application program identification.
Step S204 determines Application Type based on application program identification, according to answering under same application domain type
Correspond to preference value corresponding to user identifier with program identification, determine the classification thresholds of each Application Type respectively, applies
There are corresponding default interest tags for Program Type.
Wherein, Application Type refers to the classification for distinguishing each application program, such as video type.Classification thresholds refer to
Preference value owning application type classification Rule of judgment, according to the classification thresholds may determine that preference value corresponding to use
Family identifies whether to belong to Application Type belonging to the preference value.Classification thresholds characterize under same application domain type,
Each user accounts for the specific gravity of the whole usage behavior under the Application Type to the usage behavior of application program.
Specifically, server is based on user's usage record collection, and each application program identification is calculated and marks corresponding to user
Know corresponding preference value, and corresponding Application Type, each application class are determined according to each application program identification
There are corresponding default interest tags for type;Wherein preset interest tags can be consistent with Application Type, is also possible to table
Levy the mark being consistent with Application Type.Under same application domain type, server is according to the preference value being calculated point
The classification thresholds of each Application Type are not determined.Whether may determine that user identifier corresponding to preference value by classification thresholds
Belong to Application Type belonging to the preference value.
Step S206 uses data set according to the user of each Application Type determined based on user's usage record collection,
And conditional filtering is carried out according to classification thresholds, to filter out user identifier.
Wherein, user includes the corresponding user of each Application Type using data set using data set, and user uses
Data set includes the user identifier to correspond to each other, application program identification and preference value.
Specifically, data set is used for the corresponding user of each Application Type, server makes according to the user
The corresponding classification thresholds of the Application Type where data set carry out conditional filtering, are sieved for the user using data set with this
Select qualified user identifier.
Step S208 uses Application Type corresponding to data set according to user where the user identifier filtered out,
Determine interest tags corresponding to the user identifier filtered out.
Wherein, interest tags refer to the label for being different from tendency of the user with certain class behavior type;For example, user is frequent
Using video class application program, the interest tags of the corresponding user are video.
Specifically, based on each user filtered out using qualified user identifier in data set, server is from number
Application Type, the i.e. interest of the user identifier corresponding to data set are used according to user where obtaining the user identifier in library
Label is corresponding Application Type.
In above-described embodiment, user's usage record collection based on the application program obtained at the appointed time section is determined each
A application program identification corresponds to the preference value of user identifier, preferably characterizes the preference journey that user uses each application program
Degree.Further, preference corresponding to user identifier is corresponded to by the application program identification under analysis same application domain type
The overall distribution situation of value determines the classification thresholds of each Application Type with this, has fully considered same application domain type
The overall distribution situation of lower preference value provides more accurate screening foundation for subsequent screening user identifier.Furthermore by each application
The user of Program Type is screened using data set according to corresponding classification thresholds, to filter out qualified user's mark
Know, improves the accuracy rate for generating the interest tags of each behavior type.
In one embodiment, user's usage record that user's usage record is concentrated further includes using weight;According to referring to
User's usage record collection of application program in section of fixing time calculates each application program identification corresponding to corresponding to user identifier
Preference value, comprising the following steps: obtain the corresponding number of users of each application program identification and user's usage record collection is corresponding
Total number of users;Acquisition is corresponding with user identifier and application program identification to use weight;According to the ratio of the total number of users of number of users Zhan
Again and weight is used, calculates each application program identification corresponding to preference value corresponding to user identifier.
Wherein, the usage degree of specific application program in various application programs used by a user is characterized using weight
Specific gravity.It can determine according to the mount message of application program, access times, using duration and power consumption using weight.
Specifically, server obtains the corresponding user of each application program identification based on obtained user's usage record collection
The several and corresponding total number of users of user's usage record collection;And it is obtained pair according to user identifier and application program identification from database
The use weight answered.Server calculates each application program mark according to the total number of users, the number of users and using weight that get
Know and corresponds to preference value corresponding to user identifier.I.e. server is according to total number of users number of users corresponding with application program identification
Specific gravity and application program identification are corresponding using weight, calculate each application program identification corresponding to corresponding to user identifier
Preference value.
Preference value is corresponding with application program identification in one of the embodiments, is positively correlated using weight, and with answer
It is positively correlated with the corresponding number of users specific gravity of program.Wherein number of users specific gravity is with the corresponding total number of users of user's usage record collection
Increase and increase, and is reduced with the growth of the corresponding number of users of application program identification.Optionally, preference value, which can be, answers
With the corresponding number of users specific gravity of program identification and the corresponding product using weight of application program identification;Number of users specific gravity can be
The logarithm of the ratio of total number of users number of users corresponding with application program identification.
For example, for example, obtain Xiao Ming and the small red application program in nearest one month user's usage record collection,
Xiao Ming and the small red usage record using Tencent's video, Baidu's video and potato video are obtained, { (A is expressed as1, A2), (B2,
B3)}.Wherein, A1Indicate that Xiao Ming watches the weight of Tencent's video, A2And B2Respectively indicate Xiao Ming and small red viewing potato video
Weight, B3Indicate the weight of small red viewing Baidu's video.Then steps are as follows using the calculating of preference value of Tencent's video by Xiao Ming:
(1) total number of users of Tencent's video corresponding number of users and user's usage record collection is obtained:
The corresponding number of users of Tencent's video is 1, and total number of users of user's usage record collection is 2;That is user's usage record collection
Total number of users and Tencent's video number of users specific gravity are as follows: IDF=log (2/1), in order to avoid in log (x) function variable ginseng
The denominator of number x is 0, can also add 1 to the denominator of x.
(2) the weight TF:TF=A that Xiao Ming watches Tencent's video is obtained1。
(3) the preference value TF*IDF:TF*IDF=A that Xiao Ming uses Tencent's video is calculated1*log(2/1)。
In the present embodiment, corresponding total based on the corresponding number of users of each application program identification, user's usage record collection
Number of users and each application program identification correspond to use weight corresponding to user identifier, calculate each application program identification
Preference value corresponding to corresponding and user identifier.By introducing using weight and application program and whole accounting situation, more
Good characterization user uses the preference of each application program.
In one embodiment, right corresponding to user identifier institute according to the application program identification under same application domain type
The preference value answered determines the classification thresholds of each Application Type respectively, comprising the following steps: is based on same application domain type
Under application program identification correspond to user identifier corresponding to preference value, by the corresponding preference of same application domain type
Value is ranked up by ascending order, obtains the ranking results of preference value;According to the ranking results of preference value, same application domain class is calculated
The quantile of corresponding each preference value under type;The classification thresholds of each Application Type are determined according to quantile.
Wherein, quantile refers to: concentrating in discrete data, the quantile of data a is to meet owning for condition P (X≤a)
The cumulative probability that the probability conjunction of data, the i.e. quantile of a are corresponding a.The value range of quantile is and to be less than or wait greater than 0
In 1.
Specifically, user's usage record collection based on acquisition and the preference value being calculated, in same application domain class
Under type, preference value corresponding under same application domain type is ranked up by server by sequence from small to large respectively,
Obtain the ranking results of the preference value of each Application Type.According to the ranking results of each preference value of acquisition, server is calculated
The quantile of corresponding each preference value under same application domain type;And each application class is determined according to quantile
The corresponding classification thresholds of type, the i.e. value range of classification thresholds can be between 0 to 1, and can be 1.
In the present embodiment, it by being ranked up to the corresponding preference value of each Application Type by ascending order, obtains
The corresponding ranking results of each Application Type;Further each Application Type is calculated separately according to ranking results respectively to correspond to
Each preference value quantile, the classification thresholds of each behavior type are determined according to each quantile being calculated.Using each
The quantile overall distribution situation of Application Type determines classification thresholds, has fully considered overall distribution situation, is subsequent
The generation of interest tags provides foundation.
In one embodiment, it according to the ranking results of preference value, calculates corresponding every under each Application Type
Each of the quantile of a preference value, comprising the following steps: according to the ranking results of preference value, determine under each Application Type
Probability of occurrence of the preference value in corresponding ranking results;Each preference value under each Application Type is determined according to probability of occurrence
Cumulative probability, obtain the quantile of each preference value under each Application Type.
Wherein, probability of occurrence refers to that in the corresponding user of a certain behavior type, the user uses data using in data set
The probability for concentrating each preference value to occur.Cumulative probability refers to be used in data set in the corresponding user of a certain behavior type, will
Probability of occurrence no more than all preference values of the preference value is added, and acquired results are cumulative probability.
Specifically, server divides according to the ranking results of the obtained corresponding preference value of each Application Type
Probability of occurrence of the corresponding each preference value of each Application Type in ranking results is not calculated.Based on what is be calculated
Probability of occurrence, server determine the cumulative probability of the corresponding each preference value of each Application Type according to probability of occurrence,
I.e. the cumulative probability be corresponding preference values quantile.
For example, including each application in the data set for example, data set for a certain same application domain type
Program identification corresponds to preference value corresponding to user identifier;Each preference value is ranked up according to ascending order, obtains preference value
Ranking results.If the ranking results of preference value are as follows: 1,1,2,2,3,4,5,6,7,8;The appearance then corresponded to when preference value is 1 is general
Rate: P (1)=2/10, probability of occurrence when preference value is 2: P (2)=2/10, probability of occurrence when preference value is 3: P (3)=1/
10, then cumulative probability when preference value is 3 is P (1)+P (2)+P (3), i.e., quantile when preference value is 3 is 50%.
In the present embodiment, determine each preference value under each Application Type in phase based on the ranking results of preference value
The probability of occurrence in ranking results is answered, the tired of each preference value under each Application Type is further obtained according to probability of occurrence
Product probability, to obtain the quantile of each preference value under each Application Type.Quantile is calculated using cumulative probability, from
Reflect that individual accounts for whole specific gravity situation in each Application Type, has fully considered the relationship between data on the whole, is subsequent
Screening user identifier provides more accurate screening foundation.
In one embodiment, it according to the ranking results of preference value, calculates corresponding every under each Application Type
The quantile of a preference value, comprising the following steps: obtain each preference value under each Application Type in locating ranking results
In sequence position and the corresponding ordering user number of each application program identification owning application type;By each Application Type
Under sequence position of each preference value in locating ranking results divided by ordering user number, it is respectively right to obtain each Application Type
The quantile for each preference value answered.
Wherein, sequence position refers to that each element in a data set is ranked up according to certain logic, and each element exists
The location of in data set.Ordering user number refers to that a data concentrate the total number of corresponding all elements.
Specifically, sequence knot of the server based on the corresponding each preference value of each Application Type being calculated
Fruit, get respectively sequence position of the corresponding each preference value of each Application Type in the ranking results of locating preference value with
And each corresponding ordering user number of Application Type.After server gets corresponding data, by each Application Type
The sequence position of corresponding each preference value is divided by with the ordering user number of the corresponding Application Type, i.e., resulting calculating
It as a result is the quantile of the corresponding preference value of each Application Type.
For example, including each application program identification in the data set for the data set of a certain same application domain type
Corresponding to preference value corresponding to user identifier;Each preference value is ranked up according to ascending order, obtains the sequence knot of preference value
Fruit.If the preference value A in data set sorts in corresponding ranking results, position is 5, while preference value A is in locating application program
The ordering user number of type is 10, then the quantile of the preference value is 5/10*100%, i.e. quantile is 50%.For example, preference
The ranking results of value are as follows: 0,1,2,3,4,5,6,7,8,9;Corresponding quantile is 70% when then preference value is 6.
In the present embodiment, based on the corresponding each preference value of each Application Type in locating ranking results
Sort position and the corresponding ordering user number of each Application Type, determines that each Application Type is corresponding each
The quantile of preference value.Quantile is determined by sequence position and ordering user number, can be further reduced in computer level
Calculation amount improves the rate for generating interest tags to improve the speed of calculating.
In one embodiment, the classification thresholds of each Application Type are determined according to quantile, comprising the following steps: according to
According to quantile, corresponds to each Application Type, filter out the quantile more than or equal to corresponding first preset threshold respectively;
Corresponding to each Application Type, the difference of adjacent quantile is calculated according to the quantile filtered out;Correspondence is obtained respectively to answer
The quantile corresponding to the calculated each maximum difference of Program Type, obtains the classification thresholds of each Application Type.
Wherein, preset threshold is the boundary value for the judgement quantile being set in advance, and threshold value can store in the database;In advance
If threshold value is the boundary value of quantile corresponding with each Application Type.Difference refers to that two data carry out obtained by subtraction
Calculated result;It can be two adjacent quantiles to carry out subtracting each other resulting result.
Specifically, according to the quantile of the corresponding each preference value of each Application Type being calculated, for
Each corresponding quantile of Application Type, server obtain the default threshold of corresponding Application Type from database
Value, filters out the quantile more than or equal to the preset threshold according to preset threshold.Corresponding to each Application Type, service
Device calculates separately the difference of two adjacent quantiles according to the quantile filtered out.Server is answered according to each of being calculated
With the corresponding difference of Program Type, two quantiles corresponding to maximum difference are obtained, will sort the quantile of position rearward
Classification thresholds as the corresponding Application Type.
In the present embodiment, the classification thresholds of the corresponding preference value of each Application Type are determined based on quantile,
Select the classification thresholds that more apparent quantile is distributed in each Application Type as the Application Type.Further,
The overall distribution characteristic of each Application Type data is made full use of, provides guarantee for the accuracy rate of interest tags.
In one embodiment, number is used according to the user of each Application Type determined based on user's usage record collection
Conditional filtering is carried out according to collection, and according to classification thresholds, to filter out user identifier, comprising the following steps: obtain known interest mark
User's usage record sample set of label;According to user's usage record sample set, classification thresholds are adjusted;It is used according to user
Data set, and conditional filtering is carried out according to classification thresholds adjusted, to filter out user identifier.
Wherein, user's usage record sample set includes each user's usage record sample, and user's usage record collection includes each
The corresponding user of a Application Type uses data set, and user includes the user identifier to correspond to each other, application using data set
Program identification and preference value.
Specifically, server obtains user's usage record sample set of interest tags, root from database or terminal
The corresponding classification thresholds of each Application Type are adjusted respectively according to the user's usage record sample set got.Into one
Step uses data set based on user, and server is corresponding to each Application Type every according to classification thresholds adjusted
A preference value carries out conditional filtering, to filter out the user identifier for the screening conditions for meeting above-mentioned preference value.
In the present embodiment, user's usage record sample set based on known interest tags, to each Application Type pair
The classification thresholds answered are adjusted, the classification thresholds after being adjusted with this.Using user's usage record sample set to classification threshold
Value is tested, and the accuracy of interest tags is improved.
In one embodiment, in user's usage record sample set user's usage record sample include sample of users mark,
Interest tags, sample Application Type, sample application program identification and sample use weight;According to user's usage record sample
Collection, is adjusted classification thresholds, comprising the following steps: according to user's usage record sample set, determines by known interest tags
The sample of users of various kinds application type uses data set, sample using data set include corresponding sample of users mark,
Sample application program identification, interest tags and sample preference value;The sample of known label based on various kinds application type
User uses data set, calculates the quantile of each sample preference value of various kinds application type;According to known label
Sample of users uses data set, carries out conditional filtering according to classification thresholds, to filter out sample of users mark;According to what is filtered out
User is determined corresponding to the user identifier filtered out using Application Type corresponding to data set where sample of users mark
Prediction interest tags;It is calculated every according to the prediction interest tags of sample of users data set and known corresponding interest tags
The recall ratio of class sample Application Type adjusts classification thresholds.
Wherein, user's usage record sample set includes each user's usage record sample, each user's usage record sample
Weight is used including sample of users mark, interest tags, sample Application Type, sample application program identification and sample.
Sample of users mark is the unique identification for distinguishing each sample of users.Sample Application Type is answered with each of sample of users
With the corresponding type of program, sample Application Type and Application Type are corresponding relationships, and Application Type includes
All sample Application Types.Sample application program identification is to distinguish the unique identification of each application program.Sample preference
Value characterization and sample of users identify corresponding sample of users and use sample application program corresponding with sample application program identification
Use preference.
Wherein, user's usage record sample set includes that the corresponding sample of users of each sample Application Type uses data
Collection;Sample of users includes that corresponding sample of users mark, sample application program identification, interest tags and sample are inclined using data set
Good value.
Wherein, interest tags refer to the label for being different from tendency of the user with certain class Application Type, for example, user
The interest tags of often viewing video class application program, the corresponding user can be video.Predict that interest tags are according to emerging
Interesting label generates the interest tags for the prediction that model generates.Recall ratio is for every class sample Application Type, each sample
Total use of the prediction interest tags and the consistent number of users of known interest tags and such sample Application Type of user identifier
The ratio of amount.Recall ratio illustrates the prediction interest tags for corresponding to such sample Application Type and known interest closer to 1
The consistency of label is higher, and it is more appropriate to further illustrate that the classification thresholds of such sample Application Type are chosen.
Specifically, server obtains user's usage record sample set of interest tags, root from database or terminal
Classify according to known interest tags to it according to the user's usage record sample set got, obtains various kinds application class
The corresponding sample of users of type uses data set.The corresponding sample of various kinds application type obtained based on classification
User uses data set, and server calculates separately the quartile of the corresponding each sample preference value of various kinds application type
Number.
Sample of users based on above-mentioned known label uses data set, and server is according to various kinds application type from number
According to searching corresponding classification thresholds in library, and sample of users is screened using data set according to the classification thresholds found.
When each sample of users using sample preference value meets screening conditions in data set when, filtered out sample of users mark.Its
Middle screening conditions are: corresponding to each sample of users and use data set, sample preference value is greater than or equal to corresponding classification thresholds.
Server is identified according to the sample of users filtered out, and sample where sample of users mark is searched from database is used
Family can be correspondence and looked into using sample Application Type corresponding to data set, i.e. the prediction interest tags of sample of users mark
The sample Application Type found.Prediction interest tags and known corresponding interest based on sample of users using data set
Label, corresponds to every class sample Application Type, server judge the prediction interest tags of each sample of users mark with
Know whether interest tags are consistent, and with identification record judging result and stores in the server.When judging result is consistent can be with
Labeled as 1;Otherwise, it is labeled as 0.For example, in a certain sample Application Type, the known interest of some sample of users mark
Label is film, if prediction interest tags are also film, is recorded as 1;If the prediction interest tags of sample of users mark are
It has a meal, is then recorded as 0.
According to record as a result, server calculates the recall ratio of every class sample Application Type;It is answered further according to Different categories of samples
Corresponding classification thresholds are adjusted with the recall ratio of Program Type.If recall ratio does not meet adjustment threshold value, it is not required to classification thresholds
It is adjusted;If recall ratio meets adjustment threshold value, classification thresholds are adjusted.It is determined further according to classification thresholds adjusted
Sample of users uses the prediction label of data set, and calculates the recall ratio of every class sample Application Type, until user uses
When the recall ratio of record sample set does not meet the range of adjustment threshold value, then stop the adjustment to corresponding classification thresholds;Adjust threshold value
It can be set are as follows: recall ratio is lower than 95%.
In the present embodiment, user's usage record sample set based on known interest tags, is adjusted classification thresholds,
Classification thresholds are adjusted according to the recall ratio of calculated each Application Type, until looking into for each Application Type is complete
Rate does not meet adjustment threshold value.Classification thresholds are tested using user's usage record sample set, and are verified by recall ratio emerging
The accuracy rate of interesting label further improves the accuracy of interest tags.
It should be understood that although each step in the flow chart of Fig. 2 is successively shown according to the instruction of arrow, this
A little steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly state otherwise herein, these steps
It executes there is no the limitation of stringent sequence, these steps can execute in other order.Moreover, at least part in Fig. 2
Step may include that perhaps these sub-steps of multiple stages or stage are executed in synchronization to multiple sub-steps
It completes, but can execute at different times, the execution sequence in these sub-steps or stage, which is also not necessarily, successively to be carried out,
But it can be executed in turn or alternately at least part of the sub-step or stage of other steps or other steps.
In one embodiment, as shown in figure 3, providing a kind of device 300 for generating interest tags, comprising: use note
Record obtains module 302, classification thresholds determining module 304, screening Subscriber Identity Module 306 and interest tags generation module 308,
Wherein:
Usage record obtains module 302, for obtaining user's usage record collection of application program at the appointed time section, counts
Each application program identification is calculated corresponding to preference value corresponding to user identifier;User's usage record that user's usage record is concentrated
Including user identifier and application program identification.
Classification thresholds determining module 304, for determining Application Type based on application program identification, according to same application
Application program identification under Program Type corresponds to preference value corresponding to user identifier, determines each Application Type respectively
Classification thresholds;There are corresponding default interest tags for Application Type.
Subscriber Identity Module 306 is screened, for according to each Application Type determined based on user's usage record collection
User uses data set, and carries out conditional filtering according to classification thresholds, to filter out user identifier.
Interest tags generation module 308, for being used corresponding to data set according to user where the user identifier filtered out
Application Type, determine interest tags corresponding to the user identifier that filters out.
In one embodiment, it includes: data acquisition module and preference value computing module that above-mentioned usage record, which obtains module,.
Data acquisition module, for obtaining the corresponding number of users of each application program identification and the corresponding total use of user's usage record collection
Amount;Acquisition is corresponding with user identifier and application program identification to use weight;Preference value computing module, for according to total user
Number calculates each application program identification corresponding to preference corresponding to user identifier with the specific gravity of number of users and using weight
Value.
In one embodiment, above-mentioned classification thresholds determining module includes: sorting module, quantile acquisition module and divides
Class threshold calculation module.Sorting module, for being marked based on the application program identification under same application domain type corresponding to user
Know corresponding preference value, the corresponding preference value of same application domain type is ranked up by ascending order, obtains preference value
Ranking results;Quantile obtains module, for the ranking results according to preference value, calculates under same application domain type respectively
The quantile of corresponding each preference value;Classification thresholds computing module, for determining each Application Type according to quantile
Classification thresholds.
In one embodiment, above-mentioned quantile computing module includes: probability evaluation entity and cumulative probability computing module.
Probability evaluation entity determines each preference value under each Application Type corresponding for the ranking results according to preference value
Probability of occurrence in ranking results;Cumulative probability computing module, for being determined under each Application Type according to probability of occurrence
The cumulative probability of each preference value obtains the quantile of each preference value under each Application Type.
In one embodiment, it includes: that sorting data obtains module and quantile calculating mould that above-mentioned quantile, which obtains module,
Block.Sorting data obtains module, for obtaining row of each preference value in locating ranking results under each Application Type
Tagmeme and the corresponding ordering user number of each application program identification owning application type;Quantile computing module, being used for will
Sequence position of each preference value in locating ranking results under each Application Type obtains each application divided by ordering user number
The quantile of the corresponding each preference value of Program Type.
In one embodiment, above-mentioned classification thresholds computing module include: the first screening module, difference calculating module and
Second screening module.First screening module filters out big respectively for corresponding to each Application Type according to quantile
In or equal to corresponding preset threshold quantile;Difference calculating module, for corresponding to each Application Type, according to screening
Quantile out calculates the difference of adjacent quantile;Second screening module is calculated for obtaining corresponding each Application Type
Each of out quantile corresponding to maximum difference, obtains the classification thresholds of each Application Type.
In one embodiment, above-mentioned screening Subscriber Identity Module includes: usage record sample acquisition module, classification thresholds
Adjust module and conditional filtering module.Usage record sample acquisition module, the user for obtaining known interest tags use note
Record sample set;Classification thresholds adjust module, for being adjusted to classification thresholds according to user's usage record sample set;Condition
Screening module for using data set according to user, and carries out conditional filtering according to classification thresholds adjusted, to filter out use
Family mark.
In one embodiment, above-mentioned classification thresholds adjustment module includes: that sample of users usage record collection obtains module, sample
This user data set determining module, sample fractiles computing module, sample of users mark screening module, prediction interest tags generate
Module and recall ratio computing module.Sample of users usage record collection obtains module, is used for according to user's usage record sample set,
Being adjusted to classification thresholds includes: sample of users data set determining module, for pressing according to user's usage record sample set
Know that interest tags determine the sample of users of various kinds application type using data set, sample includes corresponding using data set
Sample of users mark, sample application program identification, interest tags and sample preference value;Sample fractiles computing module is used for base
Data set is used in the sample of users of the known label of various kinds application type, calculates the every of various kinds application type
The quantile of a sample preference value;Sample of users identifies screening module, for using data according to the sample of users of known label
Collection carries out conditional filtering according to classification thresholds, to filter out sample of users mark;Predict interest tags generation module, for according to
It is emerging according to being preset corresponding to sample Application Type of the sample of users using data set where the sample of users mark filtered out
Interesting label determines prediction interest tags corresponding to the user identifier filtered out;Recall ratio computing module, for being used according to sample
Looking into for the calculated every class sample Application Type of prediction interest tags and known corresponding interest tags of user data collection is complete
Rate adjusts classification thresholds.
In the present embodiment, it based on user's usage record collection of the application program obtained at the appointed time section, determines each
A application program identification corresponds to the preference value of user identifier, preferably characterizes the preference journey that user uses each application program
Degree.Further, preference corresponding to user identifier is corresponded to by the application program identification under analysis same application domain type
The overall distribution situation of value determines the classification thresholds of each Application Type with this, has fully considered same application domain type
The overall distribution situation of lower preference value provides more accurate screening foundation for subsequent screening user identifier.Furthermore by each application
The user of Program Type is screened using data set according to corresponding classification thresholds, to filter out qualified user's mark
Know, improves the accuracy rate for generating the interest tags of each behavior type.
Specific about the device for generating interest tags limits the method that may refer to above for interest tags are generated
Restriction, details are not described herein.Modules in the device of above-mentioned generation interest tags can be fully or partially through software, hard
Part and combinations thereof is realized.Above-mentioned each module can be embedded in the form of hardware or independently of in the processor in computer equipment,
It can also be stored in a software form in the memory in computer equipment, execute the above modules in order to which processor calls
Corresponding operation.
In one embodiment, a kind of computer equipment is provided, which can be server, internal junction
Composition can be as shown in Figure 4.The computer equipment include by system bus connect processor, memory, network interface and
Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment
Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data
Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating
The database of machine equipment is used to store user's usage record collection, user uses data set, classification thresholds data.The computer equipment
Network interface be used to communicate with external terminal by network connection.To realize one when the computer program is executed by processor
The method that kind generates interest tags.
It will be understood by those skilled in the art that structure shown in Fig. 4, only part relevant to application scheme is tied
The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment
It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with
Computer program, which performs the steps of when executing computer program obtains application program at the appointed time section
User's usage record collection calculates each application program identification corresponding to preference value corresponding to user identifier;User's usage record
User's usage record of concentration includes user identifier and application program identification;Application class is determined based on application program identification
Type, according to the application program identification under same application domain type correspond to user identifier corresponding to preference value, determine respectively
The classification thresholds of each Application Type;There are corresponding default interest tags for Application Type;It is used according to based on user
The user for each Application Type that record set determines uses data set, and carries out conditional filtering according to classification thresholds, with screening
User identifier out;It is emerging according to being preset corresponding to Application Type of the user using data set where the user identifier filtered out
Interesting label determines interest tags corresponding to the user identifier filtered out.
In one embodiment, it is also performed the steps of when processor executes computer program and obtains each application program
Identify corresponding number of users and the corresponding total number of users of user's usage record collection;It obtains and user identifier and application program identification
It is corresponding to use weight;According to the specific gravity of total number of users and number of users and weight is used, calculates each application program identification pair
It should be in the preference value corresponding to user identifier.
In one embodiment, it also performs the steps of when processor executes computer program based on same application domain
Application program identification under type corresponds to preference value corresponding to user identifier, and same application domain type is corresponding
Preference value is ranked up by ascending order, obtains the ranking results of preference value;According to the ranking results of preference value, same application journey is calculated
The quantile of corresponding each preference value under sequence type;The classification thresholds of each Application Type are determined according to quantile.
In one embodiment, the sequence according to preference value is also performed the steps of when processor executes computer program
As a result, determining probability of occurrence of each preference value in corresponding ranking results under each Application Type;According to probability of occurrence
The cumulative probability for determining each preference value under each Application Type obtains each preference value under each Application Type
Quantile.In one embodiment, it is also performed the steps of when processor executes computer program and obtains each Application Type
Under sequence position of each preference value in locating ranking results and each application program identification owning application type it is corresponding
Ordering user number;Sequence position of each preference value under each Application Type in locating ranking results is used divided by sequence
Amount obtains the quantile of the corresponding each preference value of each Application Type.
In one embodiment, it is also performed the steps of when processor executes computer program according to quantile, is corresponded to
Each Application Type filters out the quantile more than or equal to corresponding preset threshold respectively;Corresponding to each application program
Type calculates the difference of adjacent quantile according to the quantile filtered out;It is calculated to obtain corresponding each Application Type
Quantile corresponding to each maximum difference, obtains the classification thresholds of each Application Type.
In one embodiment, acquisition known interest tags are also performed the steps of when processor executes computer program
User's usage record sample set;According to user's usage record sample set, classification thresholds are adjusted;Number is used according to user
Conditional filtering is carried out according to collection, and according to classification thresholds adjusted, to filter out user identifier.
In one embodiment, it also performs the steps of when processor executes computer program according to user's usage record
Sample set determines that the sample of users of various kinds application type uses data set by known interest tags, and sample uses data
Collection includes corresponding sample of users mark, sample application program identification, interest tags and sample preference value;Based on various kinds this application
The sample of users of the known label of Program Type uses data set, calculates each sample preference value of various kinds application type
Quantile;Data set is used according to the sample of users of known label, conditional filtering is carried out according to classification thresholds, to filter out sample
This user identifier;According to corresponding to sample Application Type of the user using data set where the sample of users mark filtered out
Default interest tags, determine prediction interest tags corresponding to the user identifier that filters out;According to sample of users data set
Predict the recall ratio of interest tags and the calculated every class sample Application Type of known corresponding interest tags, adjustment classification
Threshold value.
In the present embodiment, it based on user's usage record collection of the application program obtained at the appointed time section, determines each
A application program identification corresponds to the preference value of user identifier, preferably characterizes the preference journey that user uses each application program
Degree.Further, preference corresponding to user identifier is corresponded to by the application program identification under analysis same application domain type
The overall distribution situation of value determines the classification thresholds of each Application Type with this, has fully considered same application domain type
The overall distribution situation of lower preference value provides more accurate screening foundation for subsequent screening user identifier.Furthermore by each application
The user of Program Type is screened using data set according to corresponding classification thresholds, to filter out qualified user's mark
Know, improves the accuracy rate for generating the interest tags of each behavior type.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated
Machine program performs the steps of the user's usage record collection for obtaining application program at the appointed time section when being executed by processor,
Each application program identification is calculated corresponding to preference value corresponding to user identifier;The user that user's usage record is concentrated uses note
Record includes user identifier and application program identification;Application Type is determined based on application program identification, according to same application journey
Application program identification under sequence type corresponds to preference value corresponding to user identifier, determines point of each Application Type respectively
Class threshold value;There are corresponding default interest tags for Application Type;According to each application determined based on user's usage record collection
The user of Program Type uses data set, and carries out conditional filtering according to classification thresholds, to filter out user identifier;According to screening
Default interest tags corresponding to Application Type of the user using data set where user identifier out, what determination filtered out
Interest tags corresponding to user identifier.
In one embodiment, it is performed the steps of when computer program is executed by processor and obtains each application program
Identify corresponding number of users and the corresponding total number of users of user's usage record collection;It obtains and user identifier and application program identification
It is corresponding to use weight;According to the specific gravity of total number of users and number of users and weight is used, calculates each application program identification pair
It should be in the preference value corresponding to user identifier.
In one embodiment, it performs the steps of when computer program is executed by processor based on same application domain
Application program identification under type corresponds to preference value corresponding to user identifier, and same application domain type is corresponding
Preference value is ranked up by ascending order, obtains the ranking results of preference value;According to the ranking results of preference value, same application journey is calculated
The quantile of corresponding each preference value under sequence type;The classification thresholds of each Application Type are determined according to quantile.
In one embodiment, the sequence according to preference value is performed the steps of when computer program is executed by processor
As a result, determining probability of occurrence of each preference value in corresponding ranking results under each Application Type;According to probability of occurrence
The cumulative probability for determining each preference value under each Application Type obtains each preference value under each Application Type
Quantile.
In one embodiment, it is performed the steps of when computer program is executed by processor and obtains each application class
Sequence position and each application program identification owning application type pair of each preference value in locating ranking results under type
The ordering user number answered;By sequence position of each preference value under each Application Type in locating ranking results divided by sequence
Number of users obtains the quantile of the corresponding each preference value of each Application Type.
In one embodiment, it is performed the steps of when computer program is executed by processor according to quantile, is corresponded to
Each Application Type filters out the quantile more than or equal to corresponding preset threshold respectively;Corresponding to each application program
Type calculates the difference of adjacent quantile according to the quantile filtered out;It is calculated to obtain corresponding each Application Type
Quantile corresponding to each maximum difference, obtains the classification thresholds of each Application Type.
In one embodiment, acquisition known interest tags are performed the steps of when computer program is executed by processor
User's usage record sample set;According to user's usage record sample set, classification thresholds are adjusted;Number is used according to user
Conditional filtering is carried out according to collection, and according to classification thresholds adjusted, to filter out user identifier.
In one embodiment, it is performed the steps of when computer program is executed by processor according to user's usage record
Sample set determines that the sample of users of various kinds application type uses data set by known interest tags, and sample uses data
Collection includes corresponding sample of users mark, sample application program identification, interest tags and sample preference value;Based on various kinds this application
The sample of users of the known label of Program Type uses data set, calculates each sample preference value of various kinds application type
Quantile;Data set is used according to the sample of users of known label, conditional filtering is carried out according to classification thresholds, to filter out sample
This user identifier;According to corresponding to sample Application Type of the user using data set where the sample of users mark filtered out
Default interest tags, determine prediction interest tags corresponding to the user identifier that filters out;According to sample of users data set
Predict the recall ratio of interest tags and the calculated every class sample Application Type of known corresponding interest tags, adjustment classification
Threshold value.
In the present embodiment, it based on user's usage record collection of the application program obtained at the appointed time section, determines each
A application program identification corresponds to the preference value of user identifier, preferably characterizes the preference journey that user uses each application program
Degree.Further, preference corresponding to user identifier is corresponded to by the application program identification under analysis same application domain type
The overall distribution situation of value determines the classification thresholds of each Application Type with this, has fully considered same application domain type
The overall distribution situation of lower preference value provides more accurate screening foundation for subsequent screening user identifier.Furthermore by each application
The user of Program Type is screened using data set according to corresponding classification thresholds, to filter out qualified user's mark
Know, improves the accuracy rate for generating the interest tags of each behavior type.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Instruct relevant hardware to complete by computer program, computer program to can be stored in a non-volatile computer readable
It takes in storage medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, this Shen
Please provided by any reference used in each embodiment to memory, storage, database or other media, may each comprise
Non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM
(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include
Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms,
Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing
Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment
In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance
Shield all should be considered as described in this specification.
Above embodiments only express the several embodiments of the application, and the description thereof is more specific and detailed, but can not
Therefore it is construed as limiting the scope of the patent.It should be pointed out that for those of ordinary skill in the art,
Under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the protection scope of the application.
Therefore, the scope of protection shall be subject to the appended claims for the application patent.
Claims (10)
1. a kind of method for generating interest tags, which comprises
The user's usage record collection for obtaining application program at the appointed time section calculates each application program identification corresponding to user
The corresponding preference value of mark;User's usage record that user's usage record is concentrated includes user identifier and application program mark
Know;
Application Type is determined based on application program identification, it is corresponding according to the application program identification under same application domain type
The preference value corresponding to user identifier, determines the classification thresholds of each Application Type respectively;The application class
There are corresponding default interest tags for type;
Data set is used according to the user of each Application Type determined based on user's usage record collection, and according to described
Classification thresholds carry out conditional filtering, to filter out user identifier;
Default interest tags corresponding to the Application Type of data set are used according to user where the user identifier filtered out,
Determine interest tags corresponding to the user identifier filtered out.
2. the method according to claim 1, wherein user's usage record that user's usage record is concentrated is also
Including using weight;The basis at the appointed time in section application program user's usage record collection, calculate each application program
Mark corresponds to preference value corresponding to user identifier
Obtain the corresponding number of users of each application program identification and the corresponding total number of users of user's usage record collection;
Acquisition is corresponding with the user identifier and the application program identification to use weight;
According to the specific gravity of total number of users and the number of users and it is described use weight, calculate each application program identification pair
It should be in the preference value corresponding to user identifier.
3. the method according to claim 1, wherein the application program under the type according to same application domain
Mark corresponds to the preference value corresponding to user identifier, determines that the classification thresholds of each Application Type include: respectively
Correspond to preference value corresponding to user identifier based on the application program identification under the same application domain type, by phase
It is ranked up with the corresponding preference value of Application Type by ascending order, obtains the ranking results of preference value;
According to the ranking results of the preference value, the quartile of corresponding each preference value under same application domain type is calculated
Number;
The classification thresholds of each Application Type are determined according to the quantile.
4. according to the method described in claim 3, it is characterized in that, the ranking results according to the preference value, calculate each
The quantile of corresponding each preference value includes: under Application Type
According to the ranking results of the preference value, determine each preference value under each Application Type in corresponding ranking results
Probability of occurrence;The cumulative probability that each preference value under each Application Type is determined according to the probability of occurrence obtains each
The quantile of each preference value under Application Type;Or,
Obtain sequence position and each application program mark of each preference value under each Application Type in locating ranking results
Know the corresponding ordering user number of owning application type;Each preference value under each Application Type is tied in locating sequence
Sequence position in fruit obtains the quartile of the corresponding each preference value of each Application Type divided by the ordering user number
Number.
5. according to the method described in claim 3, it is characterized in that, described determine each Application Type according to the quantile
Classification thresholds include:
According to the quantile, corresponds to each Application Type, filtered out respectively more than or equal to corresponding preset threshold
Quantile;
Corresponding to each Application Type, the difference of adjacent quantile is calculated according to the quantile filtered out;
Quantile corresponding to corresponding each calculated each maximum difference of Application Type is obtained, each application program is obtained
The classification thresholds of type.
6. the method according to claim 1, wherein what the basis was determined based on user's usage record collection
The user of each Application Type uses data set, and carries out conditional filtering according to the classification thresholds, to filter out user's mark
Knowledge includes:
Obtain user's usage record sample set of known interest tags;
According to user's usage record sample set, the classification thresholds are adjusted;
Data set is used according to the user, and carries out conditional filtering according to the classification thresholds adjusted, to filter out use
Family mark.
7. according to the method described in claim 6, it is characterized in that, user's usage record in user's usage record sample set
Sample includes sample of users mark, interest tags, sample Application Type, sample application program identification and the sample right to use
Weight;
It is described according to user's usage record sample set, the classification thresholds be adjusted include:
According to user's usage record sample set, the sample of various kinds application type is determined by the known interest tags
User uses data set, and the sample includes corresponding sample of users mark, sample application program identification, interest using data set
Label and sample preference value;
The sample of users of known label based on various kinds application type uses data set, calculates various kinds application class
The quantile of each sample preference value of type;
Data set is used according to the sample of users of the known label, conditional filtering is carried out according to the classification thresholds, with screening
Sample of users identifies out;
According to corresponding to sample Application Type of the sample of users using data set where the sample of users mark filtered out
Default interest tags, determine prediction interest tags corresponding to the user identifier filtered out;
According to the prediction interest tags of the sample of users data set and the calculated every class sample of known corresponding interest tags
The recall ratio of Application Type adjusts the classification thresholds.
8. a kind of device for generating interest tags, which is characterized in that described device includes:
Usage record obtains module, for obtaining user's usage record collection of application program at the appointed time section, calculates each
Application program identification corresponds to preference value corresponding to user identifier;User's usage record packet that user's usage record is concentrated
Include user identifier and application program identification;
Classification thresholds determining module, for determining Application Type based on application program identification, according to same application domain class
Application program identification under type corresponds to the preference value corresponding to user identifier, determines point of each Application Type respectively
Class threshold value;There are corresponding default interest tags for the Application Type;
Subscriber Identity Module is screened, for the user according to each Application Type determined based on user's usage record collection
Conditional filtering is carried out using data set, and according to the classification thresholds, to filter out user identifier;
Interest tags generation module, for using the Application Type of data set according to user where the user identifier filtered out
Corresponding default interest tags determine interest tags corresponding to the user identifier filtered out.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists
In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910525807.XA CN110377821A (en) | 2019-06-18 | 2019-06-18 | Generate method, apparatus, computer equipment and the storage medium of interest tags |
PCT/CN2020/086369 WO2020253369A1 (en) | 2019-06-18 | 2020-04-23 | Method and device for generating interest tag, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910525807.XA CN110377821A (en) | 2019-06-18 | 2019-06-18 | Generate method, apparatus, computer equipment and the storage medium of interest tags |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110377821A true CN110377821A (en) | 2019-10-25 |
Family
ID=68249072
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910525807.XA Pending CN110377821A (en) | 2019-06-18 | 2019-06-18 | Generate method, apparatus, computer equipment and the storage medium of interest tags |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110377821A (en) |
WO (1) | WO2020253369A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111079023A (en) * | 2019-12-30 | 2020-04-28 | Oppo广东移动通信有限公司 | Target account identification method, device, terminal and storage medium |
WO2020253369A1 (en) * | 2019-06-18 | 2020-12-24 | 深圳壹账通智能科技有限公司 | Method and device for generating interest tag, computer equipment and storage medium |
WO2021159276A1 (en) * | 2020-02-11 | 2021-08-19 | Citrix Systems, Inc. | Systems and methods for expedited access to applications |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110066949A1 (en) * | 2009-09-15 | 2011-03-17 | International Business Machines Corporation | Visualization of real-time social data informatics |
CN104700289A (en) * | 2015-03-17 | 2015-06-10 | 中国联合网络通信集团有限公司 | Advertising method and device |
CN106503269A (en) * | 2016-12-08 | 2017-03-15 | 广州优视网络科技有限公司 | Method, device and server that application is recommended |
CN107908686A (en) * | 2017-10-31 | 2018-04-13 | 广东欧珀移动通信有限公司 | Information-pushing method, device, server and readable storage medium storing program for executing |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9946798B2 (en) * | 2015-06-18 | 2018-04-17 | International Business Machines Corporation | Identification of target audience for content delivery in social networks by quantifying semantic relations and crowdsourcing |
US10636053B2 (en) * | 2017-05-31 | 2020-04-28 | Facebook, Inc. | Evaluating content publisher options against benchmark publisher |
RU2757546C2 (en) * | 2017-07-25 | 2021-10-18 | Общество С Ограниченной Ответственностью "Яндекс" | Method and system for creating personalized user parameter of interest for identifying personalized target content element |
CN110377821A (en) * | 2019-06-18 | 2019-10-25 | 深圳壹账通智能科技有限公司 | Generate method, apparatus, computer equipment and the storage medium of interest tags |
-
2019
- 2019-06-18 CN CN201910525807.XA patent/CN110377821A/en active Pending
-
2020
- 2020-04-23 WO PCT/CN2020/086369 patent/WO2020253369A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110066949A1 (en) * | 2009-09-15 | 2011-03-17 | International Business Machines Corporation | Visualization of real-time social data informatics |
CN104700289A (en) * | 2015-03-17 | 2015-06-10 | 中国联合网络通信集团有限公司 | Advertising method and device |
CN106503269A (en) * | 2016-12-08 | 2017-03-15 | 广州优视网络科技有限公司 | Method, device and server that application is recommended |
CN107908686A (en) * | 2017-10-31 | 2018-04-13 | 广东欧珀移动通信有限公司 | Information-pushing method, device, server and readable storage medium storing program for executing |
Non-Patent Citations (1)
Title |
---|
杨晶;成卫青;郭常忠;: "基于标准标签的用户兴趣模型研究", 计算机技术与发展, no. 10 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020253369A1 (en) * | 2019-06-18 | 2020-12-24 | 深圳壹账通智能科技有限公司 | Method and device for generating interest tag, computer equipment and storage medium |
CN111079023A (en) * | 2019-12-30 | 2020-04-28 | Oppo广东移动通信有限公司 | Target account identification method, device, terminal and storage medium |
WO2021159276A1 (en) * | 2020-02-11 | 2021-08-19 | Citrix Systems, Inc. | Systems and methods for expedited access to applications |
US11455227B2 (en) | 2020-02-11 | 2022-09-27 | Citrix Systems, Inc. | Systems and methods for expedited access to applications |
US11748082B2 (en) | 2020-02-11 | 2023-09-05 | Citrix Systems, Inc. | Systems and methods for expedited access to applications |
Also Published As
Publication number | Publication date |
---|---|
WO2020253369A1 (en) | 2020-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9092549B2 (en) | Recommendation of search keywords based on indication of user intention | |
CN107451199B (en) | Question recommendation method, device and equipment | |
CN109408724A (en) | Multimedia resource estimates the determination method, apparatus and server of clicking rate | |
CN108090208A (en) | Fused data processing method and processing device | |
CN108510402A (en) | Insurance kind information recommendation method, device, computer equipment and storage medium | |
JP2020507135A (en) | Exclusive agent pool distribution method, electronic device, and computer-readable storage medium | |
CN110377821A (en) | Generate method, apparatus, computer equipment and the storage medium of interest tags | |
CN108563680A (en) | Resource recommendation method and device | |
CN111538901A (en) | Article recommendation method and device, server and storage medium | |
CN112104505B (en) | Application recommendation method, device, server and computer readable storage medium | |
CN111814759B (en) | Method and device for acquiring face quality label value, server and storage medium | |
CN112749330B (en) | Information pushing method, device, computer equipment and storage medium | |
CN107977445A (en) | Application program recommends method and device | |
CN113076416A (en) | Information heat evaluation method and device and electronic equipment | |
CN112330055A (en) | User complaint prediction method and device | |
CN111061948A (en) | User label recommendation method and device, computer equipment and storage medium | |
CN111177500A (en) | Data object classification method and device, computer equipment and storage medium | |
CN102930016B (en) | A kind of method and apparatus for providing Search Results on mobile terminals | |
Kuzovkin et al. | Image selection in photo albums | |
CN108228869A (en) | The method for building up and device of a kind of textual classification model | |
CN111680236A (en) | Menu display method and device, terminal equipment and storage medium | |
CN108460475A (en) | Poor student's prediction technique and device based on network playing by students behavior | |
CN108763242A (en) | Label generating method and device | |
CN113297406A (en) | Picture searching method and system and electronic equipment | |
CN108595513B (en) | Video search cheating processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |