WO2020253369A1 - 生成兴趣标签的方法、装置、计算机设备和存储介质 - Google Patents

生成兴趣标签的方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2020253369A1
WO2020253369A1 PCT/CN2020/086369 CN2020086369W WO2020253369A1 WO 2020253369 A1 WO2020253369 A1 WO 2020253369A1 CN 2020086369 W CN2020086369 W CN 2020086369W WO 2020253369 A1 WO2020253369 A1 WO 2020253369A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
application
application type
sample
preference value
Prior art date
Application number
PCT/CN2020/086369
Other languages
English (en)
French (fr)
Inventor
苏显政
蔡健
郭凌峰
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2020253369A1 publication Critical patent/WO2020253369A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • This application relates to the field of information processing technology, in particular to a method, device, computer equipment and storage medium for generating interest tags.
  • differentiated services such as personalized recommendation and diversified marketing have been widely used in people's lives, and these differentiated services are inseparable from user portraits.
  • the core job of user portrait is to generate labels for users.
  • user behavior can be analyzed and predicted from a macro perspective, which helps to improve the accuracy of the company's marketing behavior for specific users.
  • a method for generating interest tags comprising:
  • the user usage record in the user usage record set includes the user ID and the application ID;
  • the interest tag corresponding to the filtered user identification is determined.
  • a device for generating interest tags comprising:
  • the usage record acquisition module is used to acquire the user usage record set of the application within a specified time period, and calculate the preference value corresponding to each application ID corresponding to the user ID; the user usage record in the user usage record set includes the user ID And application ID;
  • the classification threshold determining module is configured to determine the application type based on the application identifier, and determine the classification threshold of each application type according to the preference value corresponding to the user identifier corresponding to the application identifier under the same application type; There is a preset interest tag corresponding to the application type;
  • the user identification screening module is configured to perform condition screening according to the classification threshold according to the user usage data set of each application type determined based on the user usage record set to filter out the user identification;
  • the interest tag generation module is used to determine the interest tag corresponding to the screened user identification according to the preset interest tag corresponding to the application type of the user usage data set where the screened user identification is located.
  • a computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method for generating interest tags when the computer program is executed.
  • a computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps of the above method for generating interest tags are realized.
  • the above method, device, computer equipment, and storage medium for generating interest tags determine the preference value of each application identifier corresponding to the user identifier based on the user usage record set of the application acquired within a specified time period, so as to better characterize user usage The degree of preference for each application. Furthermore, by analyzing the overall distribution of the preference values corresponding to the user IDs under the same application type, the classification threshold of each application type is determined, and the overall preference value under the same application type is fully considered. The distribution situation provides a more accurate screening basis for subsequent screening of user identification. Furthermore, the user usage data set of each application type is filtered according to the corresponding classification threshold, so as to filter out qualified user IDs, which improves the accuracy of generating interest tags for each behavior type.
  • Fig. 1 is an application scenario diagram of a method for generating interest tags in an embodiment
  • FIG. 2 is a schematic flowchart of a method for generating interest tags in an embodiment
  • Figure 3 is a structural block diagram of a device for generating interest tags in an embodiment
  • Fig. 4 is an internal structure diagram of a computer device in an embodiment.
  • the method for generating interest tags provided in this application can be applied to the application environment as shown in FIG. 1.
  • the terminal 102 communicates with the server 104 through the network through the network.
  • the server 104 obtains the user usage record set of the application within a specified time period, and calculates the preference value corresponding to each application ID corresponding to the user ID; wherein the user usage record set can be triggered by the terminal 102; and download according to the same application type
  • the application identifier of corresponds to the preference value corresponding to the user identifier, and the classification threshold of each application type is determined respectively.
  • the server 104 conditionally filters the user usage data set of the corresponding application type according to the obtained classification threshold to filter out the user identification; according to the application type corresponding to the filtered user identification, the server 104 uses the application type as The interest tag of the filtered user identification.
  • the terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server 104 may be implemented as an independent server or a server cluster composed of multiple servers.
  • a method for generating interest tags is provided. Taking the method applied to the server in FIG. 1 as an example for description, the method includes the following steps:
  • Step S202 Obtain the user usage record set of the application within a specified time period, and calculate the preference value corresponding to each application ID corresponding to the user ID; the user usage record in the user usage record set includes the user ID and the application ID.
  • the user usage record set includes each user usage record, and each user usage record includes a user ID, an application program ID, and a usage weight.
  • User usage records contain a wealth of information, such as the similarity between users, the similarity between applications, and the degree of user preference for each application.
  • the user identifier is a unique identifier that distinguishes each user, and may be a user ID (Identification).
  • the application identifier is a unique identifier that distinguishes each application.
  • the preference value represents the user's preference for using the application corresponding to the application identifier; the preference value is related to the number of users corresponding to the application identifier, the total number of users corresponding to the user usage record set, and the weight of use .
  • the user triggers the terminal to generate a user usage record set of each application, and transmits the generated user usage record set to the server through the network, or the user usage record set can be directly stored in the terminal.
  • the server may obtain a record set of user usage in a specified time period from each terminal, or may obtain a record set of user use in a specified time period from the server. After the server obtains the user usage record set of the application within the specified time period, it calculates the preference value of each application ID corresponding to the user ID according to the user usage record set.
  • the server obtains the number of users corresponding to each application identifier and the total number of users corresponding to the user use record set based on each user use record in the user use record set; and obtains the corresponding user identifier and application identifier Use weight, and then calculate the preference value of the user ID corresponding to each application ID according to the proportion of the number of users in the total number of users and the usage weight.
  • Step S204 Determine the application type based on the application identifier, and determine the classification threshold of each application type according to the preference value corresponding to the user identifier corresponding to the application identifier under the same application type, and the application type has a corresponding preset Interest tags.
  • the application type refers to the category that distinguishes each application, such as the video type.
  • the classification threshold refers to the classification judgment condition of the preference value in the application type to which it belongs. According to the classification threshold, it can be determined whether the user identifier corresponding to the preference value belongs to the application type to which the preference value belongs.
  • the classification threshold characterizes the proportion of each user's usage behavior of the application in the overall usage behavior of the application type under the same application type.
  • the server calculates the preference value corresponding to each application identifier corresponding to the user identifier based on the user usage record set, and determines the corresponding application type according to each application identifier, and there is a corresponding preset for each application type Interest tag; the preset interest tag can be consistent with the application type, or it can be an identifier that is consistent with the application type.
  • the server determines the classification threshold of each application type according to the calculated preference value. The classification threshold can be used to determine whether the user identifier corresponding to the preference value belongs to the application type to which the preference value belongs.
  • Step S206 According to the user usage data set of each application type determined based on the user usage record set, condition filtering is performed according to the classification threshold to filter out the user identification.
  • the user usage data set includes user usage data sets corresponding to each application program type, and the user usage data set includes user IDs, application IDs, and preference values corresponding to each other.
  • the server performs condition filtering according to the classification threshold corresponding to the application type of the user usage data set, so as to filter out the qualified user identification for the user usage data set .
  • Step S208 Determine the interest tag corresponding to the filtered user identification according to the application type corresponding to the user usage data set where the filtered user identification is located.
  • the interest tag refers to a tag that is different from the user's tendency to have a certain type of behavior; for example, a user often uses a video application, and the corresponding interest tag of the user is a video.
  • the server obtains from the database the application type corresponding to the user usage data set where the user ID is located, that is, the interest tag of the user ID is the corresponding application Types of.
  • the preference value of each application identifier corresponding to the user identifier is determined based on the user usage record set of the application acquired within a specified time period, which better characterizes the user's preference for using each application. Furthermore, by analyzing the overall distribution of the preference values corresponding to the user IDs under the same application type, the classification threshold of each application type is determined, and the overall preference value under the same application type is fully considered. The distribution situation provides a more accurate screening basis for subsequent screening of user identification. Furthermore, the user usage data set of each application type is filtered according to the corresponding classification threshold, so as to filter out qualified user IDs, which improves the accuracy of generating interest tags for each behavior type.
  • the user usage records in the user usage record set also include usage weights; according to the user usage record set of the application within a specified time period, calculating the preference value corresponding to each application identifier corresponding to the user identifier includes The following steps: Obtain the number of users corresponding to each application ID and the total number of users corresponding to the user usage record set; Obtain the usage weight corresponding to the user ID and application ID; According to the proportion of the number of users to the total number of users and the usage weight Calculate the preference value corresponding to each application ID corresponding to the user ID.
  • the usage weight characterizes the proportion of the usage degree of a specific application among various applications used by the user.
  • the usage weight can be determined according to the installation information, usage times, usage duration, and power consumption of the application.
  • the server obtains the number of users corresponding to each application ID and the total number of users corresponding to the user usage record set based on the obtained user usage record set; and obtains the corresponding usage weight from the database according to the user ID and application ID.
  • the server calculates the preference value corresponding to each application identifier corresponding to the user identifier according to the acquired total number of users, number of users, and usage weight. That is, the server calculates the preference value corresponding to each application identifier corresponding to the user identifier according to the proportion of the total number of users and the number of users corresponding to the application identifier and the usage weight corresponding to the application identifier.
  • the preference value is positively correlated with the usage weight corresponding to the application program identifier, and is positively correlated with the proportion of the number of users corresponding to the application program.
  • the proportion of the number of users increases as the total number of users corresponding to the user usage record set increases, and decreases as the number of users corresponding to the application identifier increases.
  • the preference value may be the product of the proportion of the number of users corresponding to the application identifier and the use weight corresponding to the application identifier; the proportion of the number of users may be the logarithmic value of the ratio of the total number of users to the number of users corresponding to the application identifier.
  • the denominator of the variable parameter x in the function is 0, and 1 can also be added to the denominator of x.
  • each application identifier based on the number of users corresponding to each application identifier, the total number of users corresponding to the user usage record set, and the usage weight corresponding to each application identifier corresponding to the user identifier, the calculation of each application identifier corresponds to The preference value corresponding to the user ID.
  • determining the classification threshold of each application type according to the preference value corresponding to the user identifier corresponding to the application identifier under the same application type includes the following steps: based on the application identifier under the same application type Corresponding to the preference value corresponding to the user ID, sort the respective preference values of the same application type in ascending order to obtain the sorting result of the preference value; according to the sorting result of the preference value, calculate each corresponding to the same application type The quantile of the preference value; the classification threshold of each application type is determined according to the quantile.
  • the value range of the quantile is greater than 0 and less than or equal to 1.
  • the server based on the obtained user usage record set and the calculated preference value, under the same application type, the server respectively sorts the respective preference values of the same application type in ascending order to obtain each application The sort result of the preference value of the type. According to the obtained ranking results of each preference value, the server calculates the quantile of each preference value corresponding to the same application type; and determines the classification threshold corresponding to each application type according to the quantile, that is, the classification threshold.
  • the value range can be between 0 and 1, and can be 1.
  • the ranking result corresponding to each application type is obtained; further, the ranking result of each preference value corresponding to each application type is calculated according to the ranking result.
  • Quantile determine the classification threshold of each behavior type according to the calculated quantiles. The overall distribution of quantiles of each application type is used to determine the classification threshold, and the overall distribution is fully considered, which provides a basis for the subsequent generation of interest tags.
  • calculating the quantile of each preference value corresponding to each application type according to the sorting result of the preference value includes the following steps: according to the sorting result of the preference value, determining each application type The occurrence probability of each preference value in the corresponding ranking result; the cumulative probability of each preference value under each application type is determined according to the occurrence probability, and the quantile of each preference value under each application type is obtained.
  • the occurrence probability refers to the probability of each preference value in the user usage data set corresponding to a certain behavior type.
  • Cumulative probability refers to adding up the occurrence probabilities of all preference values that do not exceed the preference value in the user usage data set corresponding to a certain behavior type, and the result is the cumulative probability.
  • the server separately calculates the occurrence probability of each preference value corresponding to each application type in the sorting result according to the obtained sorting result of the preference value corresponding to each application type. Based on the calculated occurrence probability, the server determines the cumulative probability of each preference value corresponding to each application type according to the occurrence probability, that is, the cumulative probability is the quantile of the corresponding preference value.
  • the occurrence probability of each preference value under each application type in the corresponding ranking result is determined based on the ranking result of the preference value, and the cumulative probability of each preference value under each application type is further obtained according to the occurrence probability , So as to get the quantile of each preference value under each application type.
  • the cumulative probability is used to calculate the quantile, which reflects the overall proportion of each application type in the overall situation, fully considers the relationship between the data, and provides a more accurate screening basis for subsequent screening of user identification.
  • calculating the quantile of each preference value corresponding to each application type according to the ranking result of the preference value includes the following steps: obtaining the ranking of each preference value under each application type The ranking position in the result and the number of ranking users corresponding to the application type to which each application identifier belongs; the ranking position of each preference value under each application type in the ranking result is divided by the number of ranking users to obtain each application The quantile of each preference value corresponding to each type.
  • the sort position refers to the sorting of each element in a data set according to a certain logic, and the position of each element in the data set.
  • the number of sorted users refers to the total number of all corresponding elements in a data set.
  • the server respectively obtains the ranking position and each preference value of each preference value corresponding to each application type in the preference value ranking result.
  • the server After the server obtains the corresponding data, it divides the ranking bit of each preference value corresponding to each application type by the number of ranking users corresponding to the application type, that is, the calculated result is the preference value corresponding to each application type. Quantile.
  • the data set includes preference values corresponding to each application identifier corresponding to the user identifier; each preference value is sorted in ascending order to obtain the preference value sorting result. If the preference value A in the data set is ranked 5 in the corresponding ranking result, and the number of ranked users of the preference value A in the application type is 10, then the quantile of the preference value is 5/10*100 %, that is, the quantile is 50%.
  • the ranking result of the preference value is: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9; when the preference value is 6, the corresponding quantile is 70%.
  • each preference value corresponding to each application type is determined based on the ranking position of each preference value in the ranking result and the number of ranking users corresponding to each application type.
  • the quantile of the value By determining the quantile by the ranking position and the number of ranking users, the amount of calculation can be further reduced at the computer level, thereby increasing the calculation speed and the rate of generating interest tags.
  • determining the classification threshold of each application type according to the quantile includes the following steps: according to the quantile, corresponding to each application type, respectively filtering out scores greater than or equal to the corresponding first preset threshold. Digits; corresponding to each application type, calculate the difference between adjacent quantiles based on the filtered quantile; obtain the quantile corresponding to each maximum difference calculated for each application type , Get the classification threshold of each application type.
  • the preset threshold is a threshold value for judging the quantile set in advance, and the threshold can be stored in a database; the preset threshold is a threshold value for the quantile corresponding to each application type.
  • the difference is the calculation result obtained by subtracting two data; it can be the result obtained by subtracting two adjacent quantiles.
  • the server obtains the preset threshold value of the corresponding application type from the database for the quantile corresponding to each application type. According to the preset threshold, the quantiles greater than or equal to the preset threshold are filtered out. Corresponding to each application type, the server calculates the difference between two adjacent quantiles according to the filtered quantiles. The server obtains the two quantiles corresponding to the largest difference according to the calculated difference value for each application type, and uses the quantile with the lower rank as the classification threshold corresponding to the application type.
  • the classification threshold of the preference value corresponding to each application type is determined based on the quantile, and the quantile with obvious distribution among the application types is selected as the classification threshold of the application type. Furthermore, making full use of the overall distribution characteristics of data of each application type provides a guarantee for the accuracy of interest tags.
  • the method includes the following steps: obtaining user usage of known interest tags Record sample set; adjust the classification threshold according to the user's use of the record sample set; according to the user's use of the data set, and perform condition screening according to the adjusted classification threshold to filter out the user identification.
  • the user usage record sample set includes various user usage record samples
  • the user usage record set includes user usage data sets corresponding to each application type
  • the user usage data sets include user IDs, application IDs and preference values corresponding to each other.
  • the server obtains a sample set of user usage records of interest tags from a database or a terminal, and adjusts the classification threshold corresponding to each application type according to the obtained sample set of user usage records. Further, based on the user usage data set, the server conditionally filters each preference value corresponding to each application type according to the adjusted classification threshold, so as to filter out user IDs that meet the above-mentioned preference value filtering conditions.
  • the classification threshold corresponding to each application type is adjusted based on the user's usage record sample set of known interest tags, so as to obtain the adjusted classification threshold.
  • the classification threshold is tested by using a sample set of user records to improve the accuracy of interest tags.
  • the user usage record sample in the user usage record sample set includes the sample user ID, interest tag, sample application type, sample application ID and sample usage weight; the classification threshold is adjusted according to the user usage record sample set, It includes the following steps: Determine the sample user usage data set of each sample application type according to the known interest label according to the user usage record sample set.
  • the sample usage data set includes the corresponding sample user ID, sample application ID, interest label and sample preference Value; Based on the sample user usage data set of the known label of each sample application type, calculate the quantile of each sample preference value of each sample application type; According to the sample user usage data set of the known label, according to the classification threshold Perform conditional screening to filter out sample user IDs; determine the predicted interest label corresponding to the selected user ID according to the application type corresponding to the user usage data set where the sample user ID is screened; predict according to the sample user data set The recall rate of each sample application type calculated by the interest tag and the known corresponding interest tag, and the classification threshold is adjusted.
  • the user usage record sample set includes various user usage record samples, and each user usage record sample includes sample user identification, interest tag, sample application type, sample application identification, and sample usage weight.
  • the sample user ID is a unique ID that distinguishes each sample user.
  • the sample application type is a type corresponding to each application of the sample user, the sample application type and the application type are corresponding, and the application type includes all sample application types.
  • the sample application ID is a unique ID that distinguishes each application.
  • the sample preference value represents the degree of preference of the sample user corresponding to the sample user identifier to use the sample application corresponding to the sample application identifier.
  • the user usage record sample set includes sample user usage data sets corresponding to each sample application type; the sample user usage data set includes corresponding sample user IDs, sample application IDs, interest tags, and sample preference values.
  • the interest tag refers to a tag that is different from the user's tendency to have a certain type of application type.
  • the corresponding interest tag of the user may be a video.
  • the predicted interest tag is the predicted interest tag generated by the interest tag generation model.
  • the recall rate is the ratio of the number of users whose predicted interest label and known interest label are consistent with each sample application type for each sample application type to the total number of users of the sample application type. The closer the recall rate is to 1, the higher the consistency between the predicted interest label and the known interest label for the corresponding sample application type, and the more appropriate selection of the classification threshold for the sample application type.
  • the server obtains a sample set of user usage records with interest tags from a database or terminal, and classifies the obtained sample set of user usage records according to known interest tags to obtain sample users corresponding to each sample application type. Use data sets. Based on the sample user usage data set corresponding to each sample application type obtained by classification, the server calculates the quantile of each sample preference value corresponding to each sample application type.
  • the server searches the database for the corresponding classification threshold according to each sample application type, and filters the sample user usage data set according to the found classification threshold.
  • the selected sample user identification is obtained.
  • the filtering condition is: corresponding to each sample user's usage data set, the sample preference value is greater than or equal to the corresponding classification threshold.
  • the server searches the database for the sample application type corresponding to the sample user usage data set where the sample user identifier is located according to the selected sample user identifier, that is, the predicted interest tag of the sample user identifier may correspond to the found sample application type. Based on the predicted interest label of the sample user’s use of the data set and the known corresponding interest label, corresponding to each type of sample application, the server determines whether the predicted interest label identified by each sample user is consistent with the known interest label, and uses the label Record the judgment result and store it in the server. When the judgment result is consistent, it can be marked as 1; otherwise, it can be marked as 0. For example, in a sample application type, the known interest tag identified by a sample user is a movie. If the predicted interest tag is also a movie, it is recorded as 1; if the predicted interest tag identified by the sample user is eating, then Recorded as 0.
  • the server calculates the recall rate of each type of sample application; and then adjusts the corresponding classification threshold according to the recall rate of each type of sample application. If the recall rate does not meet the adjustment threshold, the classification threshold does not need to be adjusted; if the recall rate meets the adjustment threshold, the classification threshold is adjusted. Then determine the predicted label of the sample user's usage data set according to the adjusted classification threshold, and calculate the recall rate of each type of sample application type, until the recall rate of the user's usage record sample set does not meet the adjusted threshold range, stop Adjust the corresponding classification threshold; the adjustment threshold can be set as: the recall rate is lower than 95%. .
  • the classification threshold is adjusted based on the user's use record sample set of known interest tags, and the classification threshold is adjusted according to the calculated recall rate of each application type until the search of each application type The rate does not meet the adjustment threshold.
  • the classification threshold is tested by using a sample set of user records, and the accuracy of the interest label is verified by the recall rate, which further improves the accuracy of the interest label.
  • steps in the flowchart of FIG. 2 are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in FIG. 2 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. The execution of these sub-steps or stages The sequence is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • an apparatus 300 for generating interest tags including: a usage record acquisition module 302, a classification threshold determination module 304, a screening user identification module 306, and an interest tag generation module 308, wherein :
  • the usage record obtaining module 302 is used to obtain the user usage record set of the application within a specified time period, and calculate the preference value corresponding to each application ID corresponding to the user ID; the user usage record in the user usage record set includes the user ID and Application ID.
  • the classification threshold determination module 304 is used to determine the application type based on the application identifier, and determine the classification threshold of each application type according to the preference value corresponding to the user identifier corresponding to the application identifier under the same application type; the application type There is a corresponding preset interest tag.
  • the user identification filtering module 306 is used to filter the user identification according to the user usage data set of each application type determined based on the user usage record set, and perform condition filtering according to the classification threshold.
  • the interest tag generation module 308 is configured to determine the interest tag corresponding to the filtered user identification according to the application type corresponding to the user usage data set where the screened user identification is located.
  • the aforementioned usage record obtaining module includes: a data obtaining module and a preference value calculating module.
  • the data acquisition module is used to obtain the number of users corresponding to each application ID and the total number of users corresponding to the user usage record set; to obtain the usage weight corresponding to the user ID and the application ID; the preference value calculation module is used to calculate the total number of users The proportion of the number to the number of users and the weight of use are calculated, and the preference value corresponding to each application identifier corresponding to the user identifier is calculated.
  • the above-mentioned classification threshold determination module includes: a ranking module, a quantile acquisition module, and a classification threshold calculation module.
  • the sorting module is used to sort the preference values corresponding to the same application types in ascending order based on the application identifiers under the same application type corresponding to the preference values corresponding to the user identifiers to obtain the sorting results of the preference values;
  • the acquisition module is used to calculate the quantile of each preference value corresponding to the same application type according to the ranking result of the preference value; the classification threshold calculation module is used to determine the classification threshold of each application type according to the quantile.
  • the aforementioned quantile calculation module includes: a probability calculation module and a cumulative probability calculation module.
  • the probability calculation module is used to determine the occurrence probability of each preference value of each application type in the corresponding ranking result according to the ranking result of the preference value; the cumulative probability calculation module is used to determine the occurrence probability of each application type according to the occurrence probability
  • the cumulative probability of each preference value is the quantile of each preference value under each application type.
  • the above-mentioned quantile obtaining module includes: a ranking data obtaining module and a quantile calculating module.
  • the ranking data acquisition module is used to acquire the ranking position of each preference value under each application type in the ranking result and the number of ranking users corresponding to the application type to which each application identifier belongs;
  • the quantile calculation module is used to The ranking position of each preference value under each application type in the ranking result is divided by the number of ranking users to obtain the quantile of each preference value corresponding to each application type.
  • the aforementioned classification threshold calculation module includes: a first screening module, a difference calculation module, and a second screening module.
  • the first filtering module is used to filter out quantiles greater than or equal to the corresponding preset threshold according to the quantile and corresponding to each application type;
  • the difference calculation module is used to correspond to each application type, Calculate the difference between adjacent quantiles according to the filtered quantile;
  • the second filtering module is used to obtain the quantile corresponding to each largest difference calculated for each application type to obtain each application The classification threshold of the program type.
  • the aforementioned screening user identification module includes: a usage record sample acquisition module, a classification threshold adjustment module, and a condition screening module.
  • Use record sample acquisition module used to obtain user usage record sample sets with known interest tags
  • classification threshold adjustment module used to adjust classification thresholds according to user usage record sample sets
  • condition screening module used according to user usage data Set, and filter the conditions according to the adjusted classification threshold to filter out the user ID.
  • the aforementioned classification threshold adjustment module includes: a sample user usage record set acquisition module, a sample user data set determination module, a sample quantile calculation module, a sample user identification screening module, a predicted interest label generation module, and recall rate Calculation module.
  • the sample user usage record set acquisition module is used to adjust the classification threshold according to the user usage record sample set.
  • the sample user data set determination module is used to determine each sample application according to the known interest label according to the user usage record sample set Types of sample user usage data sets.
  • the sample usage data sets include corresponding sample user IDs, sample application IDs, interest labels, and sample preference values; sample quantile calculation module for the known labels based on each sample application type
  • the sample user usage data set of calculates the quantile of each sample preference value of each sample application type
  • the sample user identification filter module is used to filter the sample user usage data set according to the known label and according to the classification threshold.
  • the predicted interest label generation module is used to determine the user identification corresponding to the selected user identification according to the preset interest label corresponding to the sample application type of the sample user usage data set where the selected sample user identification is located Predicted interest label; recall rate calculation module, used to calculate the recall rate of each type of sample application type based on the predicted interest label of the sample user data set and the known corresponding interest label, and adjust the classification threshold.
  • the preference value of each application identifier corresponding to the user identifier is determined based on the user usage record set of the application acquired within a specified time period, which better characterizes the user's preference for using each application. Furthermore, by analyzing the overall distribution of the preference values corresponding to the user IDs under the same application type, the classification threshold of each application type is determined, and the overall preference value under the same application type is fully considered. The distribution situation provides a more accurate screening basis for subsequent screening of user identification. Furthermore, the user usage data set of each application type is filtered according to the corresponding classification threshold, so as to filter out qualified user IDs, which improves the accuracy of generating interest tags for each behavior type.
  • the various modules in the above apparatus for generating interest tags may be implemented in whole or in part by software, hardware, and combinations thereof.
  • the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 4.
  • the computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer equipment is used to store user usage record sets, user usage data sets, and classification threshold data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize a method of generating interest tags.
  • FIG. 4 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device including a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program: Obtain a set of user usage records of an application within a specified time period Calculate the preference value corresponding to the user ID for each application ID; user usage records in the user usage record set include the user ID and the application ID; determine the application type based on the application ID, and according to the applications under the same application type
  • the program identifier corresponds to the preference value corresponding to the user identifier, and the classification threshold of each application type is determined respectively;
  • the application type has a corresponding preset interest tag; the user usage data set of each application type determined based on the user usage record set , And perform conditional filtering according to the classification threshold to filter out the user identification; according to the preset interest tag corresponding to the application type of the user usage data set where the screened user identification is located, the interest tag corresponding to the filtered user identification is determined.
  • the processor further implements the following steps when executing the computer program: acquiring the number of users corresponding to each application identifier and the total number of users corresponding to the user usage record set; acquiring the usage weight corresponding to the user identifier and the application identifier ; According to the proportion of the total number of users to the number of users and the usage weight, calculate the preference value corresponding to each application identifier corresponding to the user identifier.
  • the processor further implements the following steps when executing the computer program: based on the preference value corresponding to the user identifier corresponding to the application identifier under the same application type, the preference values corresponding to the same application type are performed in ascending order Sort to obtain the sorting result of the preference value; According to the sorting result of the preference value, calculate the quantile of each preference value corresponding to the same application type; Determine the classification threshold of each application type according to the quantile.
  • the processor further implements the following steps when executing the computer program: according to the ranking result of the preference value, determining the occurrence probability of each preference value under each application type in the corresponding ranking result; determining each application according to the occurrence probability The cumulative probability of each preference value under the program type is the quantile of each preference value under each application type.
  • the processor further implements the following steps when executing the computer program: acquiring the ranking position of each preference value under each application type in the ranking result and the ranking user corresponding to the application type to which each application identifier belongs Divide the ranking position of each preference value under each application type in the ranking result by the number of ranking users to obtain the quantile of each preference value corresponding to each application type.
  • the processor further implements the following steps when executing the computer program: according to the quantile, corresponding to each application program type, respectively filtering out the quantile greater than or equal to the corresponding preset threshold; corresponding to each application Program type, calculate the difference between adjacent quantiles according to the filtered quantile; get the quantile corresponding to each largest difference calculated for each application type, and get the classification of each application type Threshold.
  • the processor further implements the following steps when executing the computer program: obtaining a sample set of user usage records with known interest tags; adjusting the classification threshold according to the sample set of user usage records; and according to the user usage data set
  • the adjusted classification threshold is subjected to conditional filtering to filter out user identification.
  • the processor further implements the following steps when executing the computer program: according to the sample set of user usage records, the sample user usage data set of each sample application type is determined according to the known interest tag, and the sample usage data set includes the corresponding sample User identification, sample application identification, interest label, and sample preference value; based on the sample user usage data set of the known label of each sample application type, calculate the quantile of each sample preference value of each sample application type; Sample user usage data sets with known labels are filtered according to the classification threshold to filter out the sample user ID; according to the preset interest label corresponding to the sample application type of the user usage data set where the selected sample user ID is located, determine The predicted interest label corresponding to the selected user identification; the recall rate of each type of sample application type calculated according to the predicted interest label of the sample user data set and the known corresponding interest label, and the classification threshold is adjusted.
  • the preference value of each application identifier corresponding to the user identifier is determined based on the user usage record set of the application acquired within a specified time period, which better characterizes the user's preference for using each application. Furthermore, by analyzing the overall distribution of the preference values corresponding to the user IDs under the same application type, the classification threshold of each application type is determined, and the overall preference value under the same application type is fully considered. The distribution situation provides a more accurate screening basis for subsequent screening of user identification. Furthermore, the user usage data set of each application type is filtered according to the corresponding classification threshold, so as to filter out qualified user IDs, which improves the accuracy of generating interest tags for each behavior type.
  • a computer-readable storage medium on which a computer program is stored.
  • the following steps are implemented: Obtain a set of user usage records of the application within a specified time period, and calculate Each application ID corresponds to the preference value corresponding to the user ID; the user usage record in the user usage record set includes the user ID and the application ID; the application type is determined based on the application ID, and the corresponding application ID under the same application type Based on the preference value corresponding to the user identification, the classification threshold of each application type is determined respectively; the application type has a corresponding preset interest tag; according to the user usage data set of each application type determined based on the user usage record set, and according to The classification threshold is conditionally filtered to filter out the user identification; according to the preset interest label corresponding to the application type of the user usage data set where the screened user identification is located, the interest label corresponding to the screened user identification is determined.
  • the following steps are implemented: obtain the number of users corresponding to each application identifier and the total number of users corresponding to the user usage record set; obtain the usage weight corresponding to the user identifier and the application identifier ; According to the proportion of the total number of users to the number of users and the usage weight, calculate the preference value corresponding to each application identifier corresponding to the user identifier.
  • the following steps are implemented: based on the preference value corresponding to the user identifier corresponding to the application identifier under the same application type, the preference values corresponding to the same application type are performed in ascending order Sort to obtain the sorting result of the preference value; According to the sorting result of the preference value, calculate the quantile of each preference value corresponding to the same application type; Determine the classification threshold of each application type according to the quantile.
  • the following steps are implemented: according to the ranking result of the preference value, determine the occurrence probability of each preference value under each application type in the corresponding ranking result; determine each application according to the occurrence probability
  • the cumulative probability of each preference value under the program type is the quantile of each preference value under each application type.
  • the following steps are implemented: obtaining the ranking position of each preference value under each application type in the ranking result and the ranking user corresponding to the application type to which each application identifier belongs Divide the ranking position of each preference value under each application type in the ranking result by the number of ranking users to obtain the quantile of each preference value corresponding to each application type.
  • the following steps are implemented: according to the quantile, corresponding to each application type, the quantiles that are greater than or equal to the corresponding preset threshold are filtered out; corresponding to each application Program type, calculate the difference between adjacent quantiles according to the filtered quantile; get the quantile corresponding to each largest difference calculated for each application type, and get the classification of each application type Threshold.
  • the following steps are implemented: obtain a sample set of user usage records with known interest tags; adjust the classification threshold according to the sample set of user usage records; The adjusted classification threshold is subjected to conditional filtering to filter out user identification.
  • a sample user usage data set of each sample application type is determined according to known interest tags, and the sample usage data set includes corresponding samples User identification, sample application identification, interest label, and sample preference value; based on the sample user usage data set of the known label of each sample application type, calculate the quantile of each sample preference value of each sample application type; Sample user usage data sets with known labels are filtered according to the classification threshold to filter out the sample user ID; according to the preset interest label corresponding to the sample application type of the user usage data set where the selected sample user ID is located, determine The predicted interest label corresponding to the selected user identification; the recall rate of each type of sample application type calculated according to the predicted interest label of the sample user data set and the known corresponding interest label, and the classification threshold is adjusted.
  • the preference value of each application identifier corresponding to the user identifier is determined based on the user usage record set of the application acquired within a specified time period, which better characterizes the user's preference for using each application. Furthermore, by analyzing the overall distribution of the preference values corresponding to the user IDs under the same application type, the classification threshold of each application type is determined, and the overall preference value under the same application type is fully considered. The distribution situation provides a more accurate screening basis for subsequent screening of user identification. Furthermore, the user usage data set of each application type is filtered according to the corresponding classification threshold, so as to filter out qualified user IDs, which improves the accuracy of generating interest tags for each behavior type.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Channel
  • memory bus Radbus direct RAM
  • RDRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

涉及用户画像技术领域,提供一种生成兴趣标签的方法、装置、计算机设备和存储介质。所述方法包括:获取在指定时间段内应用程序的用户使用记录集,计算每个应用程序标识对应于用户标识所对应的偏好值(S202);根据相同应用程序类型下的应用程序标识对应于用户标识所对应的所述偏好值,分别确定各应用程序类型的分类阈值(S204);根据基于所述用户使用记录集确定的各应用程序类型的用户使用数据集,并按照所述分类阈值进行条件筛选,以筛选出用户标识(S206);依照筛选出的用户标识所在用户使用数据集所对应的应用程序类型,确定筛选出的用户标识所对应的兴趣标签(S208)。采用本方法能够提高生成各行为类型的兴趣标签的准确率。

Description

生成兴趣标签的方法、装置、计算机设备和存储介质
本申请要求于2019年6月18日提交中国专利局、申请号为201910525807.X,发明名称为“生成兴趣标签的方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息处理技术领域,特别是涉及一种生成兴趣标签的方法、装置、计算机设备和存储介质。
背景技术
随着互联网的发展和应用,个性化推荐、多样化营销等差异化服务在人们生活中得到了广泛应用,而这些差异化服务离不开用户画像。用户画像的核心工作是给用户生成标签。通过对用户进行标签化工作,可以从宏观角度对用户行为进行分析和预测,有助于提升企业针对特定用户的营销行为的精准度。
发明人发现大部分用户画像的标签生成方法采用关键词提取方法来生成用户标签,然而该方法存在生成标签的准确率较低的问题。
发明内容
基于此,有必要针对上述技术问题,提供一种生成兴趣标签的方法、装置、计算机设备和存储介质。
一种生成兴趣标签的方法,所述方法包括:
获取在指定时间段内应用程序的用户使用记录集,计算每个应用程序标识对应于用户标识所对应的偏好值;所述用户使用记录集中的用户使用记录包括用户标识和应用程序标识;
基于应用程序标识确定应用程序类型,根据相同应用程序类型下的应用程序标识对应于用户标识所对应的所述偏好值,分别确定各应用程序类型的分类阈值;所述应用程序类型存在对应的预设兴趣标签;
根据基于所述用户使用记录集确定的各应用程序类型的用户使用数据集,并按照所述分类阈值进行条件筛选,以筛选出用户标识;
依照筛选出的用户标识所在用户使用数据集的应用程序类型所对应的预设兴趣标签,确定筛选出的用户标识所对应的兴趣标签。
一种生成兴趣标签的装置,所述装置包括:
使用记录获取模块,用于获取在指定时间段内应用程序的用户使用记录集,计算每个应用程序标识对应于用户标识所对应的偏好值;所述用户使用记录集中的用户使用记录包括用户标识和应用程序标识;
分类阈值确定模块,用于基于应用程序标识确定应用程序类型,根据相同应用程序类型下的应用程序标识对应于用户标识所对应的所述偏好值,分别确定各应用程序类型的分类阈值;所述应用程序类型存在对应的预设兴趣标签;
筛选用户标识模块,用于根据基于所述用户使用记录集确定的各应用程序类型的用户使用数据集,并按照所述分类阈值进行条件筛选,以筛选出用户标识;
兴趣标签生成模块,用于依照筛选出的用户标识所在用户使用数据集的应用程序类型所对应的预设兴趣标签,确定筛选出的用户标识所对应的兴趣标签。
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现上述生成兴趣标签方法的步骤。
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述生成兴趣标签方法的步骤。
上述生成兴趣标签的方法、装置、计算机设备和存储介质,基于在指定时间段内获取的应用程序的用户使用记录集,确定各个应用程序标识对应于用户标识的偏好值,更好的表征用户使用各个应用程序的偏好程度。进一步,通过分析相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值的整体分布情况,以此确定各应用程序类型的分类阈值,充分考虑了相同应用程序类型下偏好值的整体分布情况,为后续筛选用户标识提供更为准确的筛选依据。再者,将各应用程序类型的用户使用数据集按照对应的分类阈值进行筛选,从而筛选出符合条件的用户标识,提高了生成各行为类型的兴趣标签的准确率。
附图说明
图1为一个实施例中生成兴趣标签的方法的应用场景图;
图2为一个实施例中生成兴趣标签的方法的流程示意图;
图3为一个实施例中生成兴趣标签的装置的结构框图;
图4为一个实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的生成兴趣标签方法,可以应用于如图1所示的应用环境中。其中,终端102通过网络与服务器104通过网络进行通信。服务器104获取指定时间段内应用程序的用户使用记录集,计算每个应用程序标识对应于用户标识所对应的偏好值;其中用户使用记录集可以由终端102触发产生;并根据相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值,分别确定各应用程序类型的分类阈值。进一步,服务器104根据得到的分类阈值对相应应用程序类型的用户使用数据集进行条件筛选,以筛选出用户标识;依据筛选出的用户标识所对应的应用程序类型,服务器104将该应用程序类型作为筛选出的用户标识的兴趣标签。其中,终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一个实施例中,如图2所示,提供了一种生成兴趣标签的方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:
步骤S202,获取在指定时间段内应用程序的用户使用记录集,计算每个应用程序标识对应于用户标识所对应的偏好值;用户使用记录集中的用户使用记录包括用户标识和应用程序 标识。
其中,用户使用记录集包括各个用户使用记录,每个用户使用记录包括用户标识、应用程序标识和使用权重。用户使用记录包含了丰富的信息,比如用户之间的相似性、应用程序之间的相似性和用户对各个应用程序的偏好程度。其中,用户标识是区别各个用户的唯一标识,可以是用户ID(Identification)。应用程序标识是区别各个应用程序的唯一标识。
其中,偏好值是表征与用户标识对应的用户使用与应用程序标识对应的应用程序的使用偏好程度;偏好值与应用程序标识对应的用户数、用户使用记录集对应的总用户数以及使用权重有关。
具体地,用户触发终端生成各个应用程序的用户使用记录集,并将生成的用户使用记录集通过网络传输给服务器,也可以将用户使用记录集直接存储在终端中。服务器可以从各个终端获中获取指定时间段内用户使用记录集,也可以从服务器中获取指定时间段内用户使用记录集。服务器在获取到指定时间段内应用程序的用户使用记录集后,根据用户使用记录集计算每个应用程序标识所对应于用户标识的偏好值。
在其中一个实施例中,服务器基于用户使用记录集中各个用户使用记录,获取每个应用程序标识对应的用户数、以及用户使用记录集对应的总用户数;并获取对应用户标识和应用程序标识的使用权重,进而根据用户数占总用户数的比重以及使用权重计算每个应用程序标识对应的用户标识的偏好值。
步骤S204,基于应用程序标识确定应用程序类型,根据相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值,分别确定各应用程序类型的分类阈值,应用程序类型存在对应的预设兴趣标签。
其中,应用程序类型是指区分各个应用程序的类别,比如视频类型。分类阈值是指偏好值在所属应用程序类型的分类判断条件,根据该分类阈值可以判断偏好值所对应的用户标识是否属于该偏好值所属的应用程序类型。分类阈值表征了在相同应用程序类型下,各个用户对应用程序的使用行为占该应用程序类型下的整体使用行为的比重。
具体地,服务器基于用户使用记录集,计算得到每个应用程序标识对应于用户标识所对应的偏好值,并根据各个应用程序标识确定对应的应用程序类型,每个应用程序类型存在对应的预设兴趣标签;其中预设的兴趣标签可以与应用程序类型一致,也可以是表征与应用程序类型相符的标识。在相同应用程序类型下,服务器根据计算得到的偏好值分别确定各应用程序类型的分类阈值。通过分类阈值可以判断偏好值所对应的用户标识是否属于该偏好值所属的应用程序类型。
步骤S206,根据基于用户使用记录集确定的各应用程序类型的用户使用数据集,并按照分类阈值进行条件筛选,以筛选出用户标识。
其中,用户使用数据集包括各个应用程序类型对应的用户使用数据集,用户使用数据集包括彼此对应的用户标识、应用程序标识和偏好值。
具体地,针对各个应用程序类型对应的用户使用数据集,服务器按照与该用户使用数据集所在应用程序类型对应的分类阈值进行条件筛选,以此针对该用户使用数据集筛选出符合 条件的用户标识。
步骤S208,依照筛选出的用户标识所在用户使用数据集所对应的应用程序类型,确定筛选出的用户标识所对应的兴趣标签。
其中,兴趣标签是指区别于用户具有某类行为类型的倾向的标记;比如,用户经常使用视频类应用程序,相应的该用户的兴趣标签为视频。
具体地,基于筛选出的各个用户使用数据集中符合条件的用户标识,服务器从数据库中获取该用户标识所在用户使用数据集所对应的应用程序类型,即该用户标识的兴趣标签为对应的应用程序类型。
上述实施例中,基于在指定时间段内获取的应用程序的用户使用记录集,确定各个应用程序标识对应于用户标识的偏好值,更好的表征用户使用各个应用程序的偏好程度。进一步,通过分析相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值的整体分布情况,以此确定各应用程序类型的分类阈值,充分考虑了相同应用程序类型下偏好值的整体分布情况,为后续筛选用户标识提供更为准确的筛选依据。再者,将各应用程序类型的用户使用数据集按照对应的分类阈值进行筛选,从而筛选出符合条件的用户标识,提高了生成各行为类型的兴趣标签的准确率。
在一个实施例中,用户使用记录集中的用户使用记录还包括使用权重;根据在指定时间段内应用程序的用户使用记录集,计算每个应用程序标识对应于用户标识所对应的偏好值,包括以下步骤:获取每个应用程序标识对应的用户数、以及用户使用记录集对应的总用户数;获取与用户标识和应用程序标识对应的使用权重;根据用户数占总用户数的比重以及使用权重,计算每个应用程序标识对应于用户标识所对应的偏好值。
其中,使用权重表征了用户所使用的各种应用程序中特定的应用程序的使用程度的比重。可以根据应用程序的安装信息、使用次数、使用时长以及耗电量来确定使用权重。
具体地,服务器基于得到的用户使用记录集,获取每个应用程序标识对应的用户数以及用户使用记录集对应的总用户数;并根据用户标识和应用程序标识从数据库获取对应的使用权重。服务器根据获取到的总用户数、用户数以及使用权重,计算每个应用程序标识对应于用户标识所对应的偏好值。即服务器根据总用户数与应用程序标识对应的用户数比重以及应用程序标识对应的使用权重,计算每个应用程序标识对应于用户标识所对应的偏好值。
在其中一个实施例中,偏好值与应用程序标识对应的使用权重成正相关,且与应用程序对应的用户数比重成正相关。其中用户数比重随着用户使用记录集对应的总用户数增长而增长,并且随着应用程序标识对应的用户数的增长而减少。可选地,偏好值可以是应用程序标识对应的用户数比重和应用程序标识对应的使用权重的乘积;用户数比重可以是总用户数与应用程序标识对应的用户数的比值的对数值。
举例说明,例如,获取小明和小红在最近一个月内的应用程序的用户使用记录集,得到小明和小红使用腾讯视频、百度视频和土豆视频的使用记录,表示为{(A 1,A 2),(B 2,B 3)}。其中,A 1表示小明观看腾讯视频的权重,A 2和B 2分别表示小明和小红观看土豆视频的权重,B 3表示小红观看百度视频的权重。则小明使用腾讯视频的偏好值的计算步骤如下:
(1)获取腾讯视频对应的用户数与用户使用记录集的总用户数:
腾讯视频对应的用户数为1,用户使用记录集的总用户数为2;即用户使用记录集的总用户数与腾讯视频的用户数比重为:IDF=log(2/1),为了避免log(x)函数中的变量参数x的分母为0,也可以对x的分母加1。
(2)获取小明观看腾讯视频的权重TF:TF=A 1
(3)计算小明使用腾讯视频的偏好值TF*IDF:TF*IDF=A 1*log(2/1)。
在本实施例中,基于每个应用程序标识对应的用户数、用户使用记录集对应的总用户数以及每个应用程序标识对应于用户标识所对应的使用权重,计算每个应用程序标识对应与用户标识所对应的偏好值。通过引入使用权重以及应用程序与整体的占比情况,更好的表征用户使用各个应用程序的偏好程度。
在一个实施例中,根据相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值,分别确定各应用程序类型的分类阈值,包括以下步骤:基于相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值,将相同应用程序类型各自对应的偏好值按升序进行排序,得到偏好值的排序结果;根据偏好值的排序结果,计算相同应用程序类型下各自对应的每个偏好值的分位数;依据分位数确定各应用程序类型的分类阈值。
其中,分位数是指:在离散数据集中,数据a的分位数是满足条件P(X<=a)的所有数据的概率合,即a的分位数是对应a的累积概率。分位数的取值范围为大于0,且小于或等于1。
具体地,基于获取的用户使用记录集以及计算得到的偏好值,在相同应用程序类型下,服务器分别将相同应用程序类型下各自对应的偏好值按从小到大的顺序进行排序,得到各应用程序类型的偏好值的排序结果。根据获得的各偏好值的排序结果,服务器计算在相同应用程序类型下各自对应的每个偏好值的分位数;并根据分位数确定各应用程序类型各自对应的分类阈值,即分类阈值的取值范围可以是0到1之间,并且可以为1。
在本实施例中,通过对各应用程序类型各自对应的偏好值按升序进行排序,得到各应用程序类型对应的排序结果;进一步根据排序结果分别计算各应用程序类型各自对应的每个偏好值的分位数,根据计算得到的各个分位数确定各行为类型的分类阈值。利用各应用程序类型的分位数整体分布情况来确定分类阈值,充分考虑了整体分布情况,为后续兴趣标签的生成提供了依据。
在一个实施例中,根据偏好值的排序结果,计算各应用程序类型下各自对应的每个偏好值的分位数,包括以下步骤:根据偏好值的排序结果,确定各应用程序类型下的每个偏好值在相应排序结果中的出现概率;根据出现概率确定各应用程序类型下的每个偏好值的累积概率,得到各应用程序类型下的每个偏好值的分位数。
其中,出现概率是指在某一行为类型对应的用户使用数据集中,该用户使用数据集中每个偏好值出现的概率。累积概率是指在某一行为类型对应的用户使用数据集中,将不超过该偏好值的所有偏好值的出现概率相加,所得结果即为累积概率。
具体地,服务器根据得到的各个应用程序类型各自对应的偏好值的排序结果,分别计算 各应用程序类型各自对应的每个偏好值在排序结果中的出现概率。基于计算得到的出现概率,服务器根据出现概率确定各应用程序类型各自对应的每个偏好值的累积概率,即该累积概率为相应偏好值的分位数。
举例说明,例如,对于某一相同应用程序类型的数据集,该数据集中包括各个应用程序标识对应于用户标识所对应的偏好值;对各个偏好值按照升序进行排序,得到偏好值的排序结果。若偏好值的排序结果为:1,1,2,2,3,4,5,6,7,8;则对应偏好值为1时的出现概率:P(1)=2/10,偏好值为2时的出现概率:P(2)=2/10,偏好值为3时的出现概率:P(3)=1/10,则偏好值为3时的累积概率是P(1)+P(2)+P(3),即偏好值为3时的分位数为50%。
在本实施例中,基于偏好值的排序结果确定各应用程序类型下的每个偏好值在相应排序结果中的出现概率,进一步根据出现概率得到各应用程序类型下的每个偏好值的累积概率,从而得出各应用程序类型下的各个偏好值的分位数。利用累积概率计算分位数,从整体上反映出各应用程序类型中个体占整体比重情况,充分考虑了数据间的关系,为后续筛选用户标识提供更为准确的筛选依据。
在一个实施例中,根据偏好值的排序结果,计算各应用程序类型下各自对应的每个偏好值的分位数,包括以下步骤:获取各应用程序类型下的每个偏好值在所处排序结果中的排序位以及各应用程序标识所属应用程序类型对应的排序用户数;将各应用程序类型下的每个偏好值在所处排序结果中的排序位除以排序用户数,获得各应用程序类型各自对应的每个偏好值的分位数。
其中,排序位是指一个数据集内的各个元素按照一定逻辑进行排序,每个元素在数据集中所处的位置。排序用户数是指一个数据集中对应的所有元素的总个数。
具体地,服务器基于计算得到的各应用程序类型各自对应的每个偏好值的排序结果,分别获取到各应用程序类型对应的每个偏好值在所处偏好值的排序结果中的排序位以及各应用程序类型各自对应的排序用户数。服务器获取到相应数据后,将各应用程序类型各自对应的每个偏好值的排序位与对应该应用程序类型的排序用户数相除,即所得的计算结果为各应用程序类型各自对应的偏好值的分位数。
例如,对于某一相同应用程序类型的数据集,该数据集中包括各个应用程序标识对应于用户标识所对应的偏好值;对各个偏好值按照升序进行排序,得到偏好值的排序结果。若数据集中的偏好值A在相应的排序结果中排序位是5,同时该偏好值A在所处应用程序类型的排序用户数是10,则该偏好值的分位数为5/10*100%,即分位数是50%。例如,偏好值的排序结果为:0,1,2,3,4,5,6,7,8,9;则偏好值为6时对应的分位数是70%。
在本实施例中,基于各应用程序类型各自对应的每个偏好值的在所处排序结果的排序位以及各应用程序类型各自对应的排序用户数,确定各应用程序类型各自对应的每个偏好值的分位数。通过排序位与排序用户数确定分位数,在计算机层面上可以进一步减少计算量,从而提高计算的速度,提高生成兴趣标签的速率。
在一个实施例中,依据分位数确定各应用程序类型的分类阈值,包括以下步骤:依据分 位数,对应于每个应用程序类型,分别筛选出大于或等于相应第一预设阈值的分位数;对应于每个应用程序类型,根据筛选出的分位数计算相邻的分位数的差值;获取对应各应用程序类型计算出的每个最大的差值所对应的分位数,得到各应用程序类型的分类阈值。
其中,预设阈值是提前设定的判断分位数的界限值,阈值可以存储在数据库中;预设阈值是与各应用程序类型对应的分位数的界限值。差值是指两个数据进行减法运算所得的计算结果;可以是相邻的两个分位数进行相减所得的结果。
具体地,根据计算得到的各应用程序类型各自对应的每个偏好值的分位数,针对每个应用程序类型各自对应的分位数,服务器从数据库中获取对应应用程序类型的预设阈值,根据预设阈值筛选出大于或等于该预设阈值的分位数。对应于每个应用程序类型,服务器根据筛选出的分位数分别计算两个相邻的分位数的差值。服务器根据计算得到的每个应用程序类型各自对应的差值,获取最大差值所对应的两个分位数,将排序位靠后的分位数作为对应该应用程序类型的分类阈值。
在本实施例中,基于分位数确定各应用程序类型各自对应的偏好值的分类阈值,选出各应用程序类型中分布较为明显的分位数作为该应用程序类型的分类阈值。进一步,充分利用各应用程序类型数据的整体分布特性,为兴趣标签的准确率提供了保障。
在一个实施例中,根据基于用户使用记录集确定的各应用程序类型的用户使用数据集,并按照分类阈值进行条件筛选,以筛选出用户标识,包括以下步骤:获取已知兴趣标签的用户使用记录样本集;根据用户使用记录样本集,对分类阈值进行调整;根据用户使用数据集,并按照调整后的分类阈值进行条件筛选,以筛选出用户标识。
其中,用户使用记录样本集包括各个用户使用记录样本,用户使用记录集包括各个应用程序类型对应的用户使用数据集,用户使用数据集包括彼此对应的用户标识、应用程序标识和偏好值。
具体地,服务器从数据库或终端中获取已经兴趣标签的用户使用记录样本集,根据获取到的用户使用记录样本集分别对各个应用程序类型对应的分类阈值进行调整。进一步,基于用户使用数据集,服务器按照调整后的分类阈值对各应用程序类型各自对应的每个偏好值进行条件筛选,以筛选出满足上述偏好值的筛选条件的用户标识。
在本实施例中,基于已知兴趣标签的用户使用记录样本集,对各应用程序类型对应的分类阈值进行调整,以此得到调整后的分类阈值。利用用户使用记录样本集对分类阈值进行测试,提高了兴趣标签的准确性。
在一个实施例中,用户使用记录样本集中用户使用记录样本包括样本用户标识、兴趣标签、样本应用程序类型、样本应用程序标识和样本使用权重;根据用户使用记录样本集,对分类阈值进行调整,包括以下步骤:根据用户使用记录样本集,按已知兴趣标签确定各样本应用程序类型的样本用户使用数据集,样本使用数据集包括对应的样本用户标识、样本应用程序标识、兴趣标签和样本偏好值;基于各样本应用程序类型的已知标签的样本用户使用数据集,计算各样本应用程序类型的每个样本偏好值的分位数;根据已知标签的样本用户使用数据集,按照分类阈值进行条件筛选,以筛选出样本用户标识;依照筛选出的样本用户标识 所在用户使用数据集所对应的应用程序类型,确定筛选出的用户标识所对应的预测兴趣标签;根据样本用户数据集的预测兴趣标签和已知的相应兴趣标签计算出的每类样本应用程序类型的查全率,调整分类阈值。
其中,用户使用记录样本集包括各个用户使用记录样本,每个用户使用记录样本包括样本用户标识、兴趣标签、样本应用程序类型、样本应用程序标识以及样本使用权重。样本用户标识是区别各个样本用户的唯一标识。样本应用程序类型是与样本用户的各个应用程序相对应的类型,样本应用程序类型与应用程序类型是对应关系,应用程序类型包括所有的样本应用程序类型。样本应用程序标识是区别各个应用程序的唯一标识。样本偏好值表征与样本用户标识对应的样本用户使用与样本应用程序标识对应的样本应用程序的使用偏好程度。
其中,用户使用记录样本集包括各个样本应用程序类型对应的样本用户使用数据集;样本用户使用数据集包括对应的样本用户标识、样本应用程序标识、兴趣标签和样本偏好值。
其中,兴趣标签是指区别于用户具有某类应用程序类型的倾向的标记,比如,用户经常观看视频类应用程序,相应的该用户的兴趣标签可以是视频。预测兴趣标签是根据兴趣标签生成模型生成的预测的兴趣标签。查全率是对于每类样本应用程序类型,每个样本用户标识的预测兴趣标签与已知兴趣标签一致的用户数与该类样本应用程序类型的总用户数的比值。查全率越接近1,说明对应该类样本应用程序类型的预测兴趣标签和已知兴趣标签的一致性更高,进一步说明该类样本应用程序类型的分类阈值选取的较为合适。
具体地,服务器从数据库或终端中获取已经兴趣标签的用户使用记录样本集,根据获取到的用户使用记录样本集按照已知兴趣标签对其进行分类,得到各样本应用程序类型各自对应的样本用户使用数据集。基于分类得到的各样本应用程序类型各自对应的样本用户使用数据集,服务器分别计算各样本应用程序类型各自对应的每个样本偏好值的分位数。
基于上述已知标签的样本用户使用数据集,服务器依照各样本应用程序类型从数据库中查找对应的分类阈值,并根据查找到的分类阈值对样本用户使用数据集进行筛选。当各样本用户使用数据集中样本偏好值满足筛选条件时,得到筛选出的样本用户标识。其中筛选条件是:对应于每个样本用户使用数据集,样本偏好值大于或等于对应的分类阈值。
服务器根据筛选出的样本用户标识,从数据库中查找该样本用户标识所在样本用户使用数据集所对应的样本应用程序类型,即样本用户标识的预测兴趣标签可以是对应查找到的样本应用程序类型。基于样本用户使用数据集的预测兴趣标签和已知的相应的兴趣标签,对应于每类样本应用程序类型,服务器判断每个样本用户标识的预测兴趣标签与已知兴趣标签是否一致,且用标识记录判断结果并存储在服务器中。当判断结果一致的可以标记为1;否则,标记为0。例如,在某一样本应用程序类型中,某个样本用户标识的已知兴趣标签是电影,若预测兴趣标签也为电影,则记录为1;若该样本用户标识的预测兴趣标签为吃饭,则记录为0。
根据记录结果,服务器计算每类样本应用程序类型的查全率;再根据各类样本应用程序类型的查全率调整对应的分类阈值。若查全率不符合调整阈值,则不需对分类阈值进行调整;若查全率符合调整阈值,则对分类阈值进行调整。再根据调整后的分类阈值确定样本用户使 用数据集的预测标签,并计算每类样本应用程序类型的查全率,直至用户使用记录样本集的查全率不符合调整阈值的范围时,则停止对相应分类阈值的调整;调整阈值可以设置为:查全率低于95%。。
在本实施例中,基于已知兴趣标签的用户使用记录样本集,对分类阈值进行调整,根据计算出的各应用程序类型的查全率对分类阈值进行调整,直至各应用程序类型的查全率不符合调整阈值。利用用户使用记录样本集对分类阈值进行测试,并通过查全率验证兴趣标签的准确率,进一步提高了兴趣标签的准确性。
应该理解的是,虽然图2的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图3所示,提供了一种生成兴趣标签的装置300,包括:使用记录获取模块302、分类阈值确定模块304、筛选用户标识模块306以及兴趣标签生成模块308,其中:
使用记录获取模块302,用于获取在指定时间段内应用程序的用户使用记录集,计算每个应用程序标识对应于用户标识所对应的偏好值;用户使用记录集中的用户使用记录包括用户标识和应用程序标识。
分类阈值确定模块304,用于基于应用程序标识确定应用程序类型,根据相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值,分别确定各应用程序类型的分类阈值;应用程序类型存在对应的预设兴趣标签。
筛选用户标识模块306,用于根据基于用户使用记录集确定的各应用程序类型的用户使用数据集,并按照分类阈值进行条件筛选,以筛选出用户标识。
兴趣标签生成模块308,用于依照筛选出的用户标识所在用户使用数据集所对应的应用程序类型,确定筛选出的用户标识所对应的兴趣标签。
在一个实施例中,上述使用记录获取模块包括:数据获取模块和偏好值计算模块。数据获取模块,用于获取每个应用程序标识对应的用户数以及用户使用记录集对应的总用户数;获取与用户标识和应用程序标识对应的使用权重;偏好值计算模块,用于根据总用户数与用户数的比重以及使用权重,计算每个应用程序标识对应于用户标识所对应的偏好值。
在一个实施例中,上述分类阈值确定模块包括:排序模块、分位数获取模块以及分类阈值计算模块。排序模块,用于基于相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值,将相同应用程序类型各自对应的偏好值按升序进行排序,得到偏好值的排序结果;分位数获取模块,用于根据偏好值的排序结果,计算相同应用程序类型下各自对应的每个偏好值的分位数;分类阈值计算模块,用于依据分位数确定各应用程序类型的分类阈值。
在一个实施例中,上述分位数计算模块包括:概率计算模块和累积概率计算模块。概率 计算模块,用于根据偏好值的排序结果,确定各应用程序类型下的每个偏好值在相应排序结果中的出现概率;累积概率计算模块,用于根据出现概率确定各应用程序类型下的每个偏好值的累积概率,得到各应用程序类型下的每个偏好值的分位数。
在一个实施例中,上述分位数获取模块包括:排序数据获取模块和分位数计算模块。排序数据获取模块,用于获取各应用程序类型下的每个偏好值在所处排序结果中的排序位以及各应用程序标识所属应用程序类型对应的排序用户数;分位数计算模块,用于将各应用程序类型下的每个偏好值在所处排序结果中的排序位除以排序用户数,获得各应用程序类型各自对应的每个偏好值的分位数。
在一个实施例中,上述分类阈值计算模块包括:第一筛选模块、差值计算模块以及第二筛选模块。第一筛选模块,用于依据分位数,对应于每个应用程序类型,分别筛选出大于或等于相应预设阈值的分位数;差值计算模块,用于对应于每个应用程序类型,根据筛选出的分位数计算相邻的分位数的差值;第二筛选模块,用于获取对应各应用程序类型计算出的每个最大的差值所对应的分位数,得到各应用程序类型的分类阈值。
在一个实施例中,上述筛选用户标识模块包括:使用记录样本获取模块、分类阈值调整模块和条件筛选模块。使用记录样本获取模块,用于获取已知兴趣标签的用户使用记录样本集;分类阈值调整模块,用于根据用户使用记录样本集,对分类阈值进行调整;条件筛选模块,用于根据用户使用数据集,并按照调整后的分类阈值进行条件筛选,以筛选出用户标识。
在一个实施例中,上述分类阈值调整模块包括:样本用户使用记录集获取模块、样本用户数据集确定模块、样本分位数计算模块、样本用户标识筛选模块、预测兴趣标签生成模块以及查全率计算模块。样本用户使用记录集获取模块,用于根据用户使用记录样本集,对分类阈值进行调整包括:样本用户数据集确定模块,用于根据用户使用记录样本集,按已知兴趣标签确定各样本应用程序类型的样本用户使用数据集,样本使用数据集包括对应的样本用户标识、样本应用程序标识、兴趣标签和样本偏好值;样本分位数计算模块,用于基于各样本应用程序类型的已知标签的样本用户使用数据集,计算各样本应用程序类型的每个样本偏好值的分位数;样本用户标识筛选模块,用于根据已知标签的样本用户使用数据集,按照分类阈值进行条件筛选,以筛选出样本用户标识;预测兴趣标签生成模块,用于依照筛选出的样本用户标识所在样本用户使用数据集的样本应用程序类型所对应的预设兴趣标签,确定筛选出的用户标识所对应的预测兴趣标签;查全率计算模块,用于根据样本用户数据集的预测兴趣标签和已知的相应兴趣标签计算出的每类样本应用程序类型的查全率,调整分类阈值。
在本实施例中,基于在指定时间段内获取的应用程序的用户使用记录集,确定各个应用程序标识对应于用户标识的偏好值,更好的表征用户使用各个应用程序的偏好程度。进一步,通过分析相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值的整体分布情况,以此确定各应用程序类型的分类阈值,充分考虑了相同应用程序类型下偏好值的整体分布情况,为后续筛选用户标识提供更为准确的筛选依据。再者,将各应用程序类型的用户使用数据集按照对应的分类阈值进行筛选,从而筛选出符合条件的用户标识,提高了生成各行为类型的兴趣标签的准确率。
关于生成兴趣标签的装置的具体限定可以参见上文中对于生成兴趣标签的方法的限定,在此不再赘述。上述生成兴趣标签的装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图4所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储用户使用记录集、用户使用数据集、分类阈值数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种生成兴趣标签的方法。
本领域技术人员可以理解,图4中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,该存储器存储有计算机程序,该处理器执行计算机程序时实现以下步骤:获取在指定时间段内应用程序的用户使用记录集,计算每个应用程序标识对应于用户标识所对应的偏好值;用户使用记录集中的用户使用记录包括用户标识和应用程序标识;基于应用程序标识确定应用程序类型,根据相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值,分别确定各应用程序类型的分类阈值;应用程序类型存在对应的预设兴趣标签;根据基于用户使用记录集确定的各应用程序类型的用户使用数据集,并按照分类阈值进行条件筛选,以筛选出用户标识;依照筛选出的用户标识所在用户使用数据集的应用程序类型所对应的预设兴趣标签,确定筛选出的用户标识所对应的兴趣标签。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:获取每个应用程序标识对应的用户数以及用户使用记录集对应的总用户数;获取与用户标识和应用程序标识对应的使用权重;根据总用户数与用户数的比重以及使用权重,计算每个应用程序标识对应于用户标识所对应的偏好值。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:基于相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值,将相同应用程序类型各自对应的偏好值按升序进行排序,得到偏好值的排序结果;根据偏好值的排序结果,计算相同应用程序类型下各自对应的每个偏好值的分位数;依据分位数确定各应用程序类型的分类阈值。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:根据偏好值的排序结果,确定各应用程序类型下的每个偏好值在相应排序结果中的出现概率;根据出现概率确定各应用程序类型下的每个偏好值的累积概率,得到各应用程序类型下的每个偏好值的分位数。在一个实施例中,处理器执行计算机程序时还实现以下步骤:获取各应用程序类型下的每个偏 好值在所处排序结果中的排序位以及各应用程序标识所属应用程序类型对应的排序用户数;将各应用程序类型下的每个偏好值在所处排序结果中的排序位除以排序用户数,获得各应用程序类型各自对应的每个偏好值的分位数。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:依据分位数,对应于每个应用程序类型,分别筛选出大于或等于相应预设阈值的分位数;对应于每个应用程序类型,根据筛选出的分位数计算相邻的分位数的差值;获取对应各应用程序类型计算出的每个最大的差值所对应的分位数,得到各应用程序类型的分类阈值。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:获取已知兴趣标签的用户使用记录样本集;根据用户使用记录样本集,对分类阈值进行调整;根据用户使用数据集,并按照调整后的分类阈值进行条件筛选,以筛选出用户标识。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:根据用户使用记录样本集,按已知兴趣标签确定各样本应用程序类型的样本用户使用数据集,样本使用数据集包括对应的样本用户标识、样本应用程序标识、兴趣标签和样本偏好值;基于各样本应用程序类型的已知标签的样本用户使用数据集,计算各样本应用程序类型的每个样本偏好值的分位数;根据已知标签的样本用户使用数据集,按照分类阈值进行条件筛选,以筛选出样本用户标识;依照筛选出的样本用户标识所在用户使用数据集的样本应用程序类型所对应的预设兴趣标签,确定筛选出的用户标识所对应的预测兴趣标签;根据样本用户数据集的预测兴趣标签和已知的相应兴趣标签计算出的每类样本应用程序类型的查全率,调整分类阈值。
在本实施例中,基于在指定时间段内获取的应用程序的用户使用记录集,确定各个应用程序标识对应于用户标识的偏好值,更好的表征用户使用各个应用程序的偏好程度。进一步,通过分析相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值的整体分布情况,以此确定各应用程序类型的分类阈值,充分考虑了相同应用程序类型下偏好值的整体分布情况,为后续筛选用户标识提供更为准确的筛选依据。再者,将各应用程序类型的用户使用数据集按照对应的分类阈值进行筛选,从而筛选出符合条件的用户标识,提高了生成各行为类型的兴趣标签的准确率。
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:获取在指定时间段内应用程序的用户使用记录集,计算每个应用程序标识对应于用户标识所对应的偏好值;用户使用记录集中的用户使用记录包括用户标识和应用程序标识;基于应用程序标识确定应用程序类型,根据相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值,分别确定各应用程序类型的分类阈值;应用程序类型存在对应的预设兴趣标签;根据基于用户使用记录集确定的各应用程序类型的用户使用数据集,并按照分类阈值进行条件筛选,以筛选出用户标识;依照筛选出的用户标识所在用户使用数据集的应用程序类型所对应的预设兴趣标签,确定筛选出的用户标识所对应的兴趣标签。
在一个实施例中,计算机程序被处理器执行时实现以下步骤:获取每个应用程序标识对应的用户数以及用户使用记录集对应的总用户数;获取与用户标识和应用程序标识对应的使 用权重;根据总用户数与用户数的比重以及使用权重,计算每个应用程序标识对应于用户标识所对应的偏好值。
在一个实施例中,计算机程序被处理器执行时实现以下步骤:基于相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值,将相同应用程序类型各自对应的偏好值按升序进行排序,得到偏好值的排序结果;根据偏好值的排序结果,计算相同应用程序类型下各自对应的每个偏好值的分位数;依据分位数确定各应用程序类型的分类阈值。
在一个实施例中,计算机程序被处理器执行时实现以下步骤:根据偏好值的排序结果,确定各应用程序类型下的每个偏好值在相应排序结果中的出现概率;根据出现概率确定各应用程序类型下的每个偏好值的累积概率,得到各应用程序类型下的每个偏好值的分位数。
在一个实施例中,计算机程序被处理器执行时实现以下步骤:获取各应用程序类型下的每个偏好值在所处排序结果中的排序位以及各应用程序标识所属应用程序类型对应的排序用户数;将各应用程序类型下的每个偏好值在所处排序结果中的排序位除以排序用户数,获得各应用程序类型各自对应的每个偏好值的分位数。
在一个实施例中,计算机程序被处理器执行时实现以下步骤:依据分位数,对应于每个应用程序类型,分别筛选出大于或等于相应预设阈值的分位数;对应于每个应用程序类型,根据筛选出的分位数计算相邻的分位数的差值;获取对应各应用程序类型计算出的每个最大的差值所对应的分位数,得到各应用程序类型的分类阈值。
在一个实施例中,计算机程序被处理器执行时实现以下步骤:获取已知兴趣标签的用户使用记录样本集;根据用户使用记录样本集,对分类阈值进行调整;根据用户使用数据集,并按照调整后的分类阈值进行条件筛选,以筛选出用户标识。
在一个实施例中,计算机程序被处理器执行时实现以下步骤:根据用户使用记录样本集,按已知兴趣标签确定各样本应用程序类型的样本用户使用数据集,样本使用数据集包括对应的样本用户标识、样本应用程序标识、兴趣标签和样本偏好值;基于各样本应用程序类型的已知标签的样本用户使用数据集,计算各样本应用程序类型的每个样本偏好值的分位数;根据已知标签的样本用户使用数据集,按照分类阈值进行条件筛选,以筛选出样本用户标识;依照筛选出的样本用户标识所在用户使用数据集的样本应用程序类型所对应的预设兴趣标签,确定筛选出的用户标识所对应的预测兴趣标签;根据样本用户数据集的预测兴趣标签和已知的相应兴趣标签计算出的每类样本应用程序类型的查全率,调整分类阈值。
在本实施例中,基于在指定时间段内获取的应用程序的用户使用记录集,确定各个应用程序标识对应于用户标识的偏好值,更好的表征用户使用各个应用程序的偏好程度。进一步,通过分析相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值的整体分布情况,以此确定各应用程序类型的分类阈值,充分考虑了相同应用程序类型下偏好值的整体分布情况,为后续筛选用户标识提供更为准确的筛选依据。再者,将各应用程序类型的用户使用数据集按照对应的分类阈值进行筛选,从而筛选出符合条件的用户标识,提高了生成各行为类型的兴趣标签的准确率。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计 算机程序来指令相关的硬件来完成,的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种生成兴趣标签的方法,其中,所述方法包括:
    获取在指定时间段内应用程序的用户使用记录集,计算每个应用程序标识对应于用户标识所对应的偏好值;所述用户使用记录集中的用户使用记录包括用户标识和应用程序标识;
    基于应用程序标识确定应用程序类型,根据相同应用程序类型下的应用程序标识对应于用户标识所对应的所述偏好值,分别确定各应用程序类型的分类阈值;所述应用程序类型存在对应的预设兴趣标签;
    根据基于所述用户使用记录集确定的各应用程序类型的用户使用数据集,并按照所述分类阈值进行条件筛选,以筛选出用户标识;
    依照筛选出的用户标识所在用户使用数据集的应用程序类型所对应的预设兴趣标签,确定筛选出的用户标识所对应的兴趣标签。
  2. 根据权利要求1所述的方法,其中,所述用户使用记录集中的用户使用记录还包括使用权重;所述根据在指定时间段内应用程序的用户使用记录集,计算每个应用程序标识对应于用户标识所对应的偏好值包括:
    获取每个应用程序标识对应的用户数以及所述用户使用记录集对应的总用户数;
    获取与所述用户标识和所述应用程序标识对应的使用权重;
    根据所述总用户数与所述用户数的比重以及所述使用权重,计算每个应用程序标识对应于用户标识所对应的偏好值。
  3. 根据权利要求1所述的方法,其中,所述根据相同应用程序类型下的应用程序标识对应于用户标识所对应的所述偏好值,分别确定各应用程序类型的分类阈值包括:
    基于所述相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值,将相同应用程序类型各自对应的偏好值按升序进行排序,得到偏好值的排序结果;
    根据所述偏好值的排序结果,计算相同应用程序类型下各自对应的每个偏好值的分位数;
    依据所述分位数确定各应用程序类型的分类阈值。
  4. 根据权利要求3所述的方法,其中,所述根据所述偏好值的排序结果,计算各应用程序类型下各自对应的每个偏好值的分位数包括:
    根据所述偏好值的排序结果,确定各应用程序类型下的每个偏好值在相应排序结果中的出现概率;根据所述出现概率确定各应用程序类型下的每个偏好值的累积概率,得到各应用程序类型下的每个偏好值的分位数;或,
    获取各应用程序类型下的每个偏好值在所处排序结果中的排序位以及各应用程序标识所属应用程序类型对应的排序用户数;将各应用程序类型下的每个偏好值在所处排序结果中的排序位除以所述排序用户数,获得各应用程序类型各自对应的每个偏好值的分位数。
  5. 根据权利要求3所述的方法,其中,所述依据所述分位数确定各应用程序类型的分类阈值包括:
    依据所述分位数,对应于每个应用程序类型,分别筛选出大于或等于相应预设阈值的分位数;
    对应于每个应用程序类型,根据筛选出的分位数计算相邻的分位数的差值;
    获取对应各应用程序类型计算出的每个最大的差值所对应的分位数,得到各应用程序类型的分类阈值。
  6. 根据权利要求1所述的方法,其中,所述根据基于所述用户使用记录集确定的各应用程序类型的用户使用数据集,并按照所述分类阈值进行条件筛选,以筛选出用户标识包括:
    获取已知兴趣标签的用户使用记录样本集;
    根据所述用户使用记录样本集,对所述分类阈值进行调整;
    根据所述用户使用数据集,并按照所述调整后的分类阈值进行条件筛选,以筛选出用户标识。
  7. 根据权利要求6所述的方法,其中,所述用户使用记录样本集中用户使用记录样本包括样本用户标识、兴趣标签、样本应用程序类型、样本应用程序标识和样本使用权重;
    所述根据所述用户使用记录样本集,对所述分类阈值进行调整包括:
    根据所述用户使用记录样本集,按所述已知兴趣标签确定各样本应用程序类型的样本用户使用数据集,所述样本使用数据集包括对应的样本用户标识、样本应用程序标识、兴趣标签和样本偏好值;
    基于各样本应用程序类型的已知标签的样本用户使用数据集,计算各样本应用程序类型的每个样本偏好值的分位数;
    根据所述已知标签的样本用户使用数据集,按照所述分类阈值进行条件筛选,以筛选出样本用户标识;
    依照筛选出的样本用户标识所在样本用户使用数据集的样本应用程序类型所对应的预设兴趣标签,确定筛选出的用户标识所对应的预测兴趣标签;
    根据所述样本用户数据集的预测兴趣标签和已知的相应兴趣标签计算出的每类样本应用程序类型的查全率,调整所述分类阈值。
  8. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现如下步骤:
    获取在指定时间段内应用程序的用户使用记录集,计算每个应用程序标识对应于用户标识所对应的偏好值;所述用户使用记录集中的用户使用记录包括用户标识和应用程序标识;
    基于应用程序标识确定应用程序类型,根据相同应用程序类型下的应用程序标识对应于用户标识所对应的所述偏好值,分别确定各应用程序类型的分类阈值;所述应用程序类型存在对应的预设兴趣标签;
    根据基于所述用户使用记录集确定的各应用程序类型的用户使用数据集,并按照所述分类阈值进行条件筛选,以筛选出用户标识;
    依照筛选出的用户标识所在用户使用数据集的应用程序类型所对应的预设兴趣标签,确定筛选出的用户标识所对应的兴趣标签。
  9. 根据权利要求8所述的计算机设备,其中,所述用户使用记录集中的用户使用记录还包括使用权重;所述处理器执行所述计算机程序实现所述根据在指定时间段内应用程序的用户使用记录集,计算每个应用程序标识对应于用户标识所对应的偏好值,包括:
    获取每个应用程序标识对应的用户数以及所述用户使用记录集对应的总用户数;
    获取与所述用户标识和所述应用程序标识对应的使用权重;
    根据所述总用户数与所述用户数的比重以及所述使用权重,计算每个应用程序标识对应于用户标识所对应的偏好值。
  10. 根据权利要求8所述的计算机设备,其中,所述处理器执行所述计算机程序实现所述根据相同应用程序类型下的应用程序标识对应于用户标识所对应的所述偏好值,分别确定各应用程序类型的分类阈值,包括:
    基于所述相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值,将相同应用程序类型各自对应的偏好值按升序进行排序,得到偏好值的排序结果;
    根据所述偏好值的排序结果,计算相同应用程序类型下各自对应的每个偏好值的分位数;
    依据所述分位数确定各应用程序类型的分类阈值。
  11. 根据权利要求10所述的计算机设备,其中,所述处理器执行所述计算机程序实现所述根据所述偏好值的排序结果,计算各应用程序类型下各自对应的每个偏好值的分位数,包括:
    根据所述偏好值的排序结果,确定各应用程序类型下的每个偏好值在相应排序结果中的出现概率;根据所述出现概率确定各应用程序类型下的每个偏好值的累积概率,得到各应用程序类型下的每个偏好值的分位数;或,
    获取各应用程序类型下的每个偏好值在所处排序结果中的排序位以及各应用程序标识所属应用程序类型对应的排序用户数;将各应用程序类型下的每个偏好值在所处排序结果中的排序位除以所述排序用户数,获得各应用程序类型各自对应的每个偏好值的分位数。
  12. 根据权利要求10所述的计算机设备,其中,所述处理器执行所述计算机程序实现所述依据所述分位数确定各应用程序类型的分类阈值,包括:
    依据所述分位数,对应于每个应用程序类型,分别筛选出大于或等于相应预设阈值的分位数;
    对应于每个应用程序类型,根据筛选出的分位数计算相邻的分位数的差值;
    获取对应各应用程序类型计算出的每个最大的差值所对应的分位数,得到各应用程序类型的分类阈值。
  13. 根据权利要求8所述的计算机设备,其中,所述处理器执行所述计算机程序实现所述根据基于所述用户使用记录集确定的各应用程序类型的用户使用数据集,并按照所述分类阈值进行条件筛选,以筛选出用户标识,包括:
    获取已知兴趣标签的用户使用记录样本集;
    根据所述用户使用记录样本集,对所述分类阈值进行调整;
    根据所述用户使用数据集,并按照所述调整后的分类阈值进行条件筛选,以筛选出用户标识。
  14. 根据权利要求13所述的计算机设备,其中,所述用户使用记录样本集中用户使用记录样本包括样本用户标识、兴趣标签、样本应用程序类型、样本应用程序标识和样本使用权 重;所述处理器执行所述计算机程序实现所述根据所述用户使用记录样本集,对所述分类阈值进行调整,包括:
    根据所述用户使用记录样本集,按所述已知兴趣标签确定各样本应用程序类型的样本用户使用数据集,所述样本使用数据集包括对应的样本用户标识、样本应用程序标识、兴趣标签和样本偏好值;
    基于各样本应用程序类型的已知标签的样本用户使用数据集,计算各样本应用程序类型的每个样本偏好值的分位数;
    根据所述已知标签的样本用户使用数据集,按照所述分类阈值进行条件筛选,以筛选出样本用户标识;
    依照筛选出的样本用户标识所在样本用户使用数据集的样本应用程序类型所对应的预设兴趣标签,确定筛选出的用户标识所对应的预测兴趣标签;
    根据所述样本用户数据集的预测兴趣标签和已知的相应兴趣标签计算出的每类样本应用程序类型的查全率,调整所述分类阈值。
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下步骤:
    获取在指定时间段内应用程序的用户使用记录集,计算每个应用程序标识对应于用户标识所对应的偏好值;所述用户使用记录集中的用户使用记录包括用户标识和应用程序标识;
    基于应用程序标识确定应用程序类型,根据相同应用程序类型下的应用程序标识对应于用户标识所对应的所述偏好值,分别确定各应用程序类型的分类阈值;所述应用程序类型存在对应的预设兴趣标签;
    根据基于所述用户使用记录集确定的各应用程序类型的用户使用数据集,并按照所述分类阈值进行条件筛选,以筛选出用户标识;
    依照筛选出的用户标识所在用户使用数据集的应用程序类型所对应的预设兴趣标签,确定筛选出的用户标识所对应的兴趣标签。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述用户使用记录集中的用户使用记录还包括使用权重;所述计算机程序被处理器执行实现所述根据在指定时间段内应用程序的用户使用记录集,计算每个应用程序标识对应于用户标识所对应的偏好值,包括:
    获取每个应用程序标识对应的用户数以及所述用户使用记录集对应的总用户数;
    获取与所述用户标识和所述应用程序标识对应的使用权重;
    根据所述总用户数与所述用户数的比重以及所述使用权重,计算每个应用程序标识对应于用户标识所对应的偏好值。
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时实现所述根据相同应用程序类型下的应用程序标识对应于用户标识所对应的所述偏好值,分别确定各应用程序类型的分类阈值,包括:
    基于所述相同应用程序类型下的应用程序标识对应于用户标识所对应的偏好值,将相同应用程序类型各自对应的偏好值按升序进行排序,得到偏好值的排序结果;
    根据所述偏好值的排序结果,计算相同应用程序类型下各自对应的每个偏好值的分位数;
    依据所述分位数确定各应用程序类型的分类阈值。
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时实现所述根据所述偏好值的排序结果,计算各应用程序类型下各自对应的每个偏好值的分位数,包括:
    根据所述偏好值的排序结果,确定各应用程序类型下的每个偏好值在相应排序结果中的出现概率;根据所述出现概率确定各应用程序类型下的每个偏好值的累积概率,得到各应用程序类型下的每个偏好值的分位数;或,
    获取各应用程序类型下的每个偏好值在所处排序结果中的排序位以及各应用程序标识所属应用程序类型对应的排序用户数;将各应用程序类型下的每个偏好值在所处排序结果中的排序位除以所述排序用户数,获得各应用程序类型各自对应的每个偏好值的分位数。
  19. 根据权利要求17所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时实现所述依据所述分位数确定各应用程序类型的分类阈值,包括:
    依据所述分位数,对应于每个应用程序类型,分别筛选出大于或等于相应预设阈值的分位数;
    对应于每个应用程序类型,根据筛选出的分位数计算相邻的分位数的差值;
    获取对应各应用程序类型计算出的每个最大的差值所对应的分位数,得到各应用程序类型的分类阈值。
  20. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时实现所述根据基于所述用户使用记录集确定的各应用程序类型的用户使用数据集,并按照所述分类阈值进行条件筛选,以筛选出用户标识,包括:
    获取已知兴趣标签的用户使用记录样本集;
    根据所述用户使用记录样本集,对所述分类阈值进行调整;
    根据所述用户使用数据集,并按照所述调整后的分类阈值进行条件筛选,以筛选出用户标识。
PCT/CN2020/086369 2019-06-18 2020-04-23 生成兴趣标签的方法、装置、计算机设备和存储介质 WO2020253369A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910525807.XA CN110377821A (zh) 2019-06-18 2019-06-18 生成兴趣标签的方法、装置、计算机设备和存储介质
CN201910525807.X 2019-06-18

Publications (1)

Publication Number Publication Date
WO2020253369A1 true WO2020253369A1 (zh) 2020-12-24

Family

ID=68249072

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/086369 WO2020253369A1 (zh) 2019-06-18 2020-04-23 生成兴趣标签的方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN110377821A (zh)
WO (1) WO2020253369A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377821A (zh) * 2019-06-18 2019-10-25 深圳壹账通智能科技有限公司 生成兴趣标签的方法、装置、计算机设备和存储介质
CN111079023B (zh) * 2019-12-30 2023-06-16 Oppo广东移动通信有限公司 目标帐户的识别方法、装置、终端及存储介质
BR112022015834A2 (pt) 2020-02-11 2022-09-27 Citrix Systems Inc Sistemas e métodos para acesso agilizado a aplicativos

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104700289A (zh) * 2015-03-17 2015-06-10 中国联合网络通信集团有限公司 广告投放方法和装置
CN106503269A (zh) * 2016-12-08 2017-03-15 广州优视网络科技有限公司 应用推荐的方法、装置及服务器
US20180189410A1 (en) * 2015-06-18 2018-07-05 International Business Machines Corporation Identification of Target Audience for Content Delivery in Social Networks by Quantifying Semantic Relations and Crowdsourcing
US20180349944A1 (en) * 2017-05-31 2018-12-06 Facebook, Inc. Evaluating content publisher options against benchmark publisher
US20190034535A1 (en) * 2017-07-25 2019-01-31 Yandex Europe Ag Method and system for generating a user-personalization interest parameter for identifying personalized targeted content item
CN110377821A (zh) * 2019-06-18 2019-10-25 深圳壹账通智能科技有限公司 生成兴趣标签的方法、装置、计算机设备和存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8479103B2 (en) * 2009-09-15 2013-07-02 International Business Machines Corporation Visualization of real-time social data informatics
CN107908686B (zh) * 2017-10-31 2020-01-14 Oppo广东移动通信有限公司 信息推送方法、装置、服务器以及可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104700289A (zh) * 2015-03-17 2015-06-10 中国联合网络通信集团有限公司 广告投放方法和装置
US20180189410A1 (en) * 2015-06-18 2018-07-05 International Business Machines Corporation Identification of Target Audience for Content Delivery in Social Networks by Quantifying Semantic Relations and Crowdsourcing
CN106503269A (zh) * 2016-12-08 2017-03-15 广州优视网络科技有限公司 应用推荐的方法、装置及服务器
US20180349944A1 (en) * 2017-05-31 2018-12-06 Facebook, Inc. Evaluating content publisher options against benchmark publisher
US20190034535A1 (en) * 2017-07-25 2019-01-31 Yandex Europe Ag Method and system for generating a user-personalization interest parameter for identifying personalized targeted content item
CN110377821A (zh) * 2019-06-18 2019-10-25 深圳壹账通智能科技有限公司 生成兴趣标签的方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN110377821A (zh) 2019-10-25

Similar Documents

Publication Publication Date Title
WO2020253369A1 (zh) 生成兴趣标签的方法、装置、计算机设备和存储介质
US9092549B2 (en) Recommendation of search keywords based on indication of user intention
WO2021012790A1 (zh) 页面数据生成方法、装置、计算机设备及存储介质
US10152479B1 (en) Selecting representative media items based on match information
WO2018157818A1 (zh) 用户偏好的推测方法、装置、终端设备及存储介质
WO2020143156A1 (zh) 热点视频标注处理方法、装置、计算机设备及存储介质
WO2022105129A1 (zh) 内容数据推荐方法、装置、计算机设备及存储介质
CN110457577B (zh) 数据处理方法、装置、设备和计算机存储介质
CN110555164B (zh) 群体兴趣标签的生成方法、装置、计算机设备和存储介质
CN110674144A (zh) 用户画像生成方法、装置、计算机设备和存储介质
WO2021179631A1 (zh) 卷积神经网络模型压缩方法、装置、设备及存储介质
CN111400126B (zh) 网络服务异常数据检测方法、装置、设备和介质
WO2023029356A1 (zh) 基于句向量模型的句向量生成方法、装置及计算机设备
CN112784168B (zh) 信息推送模型训练方法以及装置、信息推送方法以及装置
CN112104505B (zh) 应用推荐方法、装置、服务器和计算机可读存储介质
CN111209929A (zh) 访问数据处理方法、装置、计算机设备及存储介质
CN110598090B (zh) 兴趣标签的生成方法、装置、计算机设备和存储介质
CN113010795A (zh) 用户动态画像生成方法、系统、存储介质及电子设备
CN109697155B (zh) It系统性能评估方法、装置、设备及可读存储介质
CN107392220B (zh) 数据流的聚类方法和装置
CN110162535B (zh) 用于执行个性化的搜索方法、装置、设备以及存储介质
WO2019019387A1 (zh) 信息推送建议生成方法、装置、计算机设备和存储介质
WO2022213662A1 (zh) 应用推荐方法、系统、终端以及存储介质
CN112767027B (zh) 一种基于业务感知的云成本预测方法和系统
CN111353052B (zh) 一种多媒体对象推荐方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20827747

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20827747

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 29/03/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20827747

Country of ref document: EP

Kind code of ref document: A1