CN110598090B - Interest tag generation method and device, computer equipment and storage medium - Google Patents

Interest tag generation method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN110598090B
CN110598090B CN201910667166.1A CN201910667166A CN110598090B CN 110598090 B CN110598090 B CN 110598090B CN 201910667166 A CN201910667166 A CN 201910667166A CN 110598090 B CN110598090 B CN 110598090B
Authority
CN
China
Prior art keywords
behavior
user
sample
type
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910667166.1A
Other languages
Chinese (zh)
Other versions
CN110598090A (en
Inventor
苏显政
张超亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910667166.1A priority Critical patent/CN110598090B/en
Publication of CN110598090A publication Critical patent/CN110598090A/en
Application granted granted Critical
Publication of CN110598090B publication Critical patent/CN110598090B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of user portrayal and provides a method and a device for generating an interest tag, computer equipment and a storage medium. The method comprises the following steps: acquiring a user behavior record set in a specified time period; determining a user behavior data set corresponding to each behavior type according to the user behavior record set, wherein the user behavior data set comprises user identifications, behavior times and average attribute values which correspond to each other; respectively determining a first classification threshold value of behavior times and a second classification threshold value of an average attribute value, which are respectively corresponding to each behavior type, based on a user behavior data set respectively corresponding to each behavior type; screening a target user behavior data set from the user behavior data set according to a first classification threshold and a second classification threshold, wherein the target user behavior data set comprises a target user identifier, a target behavior type and an attribute value of a target behavior action object; and determining the interest label corresponding to the screened target user identifier according to the behavior type corresponding to the user behavior data set of the screened target user identifier, thereby reducing the calculation amount for generating the interest label.

Description

Interest tag generation method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of information processing technologies, and in particular, to a method and an apparatus for generating an interest tag, a computer device, and a storage medium.
Background
With the development and application of the internet, differentiated services such as personalized recommendation and diversified marketing are widely applied to the life of people, and the differentiated services cannot leave the portrait of users. The core task of a user representation is to generate tags for the user. By performing labeling work on the user, the user behavior can be analyzed and predicted from a macroscopic perspective, and the accuracy of the marketing behavior of an enterprise for a specific user is improved.
Currently, in order to ensure the label accuracy of a user portrait, most of the label generation methods of user portraits need to acquire a large amount of user portrait data, but the methods have the problems of large data amount of the required user portrait and large calculation amount caused by the data amount.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a method, an apparatus, a computer device and a storage medium for generating an interest tag.
A method of generating an interest tag, the method comprising:
acquiring a user behavior record set in a specified time period, wherein user behavior records in the user behavior record set comprise user identifications, behavior types and attribute values of behavior action objects;
determining a user behavior data set corresponding to each behavior type based on the user behavior record set, wherein data in the user behavior data set is used for describing the corresponding relation among user identification, behavior times and average attribute values;
respectively determining a first classification threshold value of behavior times corresponding to each behavior type and a second classification threshold value of an average attribute value based on the user behavior data set;
screening a target user behavior data set from the user behavior data set according to the first classification threshold and the second classification threshold, wherein the target user behavior data set comprises a target user identifier, a target behavior type and an attribute value of a target behavior action object;
and determining the interest tag corresponding to the screened target user identifier according to the behavior type corresponding to the user behavior data set of the screened target user identifier.
In one embodiment, the determining, based on the user behavior data set, the first classification threshold of the behavior times and the second classification threshold of the average attribute value corresponding to each behavior type respectively includes:
respectively sorting the behavior times and the average attribute value corresponding to each behavior type in ascending order based on the user behavior data sets corresponding to each behavior type to obtain a sorting result of the behavior times and a sorting result of the average attribute value;
respectively calculating a first score of each behavior frequency and a second score of each average attribute value, which respectively correspond to each behavior type, according to the sequencing result of the behavior frequency and the sequencing result of the average attribute value;
and respectively determining a first classification threshold value of behavior times and a second classification threshold value of an average attribute value corresponding to each behavior type according to the first quantile and the second quantile.
In one embodiment, the calculating, according to the ranking result of the behavior times and the ranking result of the average attribute value, a first score of each behavior time and a second score of each average attribute value respectively corresponding to each behavior type includes:
determining a first occurrence probability of each action time corresponding to each action type in the corresponding sequencing result according to the sequencing result of the action times and the sequencing result of the average attribute value, and determining a second occurrence probability of each average attribute value corresponding to each action type in the corresponding sequencing result;
determining a first cumulative probability of each behavior frequency corresponding to each behavior type according to the first occurrence probability to obtain a first fraction of each behavior frequency corresponding to each behavior type;
and determining a second cumulative probability of each average attribute value corresponding to each behavior type according to the second occurrence probability to obtain a second score of each average attribute value corresponding to each behavior type.
In one embodiment, the calculating, according to the ranking result of the behavior times and the ranking result of the average attribute value, a first score of each behavior time and a second score of each average attribute value respectively corresponding to each behavior type includes:
acquiring the ranking position of each behavior frequency corresponding to each behavior type in the ranking result, the ranking position of each average attribute value corresponding to each behavior type in the ranking result and the number of ranking users corresponding to each behavior type;
dividing the sorting bit of each behavior frequency corresponding to each behavior type by the number of the sorting users to obtain a first fraction of each behavior frequency corresponding to each behavior type;
and dividing the sorting bit of each average attribute value corresponding to each behavior type by the number of the sorting users to obtain a second score of each average attribute value corresponding to each behavior type.
In one embodiment, the determining, according to the first score and the second score, a first classification threshold of behavior times and a second classification threshold of an average attribute value respectively corresponding to each behavior type includes:
according to the first quantile and the second quantile, corresponding to each behavior type, respectively screening out a first quantile which is larger than or equal to a corresponding first preset threshold value and a second quantile which is larger than or equal to a corresponding second preset threshold value;
corresponding to each behavior type, respectively calculating a first difference value of adjacent first quantiles and a second difference value of adjacent second quantiles according to the screened first quantiles and second quantiles;
acquiring a first quantile corresponding to each maximum first difference value calculated corresponding to each behavior type, and acquiring a first classification threshold value of each behavior frequency corresponding to each behavior type;
and acquiring a second score corresponding to each maximum second difference value calculated corresponding to each behavior type to obtain a second classification threshold value of each average attribute value corresponding to each behavior type.
In one embodiment, the filtering out the target user behavior data set from the user behavior data set according to the first classification threshold and the second classification threshold includes:
acquiring a user behavior record sample set of a known interest tag;
respectively adjusting the first classification threshold value and the second classification threshold value according to the user behavior record sample set;
and performing condition screening according to the user behavior data set and the adjusted first classification threshold and the adjusted second classification threshold to screen out a target user behavior data set.
In one embodiment, the user behavior record sample in the user behavior record sample set comprises a sample user identifier, an interest tag, a sample behavior type and a sample attribute value of a sample behavior action object;
the respectively adjusting the first classification threshold and the second classification threshold according to the user behavior record sample set includes:
determining a sample user behavior data set corresponding to each sample behavior type according to the user behavior record sample set and the known interest label, wherein the sample user behavior data set comprises corresponding sample user identification, interest labels, sample behavior times and average sample attribute values;
calculating a first score of each sample behavior frequency and a second score of each average sample attribute value corresponding to each sample behavior type based on a sample user behavior data set of a known interest tag corresponding to each sample behavior type;
according to the sample user behavior data set of the known label, screening out a target sample user behavior data set from the sample user behavior data set according to the first classification threshold and the second classification threshold; the target sample user behavior data set comprises a target sample user identification, a target sample behavior type and an attribute value of a target sample behavior action object;
determining a predicted interest label corresponding to the user identifier of the screened target sample according to the sample behavior type corresponding to the sample user behavior data set where the user identifier of the screened target sample is located;
and adjusting the first classification threshold value and the second classification threshold value according to the recall ratio of each type of sample behavior type calculated by the predicted interest label and the known corresponding interest label of the sample user behavior data set.
An apparatus for generating an interest tag, the apparatus comprising:
the behavior record acquisition module is used for acquiring a user behavior record set in a specified time period, wherein the user behavior record in the user behavior record set comprises a user identifier, a behavior type and an attribute value of a behavior action object;
a behavior data set determining module, configured to determine, based on the user behavior record set, a user behavior data set corresponding to each behavior type, where data in the user behavior data set is used to describe a correspondence between a user identifier, a behavior frequency, and an average attribute value;
a classification threshold determination module, configured to determine, based on the user behavior data set, a first classification threshold of behavior times and a second classification threshold of an average attribute value, where the first classification threshold corresponds to each behavior type;
a target user identification screening module, configured to screen a target user behavior data set from the user behavior data set according to the first classification threshold and the second classification threshold, where the target user behavior data set includes a target user identification, a target behavior type, and an attribute value of a target behavior action object;
and the interest tag generation module is used for determining the interest tag corresponding to the screened target user identifier according to the behavior type corresponding to the user behavior data set of the screened target user identifier.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the method for generating an interest tag when the computer program is executed.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method of generating an interest tag.
According to the method, the device, the computer equipment and the storage medium for generating the interest tags, the user behavior data sets corresponding to the behavior types are determined based on the user behavior record sets acquired in the specified time period, so that the user behaviors are represented by using smaller data and data dimensions, and smaller data volume is provided for determining the interest tags of the users. Further, determining a first classification threshold value of behavior times and a second classification threshold value of an average attribute value corresponding to each behavior type, and performing conditional screening on the user behavior data set according to the first classification threshold value and the second classification threshold value to screen out a target user identifier; determining an interest tag of the target user identifier according to the behavior type corresponding to the user behavior data set of the screened target user identifier; the calculation amount of generating the interest tags is further reduced, and meanwhile, the accuracy of generating the interest tags of all behavior types is guaranteed.
Drawings
FIG. 1 is a diagram illustrating an exemplary embodiment of a method for generating an interest tag;
FIG. 2 is a flowchart illustrating a method for generating an interest tag in an embodiment;
FIG. 3 is a block diagram of an apparatus for generating an interest tag in one embodiment;
FIG. 4 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The interest tag generation method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 and the server 104 communicate over a network.
The server 104 acquires a user behavior record set in a specified time period, wherein the user behavior record set can be generated by triggering of the terminal 102; the server 104 determines user behavior data sets corresponding to the respective behavior types according to the user behavior record sets, and determines a first classification threshold value and a second classification threshold value of an average attribute value of behavior times corresponding to the respective behavior types based on the user behavior data sets corresponding to the respective behavior types; performing conditional screening on the user behavior data set according to a first classification threshold and a second classification threshold to screen out a target user behavior data set, wherein the target user behavior data set comprises a target user identifier, a target behavior type and an attribute value of a target behavior action object; the server 104 determines the interest tag corresponding to the screened target user identifier according to the behavior type corresponding to the user behavior data set where the screened target user identifier is located. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.
In one embodiment, as shown in fig. 2, a method for generating an interest tag is provided, which is described by taking the method as an example for being applied to the server in fig. 1, and includes the following steps:
step S202, a user behavior record set in a specified time period is obtained, and user behavior records in the user behavior record set comprise user identifications, behavior types and attribute values of behavior action objects.
The user behavior record set comprises user behavior records, and each user behavior record comprises a user identifier, a behavior type and an attribute value of a behavior action object. The user identifier is a unique identifier for distinguishing each user, and may be a user ID (Identification). The type of activity is to distinguish between various activities of the user over a specified period of time, such as purchasing a type of ticket. The attribute value of the action object refers to the resource attribute corresponding to the action object of the user action; for example, when the user purchases an airline ticket, and the corresponding airline ticket price is 1000 yuan, the action object is a purchase of the airline ticket, and the attribute value of the action object is 1000 yuan.
Specifically, the terminal is triggered to generate a user behavior record set, and the generated user behavior record set is transmitted to the server through the network and stored in the database. The server can directly acquire the user behavior record set in the appointed time period from the terminal, and also can acquire the user behavior record set in the appointed time period from the database.
For example, xiaoming purchased an airline ticket at 2018, 1, 15, whose fare is 1000 rmb, and the user behavior record can be expressed by an array: (Xiaoming, purchase type of airline ticket, 1000 RMB, 1 month and 15 days 2018).
Step S204, based on the user behavior record set, determining a user behavior data set corresponding to each behavior type, wherein data in the user behavior data set is used for describing the corresponding relation among the user identification, the behavior times and the average attribute value.
The user behavior record set comprises user behavior data sets corresponding to all behavior types, and each user behavior data set comprises corresponding user identification, behavior times and an average attribute value. The action times are the total times of the user in a specified time period aiming at the same action object. The average attribute value refers to the ratio of the sum of all attribute values of the same action object to the total occurrence frequency of the action object in a specified time period.
Specifically, the server classifies the user behavior record sets according to the acquired user behavior record sets and the behavior types, and obtains user behavior data sets corresponding to the behavior types. The classification method may employ a Support Vector Machine (SVM) method, a neural network method, and a deep learning method.
To illustrate, for example, a record of user behavior within a year is obtained: buying the air ticket in 5 months and 1 day, wherein the ticket price is 500 yuan; buying the air ticket in 10 months and 1 day, wherein the ticket price is 900 yuan; buying the train ticket in 12 months and 1 day, wherein the ticket price is 200 yuan; buying the train ticket in 1 day in 9 months, wherein the ticket price is 300 yuan; then the average attribute value of minuscule buying tickets is 700 dollars and the number of actions is 2.
Step S206, respectively determining a first classification threshold value of behavior times corresponding to each behavior type and a second classification threshold value of an average attribute value based on the user behavior data set.
The first classification threshold is a heat degree distinguishing condition of the behavior frequency in the behavior type, and whether the behavior frequency accords with the behavior type can be judged according to the first classification threshold. Similarly, the second classification threshold is a heat degree distinguishing condition of the average attribute value in the belonging behavior type, and whether the average attribute value meets the belonging behavior type can be judged according to the second classification threshold. The two classification thresholds represent the proportion of the individual user behavior in the overall user behavior under the same behavior type.
Specifically, the server determines a first classification threshold of behavior times corresponding to each behavior type and a second classification threshold of an average attribute value respectively according to the user behavior data sets corresponding to the behavior types obtained by classification; whether the behavior frequency accords with the belonged behavior type or not can be judged through the first classification threshold value, and whether the average attribute value accords with the belonged behavior type or not can be judged through the second classification threshold value in the same way.
And S208, screening a target user behavior data set from the user behavior data set according to the first classification threshold and the second classification threshold, wherein the target user behavior data set comprises a target user identifier, a target behavior type and an attribute value of a target behavior action object.
Specifically, for a user behavior data set corresponding to each behavior type, the server performs conditional screening according to a first classification threshold and a second classification threshold corresponding to the behavior type where the user behavior data set is located on the basis of the user behavior data set, so as to screen a target user behavior data set from the belonging user behavior data set, wherein the target user behavior data set includes a target user identifier, a target behavior type, and an attribute value of a target behavior action object.
And step S210, determining the interest tag corresponding to the screened target user identifier according to the behavior type corresponding to the user behavior data set where the screened target user identifier is located.
The interest tag refers to a mark distinguished from a tendency that a user has a certain type of behavior, for example, the user often purchases an air ticket, and the corresponding interest tag of the user may be a purchase air ticket.
Specifically, the server obtains the behavior type corresponding to the behavior data set where the eligible target user identifier is located according to the eligible target user identifier in each screened behavior data set, that is, the interest tag of the target user identifier is the corresponding behavior type.
In the above embodiment, based on the user behavior record set obtained in the specified time period, the user behavior data set corresponding to each behavior type is determined, so that the user behavior is characterized by using smaller data and data dimensions, and a smaller data amount is provided for determining the interest tag of the user. Further, determining a first classification threshold value of behavior times and a second classification threshold value of an average attribute value corresponding to each behavior type, and performing conditional screening on the user behavior data set according to the first classification threshold value and the second classification threshold value to screen out a target user identifier; determining an interest tag of the target user identifier according to the behavior type corresponding to the user behavior data set of the screened target user identifier; the calculation amount of generating the interest tags is further reduced, and meanwhile, the accuracy of generating the interest tags of all behavior types is guaranteed.
In one embodiment, the determining the first classification threshold of the behavior times and the second classification threshold of the average attribute value respectively based on the user behavior data sets corresponding to the behavior types respectively comprises the following steps: respectively sorting the behavior times and the average attribute value corresponding to each behavior type in ascending order based on the user behavior data sets corresponding to each behavior type to obtain a sorting result of the behavior times and a sorting result of the average attribute value; respectively calculating a first score of each behavior time and a second score of each average attribute value corresponding to each behavior type according to the sequencing result of the behavior times and the sequencing result of the average attribute value; and respectively determining a first classification threshold value of the behavior times and a second classification threshold value of the average attribute value corresponding to each behavior type according to the first quantile and the second quantile.
Wherein, quantile means: in the discrete data set, the quantile of the data a is the sum of probabilities of all data satisfying the condition P (X < = a), that is, the quantile of a is the cumulative probability of corresponding a. The value range of the quantile is more than 0 and less than or equal to 1.
Specifically, based on the acquired user behavior data sets corresponding to the behavior types, the server sorts the behavior times and the average attribute values corresponding to the behavior types in a descending order, so as to obtain a sorting result of the behavior times and a sorting result of the average attribute values. According to the obtained sequencing result of the behavior times, the server calculates a first fraction of each behavior time corresponding to each behavior type; and determining a first classification threshold value of the behavior times corresponding to each behavior type according to the first quantile. Similarly, according to the obtained sorting result of the average attribute values, the server calculates a second score of each average attribute value corresponding to each behavior type; and determining a second classification threshold value of each behavior type corresponding to the average attribute value according to the calculated first quantile. The value ranges of the first classification threshold and the second classification threshold may be between 0 and 1, and may be 1.
In the embodiment, the behavior times and the average attribute values corresponding to the behavior types are sorted in an ascending order to obtain respective corresponding sorting results; and further respectively calculating a first score and a second score of the average attribute value of each behavior frequency corresponding to each behavior type according to the sorting result, and determining the classification threshold value of each behavior type according to the first score and the second score. The classification threshold is determined by utilizing the quantile overall distribution condition of each behavior type, the overall distribution condition is fully considered, and a basis is provided for the generation of subsequent interest labels.
In one embodiment, the calculating the first score of each behavior time and the second score of each average attribute value respectively corresponding to each behavior type according to the ranking result of the behavior times and the ranking result of the average attribute value includes: determining a first occurrence probability of each action time corresponding to each action type in the corresponding sequencing result according to the sequencing result of the action times and the sequencing result of the average attribute value, and determining a second occurrence probability of each average attribute value corresponding to each action type in the corresponding sequencing result; determining a first cumulative probability of each behavior frequency corresponding to each behavior type according to the first occurrence probability to obtain a first fraction of each behavior frequency corresponding to each behavior type; and determining a second cumulative probability of each average attribute value corresponding to each behavior type according to the second occurrence probability to obtain a second score of each average attribute value corresponding to each behavior type.
The first occurrence probability refers to a probability that each behavior frequency in the user behavior data set appears in the user behavior data set corresponding to a certain behavior type. Similarly, the second occurrence probability refers to a probability that each average attribute value in the user behavior data set appears in the user behavior data set corresponding to a certain behavior type. The first cumulative probability refers to that in a user behavior data set corresponding to a certain behavior type, the first occurrence probabilities of all behavior times not exceeding the behavior times are added, and the obtained result is the first cumulative probability; similarly, the second cumulative probability refers to that the second occurrence probabilities of each average attribute value not exceeding the average attribute value are added in the user behavior data set corresponding to a certain behavior type, and the obtained result is the second cumulative probability.
Specifically, the server calculates, according to the obtained ranking result of the behavior times and the obtained ranking result of the average attribute values corresponding to the behavior types, a first occurrence probability that each behavior time corresponding to each behavior type appears in the corresponding ranking result, and calculates a second occurrence probability that each average attribute value corresponding to each behavior type appears in the corresponding ranking result. Based on the calculated first occurrence probability, the server determines a first cumulative probability of each behavior frequency corresponding to each behavior type according to the first occurrence probability, namely the first cumulative probability is a first fraction of the corresponding behavior frequency. Similarly, based on the calculated second occurrence probability, the server determines, according to the second occurrence probability, a second cumulative probability of each average attribute value corresponding to each behavior type, that is, the second cumulative probability is a second fraction of the corresponding behavior times.
For example, for a certain behavior type of user behavior data set, the user behavior data set includes the number of behaviors and an average attribute value; and respectively sequencing the behavior times and the average attribute value according to ascending order to obtain a sequencing result of the behavior times and a sequencing result of the average attribute value. If the sequencing result of the action times is as follows: 1,1,2,2,3,4,5,6,7,8; the first probability of occurrence when the number of corresponding actions is 1: p (1) =2/10, first occurrence probability when the number of behaviors is 2: p (2) =2/10, first occurrence probability when the number of behaviors is 3: p (3) =1/10, the cumulative probability when the number of actions is 3 is P (1) + P (2) + P (3), that is, the quantile when the number of actions is 3 is 50%.
In this embodiment, a first occurrence probability of each action number corresponding to each action type is determined based on the ranking result of the action numbers, and a first cumulative probability of each action number corresponding to each action type is further obtained according to the first occurrence probability, so that a first fraction of each action number corresponding to each action type is obtained. Similarly, the second score of the average attribute value corresponding to each behavior type can be obtained. The quantile is calculated by utilizing the cumulative probability, the situation that the individuals of each behavior type account for the integral proportion is reflected on the whole, the relation among the data is fully considered, and the calculation amount of the data is further reduced.
In one embodiment, the step of calculating a first score of each behavior time and a second score of each average attribute value respectively corresponding to each behavior type according to the ranking result of the behavior times and the ranking result of the average attribute value comprises the following steps: acquiring the ranking position of each behavior frequency corresponding to each behavior type in the ranking result, the ranking position of each average attribute value corresponding to each behavior type in the ranking result and the number of ranking users corresponding to each behavior type; dividing the sorting bit of each behavior frequency corresponding to each behavior type by the number of sorting users to obtain a first fraction of each behavior frequency corresponding to each behavior type; and dividing the sorting bit of each average attribute value corresponding to each behavior type by the number of sorting users to obtain a second score of each average attribute value corresponding to each behavior type.
The sorting bit refers to a position where each element in a data set is sorted according to a certain logic and each element is located in the data set. The number of the sorting users refers to the total number of all elements corresponding to one data set.
Specifically, the server obtains, based on the calculated ranking result of each behavior time and the ranking result of the average attribute value corresponding to each behavior type, the ranking order of each behavior time corresponding to each behavior type in the ranking result of the behavior time, the ranking order of each average attribute value corresponding to each behavior type in the ranking result of the average attribute value, and the number of ranking users corresponding to each behavior type. After the server acquires the corresponding data, dividing the sequencing bit of each behavior frequency corresponding to each behavior type by the number of sequencing users corresponding to the behavior type, namely, the obtained calculation result is the first fraction of the behavior frequency corresponding to each behavior type. Similarly, the server further divides the ranking bit of each average attribute value corresponding to each behavior type by the number of ranking users corresponding to the behavior type, that is, the obtained calculation result is the second score of the average attribute value corresponding to each behavior type.
For example, for a certain behavior type of user behavior data set, the user behavior data set includes the number of behaviors and the average attribute value; respectively sequencing the behavior times and the average attribute value according to ascending order to obtain a sequencing result of the behavior times and a sequencing result of the average attribute value; if the ranking bit of the behavior times a in the user behavior data set in the corresponding ranking result is 5, and the number of ranking users of the behavior types where the behavior times a are located is 10, the first ranking number of the behavior times is 5/10 × 100%, that is, the first ranking number is 50%. For example, the ranking of the number of actions results in: 0,1,2,3,4,5,6,7,8,9; the corresponding first score is 70% for a number of actions of 6.
In this embodiment, a first score of each behavior frequency corresponding to each behavior type is determined based on the ranking bit of the ranking result of each behavior frequency corresponding to each behavior type and the number of ranking users corresponding to each behavior type, and similarly, a second score of each average attribute value corresponding to each behavior type is determined. The quantile is determined by the sequencing position and the number of sequencing users, and the calculated amount can be further reduced on the computer level, so that the calculation speed is increased, and the interest label generating rate is increased.
In one embodiment, the determining the first classification threshold of the behavior times and the second classification threshold of the average attribute value respectively corresponding to each behavior type according to the first score and the second score includes the following steps: respectively screening out a first score which is greater than or equal to a corresponding first preset threshold value and a second score which is greater than or equal to a corresponding second preset threshold value corresponding to each behavior type according to the first score and the second score; corresponding to each behavior type, respectively calculating a first difference value of adjacent first scores and a second difference value of adjacent second scores according to the screened first scores and second scores; acquiring a first quantile corresponding to each maximum first difference value calculated corresponding to each behavior type, and acquiring a first classification threshold value of each behavior frequency corresponding to each behavior type; and acquiring a second score corresponding to each maximum second difference value calculated corresponding to each behavior type to obtain a second classification threshold value of each average attribute value corresponding to each behavior type.
The preset threshold is a threshold value of a judgment quantile set in advance, and the threshold value can be stored in a database; the first preset threshold is a threshold value of a first quantile corresponding to each application program type, and the second preset threshold is a threshold value of a second quantile corresponding to each application program type. The difference value is a calculation result obtained by subtracting the two data; the first difference is the result of subtracting two adjacent first quantiles, and the second difference is the result of subtracting two adjacent second quantiles.
Specifically, according to the calculated first quantile of each behavior frequency corresponding to each behavior type and the calculated second quantile of each average attribute value, aiming at the first quantile corresponding to each behavior type, the server obtains a first preset threshold value corresponding to the behavior type from the database, and screens out the first quantile which is greater than or equal to the first preset threshold value according to the first preset threshold value; similarly, aiming at the second quantiles corresponding to each behavior type, the server obtains a second preset threshold corresponding to the behavior type from the database, and screens out the second quantiles larger than or equal to the second preset threshold according to the second preset threshold. And corresponding to each behavior type, the server respectively calculates a first difference value of two adjacent first scores and a second difference value of two adjacent second scores according to the screened first scores and second scores. And the server acquires two first scores corresponding to the maximum first difference according to the calculated first difference corresponding to each behavior type, and takes the first score behind the sequencing bit as a first classification threshold corresponding to the behavior type. Similarly, the server obtains two second scores corresponding to the maximum second difference according to the calculated second difference corresponding to each behavior type, and takes the second score behind the ranking order as the second classification threshold corresponding to the behavior type.
In this embodiment, the first classification threshold value of the behavior frequency and the second classification threshold value of the average attribute value corresponding to each behavior type are determined based on the first score and the second score, the first score with obvious distribution in each behavior type is selected as the first classification threshold value of the behavior type, and similarly, the second classification threshold value can be obtained. Furthermore, the overall distribution characteristics of the data of each behavior type are fully utilized, and the accuracy of the interest tag is guaranteed.
In one embodiment, the method for filtering out a target user behavior data set from a user behavior data set according to a first classification threshold and a second classification threshold comprises the following steps: acquiring a user behavior record sample set of a known interest tag; respectively adjusting the first classification threshold value and the second classification threshold value according to the user behavior record sample set; and screening out a target user behavior data set from the user behavior data set according to the user behavior data set and the adjusted first classification threshold and the adjusted second classification threshold.
Wherein the user behavior record sample set comprises each user behavior record sample. The user behavior data set includes user identifications, behavior times, and average attribute values corresponding to each other.
Specifically, the server obtains a user behavior record sample set of the interest tag from a database or a terminal, and respectively adjusts a first classification threshold and a second classification threshold corresponding to each application program type according to the obtained user behavior record sample set. Further, based on the user behavior data set, the server performs conditional screening on each behavior frequency corresponding to each behavior type according to the adjusted first classification threshold, and performs conditional screening on each average attribute value corresponding to each behavior type according to the adjusted second classification threshold, so as to screen out a target user behavior data set which meets the screening conditions of the behavior frequencies and the screening conditions of the average attribute values at the same time.
In this embodiment, the first classification threshold and the second classification threshold are adjusted based on the user behavior record sample set of the known interest tag, so as to obtain the first classification threshold and the second classification threshold, and the classification threshold is tested by using the user behavior record sample set, thereby improving the accuracy of the interest tag.
In one embodiment, the user behavior record sample in the user behavior record sample set comprises a sample user identification, an interest tag, a sample behavior type and a sample attribute value of a sample behavior action object; according to the user behavior record sample set, respectively adjusting the first classification threshold and the second classification threshold comprises: recording a sample set according to user behaviors, and determining a sample user behavior data set corresponding to each sample behavior type according to a known interest tag, wherein the sample user behavior data set comprises a corresponding sample user identifier, an interest tag, sample behavior times and an average sample attribute value; calculating a first score of each sample behavior frequency and a second score of each average sample attribute value, which are respectively corresponding to each sample behavior type, based on a sample user behavior data set of a known interest tag respectively corresponding to each sample behavior type; screening out a target sample user behavior data set from the sample user behavior data set according to a first classification threshold and a second classification threshold according to the sample user behavior data set of the known label; the target sample user behavior data set comprises a target sample user identification, a target sample behavior type and an attribute value of a target sample behavior action object; determining a predicted interest label corresponding to the user identifier of the screened target sample according to the sample behavior type corresponding to the sample user behavior data set where the user identifier of the screened target sample is located; and adjusting the first classification threshold value and the second classification threshold value according to the recall ratio of each type of sample behavior type calculated by the predicted interest label and the known corresponding interest label of the sample user behavior data set.
The user behavior record sample set comprises various user behavior record samples, and each user behavior record sample comprises a sample user identifier, an interest tag, a sample behavior type and a sample attribute value of a sample behavior action object. The sample user identification is a unique identification that distinguishes individual sample users. The sample behavior types are types corresponding to the respective behaviors of the sample user, the sample behavior types and the behavior types are in a corresponding relationship, and the behavior types include all the sample behavior types. The sample attribute value of the sample behavior action object refers to the resource attribute corresponding to the sample user behavior on the action object; for example, if a sample user purchases an airline ticket, and the corresponding airline ticket price is 1000 yuan, the behavior action object is a purchase of the airline ticket, and the sample attribute value of the sample behavior action object is 1000 yuan. The sample behavior times are the total times that the sample user occurs for the same behavior effect object. The average sample attribute value is the ratio of the sum of all attribute values of a sample user for the same action object to the total number of times.
The user behavior record sample set comprises sample user behavior data sets corresponding to all behavior types; the sample user behavior data set includes corresponding sample user identifications, interest tags, sample behavior times, and average sample attribute values.
The interest tag refers to a mark which is distinguished from a tendency that a user has a certain type of behavior, for example, the user often purchases an air ticket, and the corresponding interest tag of the user can be a purchase air ticket. The predicted interest tag is a predicted interest tag generated from an interest tag generation model. The recall ratio is a ratio of the number of users of the predicted interest tag of each sample user identification consistent with the known interest tag and the total number of users of the sample behavior type corresponding to each sample behavior type. The closer the recall ratio is to 1, the higher the consistency between the predicted interest tag and the known interest tag corresponding to the type of the sample behavior, and further the more appropriate selection of the first classification threshold and the second classification threshold of the type of the sample behavior.
Specifically, the server obtains a user behavior record sample set with interest labels from a database or a terminal, and classifies the user behavior record sample set according to the obtained user behavior record sample set and the known interest labels to obtain a sample user behavior data set corresponding to each sample behavior type. Based on the sample user behavior data sets corresponding to the sample behavior types obtained through classification, the server calculates a first quantile of each sample behavior frequency corresponding to each sample behavior type respectively and also calculates a second quantile of each average sample attribute value corresponding to each sample behavior type respectively.
Based on the sample user behavior data set of the known label, the server searches a corresponding first classification threshold value and a corresponding second classification threshold value from the database according to each sample behavior type, and screens the sample user behavior data set according to the searched first classification threshold value and the searched second classification threshold value. And when the sample behavior times and the average sample attribute value in each sample user behavior data set both meet the screening condition, obtaining a screened target sample user behavior data set, wherein the target sample user behavior data set comprises a target sample user identifier, a target sample behavior type and an attribute value of a target sample behavior action object. Wherein the screening conditions are: for each sample user behavior data set, the number of sample behaviors is greater than or equal to a first classification threshold, while the average sample attribute value is greater than or equal to a second classification threshold. If the sample behavior times and the average sample attribute value corresponding to a certain sample user identifier in a certain sample behavior type both meet the classification condition, that is, the sample behavior times is greater than or equal to the corresponding first classification threshold, and the average sample attribute value is also greater than or equal to the corresponding second classification threshold.
And the server searches the sample behavior type corresponding to the sample user behavior data set where the target sample user identification is located from the database according to the screened target sample user identification, namely the predicted interest tag of the target sample user identification is the correspondingly searched sample behavior type. And based on the predicted interest tag and the known corresponding interest tag of the sample user behavior data set, corresponding to each type of sample behavior type, the server judges whether the predicted interest tag of each target sample user identifier is consistent with the known interest tag or not, records the judgment result by using the identifier and stores the judgment result in the server. When the judgment result is consistent, the mark can be 1; otherwise, it is marked 0. For example, in a certain sample behavior type, a known interest tag identified by a certain target sample user is a movie, and if the predicted interest tag is also a movie, the record is 1; if the predicted interest tag of the target sample user identifier is meal, then the record is 0.
According to the recording result, the server calculates the recall ratio of each type of sample behavior type; and adjusting the corresponding first classification threshold value and the second classification threshold value according to the recall ratio of the behavior types of the various samples. If the recall ratio does not accord with the adjustment threshold, the first classification threshold and the second classification threshold do not need to be adjusted; and if the recall ratio accords with the adjustment threshold, adjusting the first classification threshold and the second classification threshold, determining a prediction label of the sample user behavior data set according to the adjusted first classification threshold and the second classification threshold, and calculating the recall ratio of each type of sample behavior type. Stopping adjusting the corresponding classification threshold value until the recall ratio of the user use record sample set does not accord with the range of the adjustment threshold value; the adjustment threshold may be set as: the recall ratio is lower than 95 percent. . Alternatively, the adjusting method may be adjusting at least one of the first classification threshold and the second classification threshold.
In this embodiment, based on a user behavior record sample set of known interest tags, the first classification threshold and the second classification threshold are adjusted, and the classification thresholds are adjusted according to the calculated recall ratios of the behavior types until the recall ratios of the behavior types do not meet the adjustment threshold. And testing the classification threshold value by using the user behavior record sample set, and verifying the accuracy of the interest tag by checking the recall ratio, thereby further improving the accuracy of the interest tag.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 3, there is provided an interest tag generating apparatus 300, including: an action record obtaining module 302, an action data set determining module 304, a classification threshold determining module 306, a filtering target user identification module 308, and an interest tag generating module 310, wherein:
the behavior record obtaining module 302 is configured to obtain a user behavior record set in a specified time period, where a user behavior record in the user behavior record set includes a user identifier, a behavior type, and an attribute value of a behavior action object.
And a behavior data set determining module 304, configured to determine, based on the user behavior record set, a user behavior data set corresponding to each behavior type, where data in the user behavior data set is used to describe a correspondence between the user identifier, the behavior times, and the average attribute value.
The classification threshold determination module 306 is configured to determine, based on the user behavior data set, a first classification threshold of the behavior times and a second classification threshold of the average attribute value, which correspond to each behavior type, respectively.
And the target user screening module 308 is configured to screen a target user behavior data set from the user behavior data set according to the first classification threshold and the second classification threshold, where the target user behavior data set includes a target user identifier, a target behavior type, and an attribute value of a target behavior action object.
The interest tag generating module 310 is configured to determine an interest tag corresponding to the screened target user identifier according to the behavior type corresponding to the user behavior data set in which the screened target user identifier is located.
In one embodiment, the classification threshold determining module includes: the device comprises a sorting module, a quantile calculation module and a classification threshold calculation module. The sorting module is used for sorting the behavior times and the average attribute values corresponding to the behavior types respectively in an ascending order based on the user behavior data sets corresponding to the behavior types respectively to obtain a sorting result of the behavior times and a sorting result of the average attribute values; the quantile calculation module is used for respectively calculating a first quantile of each behavior frequency and a second quantile of each average attribute value, which respectively correspond to each behavior type, according to the sequencing result of the behavior frequencies and the sequencing result of the average attribute values; and the classification threshold calculation module is used for respectively determining a first classification threshold of behavior times and a second classification threshold of an average attribute value corresponding to each behavior type according to the first quantile and the second quantile.
In one embodiment, the quantile calculation module includes: a probability calculation module and a cumulative probability calculation module. The probability calculation module is used for determining a first occurrence probability of each action time corresponding to each action type in the corresponding sequencing result and determining a second occurrence probability of each average attribute value corresponding to each action type in the corresponding sequencing result according to the sequencing result of the action times and the sequencing result of the average attribute value; the cumulative probability calculation module is used for determining a first cumulative probability of each behavior frequency corresponding to each behavior type according to the first occurrence probability to obtain a first fraction of each behavior frequency corresponding to each behavior type; and determining a second cumulative probability of each average attribute value corresponding to each behavior type according to the second occurrence probability to obtain a second score of each average attribute value corresponding to each behavior type.
In one embodiment, the quantile calculation module includes: the device comprises a data acquisition module and a quantile acquisition module. The data acquisition module is used for acquiring the ranking position of each behavior frequency corresponding to each behavior type in the ranking result, the ranking position of each average attribute value corresponding to each behavior type in the ranking result and the number of ranking users corresponding to each behavior type; the quantile obtaining module is used for dividing the sequencing bit of each action time corresponding to each action type by the number of sequencing users to obtain a first quantile of each action time corresponding to each action type; and dividing the sorting bit of each average attribute value corresponding to each behavior type by the number of sorting users to obtain a second score of each average attribute value corresponding to each behavior type.
In one embodiment, the classification threshold determining module includes: the device comprises a first screening module, a difference value calculating module and a second screening module. The first screening module is used for screening out a first score which is greater than or equal to a corresponding first preset threshold value and a second score which is greater than or equal to a corresponding second preset threshold value respectively corresponding to each behavior type according to the first score and the second score; the difference value calculation module is used for respectively calculating a first difference value of the adjacent first quantiles and a second difference value of the adjacent second quantiles according to the screened first quantile and the screened second quantile corresponding to each behavior type; the second screening module is used for acquiring a first quantile corresponding to each maximum first difference value calculated corresponding to each behavior type to obtain a first classification threshold value of each behavior frequency corresponding to each behavior type; and acquiring a second score corresponding to each maximum second difference value calculated corresponding to each behavior type to obtain a second classification threshold value of each average attribute value corresponding to each behavior type.
In one embodiment, the filtering target subscriber identity module includes: the device comprises a behavior record sample acquisition module, a classification threshold value adjustment module and a condition screening module. The behavior record sample acquisition module is used for acquiring a user behavior record sample set of a known interest tag; the classification threshold adjusting module is used for respectively adjusting the first classification threshold and the second classification threshold according to the user behavior record sample set; and the condition screening module screens out a target user behavior data set from the user behavior data set according to the user behavior data set and the adjusted first classification threshold and the adjusted second classification threshold.
In one embodiment, the user behavior record sample in the user behavior record sample set comprises a sample user identification, an interest tag, a sample behavior type and a sample attribute value of a sample behavior action object; the classification threshold adjusting module includes: the system comprises a sample user behavior acquisition module, a sample quantile calculation module, a sample user identification screening module, a predicted interest tag generation module and a recall ratio calculation module. The sample user behavior acquisition module is used for recording a sample set according to user behaviors and determining a sample user behavior data set corresponding to each sample behavior type according to a known interest tag, wherein the sample user behavior data set comprises a corresponding sample user identifier, an interest tag, sample behavior times and an average sample attribute value; the sample quantile calculation module is used for calculating a first quantile of each sample behavior frequency and a second quantile of each average sample attribute value, which are respectively corresponding to each sample behavior type, based on the sample user behavior data set of the known interest tag, which is respectively corresponding to each sample behavior type; the target sample user identification screening module is used for screening out a target sample user behavior data set from the sample user behavior data set according to a known label and a first classification threshold and a second classification threshold; the target sample user behavior data set comprises a target sample user identification, a target sample behavior type and an attribute value of a target sample behavior action object; the predicted interest label generation module is used for determining a predicted interest label corresponding to the user identifier of the screened target sample according to the sample behavior type corresponding to the sample user behavior data set where the user identifier of the screened target sample is located; and the recall ratio calculation module is used for adjusting the first classification threshold value and the second classification threshold value according to the recall ratio of each type of sample behavior type calculated by the predicted interest label and the known corresponding interest label of the sample user behavior data set.
For specific limitations of the apparatus for generating the interest tag, reference may be made to the above limitations on the method for generating the interest tag, which are not described herein again. The modules in the apparatus for generating interest tag may be implemented in whole or in part by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The database of the computer device is for storing a set of user behavior records, a set of user behavior data, a first classification threshold, and a second classification threshold data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of interest tag generation.
Those skilled in the art will appreciate that the architecture shown in fig. 4 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program: acquiring a user behavior record set in a specified time period, wherein the user behavior record in the user behavior record set comprises a user identifier, a behavior type and an attribute value of a behavior action object; determining a user behavior data set corresponding to each behavior type based on the user behavior record set, wherein data in the user behavior data set is used for describing the corresponding relation among the user identification, the behavior times and the average attribute value; respectively determining a first classification threshold value of behavior times corresponding to each behavior type and a second classification threshold value of an average attribute value based on the user behavior data set; screening a target user behavior data set from the user behavior data set according to a first classification threshold and a second classification threshold, wherein the target user behavior data set comprises a target user identifier, a target behavior type and an attribute value of a target behavior action object; and determining the interest tag corresponding to the screened target user identifier according to the behavior type corresponding to the user behavior data set of the screened target user identifier.
In one embodiment, the processor, when executing the computer program, further performs the steps of: respectively sorting the behavior times and the average attribute values corresponding to the behavior types in an ascending order based on the user behavior data sets corresponding to the behavior types respectively to obtain a sorting result of the behavior times and a sorting result of the average attribute values; respectively calculating a first score of each behavior time and a second score of each average attribute value corresponding to each behavior type according to the sequencing result of the behavior times and the sequencing result of the average attribute value; and respectively determining a first classification threshold value of the behavior times and a second classification threshold value of the average attribute value corresponding to each behavior type according to the first quantile and the second quantile.
In one embodiment, the processor, when executing the computer program, further performs the steps of: determining a first occurrence probability of each action time corresponding to each action type in the corresponding sequencing result according to the sequencing result of the action times and the sequencing result of the average attribute value, and determining a second occurrence probability of each average attribute value corresponding to each action type in the corresponding sequencing result; determining a first cumulative probability of each behavior frequency corresponding to each behavior type according to the first occurrence probability to obtain a first fraction of each behavior frequency corresponding to each behavior type; and determining a second cumulative probability of each average attribute value corresponding to each behavior type according to the second occurrence probability to obtain a second score of each average attribute value corresponding to each behavior type.
In one embodiment, the processor when executing the computer program further performs the steps of: acquiring an ordering position of each behavior frequency corresponding to each behavior type in an ordering result, an ordering position of each average attribute value corresponding to each behavior type in the ordering result and the number of ordering users corresponding to each behavior type; dividing the sequencing bit of each behavior frequency corresponding to each behavior type by the number of sequencing users to obtain a first fraction of each behavior frequency corresponding to each behavior type; and dividing the sorting bit of each average attribute value corresponding to each behavior type by the number of sorting users to obtain a second score of each average attribute value corresponding to each behavior type.
In one embodiment, the processor, when executing the computer program, further performs the steps of: respectively screening out a first score which is greater than or equal to a corresponding first preset threshold value and a second score which is greater than or equal to a corresponding second preset threshold value corresponding to each behavior type according to the first score and the second score; corresponding to each behavior type, respectively calculating a first difference value of adjacent first scores and a second difference value of adjacent second scores according to the screened first scores and second scores; acquiring a first quantile corresponding to each maximum first difference value calculated corresponding to each behavior type, and acquiring a first classification threshold value of each behavior frequency corresponding to each behavior type; and acquiring a second score corresponding to each maximum second difference value calculated corresponding to each behavior type to obtain a second classification threshold value of each average attribute value corresponding to each behavior type.
In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring a user behavior record sample set of a known interest tag; respectively adjusting the first classification threshold value and the second classification threshold value according to the user behavior record sample set; and screening out a target user behavior data set from the user behavior data set according to the adjusted first classification threshold and the adjusted second classification threshold.
In one embodiment, the user behavior record sample in the user behavior record sample set comprises a sample user identification, an interest tag, a sample behavior type and a sample attribute value of a sample behavior effect object; according to the user behavior record sample set, respectively adjusting the first classification threshold and the second classification threshold comprises: recording a sample set according to user behaviors, and determining a sample user behavior data set corresponding to each sample behavior type according to a known interest tag, wherein the sample user behavior data set comprises a corresponding sample user identifier, an interest tag, sample behavior times and an average sample attribute value; calculating a first score of each sample behavior frequency and a second score of each average sample attribute value, which are respectively corresponding to each sample behavior type, based on a sample user behavior data set of a known interest tag respectively corresponding to each sample behavior type; according to a sample user behavior data set of a known label, screening out a target sample user behavior data set from the sample user behavior data set according to a first classification threshold and a second classification threshold; the target sample user behavior data set comprises a target sample user identification, a target sample behavior type and an attribute value of a target sample behavior action object; determining a predicted interest label corresponding to the user identifier of the screened target sample according to the sample behavior type corresponding to the sample user behavior data set where the user identifier of the screened target sample is located; and adjusting the first classification threshold value and the second classification threshold value according to the recall ratio of each type of sample behavior type calculated by the predicted interest label and the known corresponding interest label of the sample user behavior data set.
In this embodiment, based on the user behavior record set acquired within the specified time period, the user behavior data set corresponding to each behavior type is determined, so that the user behavior is characterized by using smaller data and data dimensions, and a smaller data volume is provided for determining the interest tag of the user. Further, determining a first classification threshold value of behavior times and a second classification threshold value of an average attribute value corresponding to each behavior type, and performing conditional screening on the user behavior data set according to the first classification threshold value and the second classification threshold value to screen out a target user identifier; determining an interest tag of the target user identifier according to the behavior type corresponding to the user behavior data set of the screened target user identifier; the calculation amount of generating the interest tags is further reduced, and meanwhile, the accuracy of generating the interest tags of all behavior types is guaranteed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, performs the steps of: acquiring a user behavior record set in a specified time period, wherein the user behavior record in the user behavior record set comprises a user identifier, a behavior type and an attribute value of a behavior action object; determining a user behavior data set corresponding to each behavior type based on the user behavior record set, wherein data in the user behavior data set is used for describing the corresponding relation among the user identification, the behavior times and the average attribute value; respectively determining a first classification threshold value of behavior times corresponding to each behavior type and a second classification threshold value of an average attribute value based on the user behavior data set; screening a target user behavior data set from the user behavior data set according to a first classification threshold and a second classification threshold, wherein the target user behavior data set comprises a target user identifier, a target behavior type and an attribute value of a target behavior action object; and determining the interest tag corresponding to the screened target user identifier according to the behavior type corresponding to the user behavior data set of the screened target user identifier.
In one embodiment, the computer program when executed by the processor further performs the steps of: respectively sorting the behavior times and the average attribute values corresponding to the behavior types in an ascending order based on the user behavior data sets corresponding to the behavior types respectively to obtain a sorting result of the behavior times and a sorting result of the average attribute values; respectively calculating a first score of each behavior frequency and a second score of each average attribute value, which respectively correspond to each behavior type, according to the sequencing result of the behavior frequency and the sequencing result of the average attribute value; and respectively determining a first classification threshold value of the behavior times and a second classification threshold value of the average attribute value corresponding to each behavior type according to the first quantile and the second quantile.
In one embodiment, the computer program when executed by the processor further performs the steps of: determining a first occurrence probability of each action time corresponding to each action type in the corresponding sequencing result according to the sequencing result of the action times and the sequencing result of the average attribute value, and determining a second occurrence probability of each average attribute value corresponding to each action type in the corresponding sequencing result; determining a first cumulative probability of each behavior frequency corresponding to each behavior type according to the first occurrence probability to obtain a first fraction of each behavior frequency corresponding to each behavior type; and determining a second cumulative probability of each average attribute value corresponding to each behavior type according to the second occurrence probability to obtain a second score of each average attribute value corresponding to each behavior type.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring the ranking position of each behavior frequency corresponding to each behavior type in the ranking result, the ranking position of each average attribute value corresponding to each behavior type in the ranking result and the number of ranking users corresponding to each behavior type; dividing the sorting bit of each behavior frequency corresponding to each behavior type by the number of sorting users to obtain a first fraction of each behavior frequency corresponding to each behavior type; and dividing the sorting bit of each average attribute value corresponding to each behavior type by the number of sorting users to obtain a second score of each average attribute value corresponding to each behavior type.
In one embodiment, the computer program when executed by the processor further performs the steps of: respectively screening out a first fraction which is greater than or equal to a corresponding first preset threshold value and a second fraction which is greater than or equal to a corresponding second preset threshold value corresponding to each behavior type according to the first fraction and the second fraction; corresponding to each behavior type, respectively calculating a first difference value of adjacent first quantiles and a second difference value of adjacent second quantiles according to the screened first quantiles and second quantiles; acquiring a first quantile corresponding to each maximum first difference value calculated corresponding to each behavior type, and acquiring a first classification threshold value of each behavior frequency corresponding to each behavior type; and acquiring a second score corresponding to each maximum second difference value calculated corresponding to each behavior type to obtain a second classification threshold value of each average attribute value corresponding to each behavior type.
In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a user behavior record sample set of a known interest tag; respectively adjusting the first classification threshold value and the second classification threshold value according to the user behavior record sample set; and screening out a target user behavior data set from the user behavior data set according to the adjusted first classification threshold value and the adjusted second classification threshold value.
In one embodiment, the user behavior record sample in the user behavior record sample set comprises a sample user identification, an interest tag, a sample behavior type and a sample attribute value of a sample behavior action object; according to the user behavior record sample set, respectively adjusting the first classification threshold and the second classification threshold comprises: recording a sample set according to user behaviors, and determining a sample user behavior data set corresponding to each sample behavior type according to a known interest tag, wherein the sample user behavior data set comprises a corresponding sample user identifier, an interest tag, sample behavior times and an average sample attribute value; calculating a first score of each sample behavior frequency and a second score of each average sample attribute value, which are respectively corresponding to each sample behavior type, based on a sample user behavior data set of a known interest tag respectively corresponding to each sample behavior type; screening out a target sample user behavior data set from the sample user behavior data set according to a first classification threshold and a second classification threshold according to the sample user behavior data set of the known label; the target sample user behavior data set comprises a target sample user identification, a target sample behavior type and an attribute value of a target sample behavior action object; determining a predicted interest label corresponding to the user identifier of the screened target sample according to the sample behavior type corresponding to the sample user behavior data set where the user identifier of the screened target sample is located; and adjusting the first classification threshold value and the second classification threshold value according to the recall ratio of each type of sample behavior type calculated by the predicted interest label and the known corresponding interest label of the sample user behavior data set.
In this embodiment, based on the user behavior record set acquired within the specified time period, the user behavior data set corresponding to each behavior type is determined, so that the user behavior is characterized by using smaller data and data dimensions, and a smaller data volume is provided for determining the interest tag of the user. Further, determining a first classification threshold value of behavior times and a second classification threshold value of an average attribute value corresponding to each behavior type, and performing conditional screening on the user behavior data set according to the first classification threshold value and the second classification threshold value to screen out a target user identifier; determining an interest tag of the target user identifier according to the behavior type corresponding to the user behavior data set of the screened target user identifier; further reducing the calculation amount of generating the interest tags and simultaneously ensuring the accuracy of generating the interest tags of each behavior type
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (9)

1. A method of generating an interest tag, the method comprising:
acquiring a user behavior record set in a specified time period, wherein the user behavior record in the user behavior record set comprises a user identifier, a behavior type and an attribute value of a behavior action object;
determining a user behavior data set corresponding to each behavior type based on the user behavior record set, wherein data in the user behavior data set is used for describing the corresponding relation among user identification, behavior times and average attribute values;
respectively determining a first classification threshold value of behavior times corresponding to each behavior type and a second classification threshold value of an average attribute value based on the user behavior data set;
screening a target user behavior data set from the user behavior data set according to the first classification threshold and the second classification threshold, wherein the target user behavior data set comprises a target user identifier, a target behavior type and an attribute value of a target behavior action object;
determining an interest tag corresponding to the screened target user identifier according to the behavior type corresponding to the user behavior data set where the screened target user identifier is located;
the determining, based on the user behavior data set, a first classification threshold of behavior times and a second classification threshold of an average attribute value corresponding to each behavior type respectively includes:
based on the user behavior data sets corresponding to the behavior types, respectively sequencing the behavior times and the average attribute value corresponding to the behavior types in an ascending order to obtain a sequencing result of the behavior times and a sequencing result of the average attribute value;
respectively calculating a first score of each behavior frequency and a second score of each average attribute value, which respectively correspond to each behavior type, according to the sequencing result of the behavior frequency and the sequencing result of the average attribute value;
and respectively determining a first classification threshold value of the behavior times and a second classification threshold value of the average attribute value corresponding to each behavior type according to the first quantile and the second quantile.
2. The method according to claim 1, wherein the calculating a first score of each behavior time and a second score of each average attribute value respectively corresponding to each behavior type according to the ranking result of the behavior times and the ranking result of the average attribute value comprises:
determining a first occurrence probability of each action time corresponding to each action type in the corresponding sequencing result according to the sequencing result of the action times and the sequencing result of the average attribute value, and determining a second occurrence probability of each average attribute value corresponding to each action type in the corresponding sequencing result;
determining a first cumulative probability of each behavior frequency corresponding to each behavior type according to the first occurrence probability to obtain a first fraction of each behavior frequency corresponding to each behavior type;
and determining a second cumulative probability of each average attribute value corresponding to each behavior type according to the second occurrence probability to obtain a second score of each average attribute value corresponding to each behavior type.
3. The method according to claim 1, wherein the calculating a first score of each behavior time and a second score of each average attribute value respectively corresponding to each behavior type according to the ranking result of the behavior times and the ranking result of the average attribute value respectively comprises:
acquiring an ordering position of each behavior frequency corresponding to each behavior type in an ordering result, an ordering position of each average attribute value corresponding to each behavior type in the ordering result and the number of ordering users corresponding to each behavior type;
dividing the sorting bit of each behavior frequency corresponding to each behavior type by the number of the sorting users to obtain a first fraction of each behavior frequency corresponding to each behavior type;
and dividing the sorting bit of each average attribute value corresponding to each behavior type by the number of the sorting users to obtain a second score of each average attribute value corresponding to each behavior type.
4. The method according to claim 1, wherein the determining the first classification threshold and the second classification threshold of the average attribute value of the behavior times respectively corresponding to each behavior type according to the first score and the second score comprises:
according to the first quantile and the second quantile, corresponding to each behavior type, respectively screening out a first quantile which is larger than or equal to a corresponding first preset threshold value and a second quantile which is larger than or equal to a corresponding second preset threshold value;
corresponding to each behavior type, respectively calculating a first difference value of adjacent first scores and a second difference value of adjacent second scores according to the screened first scores and second scores;
acquiring a first quantile corresponding to each maximum first difference value calculated corresponding to each behavior type, and acquiring a first classification threshold value of each behavior frequency corresponding to each behavior type;
and acquiring a second score corresponding to each maximum second difference value calculated corresponding to each behavior type to obtain a second classification threshold value of each average attribute value corresponding to each behavior type.
5. The method of claim 1, wherein the filtering out the target set of user behavior data from the set of user behavior data according to the first classification threshold and the second classification threshold comprises:
acquiring a user behavior record sample set of a known interest tag;
respectively adjusting the first classification threshold value and the second classification threshold value according to the user behavior record sample set;
and screening out a target user behavior data set from the user behavior data set according to the adjusted first classification threshold and the adjusted second classification threshold.
6. The method of claim 5, wherein the user behavior record samples in the set of user behavior record samples comprise sample user identifications, interest tags, sample behavior types, and sample attribute values of sample behavior effect objects;
the adjusting the first classification threshold and the second classification threshold respectively according to the user behavior record sample set includes:
determining a sample user behavior data set corresponding to each sample behavior type according to the user behavior record sample set and the known interest label, wherein the sample user behavior data set comprises corresponding sample user identification, interest labels, sample behavior times and average sample attribute values;
calculating a first score of each sample behavior frequency and a second score of each average sample attribute value, which are respectively corresponding to each sample behavior type, based on a sample user behavior data set of a known interest tag respectively corresponding to each sample behavior type;
screening out a target sample user behavior data set from the sample user behavior data set according to the first classification threshold and the second classification threshold according to the sample user behavior data set of the known interest tag; the target sample user behavior data set comprises a target sample user identification, a target sample behavior type and an attribute value of a target sample behavior action object;
determining a predicted interest label corresponding to the user identifier of the screened target sample according to the sample behavior type corresponding to the sample user behavior data set where the user identifier of the screened target sample is located;
and adjusting the first classification threshold value and the second classification threshold value according to the recall ratio of each type of sample behavior type calculated by the predicted interest label and the known corresponding interest label of the sample user behavior data set.
7. An apparatus for generating an interest tag, the apparatus comprising:
the behavior record acquisition module is used for acquiring a user behavior record set in a specified time period, wherein the user behavior record in the user behavior record set comprises a user identifier, a behavior type and an attribute value of a behavior action object;
a behavior data set determining module, configured to determine, based on the user behavior record set, a user behavior data set corresponding to each behavior type, where data in the user behavior data set is used to describe a correspondence between a user identifier, a behavior frequency, and an average attribute value;
a classification threshold determination module, configured to determine, based on the user behavior data set, a first classification threshold of behavior times and a second classification threshold of an average attribute value, which correspond to each behavior type, respectively;
a screening target user identification module, configured to screen out a target user behavior data set from the user behavior data set according to the first classification threshold and the second classification threshold, where the target user behavior data set includes a target user identification, a target behavior type, and an attribute value of a target behavior action object;
the interest tag generation module is used for determining an interest tag corresponding to the screened target user identifier according to the behavior type corresponding to the user behavior data set of the screened target user identifier;
the classification threshold determination module is specifically configured to sort the behavior times and the average attribute values corresponding to the behavior types respectively in ascending order based on the user behavior data sets corresponding to the behavior types respectively, so as to obtain a sorting result of the behavior times and a sorting result of the average attribute values;
respectively calculating a first score of each behavior frequency and a second score of each average attribute value, which respectively correspond to each behavior type, according to the sequencing result of the behavior frequency and the sequencing result of the average attribute value;
and respectively determining a first classification threshold value of behavior times and a second classification threshold value of an average attribute value corresponding to each behavior type according to the first quantile and the second quantile.
8. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 6 when executing the computer program.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 6.
CN201910667166.1A 2019-07-23 2019-07-23 Interest tag generation method and device, computer equipment and storage medium Active CN110598090B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910667166.1A CN110598090B (en) 2019-07-23 2019-07-23 Interest tag generation method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910667166.1A CN110598090B (en) 2019-07-23 2019-07-23 Interest tag generation method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110598090A CN110598090A (en) 2019-12-20
CN110598090B true CN110598090B (en) 2023-04-11

Family

ID=68852890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910667166.1A Active CN110598090B (en) 2019-07-23 2019-07-23 Interest tag generation method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110598090B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191091A (en) * 2019-12-30 2020-05-22 成都数联铭品科技有限公司 Data classification method and system
CN113487225B (en) * 2021-07-23 2024-05-24 北京云从科技有限公司 Risk control method, system, equipment and medium
CN115033565A (en) * 2022-04-20 2022-09-09 厦门市美亚柏科信息股份有限公司 User portrait method and terminal

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104700289A (en) * 2015-03-17 2015-06-10 中国联合网络通信集团有限公司 Advertising method and device
CN105677925A (en) * 2016-03-30 2016-06-15 北京京东尚科信息技术有限公司 Method and device for processing user data in database
CN106503269A (en) * 2016-12-08 2017-03-15 广州优视网络科技有限公司 Method, device and server that application is recommended
CN109034935A (en) * 2018-06-06 2018-12-18 平安科技(深圳)有限公司 Products Show method, apparatus, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104700289A (en) * 2015-03-17 2015-06-10 中国联合网络通信集团有限公司 Advertising method and device
CN105677925A (en) * 2016-03-30 2016-06-15 北京京东尚科信息技术有限公司 Method and device for processing user data in database
CN106503269A (en) * 2016-12-08 2017-03-15 广州优视网络科技有限公司 Method, device and server that application is recommended
CN109034935A (en) * 2018-06-06 2018-12-18 平安科技(深圳)有限公司 Products Show method, apparatus, computer equipment and storage medium

Also Published As

Publication number Publication date
CN110598090A (en) 2019-12-20

Similar Documents

Publication Publication Date Title
CN110598845B (en) Data processing method, data processing device, computer equipment and storage medium
CN108711110B (en) Insurance product recommendation method, apparatus, computer device and storage medium
CN109345374B (en) Risk control method and device, computer equipment and storage medium
CN111401609B (en) Prediction method and prediction device for traffic flow time series
CN109858737B (en) Grading model adjustment method and device based on model deployment and computer equipment
CN109165983A (en) Insurance products recommended method, device, computer equipment and storage medium
CN108876133A (en) Risk assessment processing method, device, server and medium based on business information
CN110598090B (en) Interest tag generation method and device, computer equipment and storage medium
CN110781379A (en) Information recommendation method and device, computer equipment and storage medium
CN109245996B (en) Mail pushing method and device, computer equipment and storage medium
CN109034583A (en) Abnormal transaction identification method, apparatus and electronic equipment
CN110674144A (en) User portrait generation method and device, computer equipment and storage medium
CN110888911A (en) Sample data processing method and device, computer equipment and storage medium
CN109063984B (en) Method, apparatus, computer device and storage medium for risky travelers
CN109242539A (en) Based on potential user&#39;s prediction technique, device and the computer equipment for being lost user
CN112784168B (en) Information push model training method and device, information push method and device
CN112508638B (en) Data processing method and device and computer equipment
CN107622326A (en) User&#39;s classification, available resources Forecasting Methodology, device and equipment
CN115311042A (en) Commodity recommendation method and device, computer equipment and storage medium
CN112417315A (en) User portrait generation method, device, equipment and medium based on website registration
CN110991538B (en) Sample classification method and device, storage medium and computer equipment
CN111061948A (en) User label recommendation method and device, computer equipment and storage medium
WO2020253369A1 (en) Method and device for generating interest tag, computer equipment and storage medium
CN112685639A (en) Activity recommendation method and device, computer equipment and storage medium
CN111209929A (en) Access data processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant