WO2021012482A1 - 群体兴趣标签的生成方法、装置、计算机设备和存储介质 - Google Patents
群体兴趣标签的生成方法、装置、计算机设备和存储介质 Download PDFInfo
- Publication number
- WO2021012482A1 WO2021012482A1 PCT/CN2019/116494 CN2019116494W WO2021012482A1 WO 2021012482 A1 WO2021012482 A1 WO 2021012482A1 CN 2019116494 W CN2019116494 W CN 2019116494W WO 2021012482 A1 WO2021012482 A1 WO 2021012482A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- user object
- interest
- group
- objects
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Definitions
- This application relates to a method, device, computer equipment and storage medium for generating group interest tags.
- differentiated services such as personalized recommendation and diversified marketing have been widely used in people's lives, and these differentiated services are inseparable from user portraits.
- the core job of user portrait is to generate interest tags for users.
- user behavior can be analyzed and predicted from a macro perspective, which helps to improve the accuracy of the company's marketing behavior for specific users.
- the method of generating interest tags for user portraits is to generate interest tags for a specific individual user, and it is difficult to provide accurate interest tags for group users.
- a method, apparatus, computer equipment, and storage medium for generating interest tags are provided.
- a method for generating group interest tags comprising:
- the interest tags are filtered according to the target group index corresponding to each interest tag of each user object group;
- the interest tag obtained by the screening is used as the group interest tag of the corresponding user target group.
- a device for generating group interest tags comprising:
- the user object acquisition module is configured to acquire a user object collection, the user object collection including user objects with interest tags and user objects without interest tags;
- the user object clustering module is used to perform clustering according to the user attributes of the user objects in the user object set to obtain a user object group;
- the target group index determination module is used to determine the target group index corresponding to each interest tag of each user target group according to the quantitative characteristics of the user objects with each interest tag in each user target group;
- the interest tag screening module is used to screen interest tags based on the user objects with interest tags in each user object group according to the target group index corresponding to each interest tag of each user object group;
- the group interest tag determination module is used to use the filtered interest tags as the group interest tags of the corresponding user target group.
- a computer device includes a memory and one or more processors.
- the memory stores computer-readable instructions.
- the computer-readable instructions are executed by the one or more processors, the one or more Each processor performs the following steps:
- the interest tag obtained by the screening is used as the group interest tag of the corresponding user target group.
- One or more non-volatile computer-readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
- the interest tag obtained by the screening is used as the group interest tag of the corresponding user target group.
- Fig. 1 is an application scenario diagram of a method for generating a group interest tag according to one or more embodiments.
- Fig. 2 is a schematic flowchart of a method for generating a group interest tag according to one or more embodiments.
- Fig. 3 is a schematic flowchart of a method for generating a group interest tag in another embodiment.
- Fig. 4 is a structural block diagram of an apparatus for generating a group interest tag according to one or more embodiments.
- Figure 5 is a block diagram of a computer device according to one or more embodiments.
- the method for generating group interest tags provided in this application can be applied to the application environment as shown in FIG. 1.
- the terminal 102 communicates with the server 104 through the network through the network.
- the server 104 obtains a set of user objects, where the set of user objects may be triggered by the terminal 102; and clusters based on the user attributes of the user objects in the set of user objects to obtain a user object group composed of similar user objects; the server 104
- the target group index is also determined according to the characteristics of the number of user objects with each interest label in each user target group, and the group interest label of each user target group is determined based on the target group index.
- the terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
- the server 104 may be implemented as an independent server or a server cluster composed of multiple servers.
- a method for generating a group interest tag is provided. Taking the method applied to the server in FIG. 1 as an example, the method includes the following steps:
- Step S202 Obtain a user object set.
- the user object set includes user objects with interest tags and user objects with no interest tags.
- the user object set includes each user object, including user objects with interest tags and user objects with no interest tags.
- the user object collection contains a wealth of information, such as the similarity between user objects and the relationship between user objects and interest tags.
- the interest tag refers to a tag that is different from the user object's tendency to have a certain type of behavior; for example, a user object often uses a video application, and the corresponding interest tag of the user object may be a video.
- the terminal is triggered by the user to generate a user object set of each user object, and transmit the generated user object set to the server via the network, or directly store the user object set in the terminal's own device.
- the server can obtain a collection of user objects from each terminal, or it can obtain a collection of user objects from the server.
- Step S204 clustering is performed according to the user attributes of the user objects in the user object set to obtain the user object group.
- the user attribute refers to the basic information of the user object, including gender, education, birth city level, current residence city level, whether there is a car product, and the level of disposable wealth.
- the user object group is a set of user objects composed of similar user objects, and the user object group includes each similar user object.
- the server searches for the corresponding user attribute based on the acquired user object set according to each user object.
- the user attribute can be stored in the database or in the terminal corresponding to the user object; based on the found user object
- the user attributes of the user objects in the collection are clustered to obtain the user object group.
- the server obtains a set of user objects, the set of user objects includes each user object, and each user object includes user attributes and interest tags; and clusters are directly performed according to the user attributes of each user object in the obtained user object set , So as to get the user target group.
- Step S206 Determine the target group index corresponding to each interest tag of each user object group according to the characteristics of the number of user objects with each interest tag in each user object group.
- the Target Group Index reflects the strength or weakness of the user attributes of the user object in the user object group in a specific range (such as geographic area, demographic field, media audience, product consumer). For example, if the TGI index is 100, it means the average level; if it is higher than 100, it means that the user target group has a certain type of behavior tendency higher than the overall level.
- each user object group includes user objects with interest tags and user objects without interest tags.
- the server calculates the target group index corresponding to each interest tag of each user object group according to the quantitative weight characteristics of each user object with interest tags in each user object group.
- the quantity proportion feature is the proportion of the first user object number of the different interest tags in each user object group, and the proportion of the second user object number of the different interest tags in the user pair set.
- the server calculates the target group index corresponding to each interest tag of each user object group based on the calculated proportion of the first user object number and the second user object number ratio.
- Step S208 Based on the user objects with interest tags in each user object group, the interest tags are screened according to the target group index corresponding to each interest tag of each user object group.
- the server screens the interest tags according to the user objects with interest tags in the user target group and the target group index corresponding to each interest tag in the user target group, so as to filter out matching Conditional interest tag.
- Step S210 Use the interest tags obtained through screening as the group interest tags of the corresponding user target group.
- the group interest tag refers to the tag that the user target group has a certain type of behavior tendency.
- the server uses the filtered interest tags as the group interest tag of the corresponding user target group to represent that the user target group has a behavior tendency corresponding to the group interest tag .
- clustering is performed based on the user attributes of the user objects in the user object set, thereby obtaining a user object group composed of similar user objects.
- the target group index determined based on the characteristics of the number of user objects with each interest tag in each user object group can reflect the number of user objects of each interest tag in each user object group and the users of each interest tag in the user object set Based on the relationship between the number of objects and the proportion of objects, each user can be assigned an accurate group interest label based on the target group index.
- the method before clustering according to the user attributes of the user objects in the user object set, the method further includes the following steps: determining user objects with interest tags in the user object set; according to the corresponding user objects with interest tags The interest tags and the user attributes corresponding to the user objects with interest tags are selected, the user objects with the wrong interest tags are filtered; the wrong interest tags of the selected user objects are removed, and the corresponding user objects with no interest tags are obtained.
- the wrong interest tag refers to a tag that does not conform to the behavior type tendency of the user object itself, and a tag that does not conform to the user attribute of the user object.
- the server needs to perform data processing on the user objects with interest tags to delete the wrong interest tags of the user objects, thereby Mark user objects with incorrect interest tags as user objects with no interest tags.
- the server determines the user objects with interest tags in the obtained user object set; and then filters out the users according to the interest tags corresponding to each user object with interest tags and the user attributes corresponding to each user object with interest tags The user object with the wrong interest tag in the object collection.
- the server removes the wrong interest tags carried by the selected user objects, thereby turning users with wrong interest tags into user objects with no interest tags.
- screening user objects with wrong interest tags includes the following steps: The interest tag corresponding to the tagged user object and the user attributes corresponding to each user object with interest tag determine the mutual information of each user object with interest tag; based on the user object with interest tag in each user object group , According to the mutual information corresponding to each user object, filter user objects with wrong interest tags.
- mutual information is a measure of the mutual dependence between variable parameters, which can measure the correlation between two variables.
- mutual information of two discrete random variables X and Y can be defined as formula (1):
- p(x, y) is the joint probability density function of X and Y
- p(x) and p(y) are the edge probability density functions of X and Y, respectively.
- the server obtains the edge probability density function and joint probability density corresponding to each interest tag and user attribute based on the interest tag corresponding to each user object with interest tag and the user attribute corresponding to each user object with interest tag.
- Function The probability density function basically obtained by the server, and the mutual information of each user object with interest tag is calculated. According to the calculated mutual information, a corresponding relationship is established between each user object in each user object group and the corresponding mutual information, and the corresponding relationship is stored in the server.
- the server obtains the corresponding mutual information stored in the server based on the user objects with interest tags in each user object group, and screens the users with wrong interest tags in each user object group according to the obtained mutual information of the user objects. User object.
- the mutual information of each interest tag is calculated, and users are screened according to the mutual information of interest tag User objects with wrong interest tags in the object set improve the accuracy of clustering.
- the user object set includes each user object, and the user object includes user identification; clustering according to the user attributes of the user objects in the user object set to obtain the user object group includes the following steps: in the user object set, according to User attributes select multiple user objects as initial cluster centers; for each user object to be clustered except for the cluster center in the user object set, calculate the similarity with each cluster center according to the corresponding user attributes; Each user object to be clustered is divided into the cluster cluster to which the closest cluster center belongs according to the corresponding similarity; the cluster center of each cluster cluster is recalculated, and when the cluster stop condition is not met, it returns to the user For each user object to be clustered except for the cluster center in the object set, the step of calculating the similarity with each cluster center according to the corresponding user attribute, until the cluster stop condition is met, the user object group is obtained.
- the cluster centers are clusters of similar user objects that are aggregated into a cluster, and several centers of the cluster are the cluster centers.
- the initial cluster center can be a randomly selected user object.
- the clustering stop condition can be that the user objects that do not have or reach the preset number are reassigned to different clusters; it can also be that the cluster centers that do not have or reach the preset number are changed; it can also be every cluster in each cluster.
- the sum of the clusters from each user object to the cluster center reaches the preset threshold range.
- Similarity is a measure for evaluating similar programs between two user objects. The distance, correlation coefficient and cosine angle between two objects can be used to calculate the similarity. The greater the similarity, the closer the two user objects are.
- the server selects multiple user objects as initial cluster centers according to the user attributes of the user objects.
- the user object set includes the user objects at the cluster centers and the user objects to be clustered apart from the cluster centers.
- the user object of the class calculates the similarity between each user object to be clustered and each cluster center according to the corresponding user attributes, that is, calculates the difference between the user attributes corresponding to each user object to be clustered and the user attributes corresponding to each cluster center. Similarity. Based on the calculated similarity, the server obtains the cluster center corresponding to the minimum similarity of each user object to be clustered, and divides the user objects to be clustered into the cluster center corresponding to the corresponding minimum similarity. Clustering clusters.
- the server When the user objects to be clustered are divided into the corresponding cluster clusters, the server recalculates the cluster center of each cluster cluster. If the cluster stop condition is not met, it returns the cluster center except for the user object set For each user object to be clustered, the step of calculating the similarity with each cluster center according to the corresponding user attribute, until the cluster stop condition is met, the user object group is obtained. If the clustering stop condition is met, the clustering is stopped, and the user target group is obtained.
- the server recalculates the cluster center of each cluster, if at this time the updated cluster center is compared with the cluster center before the previous update, there is no or a preset number of cluster centers change , Then stop clustering and get the user target group.
- the server obtains the number of user objects to be clustered, if the number of user objects to be clustered at this time is the same as the number to be clustered in the previous clustering. If the number of user objects to be clustered that does not have or reaches the preset number is reassigned to different clusters, the clustering is stopped, and the user object group is obtained.
- cluster clusters are divided according to the similarity between the user attributes of the user object and each cluster center, so as to obtain each user object group, so that the subsequent generation of group interest tags can be Based on the generated user target group.
- determining the target group index corresponding to each interest tag of each user object group according to the characteristics of the number of user objects with each interest tag in each user object group includes the following steps: The proportion of the number of first user objects of different interest tags in the user object group; calculate the proportion of the number of second user objects of the different interest tags in the user object set; for each interest tag of each user object group, according to the corresponding The proportion of the number of first user objects and the proportion of the number of second user objects are calculated, and the target group index corresponding to each interest tag of each user object group is calculated.
- the first proportion of the number of user objects refers to the proportion of the number of users with the same interest tag to the number of group users corresponding to the user target group in the user target group.
- the second user object proportion refers to the proportion of the total number of users corresponding to the same interest tag to the total number of users in the corresponding user object set in the user object set.
- the server calculates the proportion of the first user object number of different interest tags in each user object group.
- the server obtains the number of users corresponding to the same interest tag and the number of group users of the corresponding user target group; based on the number of users and the number of group users, calculates the first user of each user target group with different interest tags Object proportion.
- the server obtains the total number of users corresponding to the same interest tag and the total number of users in the user object set, and calculates the number of second user objects for different interest tags in the user object set based on the total number of users and the total number of users proportion.
- the server obtains the calculated first user object number proportion and second user object number proportion corresponding to the interest tag, and according to the obtained corresponding first user object number proportion and first user object number proportion. 2. The proportion of the number of user objects, and the target group index corresponding to each interest tag of each user object group is calculated.
- the proportion of the number of first user objects is positively correlated with the number of users with the same interest tag, and is negatively correlated with the number of group users of the corresponding user object group.
- the proportion of the second number of user objects is positively correlated with the total number of users corresponding to the same interest tag, and is negatively correlated with the total number of users in the corresponding user object set.
- the corresponding first user object number proportion is divided by the corresponding second user object number proportion to obtain each interest tag of each user object group.
- the target group index corresponding to the interest tag is positively correlated with the proportion of the number of first user objects, and negatively correlated with the number of second user objects.
- the target group corresponding to each interest tag of each user object group is calculated according to the corresponding proportion of the number of first user objects and the number of second user objects.
- the index can reflect the relationship between the number of user objects of each interest tag in each user object group and the number of user objects of each interest tag in the user object collection, so that each user object group can be given accurate information based on the target group index.
- Group interest tag is
- filtering interest tags according to the target group index corresponding to each interest tag of each user object group includes the following steps: For each user target group determined by the user target of the label, the target group index greater than or equal to the corresponding preset threshold is screened out; according to the screened target group index, the interest label corresponding to the target group index is used as the candidate label; The tag determines the interest tag of each user target group.
- the preset threshold is a threshold value for judging the target group index set in advance, and the threshold can be stored in a database; the preset threshold is a threshold value for the target group index corresponding to each user target group.
- the server compares the target group index corresponding to each interest tag in each user object group with a preset threshold, so as to filter out each user The target group index in the target group that is greater than or equal to the corresponding preset threshold.
- the server also uses the interest label corresponding to the target group index as a candidate label according to the selected target group index. There may be multiple candidate labels; further, it selects the interest label of each user target group from the candidate labels.
- the interest tags are filtered according to the target group index corresponding to each interest tag of each user object group, and each user object group can be quickly screened out in batches. Tag of interest.
- steps in the flowchart of FIG. 2 are displayed in sequence as indicated by the arrows, these steps are not necessarily performed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in FIG. 2 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. The execution of these sub-steps or stages The sequence is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
- a device 300 for generating a group interest tag including: a user object acquisition module 302, a user object clustering module 304, a target group index determination module 306, and an interest tag screening module 308 and the group interest tag determination module 310, where:
- the user object acquisition module 302 is configured to acquire a user object collection, and the user object collection includes user objects with interest tags and user objects with no interest tags.
- the user object clustering module 304 is configured to perform clustering according to the user attributes of the user objects in the user object set to obtain a user object group.
- the target group index determination module 306 is configured to determine the target group index corresponding to each interest tag of each user target group according to the quantitative characteristics of the user objects with each interest tag in each user target group.
- the interest tag screening module 308 is configured to filter interest tags based on the user objects with interest tags in each user object group according to the target group index corresponding to each interest tag of each user object group.
- the group interest tag determination module 310 is configured to use the filtered interest tags as the group interest tags of the corresponding user target group.
- the above-mentioned group interest tag generation device further includes: a tag user object determination module 312, an error tag screening module 314, and an error tag removal module 316, wherein:
- the tag user object determination module 312 is configured to determine user objects with interest tags in the user object set.
- the wrong tag screening module 314 is used for screening user objects with wrong interest tags based on the interest tags corresponding to each user object with interest tags and the user attributes corresponding to each user object with interest tags.
- the wrong label removal module 316 is used to remove wrong interest labels carried by the selected user objects to obtain corresponding user objects with no interest labels.
- the above-mentioned error label screening module includes: a mutual information calculation module and a mutual information screening module.
- Mutual information calculation module used to determine the mutual information of each user object with interest tag based on the interest tag corresponding to each user object with interest tag and the user attribute corresponding to each user object with interest tag; mutual information screening The module is used to screen user objects with wrong interest tags based on the user objects with interest tags in each user object group according to the mutual information corresponding to each user object.
- the above-mentioned user object clustering module includes: a cluster center selection module, a similarity calculation module, a user segmentation module to be clustered, and a user object group acquisition module.
- the cluster center selection module is used to select a plurality of user objects as initial cluster centers in the user object set according to user attributes; the similarity calculation module is used to determine each user object set except for the cluster center.
- Clustered user objects calculate the similarity with each cluster center according to the corresponding user attributes; to-be-clustered user division module is used to divide each to-be-clustered user object into the closest cluster according to the corresponding similarity
- the cluster cluster to which the center belongs; the user object group acquisition module is used to recalculate the cluster center of each cluster cluster. When the cluster stop condition is not met, it returns the user object set except the cluster center.
- the clustered user objects calculate the similarity with each cluster center according to the corresponding user attributes, until the cluster stop condition is met, the user object group is obtained.
- the above-mentioned target group index determination module includes: a first user target number proportion, a second user target number proportion, and a target group index calculation module.
- the first user object number ratio is used to calculate the first user object number ratio of different interest tags in each user object group; the second user object number ratio is used to calculate the respective first user object number ratios of different interest tags in the user object set 2.
- the proportion of the number of user objects; the target group index calculation module is used to calculate each user object group according to the proportion of the first user object number and the proportion of the second user object number for each interest tag of each user object group The target group index corresponding to each interest tag of.
- the aforementioned target group index calculation module includes a target group index calculation unit.
- the target group index calculation unit is used for each interest tag of each user target group, respectively dividing the corresponding first user target number proportion by the corresponding second user target number proportion to obtain each user target group The target group index corresponding to the interest tag.
- the aforementioned interest tag screening module includes a target group index screening module, a candidate tag determination module, and an interest tag determination module.
- the target group index screening module is used to screen out the target group indexes that are greater than or equal to the corresponding preset threshold based on the user object groups identified by the user objects with interest tags in the user object set; the candidate tag determination module is used for The selected target group index uses the interest tag corresponding to the target group index as the candidate tag; the interest tag determination module is used to determine the interest tag of each user target group based on the candidate tag.
- clustering is performed based on the user attributes of the user objects in the user object set, so as to obtain a user object group composed of similar user objects.
- the target group index determined based on the characteristics of the number of user objects with each interest tag in each user object group can reflect the number of user objects of each interest tag in each user object group and the users of each interest tag in the user object set Based on the relationship between the number of objects and the proportion of objects, each user can be assigned an accurate group interest label based on the target group index.
- Each module in the above-mentioned device for generating group interest tags can be implemented in whole or in part by software, hardware, and a combination thereof.
- the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
- a computer device is provided.
- the computer device may be a server, and its internal structure diagram may be as shown in FIG. 5.
- the computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
- the memory of the computer device includes a non-volatile storage medium and an internal memory.
- the non-volatile storage medium stores an operating system, computer readable instructions, and a database.
- the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
- the database of the computer equipment is used to store user object collection data.
- the network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer readable instructions are executed by the processor, a method for generating group interest tags is realized.
- FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
- the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
- a computer device including a memory and one or more processors.
- the memory stores computer-readable instructions.
- the computer-readable instructions are executed by one or more processors, one or more Each processor executes the steps provided in the foregoing embodiments.
- one or more non-volatile computer-readable storage media storing computer-readable instructions are provided.
- the computer-readable instructions are executed by one or more processors, the one or more processors Perform the steps provided in each of the above embodiments.
- Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
- Volatile memory may include random access memory (RAM) or external cache memory.
- RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种群体兴趣标签的生成方法,包括:获取用户对象集合,用户对象集合包括带兴趣标签的用户对象和无兴趣标签的用户对象(S202);按用户对象集合中用户对象的用户属性进行聚类,得到用户对象群体(S204);根据每个用户对象群体中带各个兴趣标签的用户对象的数量比重特征,确定每个用户对象群体的每个兴趣标签对应的目标群体指数(S206);基于每个用户对象群体中有兴趣标签的用户对象,按照每个用户对象群体的每个兴趣标签对应的目标群体指数筛选兴趣标签(S208);将筛选得到的兴趣标签作为相应用户对象群体的群体兴趣标签(S210)。
Description
本申请要求于2019年07月23日提交中国专利局,申请号为2019106660760,申请名称为“群体兴趣标签的生成方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及一种群体兴趣标签的生成方法、装置、计算机设备和存储介质。
随着互联网的发展和应用,个性化推荐、多样化营销等差异化服务在人们生活中得到了广泛应用,而这些差异化服务离不开用户画像。用户画像的核心工作是给用户生成兴趣标签。通过对用户进行标签化工作,可以从宏观角度对用户行为进行分析和预测,有助于提升企业针对特定用户的营销行为的精准度。
目前,生成用户画像的兴趣标签的方式,都是针对特定的单个用户生成兴趣标签,难以对群体用户提供准确的兴趣标签。
发明内容
根据本申请公开的各种实施例,提供一种兴趣标签的生成方法、装置、计算机设备和存储介质。
一种群体兴趣标签的生成方法,所述方法包括:
获取用户对象集合,所述用户对象集合包括带兴趣标签的用户对象和无兴趣标签的用户对象;
按所述用户对象集合中用户对象的用户属性进行聚类,得到用户对象群体;
根据每个所述用户对象群体中带各个兴趣标签的用户对象的数量比重特征,确定每个用户对象群体的每个兴趣标签对应的目标群体指数;
基于每个用户对象群体中有兴趣标签的用户对象,按照每个用户对象群体的每个兴趣标签对应的目标群体指数筛选兴趣标签;
将筛选得到的兴趣标签作为相应用户对象群体的群体兴趣标签。
一种群体兴趣标签的生成装置,所述装置包括:
用户对象获取模块,用于获取用户对象集合,所述用户对象集合包括带兴趣标签的用户对象和无兴趣标签的用户对象;
用户对象聚类模块,用于按所述用户对象集合中用户对象的用户属性进行聚类,得到用户对象群体;
目标群体指数确定模块,用于根据每个所述用户对象群体中带各个兴趣标签的用户对象的数量比重特征,确定每个用户对象群体的每个兴趣标签对应的目标群体指数;
兴趣标签筛选模块,用于基于每个用户对象群体中有兴趣标签的用户对象,按照每个用户对象群体的每个兴趣标签对应的目标群体指数筛选兴趣标签;及
群体兴趣标签确定模块,用于将筛选得到的兴趣标签作为相应用户对象群体的群体兴趣标签。
一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
获取用户对象集合,所述用户对象集合包括带兴趣标签的用户对象和无兴趣标签的用户对象;
按所述用户对象集合中用户对象的用户属性进行聚类,得到用户对象群体;
根据每个所述用户对象群体中带各个兴趣标签的用户对象的数量比重特征,确定每个用户对象群体的每个兴趣标签对应的目标群体指数;
基于每个用户对象群体中有兴趣标签的用户对象,按照每个用户对象群体的每个兴趣标签对应的目标群体指数筛选兴趣标签;及
将筛选得到的兴趣标签作为相应用户对象群体的群体兴趣标签。
一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
获取用户对象集合,所述用户对象集合包括带兴趣标签的用户对象和无兴趣标签的用户对象;
按所述用户对象集合中用户对象的用户属性进行聚类,得到用户对象群 体;
根据每个所述用户对象群体中带各个兴趣标签的用户对象的数量比重特征,确定每个用户对象群体的每个兴趣标签对应的目标群体指数;
基于每个用户对象群体中有兴趣标签的用户对象,按照每个用户对象群体的每个兴趣标签对应的目标群体指数筛选兴趣标签;及
将筛选得到的兴趣标签作为相应用户对象群体的群体兴趣标签。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为根据一个或多个实施例中群体兴趣标签的生成方法的应用场景图。
图2为根据一个或多个实施例中群体兴趣标签的生成方法的流程示意图。
图3为另一个实施例中群体兴趣标签的生成方法的流程示意图。
图4为根据一个或多个实施例中群体兴趣标签的生成装置的结构框图。
图5为根据一个或多个实施例中计算机设备的框图。
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的群体兴趣标签的生成方法,可以应用于如图1所示的应用环境中。其中,终端102通过网络与服务器104通过网络进行通信。服务器104获取用户对象集合,其中用户对象集合可以是由终端102触发产生的;并基于用户对象集合中用户对象的用户属性进行聚类,得到相类似的用户对象所构成的用户对象群体;服务器104还根据每个用户对象群体中带各个兴趣标签的用户对象的数量比重特征确定目标群体指数,并基于目标群体指数 确定每个用户对象群体的群体兴趣标签。其中,终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一个实施例中,如图2所示,提供了一种群体兴趣标签的生成方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:
步骤S202,获取用户对象集合,用户对象集合包括带兴趣标签的用户对象和无兴趣标签的用户对象。
其中,用户对象集合包括各个用户对象,有带兴趣标签的用户对象和无兴趣标签的用户对象。用户对象集合包含了丰富的信息,比如用户对象之间的相似性、用户对象与兴趣标签的关系。兴趣标签是指区别于用户对象具有某类行为类型的倾向的标记;比如,用户对象经常使用视频类应用程序,相应的该用户对象的兴趣标签可以是视频。
具体地,终端被用户触发生成各个用户对象的用户对象集合,并将生成的用户对象集和通过网络传输给服务器,也可以将用户对象集合直接存储在终端自身设备中。服务器可以从各个终端中获取用户对象集合,也可以从服务器中获取用户对象集合。
步骤S204,按用户对象集合中用户对象的用户属性进行聚类,得到用户对象群体。
其中,用户属性是指用户对象的基础信息,包括性别、学历、出生城市等级、现居地城市等级、是否有车产以及可支配财富值等级。用户对象群体是相类似的用户对象所构成的一个用户对象集合,用户对象群体包括各个相类似的用户对象。
具体地,服务器基于获取到的用户对象集合,根据各个用户对象查找对应的用户属性,该用户属性可以存储在数据库中,也可以存储在与用户对象各自对应的终端中;基于查找到的用户对象集合中用户对象的用户属性进行聚类,以此得到用户对象群体。
可选地,服务器获取到用户对象集合,该用户对象集合包括各个用户对象,每个用户对象包括用户属性和兴趣标签;并直接根据获取到的用户对象集合中各个用户对象的用户属性进行聚类,从而得到用户对象群体。
步骤S206,根据每个用户对象群体中带各个兴趣标签的用户对象的数量比重特征,确定每个用户对象群体的每个兴趣标签对应的目标群体指数。
其中,目标群体指数(Target Group Index,TGI),是反映用户对象群 体中用户对象的用户属性在特定范围(如地理区域、人口统计领域、媒体受众、产品消费者)内的强势或弱势。例如,若TGI指数为100时,表示平均水平;若高于100时,代表该用户对象群体具有某类行为类型的倾向程度高于整体水平。
具体地,基于得到的各个用户对象群体,每个用户对象群体中包括带兴趣标签的用户对象以及无兴趣标签的用户对象。服务器根据每个用户对象群体中各个带兴趣标签的用户对象的数量比重特征,计算每个用户对象群体的各个兴趣标签对应的目标群体指数。
可选地,数量比重特征是每个用户对象群体中不同兴趣标签各自的第一用户对象数比重,以及在该用户对集合中不同兴趣标签各自的第二用户对象数比重。服务器根据计算得到的第一用户对象数比重和第二用户对象数比重,计算每个用户对象群体的各个兴趣标签对应的目标群体指数。
步骤S208,基于每个用户对象群体中有兴趣标签的用户对象,按照每个用户对象群体的每个兴趣标签对应的目标群体指数筛选兴趣标签。
具体地,对于每个用户对象群体,服务器根据用户对象群体中带兴趣标签的用户对象,按照与该用户对象群体中每个兴趣标签所对应的目标群体指数来筛选兴趣标签,以此筛选出符合条件的兴趣标签。
步骤S210,将筛选得到的兴趣标签作为相应用户对象群体的群体兴趣标签。
其中,群体兴趣标签是指用户对象群体具有某类行为类型倾向的标记。
具体地,基于各个用户对象群体筛选出的兴趣标签,服务器将该筛选出的兴趣标签作为与之对应的用户对象群体的群体兴趣标签,表征该用户对象群体具由与群体兴趣标签对应的行为倾向。
上述实施例中,基于用户对象集合中用户对象的用户属性进行聚类,从而得到相类似的用户对象所构成的用户对象群体。基于每个用户对象群体中带各个兴趣标签的用户对象的数量比重特征确定的目标群体指数,可以反映各个用户对象群体中各个兴趣标签的用户对象的数量比重与用户对象集合中各个兴趣标签的用户对象的数量比重的关系,从而可基于该目标群体指数赋予每个用户对象群体准确的群体兴趣标签。
在一个实施例中,按用户对象集合中用户对象的用户属性进行聚类之前,方法还包括以下步骤:确定用户对象集合中的带兴趣标签的用户对象;根据各带兴趣标签的用户对象所对应的兴趣标签以及各带兴趣标签的用户对象所 对应的用户属性,筛选带错误兴趣标签的用户对象;将筛选出的用户对象所带的错误兴趣标签去除,得到相应的无兴趣标签的用户对象。
其中,错误兴趣标签是指与用户对象本身具有的行为类型倾向不符合的标记,与用户对象的用户属性不符合的标记。
具体地,在获取到用户对象集合后,并在对用户对象集合中用户对象的用户属性进行聚类前,服务器需对带兴趣标签的用户对象进行数据处理,删除用户对象的错误兴趣标签,从而将带错误兴趣标签的用户对象标记为无兴趣标签的用户对象。服务器确定获取到的用户对象集合中带兴趣标签的用户对象;进而根据各个带兴趣标签的用户对象所对应的兴趣标签以及各个带兴趣标签的用户对象所对应的用户属性,以此筛选出该用户对象集合中的带错误兴趣标签的用户对象。服务器将筛选出的用户对象所带的错误兴趣标签去除,从而将带错误兴趣标签的用户变成无兴趣标签的用户对象。
在本实施例中,在对用户对象集合进行聚类前,基于各带兴趣标签的用户对象所对应的兴趣标签以及各带兴趣标签的用户对象所对应的用户属性,筛选出带错误兴趣标签的用户对象,给后续聚类提供更为准确的聚类样本,减少了聚类样本的数量、使得聚类模型的泛化能力更强,减少过拟合,从而提高了聚类的准确率。
在一个实施例中,根据各带兴趣标签的用户对象所对应的兴趣标签以及各带兴趣标签的用户对象所对应的用户属性,筛选带错误兴趣标签的用户对象,包括以下步骤:根据各带兴趣标签的用户对象所对应的兴趣标签以及各带兴趣标签的用户对象所对应的用户属性,确定每个带兴趣标签的用户对象的互信息;基于每个用户对象群体中的带兴趣标签的用户对象,按照每个用户对象对应的互信息筛选带错误兴趣标签的用户对象。
其中,互信息是变量参数间相互依赖性的量度,可以度量两个变量间的相关性。例如,两个离散随机变量X和Y的互信息可以定义为公式(1):
其中,p(x,y)是X和Y的联合概率密度函数,而p(x)和p(y)分别是X和Y的边缘概率密度函数。
具体地,服务器基于各个带兴趣标签的用户对象所对应的兴趣标签以及各个带兴趣标签的用户对象所对应的用户属性,得到每个兴趣标签和用户属性各自对应的边缘概率密度函数以及联合概率密度函数;服务器基本得到的概率密度函数,计算得到每个带兴趣标签的用户对象的互信息。根据计算得 到的互信息,将每个用户对象群体中各个用户对象与相应的互信息建立对应关系,并将对应关系存储在服务器中。服务器基于每个用户对象群体中的带兴趣标签的用户对象,根据用户对象获取存储在服务器中对应的互信息,并根据获取到的用户对象的互信息筛选各个用户对象群体中带错误兴趣标签的用户对象。
在本实施例中,基于带兴趣标签的用户对象所对应的兴趣标签以及各带兴趣标签的用户对象所对应的用户属性,计算每个兴趣标签的互信息,根据兴趣标签的互信息来筛选用户对象集合中带错误兴趣标签的用户对象,提高了聚类的准确率。
在一个实施例中,用户对象集合包括各个用户对象,用户对象包括用户标识;按用户对象集合中用户对象的用户属性进行聚类,得到用户对象群体,包括以下步骤:在用户对象集合中,按照用户属性选取多个作为初始的聚类中心的用户对象;对于用户对象集合中除聚类中心的每个待聚类的用户对象,按照相应用户属性计算与每个聚类中心的相似度;将每个待聚类的用户对象按照相应相似度划分到最接近的聚类中心所属的聚类簇;重新计算每个聚类簇的聚类中心,当不满足聚类停止条件时,返回对于用户对象集合中除聚类中心的每个待聚类的用户对象,按照相应用户属性计算与每个聚类中心的相似度的步骤,直至满足聚类停止条件时,得到用户对象群体。
其中,聚类中心是具有相类似的用户对象聚合成一个聚类簇,该聚类簇的几个中心为聚类中心。初始的聚类中心可以是随机选取的用户对象。聚类停止条件可以是没有或达到预设数目的用户对象被重新分配给不同的聚类簇;也可以是没有或达到预设数目的聚类中心发生变化;也可以是各个聚类簇中每个用户对象到聚类中心的聚类之和达到预设阈值范围。相似度是评定两个用户对象之间相近程序的一种度量,可以采用两个对象之间的距离、相关系数和余弦角度来计算相似度。相似度越大,说明两个用户对象越接近。
具体地,根据获取的用户对象集合,服务器按照用户对象的用户属性选取多个用户对象作为初始的聚类中心,该用户对象集合包括了聚类中心的用户对象以及除聚类中心外的待聚类的用户对象。服务器按照相应用户属性,计算每个待聚类的用户对象与各个聚类中心的相似度,即计算每个待聚类的用户对象所对应的用户属性与各个聚类中心所对应的用户属性的相似度。基于计算得到的相似度,服务器获取每个待聚类的用户对象的最小相似度所对应的聚类中心,将待聚类的用户对象划分到对应的最小相似度所对应的聚类 中心所属的聚类簇中。当待聚类的用户对象都划分到相应的聚类簇时,服务器重新计算每个聚类簇的聚类中心,若不满足聚类停止条件时,则返回对于用户对象集合中除聚类中心的每个待聚类的用户对象,按照相应用户属性计算与每个聚类中心的相似度的步骤,直至满足聚类停止条件时,得到用户对象群体。若满足聚类停止条件时,则停止聚类,得到用户对象群体。
可选地,服务器重新计算每个聚类簇的聚类中心,若此时更新后的聚类中心与前一次更新前的聚类中心相比,没有或达到预设数目的聚类中心发生改变,则停止聚类,得到用户对象群体。
可选地,当待聚类的用户对象都划分到相应的聚类簇时,服务器获取待聚类的用户对象数,若此时的待聚类数与上一次聚类的待聚类数相比,没有或达到预设数目的待聚类的用户对象数被重新分配给不同的聚类簇,则停止聚类,得到用户对象群体。
在本实施例中,基于用户对象集合中各个用户对象,按照用户对象的用户属性与各个聚类中心的相似度来划分聚类簇,从而得到各个用户对象群体,使得后续群体兴趣标签的生成可以基于生成的用户对象群体进行。
在一个实施例中,根据每个用户对象群体中带各个兴趣标签的用户对象的数量比重特征,确定每个用户对象群体的每个兴趣标签对应的目标群体指数,包括以下步骤:计算在每个用户对象群体中不同兴趣标签各自的第一用户对象数比重;计算在用户对象集合中不同兴趣标签各自的第二用户对象数比重;对于每个用户对象群体的每个兴趣标签,分别按照相应的第一用户对象数比重和第二用户对象数比重,计算每个用户对象群体的每个兴趣标签对应的目标群体指数。
其中,第一用户对象数比重是指在用户对象群体中,相同兴趣标签的用户数与对应用户对象群体的群体用户数的比重。第二用户对象比重是指在用户对象集合中,相同兴趣标签所对应的用户总数与对应用户对象集合的总用户数的比重。
具体地,基于各个用户对象群体的每个用户对象对应的兴趣标签,服务器计算在每个用户对象群体中不同兴趣标签的第一用户对象数比重。在各个用户对象群体中,服务器获取各个属于同一兴趣标签所对应的用户数以及相应用户对象群体的群体用户数;基于用户数以及群体用户数计算每个用户对象群体中不同兴趣标签的第一用户对象比重。在用户对象集合中,服务器获取各个属于同一兴趣标签所对应的用户总数以及用户对象集合的总用户数, 基于该用户总数和总用户数计算用户对象集合中不同兴趣标签各自的第二用户对象数比重。对于每个用户对象群体的每个兴趣标签,服务器获取计算得到的与兴趣标签相应的第一用户对象数比重以及第二用户对象数比重,根据获取到的相应的第一用户对象数比重以及第二用户对象数比重,计算每个用户对象群体的每个兴趣标签对应的目标群体指数。
在其中一个实施例中,第一用户对象数比重与相同兴趣标签的用户数正相关,且与相应用户对象群体的群体用户数成负相关。第二用户对象数比重与相同兴趣标签所对应的用户总数成正相关,且与相应用户对象集合的总用户数成负相关。
在其中一个实施例中,对于每个用户对象群体的每个兴趣标签,分别用相应的第一用户对象数比重除以相应的第二用户对象数比重,得到每个用户对象群体的每个兴趣标签对应的目标群体指数。兴趣标签对应的目标群体指数与第一用户对象数比重正相关,且与第二用户对象数成负相关。
在本实施例中,针对每个用户对象群体的每个兴趣标签,按照相应的第一用户对象数比重和第二用户对象数比重,计算每个用户对象群体的每个兴趣标签对应的目标群体指数,可以反映各个用户对象群体中各个兴趣标签的用户对象的数量比重与用户对象集合中各个兴趣标签的用户对象的数量比重的关系,从而可基于该目标群体指数赋予每个用户对象群体准确的群体兴趣标签。
在一个实施例中,基于每个用户对象群体中有兴趣标签的用户对象,按照每个用户对象群体的每个兴趣标签对应的目标群体指数筛选兴趣标签,包括以下步骤:基于用户集合中带兴趣标签的用户对象所确定的各个用户对象群体,分别筛选出大于或等于相应预设阈值的目标群体指数;根据筛选出的目标群体指数,将目标群体指数所对应的兴趣标签作为候选标签;基于候选标签确定每个用户对象群体的兴趣标签。
其中,预设阈值是提前设定的判断目标群体指数的界限值,阈值可以存储在数据库中;预设阈值是与各用户对象群体对应的目标群体指数的界限值。
具体地,基于用户对象集合中带兴趣标签的用户对象所确定的各个用户对象群体,服务器将各个用户对象群体中各个兴趣标签对应的目标群体指数与预设阈值进行比较,以此筛选出各个用户对象群体中大于或等于相应预设阈值的目标群体指数。服务器还根据筛选出的目标群体指数,将该目标群体指数所对应的兴趣标签作为候选标签,候选标签可以有多个;进一步,从候 选标签中筛选中每个用户对象群体的兴趣标签。
在本实施例中,基于每个用户对象群体中有兴趣标签的用户对象,按照每个用户对象群体的每个兴趣标签对应的目标群体指数筛选兴趣标签,可快速批量的筛选出各个用户对象群体的兴趣标签。
应该理解的是,虽然图2的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图3所示,提供了一种群体兴趣标签的生成装置300,包括:用户对象获取模块302、用户对象聚类模块304、目标群体指数确定模块306、兴趣标签筛选模块308以及群体兴趣标签确定模块310,其中:
用户对象获取模块302,用于获取用户对象集合,用户对象集合包括带兴趣标签的用户对象和无兴趣标签的用户对象。
用户对象聚类模块304,用于按用户对象集合中用户对象的用户属性进行聚类,得到用户对象群体。
目标群体指数确定模块306,用于根据每个用户对象群体中带各个兴趣标签的用户对象的数量比重特征,确定每个用户对象群体的每个兴趣标签对应的目标群体指数。
兴趣标签筛选模块308,用于基于每个用户对象群体中有兴趣标签的用户对象,按照每个用户对象群体的每个兴趣标签对应的目标群体指数筛选兴趣标签。
群体兴趣标签确定模块310,用于将筛选得到的兴趣标签作为相应用户对象群体的群体兴趣标签。
在一个实施例中,如图4所示,上述群体兴趣标签的生成装置还包括:标签用户对象确定模块312、错误标签筛选模块314以及错误标签去除模块316,其中:
标签用户对象确定模块312,用于确定用户对象集合中的带兴趣标签的用户对象。
错误标签筛选模块314,用于根据各带兴趣标签的用户对象所对应的兴趣标签以及各带兴趣标签的用户对象所对应的用户属性,筛选带错误兴趣标签的用户对象。
错误标签去除模块316,用于将筛选出的用户对象所带的错误兴趣标签去除,得到相应的无兴趣标签的用户对象。
在一个实施例中,上述错误标签筛选模块包括:互信息计算模块和互信息筛选模块。互信息计算模块,用于根据各带兴趣标签的用户对象所对应的兴趣标签以及各带兴趣标签的用户对象所对应的用户属性,确定每个带兴趣标签的用户对象的互信息;互信息筛选模块,用于基于每个用户对象群体中的带兴趣标签的用户对象,按照每个用户对象对应的互信息筛选带错误兴趣标签的用户对象。
在一个实施例中,上述用户对象聚类模块包括:聚类中心选取模块、相似度计算模块、待聚类用户划分模块和用户对象群体获取模块。聚类中心选取模块,用于在用户对象集合中,按照用户属性选取多个作为初始的聚类中心的用户对象;相似度计算模块,用于对于用户对象集合中除聚类中心的每个待聚类的用户对象,按照相应用户属性计算与每个聚类中心的相似度;待聚类用户划分模块,用于将每个待聚类的用户对象按照相应相似度划分到最接近的聚类中心所属的聚类簇;用户对象群体获取模块,用于重新计算每个聚类簇的聚类中心,当不满足聚类停止条件时,返回对于用户对象集合中除聚类中心的每个待聚类的用户对象,按照相应用户属性计算与每个聚类中心的相似度的步骤,直至满足聚类停止条件时,得到用户对象群体。
在一个实施例中,上述目标群体指数确定模块包括:第一用户对象数比重、第二用户对象数比重和目标群体指数计算模块。第一用户对象数比重,用于计算在每个用户对象群体中不同兴趣标签各自的第一用户对象数比重;第二用户对象数比重,用于计算在用户对象集合中不同兴趣标签各自的第二用户对象数比重;目标群体指数计算模块,用于对于每个用户对象群体的每个兴趣标签,分别按照相应的第一用户对象数比重和第二用户对象数比重,计算每个用户对象群体的每个兴趣标签对应的目标群体指数。
在一个实施例中,上述目标群体指数计算模块包括目标群体指数计算单元。目标群体指数计算单元,用于对于每个用户对象群体的每个兴趣标签,分别用相应的第一用户对象数比重除以相应的第二用户对象数比重,得到每个用户对象群体的每个兴趣标签对应的目标群体指数。
在一个实施例中,上述兴趣标签筛选模块包括目标群体指数筛选模块、候选标签确定模块和兴趣标签确定模块。目标群体指数筛选模块,用于基于用户对象集合中带兴趣标签的用户对象所确定的各个用户对象群体,分别筛选出大于或等于相应预设阈值的目标群体指数;候选标签确定模块,用于根据筛选出的目标群体指数,将目标群体指数所对应的兴趣标签作为候选标签;兴趣标签确定模块,用于基于候选标签确定每个用户对象群体的兴趣标签。
在上述实施例中,基于用户对象集合中用户对象的用户属性进行聚类,从而得到相类似的用户对象所构成的用户对象群体。基于每个用户对象群体中带各个兴趣标签的用户对象的数量比重特征确定的目标群体指数,可以反映各个用户对象群体中各个兴趣标签的用户对象的数量比重与用户对象集合中各个兴趣标签的用户对象的数量比重的关系,从而可基于该目标群体指数赋予每个用户对象群体准确的群体兴趣标签。
关于群体兴趣标签的生成装置的具体限定可以参见上文中对于群体兴趣标签的生成方法的限定,在此不再赘述。上述群体兴趣标签的生成装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图5所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储用户对象集合数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种群体兴趣标签的生成方法。
本领域技术人员可以理解,图5中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,提供了一种计算机设备,包括存储器及一个或多个处理器,存储器中储存有计算机可读指令,计算机可读指令被一个或多个处理 器执行时,使得一个或多个处理器执行上述各个实施例中提供的步骤。
在一个实施例中,提供了一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述各个实施例中提供的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。
Claims (20)
- 一种群体兴趣标签的生成方法,包括:获取用户对象集合,所述用户对象集合包括带兴趣标签的用户对象和无兴趣标签的用户对象;按所述用户对象集合中用户对象的用户属性进行聚类,得到用户对象群体;根据每个所述用户对象群体中带各个兴趣标签的用户对象的数量比重特征,确定每个用户对象群体的每个兴趣标签对应的目标群体指数;基于每个用户对象群体中有兴趣标签的用户对象,按照每个用户对象群体的每个兴趣标签对应的目标群体指数筛选兴趣标签;及将筛选得到的兴趣标签作为相应用户对象群体的群体兴趣标签。
- 根据权利要求1所述的方法,其特征在于,在所述按所述用户对象集合中用户对象的用户属性进行聚类之前,所述方法还包括:获取所述用户对象集合中的带兴趣标签的用户对象;根据各所述带兴趣标签的用户对象所对应的兴趣标签以及各所述带兴趣标签的用户对象所对应的用户属性,筛选带错误兴趣标签的用户对象;及将筛选出的用户对象所带的错误兴趣标签去除,得到相应的无兴趣标签的用户对象。
- 根据权利要求2所述的方法,其特征在于,所述根据各所述带兴趣标签的用户对象所对应的兴趣标签以及各所述带兴趣标签的用户对象所对应的用户属性,筛选带错误兴趣标签的用户对象包括:根据各所述带兴趣标签的用户对象所对应的兴趣标签以及各所述带兴趣标签的用户对象所对应的用户属性,得到每个兴趣标签和用户属性各自对应的边缘概率密度函数以及联合概率密度函数;基本所述边缘概率密度函数以及联合概率密度函数,计算得到每个带兴趣标签的用户对象的互信息;及基于每个用户对象群体中的带兴趣标签的用户对象,按照每个用户对象对应的互信息筛选带错误兴趣标签的用户对象。
- 根据权利要求1所述的方法,其特征在于,所述用户对象集合包括各个用户对象,所述用户对象包括用户标识;所述按所述用户对象集合中用户对象的用户属性进行聚类,得到用户对象群体包括:在所述用户对象集合中,按照用户属性选取多个作为初始的聚类中心的 用户对象;对于所述用户对象集合中除聚类中心的每个待聚类的用户对象,按照相应用户属性计算与每个聚类中心的相似度;将所述每个待聚类的用户对象按照相应相似度划分到最接近的聚类中心所属的聚类簇;及重新计算每个聚类簇的聚类中心,当不满足聚类停止条件时,返回所述对于所述用户对象集合中除聚类中心的每个待聚类的用户对象,按照相应用户属性计算与每个聚类中心的相似度的步骤,直至满足聚类停止条件时,得到用户对象群体。
- 根据权利要求4所述的方法,其特征在于,在所述将所述每个待聚类的用户对象按照相应相似度划分到最接近的聚类中心所属的聚类簇之后,所述方法还包括:当待聚类的用户对象都划分到相应的聚类簇时,获取待聚类的用户对象数;及若此时的待聚类数与上一次聚类的待聚类数相比,没有或达到预设数目的待聚类的用户对象数被重新分配给不同的聚类簇,则停止聚类,得到用户对象群体。
- 根据权利要求1所述的方法,其特征在于,所述根据每个所述用户对象群体中带各个兴趣标签的用户对象的数量比重特征,确定每个用户对象群体的每个兴趣标签对应的目标群体指数包括:获取各个属于同一兴趣标签所对应的用户数、相应用户对象群体的群体用户数以及用户对象集合的总用户数;基于所述用户数以及所述群体用户数计算在每个用户对象群体中不同兴趣标签各自的第一用户对象数比重;基于所述用户总数和所述用户对象集合的总用户数计算在所述用户对象集合中不同兴趣标签各自的第二用户对象数比重;及对于每个用户对象群体的每个兴趣标签,分别按照相应的第一用户对象数比重和第二用户对象数比重,计算每个用户对象群体的每个兴趣标签对应的目标群体指数。
- 根据权利要求6所述的方法,其特征在于,所述对于每个用户对象群体的每个兴趣标签,分别按照相应的第一用户对象数比重和相应的第二用户对象数比重,计算每个用户对象群体的每个兴趣标签对应的目标群体指数包 括:获取每个用户对象群体的每个兴趣标签,所述兴趣标签对应的目标群体指数与第一用户对象数比重正相关,且与第二用户对象数成负相关;及用相应的第一用户对象数比重除以相应的第二用户对象数比重,得到每个用户对象群体的每个兴趣标签对应的目标群体指数。
- 根据所述权利要求1至7任一项所述的方法,其特征在于,所述基于每个用户对象群体中有兴趣标签的用户对象,按照每个用户对象群体的每个兴趣标签对应的目标群体指数筛选兴趣标签包括:基于所述用户对象集合中带兴趣标签的用户对象所确定的各个用户对象群体,筛选出大于或等于相应预设阈值的目标群体指数;根据筛选出的所述目标群体指数,将所述目标群体指数所对应的兴趣标签作为候选标签;及基于所述候选标签确定每个用户对象群体的兴趣标签。
- 一种群体兴趣标签的生成装置,包括:用户对象获取模块,用于获取用户对象集合,所述用户对象集合包括带兴趣标签的用户对象和无兴趣标签的用户对象;用户对象聚类模块,用于按所述用户对象集合中用户对象的用户属性进行聚类,得到用户对象群体;目标群体指数确定模块,用于根据每个所述用户对象群体中带各个兴趣标签的用户对象的数量比重特征,确定每个用户对象群体的每个兴趣标签对应的目标群体指数;兴趣标签筛选模块,用于基于每个用户对象群体中有兴趣标签的用户对象,按照每个用户对象群体的每个兴趣标签对应的目标群体指数筛选兴趣标签;及群体兴趣标签确定模块,用于将筛选得到的兴趣标签作为相应用户对象群体的群体兴趣标签。
- 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:获取用户对象集合,所述用户对象集合包括带兴趣标签的用户对象和无兴趣标签的用户对象;按所述用户对象集合中用户对象的用户属性进行聚类,得到用户对象群 体;根据每个所述用户对象群体中带各个兴趣标签的用户对象的数量比重特征,确定每个用户对象群体的每个兴趣标签对应的目标群体指数;基于每个用户对象群体中有兴趣标签的用户对象,按照每个用户对象群体的每个兴趣标签对应的目标群体指数筛选兴趣标签;及将筛选得到的兴趣标签作为相应用户对象群体的群体兴趣标签。
- 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:根据各所述带兴趣标签的用户对象所对应的兴趣标签以及各所述带兴趣标签的用户对象所对应的用户属性,得到每个兴趣标签和用户属性各自对应的边缘概率密度函数以及联合概率密度函数;基本所述边缘概率密度函数以及联合概率密度函数,计算得到每个带兴趣标签的用户对象的互信息;及基于每个用户对象群体中的带兴趣标签的用户对象,按照每个用户对象对应的互信息筛选带错误兴趣标签的用户对象。
- 根据权利要求10所述的计算机设备,其特征在于,所述用户对象集合包括各个用户对象,所述用户对象包括用户标识;所述处理器执行所述计算机可读指令时还执行以下步骤:在所述用户对象集合中,按照用户属性选取多个作为初始的聚类中心的用户对象;对于所述用户对象集合中除聚类中心的每个待聚类的用户对象,按照相应用户属性计算与每个聚类中心的相似度;将所述每个待聚类的用户对象按照相应相似度划分到最接近的聚类中心所属的聚类簇;及重新计算每个聚类簇的聚类中心,当不满足聚类停止条件时,返回所述对于所述用户对象集合中除聚类中心的每个待聚类的用户对象,按照相应用户属性计算与每个聚类中心的相似度的步骤,直至满足聚类停止条件时,得到用户对象群体。
- 根据权利要求12所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:当待聚类的用户对象都划分到相应的聚类簇时,获取待聚类的用户对象数;及若此时的待聚类数与上一次聚类的待聚类数相比,没有或达到预设数目的待聚类的用户对象数被重新分配给不同的聚类簇,则停止聚类,得到用户对象群体。
- 根据权利要求10所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:获取各个属于同一兴趣标签所对应的用户数、相应用户对象群体的群体用户数以及用户对象集合的总用户数;基于所述用户数以及所述群体用户数计算在每个用户对象群体中不同兴趣标签各自的第一用户对象数比重;基于所述用户总数和所述用户对象集合的总用户数计算在所述用户对象集合中不同兴趣标签各自的第二用户对象数比重;及对于每个用户对象群体的每个兴趣标签,分别按照相应的第一用户对象数比重和第二用户对象数比重,计算每个用户对象群体的每个兴趣标签对应的目标群体指数。
- 根据权利要求10-14任一项所述的计算机设备,其特征在于,所述处理器执行所述计算机可读指令时还执行以下步骤:获取每个用户对象群体的每个兴趣标签,所述兴趣标签对应的目标群体指数与第一用户对象数比重正相关,且与第二用户对象数成负相关;及用相应的第一用户对象数比重除以相应的第二用户对象数比重,得到每个用户对象群体的每个兴趣标签对应的目标群体指数。
- 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:获取用户对象集合,所述用户对象集合包括带兴趣标签的用户对象和无兴趣标签的用户对象;按所述用户对象集合中用户对象的用户属性进行聚类,得到用户对象群体;根据每个所述用户对象群体中带各个兴趣标签的用户对象的数量比重特征,确定每个用户对象群体的每个兴趣标签对应的目标群体指数;基于每个用户对象群体中有兴趣标签的用户对象,按照每个用户对象群体的每个兴趣标签对应的目标群体指数筛选兴趣标签;及将筛选得到的兴趣标签作为相应用户对象群体的群体兴趣标签。
- 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:根据各所述带兴趣标签的用户对象所对应的兴趣标签以及各所述带兴趣标签的用户对象所对应的用户属性,得到每个兴趣标签和用户属性各自对应的边缘概率密度函数以及联合概率密度函数;基本所述边缘概率密度函数以及联合概率密度函数,计算得到每个带兴趣标签的用户对象的互信息;及基于每个用户对象群体中的带兴趣标签的用户对象,按照每个用户对象对应的互信息筛选带错误兴趣标签的用户对象。
- 根据权利要求16所述的存储介质,其特征在于,所述用户对象集合包括各个用户对象,所述用户对象包括用户标识;所述计算机可读指令被所述处理器执行时还执行以下步骤:在所述用户对象集合中,按照用户属性选取多个作为初始的聚类中心的用户对象;对于所述用户对象集合中除聚类中心的每个待聚类的用户对象,按照相应用户属性计算与每个聚类中心的相似度;将所述每个待聚类的用户对象按照相应相似度划分到最接近的聚类中心所属的聚类簇;及重新计算每个聚类簇的聚类中心,当不满足聚类停止条件时,返回所述对于所述用户对象集合中除聚类中心的每个待聚类的用户对象,按照相应用户属性计算与每个聚类中心的相似度的步骤,直至满足聚类停止条件时,得到用户对象群体。
- 根据权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:获取各个属于同一兴趣标签所对应的用户数、相应用户对象群体的群体用户数以及用户对象集合的总用户数;基于所述用户数以及所述群体用户数计算在每个用户对象群体中不同兴趣标签各自的第一用户对象数比重;基于所述用户总数和所述用户对象集合的总用户数计算在所述用户对象集合中不同兴趣标签各自的第二用户对象数比重;及对于每个用户对象群体的每个兴趣标签,分别按照相应的第一用户对象数比重和第二用户对象数比重,计算每个用户对象群体的每个兴趣标签对应 的目标群体指数。
- 根据权利要求16-19任一项所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行时还执行以下步骤:获取每个用户对象群体的每个兴趣标签,所述兴趣标签对应的目标群体指数与第一用户对象数比重正相关,且与第二用户对象数成负相关;及用相应的第一用户对象数比重除以相应的第二用户对象数比重,得到每个用户对象群体的每个兴趣标签对应的目标群体指数。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910666076.0A CN110555164B (zh) | 2019-07-23 | 2019-07-23 | 群体兴趣标签的生成方法、装置、计算机设备和存储介质 |
CN201910666076.0 | 2019-07-23 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021012482A1 true WO2021012482A1 (zh) | 2021-01-28 |
Family
ID=68735857
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/116494 WO2021012482A1 (zh) | 2019-07-23 | 2019-11-08 | 群体兴趣标签的生成方法、装置、计算机设备和存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110555164B (zh) |
WO (1) | WO2021012482A1 (zh) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111125495A (zh) * | 2019-12-19 | 2020-05-08 | 京东方科技集团股份有限公司 | 一种信息推荐方法、设备及存储介质 |
CN111160977A (zh) * | 2019-12-31 | 2020-05-15 | 中国移动通信集团黑龙江有限公司 | 用户关系兴趣特征图的获取方法、装置、设备及介质 |
CN111400586A (zh) * | 2020-02-13 | 2020-07-10 | 北京达佳互联信息技术有限公司 | 群组展示方法、终端、服务器、系统及存储介质 |
CN111461118B (zh) * | 2020-03-31 | 2023-11-24 | 中国移动通信集团黑龙江有限公司 | 兴趣特征确定方法、装置、设备及存储介质 |
CN111667018B (zh) * | 2020-06-17 | 2023-12-15 | 腾讯科技(深圳)有限公司 | 一种对象聚类的方法、装置、计算机可读介质及电子设备 |
CN112529628B (zh) * | 2020-12-16 | 2024-04-09 | 平安科技(深圳)有限公司 | 客户标签的生成方法、装置、计算机设备及存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678431A (zh) * | 2013-03-26 | 2014-03-26 | 南京邮电大学 | 一种基于标准标签和项目评分的推荐方法 |
CN103810192A (zh) * | 2012-11-09 | 2014-05-21 | 腾讯科技(深圳)有限公司 | 一种用户的兴趣推荐方法和装置 |
CN107786943A (zh) * | 2017-11-15 | 2018-03-09 | 北京腾云天下科技有限公司 | 一种用户分群方法及计算设备 |
EP3340073A1 (en) * | 2016-12-22 | 2018-06-27 | Thomson Licensing | Systems and methods for processing of user content interaction |
CN108319612A (zh) * | 2017-01-17 | 2018-07-24 | 百度在线网络技术(北京)有限公司 | 受众媒体推荐方法和系统 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106682013A (zh) * | 2015-11-09 | 2017-05-17 | 阿里巴巴集团控股有限公司 | 用于数据推送的方法和设备 |
CN105677925B (zh) * | 2016-03-30 | 2021-10-15 | 北京京东尚科信息技术有限公司 | 数据库用户数据处理方法和装置 |
CN107122805A (zh) * | 2017-05-15 | 2017-09-01 | 腾讯科技(深圳)有限公司 | 一种用户聚类方法和装置 |
CN108287864B (zh) * | 2017-12-06 | 2020-07-10 | 深圳市腾讯计算机系统有限公司 | 一种兴趣群组划分方法、装置、介质及计算设备 |
-
2019
- 2019-07-23 CN CN201910666076.0A patent/CN110555164B/zh active Active
- 2019-11-08 WO PCT/CN2019/116494 patent/WO2021012482A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103810192A (zh) * | 2012-11-09 | 2014-05-21 | 腾讯科技(深圳)有限公司 | 一种用户的兴趣推荐方法和装置 |
CN103678431A (zh) * | 2013-03-26 | 2014-03-26 | 南京邮电大学 | 一种基于标准标签和项目评分的推荐方法 |
EP3340073A1 (en) * | 2016-12-22 | 2018-06-27 | Thomson Licensing | Systems and methods for processing of user content interaction |
CN108319612A (zh) * | 2017-01-17 | 2018-07-24 | 百度在线网络技术(北京)有限公司 | 受众媒体推荐方法和系统 |
CN107786943A (zh) * | 2017-11-15 | 2018-03-09 | 北京腾云天下科技有限公司 | 一种用户分群方法及计算设备 |
Also Published As
Publication number | Publication date |
---|---|
CN110555164A (zh) | 2019-12-10 |
CN110555164B (zh) | 2024-01-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021012482A1 (zh) | 群体兴趣标签的生成方法、装置、计算机设备和存储介质 | |
WO2020007164A1 (zh) | 用户特征的生成方法、装置、设备及计算机可读存储介质 | |
WO2021027317A1 (zh) | 基于关系网络的属性信息处理方法、装置、计算机设备和存储介质 | |
WO2022105129A1 (zh) | 内容数据推荐方法、装置、计算机设备及存储介质 | |
WO2018223719A1 (zh) | 用户投保行为预测的方法、装置、计算设备及介质 | |
CN109582876B (zh) | 旅游行业用户画像构造方法、装置和计算机设备 | |
CN109360048A (zh) | 订单生成方法、系统、计算机设备和存储介质 | |
WO2021012790A1 (zh) | 页面数据生成方法、装置、计算机设备及存储介质 | |
CN109447731B (zh) | 跨平台产品推荐方法、装置、计算机设备和存储介质 | |
CN109492180A (zh) | 资源推荐方法、装置、计算机设备及计算机可读存储介质 | |
US20210056458A1 (en) | Predicting a persona class based on overlap-agnostic machine learning models for distributing persona-based digital content | |
WO2020253357A1 (zh) | 数据产品推荐方法、装置、计算机设备和存储介质 | |
CN110223186B (zh) | 用户相似度确定方法以及信息推荐方法 | |
CN111192153B (zh) | 人群关系网络构建方法、装置、计算机设备和存储介质 | |
CN110825894B (zh) | 数据索引建立、数据检索方法、装置、设备和存储介质 | |
WO2020244152A1 (zh) | 数据推送方法、装置、计算机设备和存储介质 | |
WO2013103747A1 (en) | Detecting overlapping clusters | |
WO2020015139A1 (zh) | 风险旅客方法、装置、计算机设备和存储介质 | |
WO2020248844A1 (zh) | 测试对象的寿命预估方法、装置、设备及介质 | |
CN111209929A (zh) | 访问数据处理方法、装置、计算机设备及存储介质 | |
CN112131277A (zh) | 基于大数据的医疗数据异常分析方法、装置和计算机设备 | |
WO2020253369A1 (zh) | 生成兴趣标签的方法、装置、计算机设备和存储介质 | |
CN112346951B (zh) | 业务的测试方法及装置 | |
CN111291795A (zh) | 人群特征分析方法、装置、存储介质和计算机设备 | |
CN109656433A (zh) | 类目信息处理方法、装置、计算机设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19938607 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19938607 Country of ref document: EP Kind code of ref document: A1 |