WO2018099177A1 - 一种扩展潜在用户的方法及装置 - Google Patents

一种扩展潜在用户的方法及装置 Download PDF

Info

Publication number
WO2018099177A1
WO2018099177A1 PCT/CN2017/104098 CN2017104098W WO2018099177A1 WO 2018099177 A1 WO2018099177 A1 WO 2018099177A1 CN 2017104098 W CN2017104098 W CN 2017104098W WO 2018099177 A1 WO2018099177 A1 WO 2018099177A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
seed
energy value
training
seed user
Prior art date
Application number
PCT/CN2017/104098
Other languages
English (en)
French (fr)
Inventor
张海滨
程圣军
张旭
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018099177A1 publication Critical patent/WO2018099177A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a method and apparatus for extending potential users.
  • APP terminal applications
  • the APP may be an application that provides a certain service. How to extend the use of an application user becomes a top priority.
  • the potential users are extended among non-seed users mainly by seed users, that is, by analyzing the behavior characteristics of the seed users, searching for potential users similar to the behavior characteristics of the seed users among a large number of users who do not use the application.
  • seed users refers to a user who has used the application
  • non-seed user refers to a user who does not use the application.
  • the atypical seed user refers to a seed user whose behavioral characteristics are significantly different from those of other seed users. For example, if the user is the download user of an APP, some users often open the app for frequent operations, while some people download the app but rarely operate. The user who downloads the app but rarely operates is the app. Atypical seed user.
  • the potential users obtained through the seed user extension may not be potential users who actually use the APP, thereby making the accuracy of message push lower, and thus the recommendation success rate is not high. .
  • Embodiments of the present invention provide a method and apparatus for extending a potential user based on a seed user, so as to improve the accuracy of application push.
  • a method of extending a potential user in which a sampled non-seed user set is sampled from a full number of non-seed users.
  • the initial energy value of the seed user in the seed user set and the initial energy value of the non-seed user in the sampled non-seed user set are set, wherein the seed user's initial energy value is greater than the non-seed user's initial energy value.
  • the initial energy value of the seed user in the seed user set is trained based on the initial energy value of the seed user and the initial energy value of the non-seed user in the sampled non-seed user set, and the trained energy value is obtained, based on the training
  • the magnitude of the energy value distinguishes the impact of the typical seed user and the atypical seed user on determining the potential user, wherein the energy value of the user represents the additional weight of the user's influence on the surrounding user, and the influence mainly refers to the typical user The impact of seed users.
  • the higher the energy value in the embodiment of the present invention the more similar it is to the typical seed user, and the more likely it is to become a typical seed user.
  • Determining a predicted energy value of each non-seed user in the full non-seed user set based on the energy value of the seed user training, and determining a potential user in the full non-seed user set according to the predicted energy value of the non-seed user It can reduce the noise impact caused by atypical seed users, which can improve the accuracy of application push.
  • the K nearest neighbor of the training user is determined from the seed user set and the sampled non-seed user set, according to each of the K nearest neighbors.
  • the initial energy value of a user determines the energy value of the training user after training. Determining, according to the energy value of each training user in the seed user set, the energy value of each non-seed user in the non-seed user set, and obtaining the predicted energy value of the non-seed user, and then according to the predicted energy value of the non-seed user Determine if the non-seed user is a potential user.
  • the training user set includes at least the seed user set.
  • the K nearest neighbor of the non-seed user may be determined from the seed user set and the sampled non-seed user set for each non-seed user in the full non-seed user set, and according to the non-seed user
  • the energy value of each user in the K nearest neighbor determines the predicted energy value of the non-seed user.
  • the energy value of the seed user is the energy value of the seed user after training; if the user is a non-seed user, Then the energy value of the non-seed user is the initial energy value of the non-seed user.
  • the training user set may include the seed user set and the sampled non-seed user set, and the energy value of each user in the K nearest neighbor of the non-seed user may be the K nearest neighbor of the non-seed user.
  • the energy value of each user after training.
  • the energy value of the non-seed user in the sampled non-seed user set needs to be trained, and the energy value of the sampled non-seed user is also trained, so that the potential user can be more accurately determined.
  • the sum of the initial energy value of the user in the K nearest neighbor and the initial energy value of the training user may be used to determine the energy value of the training user after training.
  • determining a K nearest neighbor of the non-seed user from the seed user set and the sampled non-seed user set Determining, after the initial energy value of the seed user in the seed user set and the initial energy value of the non-seed user in the non-seed user set, an updated energy value of each user in the K nearest neighbor of the non-seed user; The updated energy value of each user in the K nearest neighbor of the non-seed user determines the predicted energy value of the non-seed user, and determines whether the non-seed user is a potential user according to the predicted energy value.
  • the initial energy value of the seed user is updated by using the initial energy value of the K nearest neighbor of the seed user, so that the updated energy value of the typical seed user is higher than the updated energy value of the atypical seed user, and then a non-seed is determined.
  • the user is a potential user, increase the impact of typical seeds and reduce the impact of atypical seeds. Based on the energy value after training, expanding potential users among non-seed users can reduce the noise impact caused by atypical seed users.
  • an apparatus for extending a potential user having a function of implementing an extended potential user involved in the above first aspect, the function being implemented by hardware, or executing the corresponding software by hardware achieve.
  • the hardware or software includes one or more modules corresponding to the functions described above.
  • the modules can be software and/or hardware.
  • the device for expanding potential users includes an obtaining unit and a processing unit, and the functions of the obtaining unit and the processing unit may correspond to each method step, and details are not described herein.
  • an apparatus for extending a potential user comprising: a processor and a memory, wherein the memory stores a computer readable program; and the processor is configured to run in the memory Program, A method for extending a potential user by any of the above-mentioned first aspects.
  • a computer storage medium for storing instructions that, when executed, perform any of the methods of extending a potential user involved in the first aspect above.
  • the initial energy value is set by each user in the set of non-seed users that collect and sample the seed user, and each user in the set of seed users and the sampled non-seed user set is trained to obtain the energy value after training. Based on the trained energy values, the predicted energy values for each non-seed user in the full non-seed user set are determined.
  • the predicted energy value of the non-seed user can reflect the degree of similarity between the non-seed user and the typical seed user. Based on the predicted energy value of the non-seed user, determining the potential user in the full non-seed user set can reduce the SARS.
  • the noise impact caused by the seed user can further improve the accuracy of the application push.
  • FIG. 1 is a schematic diagram of a system architecture applied to a method for extending a potential user according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of recording user behavior characteristic data according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of abstracting user behavior feature data into data in a 2-dimensional space according to an embodiment of the present invention
  • FIG. 4 is a flowchart of a method for extending a potential user according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of another method for extending a potential user according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of an apparatus for extending a potential user according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of another apparatus for extending a potential user according to an embodiment of the present invention.
  • the method for extending a potential user is applicable to an application scenario in which a similarity algorithm is used to perform potential user extension.
  • the potential user extension using the similarity algorithm can be used in the system architecture diagram shown in FIG. 1.
  • the database is used to store the user's basic metadata.
  • the basic metadata of each type in the database can be stored in the form of a form, and each form stores a type of basic metadata of the user.
  • An Extract-Transform-Load (ETL) module is configured to extract basic metadata of the user from the database and perform simple summary transformation on the extracted basic metadata to obtain behavior characteristic data of the user.
  • the behavior characteristic data of the user in FIG. 2 mainly includes the user identification, the average frequency (number of times/day) of using the terminal APP per day, and the average Internet traffic per minute (KB/minute) of the user.
  • the data ETL module can also distinguish between seed users and non-seed users. Seed users and non-seed users can be distinguished by different flag bits. For example, a subscriber who subscribes to an application can be called a seed user and is identified by a flag with a value of 1. A non-subscriber who is not subscribed to the application can be called a non-seed.
  • the data mining and analysis platform is mainly used to abstract the behavior characteristic data of seed users and non-seed users and use the abstracted behavior feature data to perform similarity algorithm processing to obtain potential users to be extended.
  • the marketing platform applies pushes to the potential users in a manner such as advertisements or short messages for potential users obtained by the data mining and analysis platform.
  • the following describes the implementation process of the similarity algorithm processing to implement the potential user extension by using the behavior characteristic data of the seed user and the non-seed user.
  • the method execution subject of the extended potential user involved in the following embodiments may be referred to as a device that expands the potential user, and the device of the extended potential user may be a data analysis and mining platform, or may be a component in the data analysis and mining platform.
  • the abstraction as data in the N-dimensional space for example, based on the user behavior characteristic data shown in FIG. 2, the user behavior characteristic data can be abstracted into the data in the 2-dimensional space shown in FIG.
  • the users whose spatial distances are close in FIG. 3 have similar user behaviors, and these users with similar user behaviors can be regarded as potential users.
  • a k-Nearest Neighbor (kNN) lookup scheme can be used to extend potential users.
  • the K nearest neighbor refers to the K users that are closest to the user behavior characteristics of the specified object, and K is a positive integer, and the K users can be considered as users having the most similar user behavior characteristics with the specified object.
  • K is a positive integer
  • the K users can be considered as users having the most similar user behavior characteristics with the specified object.
  • the nearest neighbor includes one seed user (user 2) and two non-seed users (user 4 and user 5).
  • a method for expanding the potential user based on the seed user in which the method is provided.
  • the energy value is used to distinguish the impact of the typical seed user and the atypical seed user on determining the potential user.
  • the energy value of the user represents the impact weight of the user on the surrounding users, and the impact mainly refers to the influence of the user as a typical seed user. .
  • the higher the energy value in the embodiment of the present invention the more similar it is to the typical seed user, and the more likely it is to become a typical seed user.
  • a sampled non-seed user set is sampled from a full number of non-seed users.
  • the initial energy value of the seed user in the seed user set and the initial energy value of the non-seed user in the sampled non-seed user set are set, wherein the seed user's initial energy value is greater than the non-seed user's initial energy value.
  • Determining K nearest neighbors of the training user from the seed user set and the sampled non-seed user set for each user (abbreviated as training user) including at least the seed user set, according to the K nearest neighbor The initial energy value of each user determines the energy value of the training user after training.
  • the initial energy value of the nearest neighbor trains the initial energy value of the seed user, so that the energy value of the typical seed user after training is higher than the energy value of the atypical seed user after training, thereby determining whether a non-seed user is a potential user.
  • expanding potential users among non-seed users can reduce the noise impact caused by atypical seed users.
  • FIG. 4 is a flowchart of a method for implementing a method for extending a potential user according to an embodiment of the present invention.
  • the method for performing the method shown in FIG. 4 is a device for expanding a potential user.
  • the device for extending potential users is based on the implementation process of the seed user to extend the potential user, as shown in FIG. 4, including:
  • S101 Acquire a seed user set and a sampled non-seed user set.
  • the device for expanding a potential user in the embodiment of the present invention may acquire a seed user set and a sampled non-seed user set from the data ETL module, and can acquire behavior characteristic data of the seed user and the non-seed user.
  • the sampled non-seed user set may be sampled from a full set of non-seed users using any existing sampling method, for example, A random sampling method is used to extract 1% of the users from the full non-seed user set as a sampled non-seed user set.
  • the sampled non-seed user set is a subset of the full non-seed user set, and the present invention does not limit the specific sampling method.
  • S102 Set an initial energy value of the seed user in the seed user set and an initial energy value of the non-seed user in the sampled non-seed user set; wherein the seed user has an initial energy value greater than the non-seed The user's initial energy value.
  • different initial energy values may be preset for the seed user and the non-seed user.
  • the initial energy value can distinguish the impact weight of the seed user and the non-seed user on the surrounding user. The higher the initial energy value is The more similar a typical seed user is, the more likely it is to become a typical seed user. Therefore, the initial energy value of the seed user preset in the embodiment of the present invention is greater than the initial energy value of the non-seed user.
  • S103 For each training user in the training user set, determine, from the seed user set and the sampled non-seed user set, a K nearest neighbor of the training user, according to an initial energy of each user in the K nearest neighbor. The value determines the energy value of the training user after training.
  • the seed user In the K nearest neighbor of a seed user, if there are many non-seed users, the seed user is likely to be an atypical seed user; on the contrary, if a seed user has a K nearest neighbor, if there are many seed users, the seed user is very May be a typical seed user.
  • the K nearest neighbor includes a seed user and/or a non-seed user
  • the number of seed users in the K nearest neighbor of the seed user is generally greater than the number of non-seed users.
  • the number of seed users in the K nearest neighbor of the seed user is generally less than the number of non-seed users.
  • the user in the seed user set, or the user in the seed user set and the user in the sampled non-seed user set may be used as the training user, and the seed user set and the sampled non- In the seed user set, the K nearest neighbor of the training user is determined, and the initial energy value of the training user is trained according to the initial energy value of each user in the K nearest neighbor, and the energy value after the typical seed training is obtained. Training results for energy values after atypical seed training.
  • the energy of the training user after training may be determined according to the initial energy value of each user in the K nearest neighbor. value:
  • the energy value of the training user after training is determined by the sum of the initial energy value mean of the user in the K nearest neighbor and the initial energy value of the training user. For example, the following formula can be used to confirm the energy value of the training user after training:
  • the init(Useri) is the initial energy value of the training user i
  • the userl is the user l in the K nearest neighbor of the training user i
  • the energy(Useri) is the energy value after the training user i is trained
  • the energy(Userl) is the user l.
  • the initial energy value, k is the number of users in the K nearest neighbor of training user i.
  • the process of determining the energy value after training the user is described by taking the user behavior characteristic data shown in FIG. 2 and FIG. 3 as an example.
  • the seed user set includes user 2, user 9, user 10, user 11, user 12, user 13, and user 14.
  • the sampled non-seed user set includes user 1, user 3, user 4, user 5, user 7, and user 16. And user 17 etc.
  • the non-seed user set including the above samples in the full amount of non-seed user sets includes, for example, user 6, user 8, user 15, and the like.
  • the training user is the user 2, the user 9, the user 10, the user 11, the user 12, the user 13 and the user 14 included in the seed user set, or further includes the user 1 included in the sampled non-seed user set.
  • a K-nearest neighbor user of the training user is determined from the seed user set and the sampled non-seed user set.
  • the Euclidean distance can be used to calculate the distance between two points in the two-dimensional space to determine the K nearest neighbor user of the training user. For example, for the user 2 in FIG. 3, when determining the K nearest neighbor user, the Euclidean distance between the user 2 and other users is calculated respectively, for example, for the user 1 and the user 2, the Euclidean distance between the two is Among the calculated results, K users with the smallest Euclidean distance are selected as K nearest neighbor users.
  • the initial energy value of each user in the K nearest neighbor is used to train the initial energy value of the training user, and the energy value after training the training user is obtained.
  • the training process may be adopted, and the initial energy value of each user in the K nearest neighbor is used to train the initial energy value of the training user, and the energy value of all training users after training is obtained.
  • the three nearest neighbor users of user 14 are user 15, user 16, and user 17, respectively.
  • the K nearest neighbor user of the training user in the embodiment of the present invention it may be determined based on all non-seed users in the full non-seed user set, or may be determined based on the sampled non-seed user set. In the case of considering the computational complexity in the embodiment of the present invention, it is preferable to determine based on the sampled non-seed user set to reduce the computational complexity.
  • S104 Determine, according to the energy value of each trained user in the training user set, an energy value of each non-seed user in the total non-seed user set, to obtain a predicted energy value of the non-seed user.
  • the training user set includes at least a seed user set, and may also include a seed user set and a sampled non-seed user set.
  • the K nearest neighbor of the non-seed user is determined from the seed user set and the sampled non-seed user set;
  • the energy value of each user in the K nearest neighbor of the seed user determines the predicted energy value of the non-seed user.
  • the energy value of the seed user is the energy value of the seed user after training. If the non-seed user has similar behavior characteristic data as the typical seed user, the probability that the non-seed user has a typical seed user in the K nearest neighbor user is larger than that in the presence of the atypical seed user, and is in the embodiment of the present invention.
  • the non-seed is used The energy value of the typical seed user in the K nearest neighbor user is greater than the energy value of the atypical seed user after training, and then the energy value of each user in the K nearest neighbor of the non-seed user is used to determine the typical seed.
  • the predicted energy value of the non-seed user with similar behavioral feature data of the user is relatively large compared with the predicted energy value of the non-seed user with similar behavioral feature data of the atypical seed user.
  • the energy value of the non-seed user may be the initial energy value of the non-seed user or the energy value of the non-seed user after training, if there is a non-seed user for each of the K nearest neighbor users of the non-seed user. .
  • the training user set includes the seed user set and the sampled non-seed user set
  • the The energy value of each user in the K nearest neighbor of the non-seed user that is, each user of the K nearest neighbors for the non-seed user
  • the energy value of the seed user is a seed.
  • the energy value after the user is trained. If the user is a non-seed user, the energy value of the non-seed user is the energy value of the non-seed user after training.
  • the predicted energy value of the non-seed user may be determined according to the energy value of each user in the K nearest neighbor of the non-seed user:
  • the sum of the energy value mean of the user in the K nearest neighbor and the initial energy value preset by the non-seed user is determined as the predicted energy value of the non-seed user.
  • the following formula can be used to determine the predicted energy value of a non-seed user:
  • energy(Useri) is the predicted energy value of the non-seed user i
  • init(Useri) is the initial energy value of the non-seed user i
  • Usern is the K nearest neighbor user of the non-seed user i
  • energy(Usern) is the non-seed user.
  • the energy value of k nearest neighbor user n, k is the number of K nearest neighbor users of non-seed user i.
  • the energy value of the user n is the energy value of the seed user after training.
  • the energy value of the user n may be the initial energy value of the non-seed user, or may be the energy value of the non-seed user after training.
  • the K nearest neighbor of user 3 in the full non-seed user set is user 2, user 4 and user 5, if the non-seed user energy value of the user in the K nearest neighbor of the non-seed user is the energy of the sampled non-seed user training.
  • S105 Determine a potential user in the full non-seed user set according to the predicted energy value of the non-seed user.
  • the potential user may be determined in the full non-seed user set according to the predicted energy value of the non-seed user.
  • the non-seed user whose predicted energy value is greater than the preset threshold may be selected as a potential user.
  • the specific setting manner of the threshold is not limited in the embodiment of the present invention. For example, it may be set according to an empirical value, or may be determined by a machine learning manner according to a predicted energy value of each non-seed user in the full non-seed user set. Size setting.
  • the non-seed users of the non-seed users in the non-seed user set are in descending order, and the set number of non-seed users in the top ranked order are selected as potential users.
  • the specific setting manner of the quantity is not limited in the embodiment of the present invention. For example, it may be set according to an empirical value, or may be set by a machine learning manner according to the number of non-seed users in the full non-seed user set. For example, the predicted energy value of the user 15 is greater than the predicted energy value of the user 3. If a potential user is determined, it can be determined that the user 15 is more similar to a typical seed user, and the user 15 can be determined as a potential user.
  • the training value of the training user is obtained by training each training user in the training user set, and the non-seed user in the full non-seed user set is determined based on the energy value after training the training user.
  • the predicted energy value, the predicted energy value of the non-seed user can reflect the similarity degree of the non-seed user with the typical seed user, and the potential user is determined in the full non-seed user set based on the predicted energy value of the non-seed user. It can reduce the noise impact caused by atypical seed users, which can improve the accuracy of application push.
  • the process of determining the predicted energy value of each non-seed user in the non-seed user set based on the user energy value in the seed user set and the sampled non-seed user set is not limited to the execution of the foregoing embodiment.
  • the process may be performed, for example, in the embodiment of the present invention, after determining the K nearest neighbor user of the non-seed user in the full non-seed user set, and then according to the initial energy value of the seed user in the seed user set and the non-seed The initial energy value of the non-seed user in the user set, determining the updated energy value of each user in the K nearest neighbor of the non-seed user, according to the updated energy value of each user in the K nearest neighbor of the non-seed user The predicted energy value of the non-seed user.
  • FIG. 5 is a flowchart of another method for implementing an extended potential user according to an embodiment of the present invention.
  • steps S201, S202, and S206 are performed, and S101, S102, and S105 in FIG. The same, will not be described here, the following only describes the differences:
  • S203 For each non-seed user in the full non-seed user set, determine the K nearest neighbor of the non-seed user from the seed user set and the sampled non-seed user set.
  • the implementation process of determining the K-nearest neighbor of the non-seed user in the embodiment of the present invention is similar to the implementation process of determining the K nearest neighbor user of the training user in the foregoing embodiment, and details are not described herein again.
  • S204 Determine, according to an initial energy value of the seed user in the seed user set and an initial energy value of the non-seed user in the non-seed user set, an updated energy value of each user in the K nearest neighbor of the non-seed user. .
  • the implementation process of determining the updated energy value of each user in the K nearest neighbor of the non-seed user, and the initial energy value of each user in the nearest neighbor using the training user K involved in the foregoing embodiment The initial energy value of the user is trained, and the implementation of the energy value of the training user is similar. The difference is only the energy value of the training user after the training in the above embodiment.
  • the K of the non-seed user is used. The updated energy value of each user in the nearest neighbor will not be described here.
  • S205 Determine, according to the updated energy value of each user in the K nearest neighbor of the non-seed user, the predicted energy value of the non-seed user.
  • the embodiment of the present invention further provides an apparatus for extending a potential user, based on the method for extending a potential user involved in the foregoing embodiment.
  • the device for expanding the potential user includes a corresponding hardware structure and/or software module for executing each function in order to implement the above functions.
  • the embodiments of the present invention can be implemented in a combination of hardware or hardware and computer software in combination with the elements and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the technical solutions of the embodiments of the present invention.
  • the embodiment of the present invention may perform functional unit division on a device for expanding a potential user according to the foregoing method example.
  • each functional unit can be divided for each function, or two or more functions can be integrated into one processing unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present invention is schematic, and is only a logical function division, and the actual implementation may have another division manner.
  • FIG. 6 shows a simplified functional block diagram of an apparatus 100 for extending potential users provided by an embodiment of the present invention.
  • the apparatus 100 for expanding potential users includes an obtaining unit 101 and a processing unit. 102, where:
  • the obtaining unit 101 is configured to obtain a seed user set and a sampled non-seed user set, wherein the sampled non-seed user set is a subset of the full non-seed user set.
  • the processing unit 102 is configured to set an initial energy value of the seed user in the seed user set acquired by the acquiring unit 101 and an initial energy value of the non-seed user in the sampled non-seed user set;
  • the initial energy value of the seed user is greater than the initial energy value of the non-seed user;
  • the K nearest neighbor of the training user determines the energy value of the trained user after the training according to the initial energy value of each user in the K nearest neighbor; according to the energy value of each trained user in the training user set, Determining an energy value of each non-seed user in the total non-seed user set to obtain a predicted energy value of the non-seed user; and determining, according to the predicted energy value, whether the non-seed user is a potential user.
  • the processing unit 102 may determine, for each non-seed user in the full-quantity non-seed user set, a K nearest neighbor of the non-seed user from the seed user set and the sampled non-seed user set; The predicted energy value of the non-seed user is determined according to the energy value of each user in the K nearest neighbor of the non-seed user.
  • the energy value of the seed user is the energy value of the seed user after training; if the user is a non-seed user, The energy value of the non-seed user is the initial energy value of the non-seed user.
  • the energy value of each user in the K nearest neighbor of the non-seed user may be training for each user in the K nearest neighbor of the non-seed user. After the energy value.
  • the processing unit 102 may update the energy value of the training user by using a sum of an initial energy value mean of the user in the K nearest neighbor and an initial energy value of the training user, and the training obtained by the update is performed.
  • the energy value of the user is determined as the energy value of the training user after training.
  • the processing unit 102 may be a processor or a controller.
  • the obtaining unit 101 may be a communication interface, a transceiver, a transceiver circuit, etc., wherein the communication interface is a collective name and may include one or more interfaces.
  • the apparatus 100 for expanding potential users may be the structure shown in FIG.
  • FIG. 7 is a schematic structural diagram of an apparatus 1000 for expanding potential users according to an embodiment of the present invention.
  • the device 1000 for expanding potential users adopts a general computer system structure, including a bus, a processor 1001, a memory 1002, and a communication interface 1003.
  • the program code for executing the solution of the present invention is stored in the memory 1002, and is processed by the processor 1001. To control execution.
  • the bus can include a path to transfer information between various components of the computer.
  • the processor 1001 can be a general purpose central processing unit (CPU), a microprocessor, and an application specific integrated circuit. An application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the present invention.
  • One or more memories included in the computer system which may be read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM) or Other types of dynamic storage devices that store information and instructions may also be disk storage. These memories are connected to the processor via a bus.
  • the communication interface 1003 can use any device such as a transceiver to communicate with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Network (WLAN), and the like.
  • a transceiver to communicate with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Network (WLAN), and the like.
  • RAN Radio Access Network
  • WLAN Wireless Local Area Network
  • a memory 1002 such as a RAM, holds an operating system and a program for executing the inventive scheme.
  • the operating system is a program that controls the running of other programs and manages system resources.
  • the program stored in the memory 1002 is used by the instruction processor 1001 to perform a method of expanding a potential user, comprising: obtaining a seed user set and a sampled non-seed user set through the communication interface 1003. Setting an initial energy value of the seed user in the obtained seed user set and an initial energy value of the non-seed user in the sampled non-seed user set; wherein the seed user's initial energy value is greater than the non-seed An initial energy value of the user; for each training user in the training user set, determining, from the seed user set and the sampled non-seed user set, the K nearest neighbor of the training user, according to each of the K nearest neighbors Determining an energy value of the training user after the training, and determining an energy value of each non-seed user in the full non-seed user set according to the energy value of each training user in the training user set Obtaining a predicted energy value of the non-seed user; determining, according to the predicted energy value, whether the non-seed user is a potential user.
  • the embodiment of the present invention further provides a computer storage medium for storing some instructions. When the instructions are executed, any method for extending potential users involved in the foregoing embodiments may be completed.
  • the initial energy value is set by each user in the set of non-seed users that are set and sampled by the seed user, and each user in the set of training users including at least the seed user set is trained to obtain the energy value after training, based on The energy value after training determines the predicted energy value of each non-seed user in the full non-seed user set.
  • the predicted energy value of the non-seed user can reflect the degree of similarity between the non-seed user and the typical seed user. Based on the predicted energy value of the non-seed user, determining the potential user in the full non-seed user set can reduce the SARS.
  • the noise impact caused by the seed user can further improve the accuracy of the application push.

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种扩展潜在用户的方法及装置,获取种子用户集合以及抽样的非种子用户集合,设置种子用户集合中的种子用户的初始能量值和抽样的非种子用户集合中的非种子用户的初始能量值;其中,种子用户的初始能量值大于非种子用户的初始能量值;针对至少包括种子用户集合的训练用户集合中的每一个训练用户,分别确定该训练用户的K最近邻,根据K最近邻中每一个用户的初始能量值,确定该训练用户训练后的能量值;根据训练用户集合中每一个训练用户训练后的能量值,确定全量非种子用户集合中每一非种子用户的预测能量值;根据非种子用户的预测能量值,在全量非种子用户集合中确定潜在用户,以提高应用推送的准确度。

Description

一种扩展潜在用户的方法及装置
本申请要求于2016年11月29日提交中国专利局、申请号为201611075513.4,发明名称为“一种扩展潜在用户的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及通信技术领域,尤其涉及一种扩展潜在用户的方法及装置。
背景技术
随着互联网技术的蓬勃发展,各种终端应用(Application,简称APP)不断推陈出新,其中,APP具体可以是提供某种服务的应用程序。如何扩展使用一个应用的用户,成为当务之急。
目前,主要通过种子用户(seed users))在非种子用户中扩展潜在用户,即通过分析种子用户的行为特征,在海量的未使用该应用的用户中寻找与种子用户的行为特征相似的潜在用户作为扩展或者拉新的对象,从而实现精准营销。精准营销的过程具体可以是向具有相似用户行为特征的用户推送使用应用的消息。其中,所述种子用户是指已经使用该应用的用户,所述非种子用户是指未使用该应用的用户。
然而,在实际应用场景中,种子用户中往往会存在或者掺杂着诸多非典型种子用户。所述非典型种子用户是指行为特征和其他种子用户的行为特征有着明显区别的种子用户。比如同为某款APP的下载用户,有些用户经常打开该APP进行频繁操作,而有些人虽然下载了该APP但是很少进行操作,这些下载了该APP但是很少进行操作的用户即是该APP的非典型种子用户。
由于种子用户中非典型种子用户的存在,因此通过种子用户扩展得到的潜在用户,可能并不是真正会使用该APP的潜在用户,从而,使得消息推送的准确度较低,进而推荐成功率不高。
发明内容
本发明实施例提供一种基于种子用户扩展潜在用户的方法及装置,以提高应用推送的准确度。
第一方面,提供一种扩展潜在用户的方法,该方法中,从全量的非种子用户中抽样获得抽样的非种子用户集合。设置种子用户集合中的种子用户的初始能量值和抽样的非种子用户集合中的非种子用户的初始能量值,其中,种子用户的初始能量值大于非种子用户的初始能量值。基于种子用户的初始能量值和抽样的非种子用户集合中的非种子用户的初始能量值,对种子用户集合中的种子用户的初始能量值进行训练,得到训练后的能量值,基于该训练后的能量值的大小,区分出典型种子用户和非典型种子用户对于确定潜在用户的影响,其中,用户的能量值表征该用户对周围用户附加的影响权重,所述影响主要是指对用户成为典型种子用户的影响。本发明实施例中能量值越高说明与典型种子用户越相似,越有可能成为典型的种子用户。基于种子用户训练后的能量值,确定全量非种子用户集合中每一非种子用户的预测能量值,并根据所述非种子用户的预测能量值,在所述全量非种子用户集合中确定潜在用户,能够降低非典型种子用户造成的噪声影响,进而能够提高应用推送的准确度。
一种可能的设计中,针对训练用户集合中的每一个用户(简称训练用户),从种子用户集合以及抽样的非种子用户集合中,确定该训练用户的K最近邻,根据K最近邻中每一个用户的初始能量值,确定该训练用户训练后的能量值。根据种子用户集合中每一个训练用户训练后的能量值,确定全量非种子用户集合中每一非种子用户的能量值,得到非种子用户的预测能量值,进而可根据非种子用户的预测能量值,确定该非种子用户是否为潜在用户。其中,所述训练用户集合中至少包括所述种子用户集合。
其中,可针对所述全量非种子用户集合中的每一非种子用户,从所述种子用户集合以及抽样的非种子用户集合中,确定该非种子用户的K最近邻,并根据该非种子用户的K最近邻中每一个用户的能量值,确定该非种子用户的预测能量值。
可选的,对于该非种子用户的K最近邻中的每一个用户,若该用户为种子用户,则该种子用户的能量值为种子用户训练后的能量值;若该用户为非种子用户,则该非种子用户的能量值为非种子用户的初始能量值。在该方法中不需要训练所述抽样的非种子用户集合中的非种子用户的能量值,可以节约系统的资源的情况下,也可以减轻种子用户中非典型种子对确定潜在用户的影响。
可选的,所述训练用户集合中可以包括所述种子用户集合和抽样的非种子用户集合,该非种子用户的K最近邻中每一个用户的能量值可以为该非种子用户的K最近邻中每一个用户训练后的能量值。在该方法中需要训练所述抽样的非种子用户集合中的非种子用户的能量值,对抽样非种子用户的能量值也进行训练,可以更加准确的确定潜在用户。
可选的,可利用K最近邻中用户的初始能量值均值与所述训练用户的初始能量值之和,确定所述训练用户训练后的能量值。
另一种可能的设计中,对于所述全量非种子用户集合中的每一个非种子用户,从所述种子用户集合以及抽样的非种子用户集合中,确定该非种子用户的K最近邻;根据所述种子用户集合中的种子用户的初始能量值和所述非种子用户集合中的非种子用户的初始能量值,确定该非种子用户的K最近邻中每一个用户的更新后能量值;根据该非种子用户的K最近邻中每一个用户的更新后能量值,确定该非种子用户的预测能量值,并根据所述预测能量值,确定该非种子用户是否为潜在用户。
本发明实施例中,由于非典型种子用户的行为特征和其他种子用户的行为特征有着明显区别,典型种子用户与非典型种子用户相比,其K最近邻中种子用户的数量一般要更多,故利用种子用户的K最近邻的初始能量值对种子用户的初始能量值进行更新,可使得典型种子用户更新后的能量值高于非典型种子用户更新后的能量值,进而在判断一个非种子用户是否为潜在用户的时候,加大典型种子的影响且减弱非典型种子的影响。基于训练后的能量值,在非种子用户中扩展潜在用户,能够降低非典型种子用户造成的噪声影响。
第二方面,提供一种扩展潜在用户的装置,该扩展潜在用户的装置具有实现上述第一方面中涉及的扩展潜在用户的功能,所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。所述模块可以是软件和/或硬件。
一种可能的设计中,所述扩展潜在用户的装置包括获取单元和处理单元,获取单元和处理单元的功能可以和各方法步骤相对应,在此不予赘述。
第三方面,提供一种扩展潜在用户的装置,该扩展潜在用户的装置包括:处理器和存储器,其中,所述存储器中存有计算机可读程序;所述处理器通过运行所述存储器中的程序, 以用于完成上述第一方面涉及的任意一种扩展潜在用户的方法。
第四方面,提供一种计算机存储介质,用于存储一些指令,这些指令被执行时,可以完成上述第一方面所涉及的任意一种扩展潜在用户的方法。
上述通过对种子用户集合和抽样的非种子用户集合中的每一个用户设置初始能量值,并对种子用户集合和抽样的非种子用户集合中的每一个用户进行训练,得到训练后的能量值,基于训练后的能量值,确定得到全量非种子用户集合中每一非种子用户的预测能量值。非种子用户的预测能量值大小,能够反映该非种子用户具有与典型种子用户的相似程度,基于非种子用户的预测能量值大小,在所述全量非种子用户集合中确定潜在用户,能够降低非典型种子用户造成的噪声影响,进而能够提高应用推送的准确度。
附图说明
图1为本发明实施例提供的扩展潜在用户的方法所应用的系统架构;
图2为本发明实施例提供的记录用户行为特征数据的示意图;
图3为本发明实施例提供的用户行为特征数据抽象为2维空间内数据的示意图;
图4为本发明实施例提供的扩展潜在用户的一种方法实施流程图;
图5为本发明实施例提供的扩展潜在用户的另一种方法实施流程图;
图6为本发明实施例提供的一种扩展潜在用户的装置结构示意图;
图7为本发明实施例提供的另一种扩展潜在用户的装置结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行介绍。
本发明实施例提供的扩展潜在用户的方法,适用于利用相似性算法进行潜在用户扩展的应用场景。利用相似性算法进行潜在用户扩展可采用图1所示的系统架构图中。图1中,数据库用于存储用户的基础元数据,数据库中各类型的基础元数据可通过表单形式存储,每个表单存储用户一种类型的基础元数据。数据提取转换加载(Extract-Transform-Load,ETL)模块,用于从所述数据库中提取用户的基础元数据并对提取的基础元数据进行简单汇总变换得到用户的行为特征数据。例如,对从数据库中提取的用户每日上网流量数据,用户基本信息数据,用户上网记录等基础元数据进行简单汇总得到图2所示的行为特征数据。图2中用户的行为特征数据主要包括用户标识、平均每天使用终端APP的频率(次数/天)以及用户平均每分钟的上网流量(KB/分钟)。数据ETL模块还可区分出种子用户与非种子用户。种子用户和非种子用户可通过不同的标志位区分,例如针对订阅某款应用的订阅用户可称为种子用户并用数值为1的标志位标识,未订阅该应用的非订阅用户可称为非种子用并用数值为0的标志位标识。数据挖掘与分析平台主要用于对种子用户与非种子用户的行为特征数据进行抽象并利用抽象后的行为特征数据进行相似性算法处理,得到待扩展的潜在用户。营销平台针对数据挖掘与分析平台得到的潜在用户,采用诸如广告或者短消息的方式向所述潜在用户进行应用的推送。
本发明实施例以下主要针对利用种子用户和非种子用户的行为特征数据进行相似性算法处理实现潜在用户扩展的实施过程进行说明。
以下实施例中涉及的扩展潜在用户的方法执行主体可以称为是扩展潜在用户的装置,该扩展潜在用户的装置可以是数据分析与挖掘平台,也可以是数据分析与挖掘平台中的部件。
扩展潜在用户的装置进行相似性算法处理时,可将种子用户和非种子用户的行为特征数 据抽象为N维空间内的数据,例如基于图2所示的用户行为特征数据,可将用户行为特征数据抽象为图3所示的2维空间内的数据。图3中空间距离接近的用户具有相似的用户行为,并可将这些具有相似用户行为的用户作为潜在用户。例如,可采用K最近邻(k-Nearest Neighbor,kNN)的查找方案扩展潜在用户。所述K最近邻是指与指定对象的用户行为特征最接近的K个用户,K为正整数,这K个用户可认为是与该指定对象具有最相似的用户行为特征的用户。例如,对于图3中所指定的用户3,K=3时,用户3的K最近邻是指与用户3空间距离最接近的3个用户,例如图3中用户3在K=3时的K最近邻中包括一个种子用户(用户2)和两个非种子用户(用户4和用户5)。由图3可知,针对用户3和用户15,K=3时的K最近邻均是包括一个种子用户和两个非种子用户,因此,采用K最近邻的方法无法区分用户3和用户15,哪一个用户是更有可能的潜在用户。
然而,针对用户3而言,K=3时的K最近邻中包括的种子用户明显为非典型种子用户,用户3可能并不是潜在的与典型种子用户具有相似用户行为特征的潜在用户,但是若按照常规K最近邻扩展潜在用户的方案,并不能区分出用户3和用户15哪一个用户是与典型种子用户具有相似用户行为特征的潜在用户,进而使得确定的潜在用户准确度比较低。
本发明实施例中为降低基于相似度算法进行潜在用户扩展过程中非典型种子用户造成的噪声影响,提高确定潜在用户的准确度,提供一种基于种子用户扩展潜在用户的方法,在该方法中,基于能量值区分出典型种子用户和非典型种子用户对于确定潜在用户的影响,用户的能量值表征该用户对周围用户附加的影响权重,所述影响主要是指对用户成为典型种子用户的影响。本发明实施例中能量值越高说明与典型种子用户越相似,越有可能成为典型的种子用户。在利用种子用户的能量值扩展潜在用户过程中,从全量的非种子用户中抽样获得抽样的非种子用户集合。设置种子用户集合中的种子用户的初始能量值和抽样的非种子用户集合中的非种子用户的初始能量值,其中,种子用户的初始能量值大于非种子用户的初始能量值。针对至少包括所述种子用户集合的训练用户集合中的每一个用户(简称训练用户),从种子用户集合以及抽样的非种子用户集合中,确定该训练用户的K最近邻,根据K最近邻中每一个用户的初始能量值,确定该训练用户训练后的能量值。根据训练用户集合中每一个训练用户训练后的能量值,确定全量非种子用户集合中每一非种子用户的能量值,得到非种子用户的预测能量值,进而可根据非种子用户的预测能量值,在全量非种子用户中确定潜在用户。由于非典型种子用户的行为特征和其他种子用户的行为特征有着明显区别,典型种子用户与非典型种子用户相比,其K最近邻中种子用户的数量一般要更多,故利用种子用户的K最近邻的初始能量值对种子用户的初始能量值进行训练,可使得典型种子用户训练后的能量值高于非典型种子用户训练后的能量值,进而在判断一个非种子用户是否为潜在用户的时候,加大典型种子的影响且减弱非典型种子的影响。基于训练后的能量值,在非种子用户中扩展潜在用户,能够降低非典型种子用户造成的噪声影响。
图4所示为本发明实施例提供的扩展潜在用户的一种方法实施流程图,图4所示的方法执行主体为扩展潜在用户的装置。扩展潜在用户的装置基于种子用户扩展潜在用户的实施过程,如图4所示,包括:
S101:获取种子用户集合以及抽样的非种子用户集合。
本发明实施例中扩展潜在用户的装置可从数据ETL模块获取到种子用户集合以及抽样的非种子用户集合,并能够获取到种子用户和非种子用户的行为特征数据。其中,抽样的非种子用户集合可以是采用任何现有的抽样方法从全量非种子用户集合中抽样获得,例如,可以 采用随机抽样的方法从全量非种子用户集合中抽取1%的用户作为抽样的非种子用户集合。显然,所述抽样的非种子用户集合为全量非种子用户集合的子集,本发明对于具体的抽样方法不做限定。
S102:设置所述种子用户集合中的种子用户的初始能量值和所述抽样的非种子用户集合中的非种子用户的初始能量值;其中,所述种子用户的初始能量值大于所述非种子用户的初始能量值。
本发明实施例中可对种子用户和非种子用户预设不同的初始能量值,该初始能量值的大小能够区分种子用户和非种子用户对周围用户附加的影响权重,初始能量值越高说明与典型种子用户越相似,越有可能成为典型种子用户,故本发明实施例中预设的种子用户的初始能量值大于非种子用户的初始能量值。
S103:针对训练用户集合中的每一个训练用户,从所述种子用户集合以及抽样的非种子用户集合中,确定该训练用户的K最近邻,根据所述K最近邻中每一个用户的初始能量值,确定该训练用户训练后的能量值。
一个种子用户的K最近邻中,如果有很多非种子用户,则该种子用户很可能是非典型种子用户;相反,如果一个种子用户的K最近邻中,如果有很多种子用户,则该种子用户很可能是典型种子用户。
由于K最近邻中包括种子用户和/或非种子用户,若一个种子用户为典型种子用户,则该种子用户的K最近邻中的种子用户数量一般会多于非种子用户的数量。然而,若一个种子用户为非典型种子用户,则该种子用户的K最近邻中的种子用户数量一般会少于非种子用户的数量。故,本发明实施例中可将所述种子用户集合中的用户,或者种子用户集合中的用户和抽样的非种子用户集合中的用户作为训练用户,并从所述种子用户集合以及抽样的非种子用户集合中,确定该训练用户的K最近邻,根据所述K最近邻中每一个用户的初始能量值,对训练用户的初始能量值进行训练,并得到典型种子训练后的的能量值高于非典型种子训练后的的能量值的训练结果。
本发明实施例中为得到典型种子的能量值高于非典型种子的能量值的训练结果,可采用如下方式根据所述K最近邻中每一个用户的初始能量值,确定训练用户训练后的能量值:
利用K最近邻中用户的初始能量值均值与所述训练用户的初始能量值之和,确所述训练用户训练后的能量值。例如可采用如下公式,确所述训练用户训练后的能量值:
Figure PCTCN2017104098-appb-000001
其中,init(Useri)为训练用户i的初始能量值,Userl为训练用户i的K最近邻中的用户l,energy(Useri)为训练用户i训练后的能量值,energy(Userl)为用户l的初始能量值,k为训练用户i的K最近邻中的用户数量。
本发明实施例中,以图2和图3所示的用户行为特征数据为例对确定训练用户训练后的能量值的过程进行说明。
图3中,种子用户集合中包括用户2、用户9、用户10、用户11、用户12、用户13和用户14。抽样的非种子用户集合中包括用户1、用户3、用户4、用户5、用户7、用户16 和用户17等。全量的非种子用户集合中除包括上述抽样的非种子用户集合,还包括诸如用户6、用户8、用户15等。
本发明实施例中训练用户为种子用户集合中包括的用户2、用户9、用户10、用户11、用户12、用户13和用户14,或者还包括抽样的非种子用户集合中包括的用户1、用户3、用户4、用户5、用户7、用户16和用户17等。
首先,从所述种子用户集合和抽样的非种子用户集合中,确定训练用户的K最近邻用户。本发明实施例中可采用欧式距离计算二维空间内两点间的距离,确定训练用户的K最近邻用户。例如,对于图3中的用户2而言,确定K最近邻用户时,分别计算用户2与其它用户之间的欧式距离,例如对于用户1和用户2而言,二者的欧式距离为
Figure PCTCN2017104098-appb-000002
在计算得到的结果中选取欧式距离最小的K个用户作为K最近邻用户。
其次,利用所述K最近邻中每一个用户的初始能量值,对训练用户的初始能量值进行训练,得到训练用户训练后的能量值。
例如,当K=3时,种子用户2的三个最近邻用户分别为用户1、用户3和用户4,用户2训练后的能量值为0.8+(0.2+0.2+0.2)/3=1.0。抽样的非种子用户4的K最近邻为用户2,用户3和用户5,用户4训练后的能量值为0.2+(0.8+0.2+0.2)/3=0.6。
本发明实施例中可采用上述训练过程,利用所述K最近邻中每一个用户的初始能量值,对训练用户的初始能量值进行训练,得到全部训练用户训练后的能量值。例如,用户14的三个最近邻用户分别为用户15、用户16和用户17。用户14训练后的能量值为0.8+(0.8+0.2+0.2)/3=1.2。抽样的非种子用户5的K最近邻为用户3,用户4和用户7,用户5训练后的能量值为0.2+(0.2+0.2+0.2)/3=0.4。对应的,用户16训练后的能量值为0.2+(0.8+0.8+0.2)/3=0.8,用户17训练后的能量值为0.2+(0.8+0.8+0.2)/3=0.8。
可以理解的是,本发明实施例中确定训练用户的K最近邻用户过程中,可基于全量非种子用户集合中的全部非种子用户来确定,也可基于抽样的非种子用户集合来确定。本发明实施例中考虑到计算复杂度的情况下,优选基于抽样的非种子用户集合来确定,以降低计算复杂度。
S104:根据所述训练用户集合中每一个训练用户训练后的能量值,确定全量非种子用户集合中每一非种子用户的能量值,得到非种子用户的预测能量值。
本发明实施例中,所述训练用户集合中至少包括种子用户集合,也可以包括种子用户集合和抽样的非种子用户集合。
本发明实施例中可针对所述全量非种子用户集合中的每一非种子用户,从所述种子用户集合以及抽样的非种子用户集合中,确定该非种子用户的K最近邻;根据该非种子用户的K最近邻中每一个用户的能量值,确定该非种子用户的预测能量值。
其中,针对非种子用户的K最近邻用户中的每一个用户,若该用户为种子用户,则该种子用户的能量值为种子用户训练后的能量值。若非种子用户具有与典型种子用户相似的行为特征数据,则该非种子用户的K最近邻用户中存在典型种子用户的几率相比存在非典型种子用户的几率会比较大,并且本发明实施例中利用种子用户训练后的能量值,使得该非种子用 户的K最近邻用户中典型种子用户训练后的能量值大于非典型种子用户训练后的能量值,进而利用非种子用户的K最近邻中每一个用户的能量值,确定得到的该与典型种子用户具有相似行为特征数据的非种子用户的预测能量值,相对与非典型种子用户具有相似行为特征数据的非种子用户的预测能量值而言,也会相对较大。
其中,针对非种子用户的K最近邻用户中的每一个用户,若存在非种子用户,则该非种子用户的能量值可以是非种子用户的初始能量值,也可以是非种子用户训练后的能量值。
可选的,本发明实施例中为进一步提高确定潜在用户的准确度,确定非种子用户的预测能量值过程中,若训练用户集合中包括种子用户集合和抽样的非种子用户集合,则可以基于该非种子用户的K最近邻中每一个用户训练后的能量值,即针对非种子用户的K最近邻用户中的每一个用户,若该用户为种子用户,则该种子用户的能量值为种子用户训练后的能量值,若该用户为非种子用户,则该非种子用户的能量值为非种子用户训练后的能量值。
本发明实施例中,可采用如下方式根据非种子用户的K最近邻中每一个用户的能量值,确定该非种子用户的预测能量值:
将K最近邻中用户的能量值均值与所述非种子用户预设的初始能量值之和,确定为非种子用户的预测能量值。例如可采用如下公式,确定非种子用户的预测能量值:
Figure PCTCN2017104098-appb-000003
其中,energy(Useri)为非种子用户i的预测能量值,init(Useri)为非种子用户i的初始能量值,Usern为非种子用户i的K最近邻用户,energy(Usern)为非种子用户i的K最近邻用户n的能量值,k为非种子用户i的K最近邻用户数量。
其中,用户n若为种子用户,则该用户n的能量值为该种子用户训练后的能量值。
本发明实施例中,用户n若为非种子用户,则该用户n的能量值可以为非种子用户的初始能量值,也可以为非种子用户训练后的能量值。
例如,全量非种子用户集合中的用户3的K最近邻为用户2,用户4和用户5,若非种子用户的K最近邻中用户的非种子用户能量值为非种子用户的初始能量值,则用户3的预测能量值为0.2+(1.0+0.2+0.2)/3=0.667;相应的可以得到全量非种子用户集合中用户15的预测能量值为0.2+(1.2+0.2+0.2)/3=0.73。
再例如,全量非种子用户集合中用户3的K最近邻为用户2,用户4和用户5,若非种子用户的K最近邻中用户的非种子用户能量值为抽样的非种子用户训练后的能量值,则用户3的预测能量值为0.2+(1.0+0.6+0.4)/3=0.867;相应的可以得到全量非种子用户集合中用户15的预测能量值为0.2+(1.2+0.8+0.8)/3=1.133。
S105:根据所述非种子用户的预测能量值,在所述全量非种子用户集合中确定潜在用户。
本发明实施例中确定出全量非种子用户集合中每一非种子用户的预测能量值后,可按照非种子用户的预测能量值大小,在所述全量非种子用户集合中确定潜在用户。一种实施方式中,可选择预测能量值大于预设阈值的非种子用户为潜在用户。其中,所述阈值的具体设定方式本发明实施例不做限定,例如可根据经验值设定,也可通过机器学习的方式根据全量非种子用户集合中每一非种子用户的预测能量值的大小设定。另一种实施方式中,也可按照所 述非种子用户集合中每一非种子用户的预测能量值由大到小的顺序,选择排列顺序靠前的设定数量的非种子用户为潜在用户。其中,所述数量的具体设定方式本发明实施例不做限定,例如可根据经验值设定,也可通过机器学习的方式根据全量非种子用户集合中非种子用户的数量来设定。例如,用户15的预测能量值大于用户3的预测能量值,若确定一个潜在用户,则可确定用户15与典型种子用户更为相似,可将用户15确定为潜在用户。
本发明实施例上述通过对训练用户集合中的每一个训练用户进行训练,得到训练用户训练后的能量值,基于训练用户训练后的能量值,确定得到全量非种子用户集合中每一非种子用户的预测能量值,非种子用户的预测能量值大小能够反映该非种子用户具有与典型种子用户的相似程度,基于非种子用户的预测能量值大小,在所述全量非种子用户集合中确定潜在用户,能够降低非典型种子用户造成的噪声影响,进而能够提高应用推送的准确度。
本发明实施例,基于种子用户集合和抽样的非种子用户集合中的用户能量值,确定全量非种子用户集合中每一非种子用户的预测能量值的过程,并不限于上述实施例涉及的执行过程,例如,本发明实施例中还可在确定了全量非种子用户集合中非种子用户的K最近邻用户后,再根据所述种子用户集合中的种子用户的初始能量值和所述非种子用户集合中的非种子用户的初始能量值,确定非种子用户的K最近邻中每一个用户的更新后能量值,根据该非种子用户的K最近邻中每一个用户的更新后能量值,确定该非种子用户的预测能量值。
图5所示为本发明实施例提供的扩展潜在用户的另一种方法实施流程图,图5所示的扩展潜在用户方法中,S201、S202和S206执行步骤与图4中S101、S102和S105相同,在此不再赘述,以下仅就不同之处进行说明:
S203:对于全量非种子用户集合中的每一个非种子用户,从所述种子用户集合以及抽样的非种子用户集合中,确定该非种子用户的K最近邻。
本发明实施例中确定非种子用户的K最近邻的实施过程,与上述实施例中确定训练用户的K最近邻用户的实施过程类似,在此不再赘述。
S204:根据所述种子用户集合中的种子用户的初始能量值和所述非种子用户集合中的非种子用户的初始能量值,确定非种子用户的K最近邻中每一个用户的更新后能量值。
本发明实施例中确定非种子用户的K最近邻中每一个用户的更新后能量值的实施过程,与上述实施例中涉及的利用训练用户K最近邻中每一个用户的初始能量值,对训练用户的初始能量值进行训练,得到训练用户训练后的能量值的实施过程类似,不同之处仅在于上述实施例中为训练用户训练后的能量值,本发明实施例中为非种子用户的K最近邻中每一个用户的更新后能量值,在此不再赘述。
S205:根据该非种子用户的K最近邻中每一个用户的更新后能量值,确定该非种子用户的预测能量值。
基于上述实施例涉及的扩展潜在用户的方法,本发明实施例还提供了一种扩展潜在用户的装置。可以理解的是,扩展潜在用户的装置为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。结合本发明中所公开的实施例描述的各示例的单元及算法步骤,本发明实施例能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同的方法来实现所描述的功能,但是这种实现不应认为超出本发明实施例的技术方案的范围。
本发明实施例可以根据上述方法示例对扩展潜在用户的装置进行功能单元的划分,例 如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本发明实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用集成的单元的情况下,图6示出了本发明实施例提供的扩展潜在用户的装置100的简化功能方框图,如图6所示,扩展潜在用户的装置100包括获取单元101和处理单元102,其中:
获取单元101,用于获取种子用户集合以及抽样的非种子用户集合,其中,所述抽样的非种子用户集合为全量非种子用户集合的子集。
处理单元102,用于设置所述获取单元101获取的所述种子用户集合中的种子用户的初始能量值和所述抽样的非种子用户集合中的非种子用户的初始能量值;其中,所述种子用户的初始能量值大于所述非种子用户的初始能量值;针对至少包括种子用户集合的训练用户集合中的每一个训练用户,从所述种子用户集合以及抽样的非种子用户集合中,确定该训练用户的K最近邻,根据所述K最近邻中每一个用户的初始能量值,确定该训练用户训练后的能量值;根据所述训练用户集合中每一个训练用户训练后的能量值,确定所述全量非种子用户集合中每一非种子用户的能量值,得到非种子用户的预测能量值;根据所述预测能量值,确定该非种子用户是否为潜在用户。
其中,所述处理单元102,可针对所述全量非种子用户集合中的每一非种子用户,从所述种子用户集合以及抽样的非种子用户集合中,确定该非种子用户的K最近邻;根据该非种子用户的K最近邻中每一个用户的能量值,确定该非种子用户的预测能量值。
其中,对于该非种子用户的K最近邻中的每一个用户,若该用户为种子用户,则该种子用户的能量值为种子用户训练后的能量值;若该用户为非种子用户,则该非种子用户的能量值为非种子用户的初始能量值。
其中,若训练用户集合中包括种子用户集合和抽样的非种子用户集合,则该非种子用户的K最近邻中每一个用户的能量值可以为该非种子用户的K最近邻中每一个用户训练后的能量值。
可选的,所述处理单元102,可利用K最近邻中用户的初始能量值均值与所述训练用户的初始能量值之和,更新所述训练用户的能量值,将更新得到的所述训练用户的能量值,确定为该训练用户训练后的能量值。
当采用硬件形式实现时,本发明实施例中,处理单元102可以是处理器或控制器。获取单元101可以是通信接口、收发器、收发电路等,其中,通信接口是统称,可以包括一个或多个接口。
当所述处理单元102是处理器,获取单元101是通信接口时,本发明实施例所涉及的扩展潜在用户的装置100可以为图7所示的结构。
图7示出了本发明实施例提供的扩展潜在用户的装置1000的一种结构示意图。参阅图7所示,扩展潜在用户的装置1000采用通用计算机系统结构,包括总线,处理器1001,存储器1002和通信接口1003,执行本发明方案的程序代码保存在存储器1002中,并由处理器1001来控制执行。
总线可包括一通路,在计算机各个部件之间传送信息。
处理器1001可以是一个通用中央处理器(CPU),微处理器,特定应用集成电路 application-specific integrated circuit(ASIC),或一个或多个用于控制本发明方案程序执行的集成电路。计算机系统中包括的一个或多个存储器,可以是只读存储器read-only memory(ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器random access memory(RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是磁盘存储器。这些存储器通过总线与处理器相连接。
通信接口1003,可以使用任何收发器一类的装置,以便与其他设备或通信网络通信,如以太网,无线接入网(RAN),无线局域网(WLAN)等.
存储器1002,如RAM,保存有操作系统和执行本发明方案的程序。操作系统是用于控制其他程序运行,管理系统资源的程序。
存储器1002中存储的程序用于指令处理器1001执行扩展潜在用户的方法,包括:通过通信接口1003获取种子用户集合以及抽样的非种子用户集合。设置获取的所述种子用户集合中的种子用户的初始能量值和所述抽样的非种子用户集合中的非种子用户的初始能量值;其中,所述种子用户的初始能量值大于所述非种子用户的初始能量值;针对训练用户集合中的每一个训练用户,从所述种子用户集合以及抽样的非种子用户集合中,确定该训练用户的K最近邻,根据所述K最近邻中每一个用户的初始能量值,确定该训练用户训练后的能量值;根据所述训练用户集合中每一个训练用户训练后的能量值,确定所述全量非种子用户集合中每一非种子用户的能量值,得到非种子用户的预测能量值;根据所述预测能量值,确定该非种子用户是否为潜在用户。
本发明实施例中,扩展潜在用户的装置100和扩展潜在用户的装置1000所涉及的与本发明实施例提供的技术方案相关的概念,解释和详细说明及其他步骤请参见前述方法或其他实施例中关于这些内容的描述,此处不做赘述。
可以理解的是,本发明实施例附图中仅仅示出了扩展潜在用户的装置的简化设计。在实际应用中,并不限于上述结构。
本发明实施例还提供一种计算机存储介质,用于存储一些指令,这些指令被执行时,可以完成上述实施例所涉及的任意一种扩展潜在用户的方法。
上述通过对种子用户集合和抽样的非种子用户集合中的每一个用户设置初始能量值,并对至少包括种子用户集合的训练用户集合中的每一个用户进行训练,得到训练后的能量值,基于训练后的能量值,确定得到全量非种子用户集合中每一非种子用户的预测能量值。非种子用户的预测能量值大小,能够反映该非种子用户具有与典型种子用户的相似程度,基于非种子用户的预测能量值大小,在所述全量非种子用户集合中确定潜在用户,能够降低非典型种子用户造成的噪声影响,进而能够提高应用推送的准确度。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (11)

  1. 一种扩展潜在用户的方法,其特征在于,包括:
    获取种子用户集合以及抽样的非种子用户集合,其中,所述抽样的非种子用户集合为全量非种子用户集合的子集;
    设置所述种子用户集合中的种子用户的初始能量值和所述抽样的非种子用户集合中的非种子用户的初始能量值;其中,所述种子用户的初始能量值大于所述非种子用户的初始能量值;
    针对训练用户集合中的每一个训练用户,从所述种子用户集合以及抽样的非种子用户集合中,确定该训练用户的K最近邻,并根据所述K最近邻中每一个用户的初始能量值,确定该训练用户训练后的能量值,其中,所述训练用户集合中至少包括所述种子用户集合;
    根据所述训练用户集合中每一个训练用户训练后的能量值,确定所述全量非种子用户集合中每一非种子用户的能量值,得到非种子用户的预测能量值;
    根据所述非种子用户的预测能量值,在所述全量非种子用户集合中确定潜在用户。
  2. 如权利要求1所述的方法,其特征在于,根据所述训练用户集合中每一个训练用户训练后的能量值,确定所述全量非种子用户集合中每一非种子用户的能量值,包括:
    针对所述全量非种子用户集合中的每一非种子用户,从所述种子用户集合以及抽样的非种子用户集合中,确定该非种子用户的K最近邻;
    根据该非种子用户的K最近邻中每一个用户的能量值,确定该非种子用户的预测能量值。
  3. 如权利要求2所述的方法,其特征在于,该非种子用户的K最近邻中每一个用户的能量值,具体为:
    对于该非种子用户的K最近邻中的每一个用户,若该用户为种子用户,则该种子用户的能量值为种子用户训练后的能量值;若该用户为非种子用户,则该非种子用户的能量值为非种子用户的初始能量值。
  4. 如权利要求2所述的方法,其特征在于,所述训练用户集合中还包括所述抽样的非种子用户集合;
    该非种子用户的K最近邻中每一个用户的能量值,具体为:
    该非种子用户的K最近邻中每一个用户训练后的能量值。
  5. 如权利要求1至4任一项所述的方法,其特征在于,根据所述K最近邻中每一个用户的初始能量值,确定该训练用户训练后的能量值,具体为:
    利用K最近邻中用户的初始能量值均值与所述训练用户的初始能量值之和,确定所述训练用户训练后的能量值。
  6. 一种扩展潜在用户的装置,其特征在于,包括:
    获取单元,用于获取种子用户集合以及抽样的非种子用户集合,其中,所述抽样的非种子用户集合为全量非种子用户集合的子集;
    处理单元,用于设置所述获取单元获取的所述种子用户集合中的种子用户的初始能量值和所述抽样的非种子用户集合中的非种子用户的初始能量值;其中,所述种子用户的初始能量值大于所述非种子用户的初始能量值;针对训练用户集合中的每一个训练用户,从所述种子用户集合以及抽样的非种子用户集合中,确定该训练用户的K最近邻,并根据所述K最近邻中每一个用户的初始能量值,确定该训练用户训练后的能量值,其中,所述训练用户集合中包括所述种子用户集合;根据所述训练用户集合中每一个训练用户训练后的能量值,确定 所述全量非种子用户集合中每一非种子用户的能量值,得到非种子用户的预测能量值;根据所述非种子用户的预测能量值,在所述全量非种子用户集合中确定潜在用户。
  7. 如权利要求6所述的装置,其特征在于,所述处理单元,采用如下方式根据所述训练用户集合中每一个训练用户训练后的能量值,确定所述全量非种子用户集合中每一非种子用户的能量值:
    针对所述全量非种子用户集合中的每一非种子用户,从所述种子用户集合以及抽样的非种子用户集合中,确定该非种子用户的K最近邻;
    根据该非种子用户的K最近邻中每一个用户的能量值,确定该非种子用户的预测能量值。
  8. 如权利要求7所述的装置,其特征在于,该非种子用户的K最近邻中每一个用户的能量值,具体为:
    对于该非种子用户的K最近邻中的每一个用户,若该用户为种子用户,则该种子用户的能量值为种子用户训练后的能量值;若该用户为非种子用户,则该非种子用户的能量值为非种子用户的初始能量值。
  9. 如权利要求7所述的装置,其特征在于,所述训练用户集合中还包括所述抽样的非种子用户集合,该非种子用户的K最近邻中每一个用户的能量值,具体为:该非种子用户的K最近邻中每一个用户训练后的能量值。
  10. 如权利要求6至9任一项所述的装置,其特征在于,所述处理单元,采用如下方式根据所述K最近邻中每一个用户的初始能量值,确定该训练用户训练后的能量值:
    利用K最近邻中用户的初始能量值均值与所述训练用户的初始能量值之和,更新所述训练用户的能量值。
  11. 一种扩展潜在用户的装置,其特征在于,包括:处理器和存储器,其中,
    所述存储器中存有计算机可读程序;
    所述处理器通过运行所述存储器中的程序,以用于完成上述权利要求1至5任一项所述的方法。
PCT/CN2017/104098 2016-11-29 2017-09-28 一种扩展潜在用户的方法及装置 WO2018099177A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611075513.4A CN108122123B (zh) 2016-11-29 2016-11-29 一种扩展潜在用户的方法及装置
CN201611075513.4 2016-11-29

Publications (1)

Publication Number Publication Date
WO2018099177A1 true WO2018099177A1 (zh) 2018-06-07

Family

ID=62225941

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/104098 WO2018099177A1 (zh) 2016-11-29 2017-09-28 一种扩展潜在用户的方法及装置

Country Status (2)

Country Link
CN (1) CN108122123B (zh)
WO (1) WO2018099177A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113536848A (zh) * 2020-04-17 2021-10-22 中国移动通信集团广东有限公司 一种数据处理方法、装置及电子设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111973996B (zh) * 2020-08-20 2024-03-12 腾讯科技(上海)有限公司 一种游戏资源投放方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120054040A1 (en) * 2010-08-30 2012-03-01 Abraham Bagherjeiran Adaptive Targeting for Finding Look-Alike Users
CN105260414A (zh) * 2015-09-24 2016-01-20 精硕世纪科技(北京)有限公司 用户行为相似性计算方法及装置
CN105404947A (zh) * 2014-09-02 2016-03-16 阿里巴巴集团控股有限公司 用户质量侦测方法及装置
CN105447730A (zh) * 2015-12-25 2016-03-30 腾讯科技(深圳)有限公司 目标用户定向方法及装置
CN105550903A (zh) * 2015-12-25 2016-05-04 腾讯科技(深圳)有限公司 目标用户确定方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101685458B (zh) * 2008-09-27 2012-09-19 华为技术有限公司 一种基于协同过滤的推荐方法和系统
CN103377242B (zh) * 2012-04-25 2016-06-22 Tcl集团股份有限公司 用户行为分析方法、分析预测方法及电视节目推送系统
CN105447038A (zh) * 2014-08-29 2016-03-30 国际商业机器公司 用于获取用户特征的方法和系统
CN104751354B (zh) * 2015-04-13 2018-06-26 合一信息技术(北京)有限公司 一种广告人群筛选方法
CN106022800A (zh) * 2016-05-16 2016-10-12 北京百分点信息科技有限公司 一种用户特征数据的处理方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120054040A1 (en) * 2010-08-30 2012-03-01 Abraham Bagherjeiran Adaptive Targeting for Finding Look-Alike Users
CN105404947A (zh) * 2014-09-02 2016-03-16 阿里巴巴集团控股有限公司 用户质量侦测方法及装置
CN105260414A (zh) * 2015-09-24 2016-01-20 精硕世纪科技(北京)有限公司 用户行为相似性计算方法及装置
CN105447730A (zh) * 2015-12-25 2016-03-30 腾讯科技(深圳)有限公司 目标用户定向方法及装置
CN105550903A (zh) * 2015-12-25 2016-05-04 腾讯科技(深圳)有限公司 目标用户确定方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113536848A (zh) * 2020-04-17 2021-10-22 中国移动通信集团广东有限公司 一种数据处理方法、装置及电子设备
CN113536848B (zh) * 2020-04-17 2024-03-19 中国移动通信集团广东有限公司 一种数据处理方法、装置及电子设备

Also Published As

Publication number Publication date
CN108122123A (zh) 2018-06-05
CN108122123B (zh) 2021-08-20

Similar Documents

Publication Publication Date Title
CN108563722B (zh) 文本信息的行业分类方法、系统、计算机设备和存储介质
CN108415952B (zh) 用户数据存储方法、标签计算方法及计算设备
WO2019153551A1 (zh) 文章分类方法、装置、计算机设备及存储介质
CN105608179B (zh) 确定用户标识的关联性的方法和装置
US20180336401A1 (en) Identifying unknown person instances in images
EP2461273A2 (en) Method and system for machine-learning based optimization and customization of document similarities calculation
CN110008474B (zh) 一种关键短语确定方法、装置、设备及存储介质
CN110544109A (zh) 用户画像生成方法、装置、计算机设备和存储介质
WO2022148038A1 (zh) 信息推荐方法及装置
WO2019061664A1 (zh) 电子装置、基于用户上网数据的产品推荐方法及存储介质
US20210014124A1 (en) Feature-based network embedding
CN111859093A (zh) 敏感词处理方法、装置及可读存储介质
US11115338B2 (en) Intelligent conversion of internet domain names to vector embeddings
WO2018099177A1 (zh) 一种扩展潜在用户的方法及装置
US20170229118A1 (en) Linguistic model database for linguistic recognition, linguistic recognition device and linguistic recognition method, and linguistic recognition system
CN110147223B (zh) 组件库的生成方法、装置及设备
CN113157198A (zh) 管理缓存的方法、设备和计算机程序产品
US10776323B2 (en) Data storage for mobile terminals
US20220129771A1 (en) Methods and systems for privacy preserving inference generation in a distributed computing environment
WO2018166499A1 (zh) 文本分类方法、设备和存储介质
US20210312265A1 (en) Response Generation using Memory Augmented Deep Neural Networks
US20230085697A1 (en) Method, apparatus and computer program product for graph-based encoding of natural language data objects
CN111368864A (zh) 识别方法、可用性评估方法及装置、电子设备、存储介质
CN111860655B (zh) 用户的处理方法、装置和设备
US20220399120A1 (en) Method, apparatus and computer program product for providing a multi-omics framework for estimating temporal disease trajectories

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17876399

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17876399

Country of ref document: EP

Kind code of ref document: A1