CN109034853B - Method, device, medium and electronic equipment for searching similar users based on seed users - Google Patents

Method, device, medium and electronic equipment for searching similar users based on seed users Download PDF

Info

Publication number
CN109034853B
CN109034853B CN201710431844.5A CN201710431844A CN109034853B CN 109034853 B CN109034853 B CN 109034853B CN 201710431844 A CN201710431844 A CN 201710431844A CN 109034853 B CN109034853 B CN 109034853B
Authority
CN
China
Prior art keywords
user
users
sku
similar
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710431844.5A
Other languages
Chinese (zh)
Other versions
CN109034853A (en
Inventor
赫南
陈英杰
黄坤
孙振鹏
陈敏
郭谦
温园旭
黄超
胡景贺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710431844.5A priority Critical patent/CN109034853B/en
Publication of CN109034853A publication Critical patent/CN109034853A/en
Application granted granted Critical
Publication of CN109034853B publication Critical patent/CN109034853B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement

Abstract

The disclosure relates to a method, a device, a storage medium and an electronic device for searching similar users based on seed users. The method comprises the following steps: acquiring a SKU corresponding to a user behavior, and performing discrete vectorization processing on the SKU to obtain SKU characteristic data; acquiring first characteristic data of a user, and performing preset model training according to the first characteristic data and the SKU characteristic data to obtain a similar user prediction model; wherein the first characteristic data is preset discrete characteristic data different from the SKU; and acquiring preset user information, and predicting and determining a similar user associated with the preset user according to the preset user information and the similar user prediction model. The method and the device can solve the problems of low efficiency and long time consumption when similar users are expanded, and improve the generalization capability and the expansion effect of the model.

Description

Method, device, medium and electronic equipment for searching similar users based on seed users
Technical Field
The present disclosure relates to the field of mobile internet technologies, and in particular, to a method for searching for similar users based on a seed user, a device for searching for similar users based on a seed user, and a computer-readable storage medium and an electronic device for implementing the method for searching for similar users based on a seed user.
Background
In the current mobile internet era, in order to improve brand influence and promote commodity sales, some e-commerce can excavate similar users having direct or potential relation with commodities popularized by advertisers by means of an accurate targeting technology according to various behaviors of the users on the internet. This technical approach to mining users is also referred to collectively as audience targeting techniques.
In the related technology, some advertising platforms provide an advertising audience targeting technology of 'new customer recommendation', and similar population expansion is performed in a 'person-to-person' mode by using accurate customer data of an advertiser and other user behavior data owned by the advertising platform, namely a Lookalike technology.
Although a plurality of implementation schemes of the existing Lookalike technology exist, the Lookalike technology still has some problems, for example, the expansion of the algorithm is time-consuming and inefficient, and the amount of expanded users is difficult to estimate. In addition, the algorithm error is large, so that some similar users are discarded in advance, and the accuracy of the expansion is affected, so that the users similar to the seed user cannot get the advertisement exposure. Therefore, there is a need to provide a new technical solution to improve one or more of the problems in the above solutions.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
An object of the present disclosure is to provide a method for finding similar users based on seed users, an apparatus for finding similar users based on seed users, and a computer-readable storage medium and an electronic device implementing the method for finding similar users based on seed users, thereby overcoming, at least to some extent, one or more of the problems due to the limitations and disadvantages of the related art.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to a first aspect of the embodiments of the present disclosure, a method for finding similar users based on seed users is provided, the method including:
acquiring a SKU corresponding to a user behavior, and performing discrete vectorization processing on the SKU to obtain SKU characteristic data;
acquiring first characteristic data of a user, and performing preset model training according to the first characteristic data and the SKU characteristic data to obtain a similar user prediction model; wherein the first characteristic data is preset discrete characteristic data different from the SKU;
and acquiring preset user information, and predicting and determining a similar user associated with the preset user according to the preset user information and the similar user prediction model.
In an exemplary embodiment of the present disclosure, the acquiring a SKU corresponding to a user behavior, and performing discrete vectorization processing on the SKU to obtain SKU feature data includes:
acquiring a plurality of SKUs corresponding to user behaviors, and performing discrete vectorization processing on each SKU in the plurality of SKUs to obtain a vector corresponding to each SKU;
and averaging the vectors corresponding to each SKU, and taking the average value as the characteristic data of the SKU.
In an example embodiment of the present disclosure, the SKU corresponding to the user action includes at least one of a purchase SKU and a browse SKU.
In an exemplary embodiment of the disclosure, the first characteristic data comprises user attribute information data, the user attribute information data being determined from registration information data and/or user behavior data of the user.
In an exemplary embodiment of the present disclosure, the similar user prediction model comprises a logistic regression LR model;
the obtaining of a similar user prediction model through preset model training according to the first feature data and the SKU feature data comprises:
and inputting the first characteristic data and the SKU characteristic data into a preset LR model training tool, and performing model training by using an LBFGS algorithm to obtain the logistic regression LR model.
In an exemplary embodiment of the present disclosure, the method further comprises:
before the model training is carried out, the seed users are used as positive samples, random sampling is carried out, and partial users in all the users are obtained as negative samples;
and putting the positive sample, the negative sample, the first feature data and the SKU feature data into a training set, and carrying out subsequent model training based on the training set.
In an exemplary embodiment of the present disclosure, the obtaining of a preset user information and the determining, according to the preset user information and the similar user prediction model, of a similar user associated with the preset user includes:
determining user behavior time according to the preset user information so as to take users in a preset time period as active users to be added into a candidate set;
predicting the candidate set by using the logistic regression LR model to obtain a probability value of each user similar to the seed user;
sorting the probability values of the users similar to the seed user, and sequentially selecting the users corresponding to the N probability values which are sorted at the top as the extended similar users; wherein N is a natural number.
In an exemplary embodiment of the present disclosure, the method further comprises:
and updating the active users in the candidate set at preset time intervals.
In an exemplary embodiment of the present disclosure, the method further comprises:
and adjusting the value of the N according to the number of the preset expanded users so as to adjust the expansion amount of the similar users.
According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for finding similar users based on seed users, the apparatus including:
the discrete vectorization module is used for acquiring a SKU corresponding to the user behavior and performing discrete vectorization processing on the SKU to obtain SKU characteristic data;
the model training module is used for acquiring first characteristic data of a user and carrying out preset model training according to the first characteristic data and the SKU characteristic data to obtain a similar user prediction model; wherein the first characteristic data is preset discrete characteristic data different from the SKU;
and the user extension module is used for acquiring preset user information and determining the similar user associated with the preset user according to the preset user information and the similar user prediction model.
According to a third aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method for finding similar users based on seed users in any of the above embodiments.
According to a fourth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the steps of the method for finding similar users based on seed users in any of the above embodiments via execution of the executable instructions.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
in an embodiment of the disclosure, by the method and the device for finding similar users based on seed users, discrete vectorization processing is performed on SKUs corresponding to a large number of discrete user behaviors to obtain SKU feature data, and meanwhile, preset discrete feature data different from the SKU, namely first feature data, is obtained, so that combined user features of discrete features and the discrete vectorized SKU features are obtained, and are used as input to perform model training and similar user expansion based on a trained model. Therefore, the feature space is compressed by using the discrete vectorization technology, the model training time is greatly shortened, and the model generalization capability is improved, so that the problems of low efficiency and long time consumption when similar users are expanded by using the Lookalike technology can be solved, and the generalization capability and the expansion effect of the model are improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
FIG. 1 schematically illustrates a flow chart of a method for finding similar users based on seed users in an exemplary embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a method for finding similar users based on seed users in an exemplary embodiment of the present disclosure;
FIG. 3 is a schematic diagram illustrating a discrete feature vectorization training network structure in an exemplary embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow chart of a method for finding similar users based on seed users in an exemplary embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow chart of a method for finding similar users based on seed users in an exemplary embodiment of the present disclosure;
FIG. 6 is a schematic diagram illustrating an application scenario for finding similar users in an exemplary embodiment of the present disclosure;
FIG. 7 is a schematic diagram illustrating a device for finding similar users based on seed users in an exemplary embodiment of the disclosure;
FIG. 8 schematically illustrates a computer-readable storage medium in an exemplary embodiment of the disclosure;
fig. 9 schematically illustrates an electronic device in an exemplary embodiment of the disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The exemplary embodiment first provides a method for finding similar users based on seed users. Referring to fig. 1, the method may include:
step S101: acquiring a SKU (Stock Keeping Unit) corresponding to the user behavior, and performing discrete vectorization processing on the SKU to obtain SKU characteristic data.
Step S102: and acquiring first characteristic data of a user, and performing preset model training according to the first characteristic data and the SKU characteristic data to obtain a similar user prediction model. Wherein the first characteristic data is preset discrete characteristic data different from the SKU.
Step S103: and acquiring preset user information, and predicting and determining a similar user associated with the preset user according to the preset user information and the similar user prediction model.
By the method for searching for similar users based on the seed users, the combined user characteristics of the discrete characteristics and the discrete vectorized SKU characteristics are obtained, the combined user characteristics are used as input to conduct model training, and similar user expansion is conducted based on the trained models. Therefore, the feature space is compressed by using the discrete vectorization technology, the model training time is greatly shortened, and the model generalization capability is improved, so that the problems of low efficiency and long time consumption when similar users are expanded by using the Lookalike technology can be solved, and the generalization capability and the expansion effect of the model are improved.
Hereinafter, the respective steps of the above-described method in the present exemplary embodiment will be described in more detail with reference to fig. 1 to 6.
In step S101, a SKU corresponding to the user behavior is obtained, and discrete vectorization processing is performed on the SKU to obtain SKU feature data.
In this exemplary embodiment, the SKU corresponding to the user behavior may include, but is not limited to, at least one of a SKU purchased and a SKU browsed by the user on the e-commerce website, for example. The e-commerce SKU may include e-commerce item name data and e-commerce item-related data associated with each e-commerce item name. For example, the SKU may include a product name such as "shampoo" and product-related data associated therewith such as "euryale", "supple", and "moisturize", among others. These characteristic data may be used to determine similarities between users, such as the fact that all like the same brand of shampoo, etc. The characteristic data are discrete characteristics, hundreds of millions of SKUs are usually available on an E-commerce platform, the user characteristics are huge in dimension if the discrete characteristics are directly used, the discrete characteristics are extremely sparse, and operation cannot be performed between different SKUs due to the fact that semantic relevance does not exist between the discrete SKUs.
In the present exemplary embodiment, the inventor pioneers a word2vec analogy idea to propose SKU2vec, i.e., a SKU vectorization concept, and vectorize SKUs. In particular, the SKU may be vectorized using a discrete vectorization embedding technique. Illustratively, each SKU may be regarded as a word, each SKU sequence viewed by a user at a certain time interval, for example, may be regarded as a doc, and a training network structure using word2vec as a training tool is shown in fig. 3. Each SKU training is mapped to a vector. Thus, the distance between the vectors corresponding to different SKUs can be used to measure the relevance, and the calculation between different SKUs of the user such as browsing is also significant.
Referring to fig. 2, in an exemplary embodiment of the present disclosure, the acquiring a SKU corresponding to a user behavior, and performing discrete vectorization processing on the SKU to obtain SKU feature data may include the following steps:
step S201: acquiring a plurality of SKUs corresponding to user behaviors, and performing discrete vectorization processing on each SKU in the SKUs to obtain a vector corresponding to each SKU.
For example, the user behavior feature generation on the SKU may be to generate user sequence data according to the user log data, for example, browse SKU sequence data, and embed the SKU to obtain a vector corresponding to each SKU, that is, to establish a SKU-to-vector mapping table.
Step S202: and averaging the vectors corresponding to each SKU, and taking the average value as the characteristic data of the SKU.
Illustratively, the user behavior, such as browsing the average value of vectors corresponding to all corresponding SKUs, is used as SKU feature data of the user, and the SKU-related feature is a user sparse feature, which is specifically shown in table 1.
TABLE 1 sparse SKU embedding feature
Figure BDA0001317522310000071
In this exemplary embodiment, by vectorizing the SKUs, vectors corresponding to all SKUs corresponding to browsing or purchasing behaviors of the user may be averaged to serve as feature data of the user in the SKU dimension, so as to achieve the purpose of compressing the feature dimension.
In step S102, first feature data of a user is obtained, and a prediction model of a similar user is obtained by performing preset model training according to the first feature data and the SKU feature data. Wherein the first characteristic data is preset discrete characteristic data different from the SKU.
In an exemplary embodiment of the present disclosure, the first feature data may include user attribute information data, which may be determined from registration information data and/or user behavior data of the user.
For example, the user attribute information data may be user portrait characteristics, that is, descriptions of various dimension tags are made to the user by using data mining technology according to the registration information of the user and various behavior data. These portrait features may complement the user's generic attributes, perfecting the features that are subsequently used in model training data. The following table 2 exemplarily shows a part of the user image characteristics.
TABLE 2
Figure BDA0001317522310000081
The first feature data may further include coarse-grained discrete features of the user, for example, the user features may be classified according to different user behaviors to form different user features, and in this embodiment, browsing behavior features and purchasing behavior features of the user are mainly used. The user's behavior may be aggregated into a tertiary category, a secondary category, a primary category of the SKU, the store to which the SKU belongs, the brand, etc. This coarse-grained characterization is more suitable for describing the long-term preferences of the user.
Through the feature generation process, combined user feature data of discrete feature data (namely, first feature data) and embedding feature data (namely, SKU feature data) is obtained. The discrete features and the embedding features are designed to be used as input for model training, so that the problems of low efficiency and long time consumption of similar population expansion of the Lookalike technology can be solved in the follow-up process; meanwhile, the model has the interpretability, generalization capability and expansion effect.
The following is a model fitting method using the feature data acquired as described above to express a user. In an exemplary embodiment of the present disclosure, the similar user prediction model may include, but is not limited to, a logistic regression LR model. In the embodiment, a logistic regression LR model is selected, which belongs to a linear model system, has good interpretability and is easy to implement. It should be noted that the user extension is to essentially find the correlation between users according to some attributes and characteristics of the users. In the embodiment, the LR model is exemplarily used for finding and extending similar users, but other methods such as rule system, clustering, depth model, etc. may be used for finding and extending similar users, which is not limited herein.
In step S102, the step of performing preset model training according to the first feature data and the SKU feature data to obtain a similar user prediction model may include the following steps: and inputting the first characteristic data and the SKU characteristic data into a preset LR model training tool, and performing model training by using an LBFGS algorithm to obtain the logistic regression LR model.
Illustratively, in order to balance the training time and the model effect, the maximum iteration number and the convergence condition can be set according to specific situations. In this embodiment, a Spark platform is exemplarily used, LR in Mllib is used as a training tool, and an LBFGS algorithm is used as a training method to train a model. Since the foregoing uses the embedding technique to compress the feature space, the LR model training time in this embodiment is shortened much, and the generalization capability is improved.
It should be noted that the prediction ability of the model is usually composed of memory ability and generalization ability, and the general linear model has better memory ability, while the deep learning model has better generalization ability. In this embodiment, SKU is mapped into a vector, so that the SKU features are changed from one-hot features, that is, 0-1 features, to continuous features, which is to introduce a non-linear factor into a linear model, which is equivalent to improving the generalization capability of the model, and to a certain extent, the overfitting phenomenon of the model can be slowed down.
On the basis of the above embodiment, in an exemplary embodiment of the present disclosure, the method may further include the following steps a to B:
step A: before the model training is carried out, the seed users are used as positive samples, random sampling is carried out, and partial users in all the users are obtained and used as negative samples.
For example, in this embodiment, the positive sample generation manner may be that a seed user specified by an advertiser or a click user after the advertisement is actually exposed may be directly used as the positive sample.
The negative examples in this embodiment may be shared negative examples. The model training actually lacks explicit negative examples, and non-positive examples are all lacking labels. Generally, random sampling can be adopted, or modes such as biased sampling according to user activity and the like are selected, and negative samples can be shared among different models.
And B: and putting the positive sample, the negative sample, the first feature data and the SKU feature data into a training set, and carrying out subsequent model training based on the training set.
For example, as shown in fig. 5, a training set is set based on the obtained user characteristics, positive samples and negative samples, and then a model can be trained using LR in mllb as a training tool and LBFGS algorithm as a training method. In the embodiment, the training data generation time can be shortened by random sampling and negative case sharing.
In step S103, a piece of preset user information is obtained, and a similar user associated with the preset user is determined according to the preset user information and the similar user prediction model prediction.
Referring to fig. 4, for example, in order to reduce the time consumption for expanding similar users, the step S103 may include the following steps:
step S401: and determining user behavior time according to the preset user information so as to take the user in a preset time period as an active user to join the candidate set.
For example, the candidate set may be selected according to the cache principle, and if a user never has a record of behavior recently on the e-commerce platform, the probability that the user behavior will occur in the future is very low. Even if the inactive users are similar to the seed user, the probability of receiving an advertising exposure is very low. Therefore, in the embodiment, users who are active in a preset time period, such as the latest month, are selected as the candidate set, the time consumption for predicting the active users is much less than that for predicting the users in the whole amount, and the unpredicted users are basically users who cannot generate internet surfing behaviors recently, so that the advertising effect and the exposure are not greatly influenced.
In an exemplary embodiment of the present disclosure, further, in order to improve accuracy of expanding similar users, the active users in the candidate set are updated at preset time intervals. For example, the candidate set may be updated continuously, for example, on a daily basis, and users that are active may be added to the candidate set in a timely manner. The embodiment supports automatic updating of the model, and comprises the steps of clicking a user after actual exposure as a return quotation of a good case, supplementing a good case set, updating feature data and regularly updating the model, and further improving the accuracy of expanding similar users.
Step S402: and predicting the candidate set by using the logistic regression LR model to obtain the probability value of each user similar to the seed user.
Illustratively, for a given user uiCorresponding feature is fiA trained LR model may estimate the probability value of the user being similar to the seed user according to the following equation:
Figure BDA0001317522310000111
similar users may then be selected based on the probability values and the amount of expansion desired by the advertiser.
Step S403: sorting the probability values of the users similar to the seed user, and sequentially selecting the users corresponding to the N probability values which are sorted at the top as the extended similar users; wherein N is a natural number.
Illustratively, the predicted result may be locally ranked by using, for example, a local ranking algorithm GenSort in a Spark platform according to the expansion amount selected by the advertiser, and Top-N users among the predicted result may be selected as expansion users.
Referring to fig. 6, on the basis of the above-described embodiment, in an exemplary embodiment of the present disclosure, the method may further include the steps of:
step S404: and adjusting the value of the N according to the number of the preset expanded users so as to adjust the expansion amount of the similar users.
Illustratively, the number of users of the preset extension can be set according to the advertisement budget, the promotion strength and the like, so that proper advertisement exposure can be ensured. The similar user expansion amount can also be adjusted according to the advertisement exposure effect, such as ROI, CTR and the like, so as to achieve the purpose of balancing the advertisement effect and the advertisement exposure amount. Therefore, the problem that the quantity of the expanded users is difficult to estimate in the related technology is solved to a certain extent, and meanwhile, the users similar to the seed users are effectively and accurately exposed to the advertisements, so that the accuracy of expansion is improved.
The method can be applied to shopping scenes of an E-commerce platform, meets the requirement that an advertiser searches for similar users according to the existing seed user orientation, can be integrated into an existing advertising system, and forms a set of feasible technical scheme which can be used for industrial production.
In this embodiment, an exemplary logistic regression method is used to expand users to meet the requirement of advertisers for targeting similar users according to existing seed users. The advertiser can adjust the expansion amount according to the advertisement effect to balance the advertisement effect and the exposure. Secondly, model training is carried out by utilizing the discrete features and the embedding features in the embodiment, so that the similar population expansion efficiency can be improved, the time consumption is reduced, and meanwhile, the generalization capability of the model is improved. And finally, a candidate set can be established by utilizing the active users, and only the candidate set is predicted during prediction, so that the exposure effect is ensured, and the time consumption of prediction is shortened.
In the embodiment of the present invention, it is supported that an advertiser uploads a core user, or an advertisement platform tag combination is used to circle a user as a seed user, so as to expand similar users, which can avoid the disadvantage that similar users are expanded by a user tag in the related art. In addition, in the embodiment, the word2vec idea is used for performing dimensionality reduction processing on the high-dimensional discrete features, so that the time consumption of model training is reduced, and the generalization capability and the expansion effect of the model are improved.
It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc. Additionally, it will also be readily appreciated that the steps may be performed synchronously or asynchronously, e.g., among multiple modules/processes/threads.
Further, in this example embodiment, an apparatus for finding similar users based on seed users is also provided. Referring to fig. 7, the apparatus 100 may include a discrete vectorization module 101, a model training module 102, and a user expansion module 103. Wherein:
the discrete vectorization module 101 is configured to obtain a SKU corresponding to a user behavior, and perform discrete vectorization processing on the SKU to obtain SKU feature data.
The model training module 102 is configured to obtain first feature data of a user, and perform preset model training according to the first feature data and the SKU feature data to obtain a similar user prediction model; wherein the first characteristic data is preset discrete characteristic data different from the SKU.
The user extension module 103 is configured to obtain preset user information, and determine a similar user associated with the preset user according to the preset user information and the similar user prediction model prediction.
In an exemplary embodiment of the present disclosure, the discrete vectorization module 101 may be configured to obtain a plurality of SKUs corresponding to user behaviors, and perform discrete vectorization processing on each SKU in the plurality of SKUs to obtain a vector corresponding to each SKU; and then averaging the vectors corresponding to each SKU, and taking the average value as the characteristic data of the SKU.
In an exemplary embodiment of the present disclosure, the SKU corresponding to the user action may include at least one of a purchase SKU and a browse SKU.
In an exemplary embodiment of the present disclosure, the first feature data may include user attribute information data, which may be determined from registration information data and/or user behavior data of the user.
In an exemplary embodiment of the present disclosure, the similar user prediction model may include, but is not limited to, a logistic regression LR model. Correspondingly, the model training module 102 may be configured to input the first feature data and the SKU feature data into a preset LR model training tool, and perform model training with an LBFGS algorithm to obtain the logistic regression LR model.
In an exemplary embodiment of the present disclosure, the apparatus 100 may further include a preprocessing module, configured to, before performing the model training, use the seed user as a positive sample, and perform random sampling to obtain a part of all users as a negative sample; the positive examples, the negative examples, and the first feature data and the SKU feature data are then placed into a training set, and the model training module 102 may be enabled to perform subsequent training of the model based on the training set.
In an exemplary embodiment of the disclosure, the user extension module 103 may be configured to: determining user behavior time according to the preset user information so as to take users in a preset time period as active users to be added into a candidate set; predicting the candidate set by using the logistic regression LR model to obtain a probability value of each user similar to the seed user; sorting the probability values of the users similar to the seed user, and sequentially selecting the users corresponding to the N probability values which are sorted at the top as the extended similar users; wherein N is a natural number.
In an exemplary embodiment of the present disclosure, the apparatus 100 may further include a user updating module, configured to update the active users in the candidate set at preset time intervals.
In an exemplary embodiment of the present disclosure, the apparatus 100 may further include a user amount adjusting module, configured to adjust the value of N according to a preset number of extended users, so as to adjust the extension amount of similar users.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units. The components shown as modules or units may or may not be physical units, i.e. may be located in one place or may also be distributed over a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the wood-disclosed scheme. One of ordinary skill in the art can understand and implement it without inventive effort.
In an exemplary embodiment of the present disclosure, a computer-readable storage medium is further provided, on which a computer program is stored, which when executed by a processor, for example, can implement the steps of the method for finding similar users based on seed users in any of the above embodiments. In some possible embodiments, aspects of the present invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the present invention as described in the seed user based finding similar users method section above of this description, when said program product is run on the terminal device.
Referring to fig. 8, a program product 300 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In an exemplary embodiment of the present disclosure, there is also provided an electronic device, which may include a processor, and a memory for storing executable instructions of the processor. Wherein the processor is configured to perform the steps of the method for finding similar users based on seed users in any of the above embodiments via execution of the executable instructions.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 9. The electronic device 600 shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 9, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 that connects the various system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.
Wherein the storage unit stores program code, which can be executed by the processing unit 610, to cause the processing unit 610 to perform the steps according to various exemplary embodiments of the present invention described in the seed user based finding similar users method section above in this specification. For example, the processing unit 610 may perform the steps as shown in fig. 1.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the above method for finding similar users based on seed users according to the embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (11)

1. A method for searching similar users based on seed users is characterized by comprising the following steps:
acquiring a plurality of SKUs corresponding to user behaviors, and performing discrete vectorization processing on each SKU in the plurality of SKUs to obtain a vector corresponding to each SKU;
averaging the vectors corresponding to each SKU, and taking the average value as the characteristic data of the SKU;
acquiring first characteristic data of a user, acquiring combined user characteristic data according to the first characteristic data and the SKU characteristic data, and performing preset model training according to the combined user characteristic data to acquire a similar user prediction model; wherein the first characteristic data is preset discrete characteristic data different from the SKU;
and acquiring preset user information, and predicting and determining a similar user associated with the preset user according to the preset user information and the similar user prediction model.
2. The method of claim 1, wherein the SKU corresponding to the user behavior comprises at least one of a purchase SKU and a browse SKU.
3. The seed user based method for finding similar users according to claim 1, wherein the first characteristic data comprises user attribute information data, the user attribute information data being determined by registration information data and/or user behavior data of the user.
4. The method for finding similar users based on seed users according to any one of claims 1-3, wherein the similar user prediction model comprises a Logistic Regression (LR) model;
the obtaining of a similar user prediction model through preset model training according to the first feature data and the SKU feature data comprises:
and inputting the first characteristic data and the SKU characteristic data into a preset LR model training tool, and performing model training by using an LBFGS algorithm to obtain the logistic regression LR model.
5. The method of claim 4, further comprising:
before the model training is carried out, the seed users are used as positive samples, random sampling is carried out, and partial users in all the users are obtained as negative samples;
and putting the positive sample, the negative sample, the first feature data and the SKU feature data into a training set, and carrying out subsequent model training based on the training set.
6. The method for finding similar users based on seed users according to claim 4, wherein the obtaining of a preset user information and the determining of the similar user associated with the preset user according to the preset user information and the similar user prediction model prediction comprises:
determining user behavior time according to the preset user information so as to take users in a preset time period as active users to be added into a candidate set;
predicting the candidate set by using the logistic regression LR model to obtain a probability value of each user similar to the seed user;
sorting the probability values of the users similar to the seed user, and sequentially selecting the users corresponding to the N probability values which are sorted at the top as the extended similar users; wherein N is a natural number.
7. The method for finding similar users based on seed users as claimed in claim 6, further comprising:
and updating the active users in the candidate set at preset time intervals.
8. The method for finding similar users based on seed users as claimed in claim 6, further comprising:
and adjusting the value of the N according to the number of the preset expanded users so as to adjust the expansion amount of the similar users.
9. An apparatus for finding similar users based on seed users, the apparatus comprising:
the discrete vectorization module is used for acquiring a plurality of SKUs corresponding to user behaviors and performing discrete vectorization processing on each SKU in the SKUs to obtain a vector corresponding to each SKU;
the characteristic data determining module is used for averaging the vectors corresponding to each SKU and taking the average value as the SKU characteristic data;
the model training module is used for acquiring first characteristic data of a user, acquiring combined user characteristic data according to the first characteristic data and the SKU characteristic data, and performing preset model training according to the combined user characteristic data to acquire a similar user prediction model; wherein the first characteristic data is preset discrete characteristic data different from the SKU;
and the user extension module is used for acquiring preset user information and determining the similar user associated with the preset user according to the preset user information and the similar user prediction model.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for finding similar users based on seed users according to any one of claims 1 to 8.
11. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the steps of the method for finding similar users based on seed users of any one of claims 1-8 via execution of the executable instructions.
CN201710431844.5A 2017-06-09 2017-06-09 Method, device, medium and electronic equipment for searching similar users based on seed users Active CN109034853B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710431844.5A CN109034853B (en) 2017-06-09 2017-06-09 Method, device, medium and electronic equipment for searching similar users based on seed users

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710431844.5A CN109034853B (en) 2017-06-09 2017-06-09 Method, device, medium and electronic equipment for searching similar users based on seed users

Publications (2)

Publication Number Publication Date
CN109034853A CN109034853A (en) 2018-12-18
CN109034853B true CN109034853B (en) 2021-11-26

Family

ID=64628702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710431844.5A Active CN109034853B (en) 2017-06-09 2017-06-09 Method, device, medium and electronic equipment for searching similar users based on seed users

Country Status (1)

Country Link
CN (1) CN109034853B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740685B (en) * 2019-01-08 2020-10-27 武汉斗鱼鱼乐网络科技有限公司 User loss characteristic analysis method, prediction method, device, equipment and medium
CN113536848B (en) * 2020-04-17 2024-03-19 中国移动通信集团广东有限公司 Data processing method and device and electronic equipment
CN112015726B (en) * 2020-08-21 2024-04-12 广州欢网科技有限责任公司 User activity prediction method, system and readable storage medium
CN112036987B (en) * 2020-09-11 2024-04-02 杭州海康威视数字技术股份有限公司 Method and device for determining recommended commodity
CN112508609A (en) * 2020-12-07 2021-03-16 深圳市欢太科技有限公司 Crowd expansion prediction method, device, equipment and storage medium
CN113129054A (en) * 2021-03-30 2021-07-16 广州博冠信息科技有限公司 User identification method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914783A (en) * 2014-04-13 2014-07-09 北京工业大学 E-commerce website recommending method based on similarity of users
US9129214B1 (en) * 2013-03-14 2015-09-08 Netflix, Inc. Personalized markov chains
CN105931079A (en) * 2016-04-29 2016-09-07 合网络技术(北京)有限公司 Method and apparatus for diffusing seed users
CN106204103A (en) * 2016-06-24 2016-12-07 有米科技股份有限公司 The method of similar users found by a kind of moving advertising platform

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7729977B2 (en) * 2005-08-17 2010-06-01 Quan Xiao Method and system for grouping merchandise, services and users and for trading merchandise and services
US9087332B2 (en) * 2010-08-30 2015-07-21 Yahoo! Inc. Adaptive targeting for finding look-alike users
JP6445517B2 (en) * 2013-03-13 2018-12-26 グーグル エルエルシー Improved user experience for unrecognized and new users
CN103984746B (en) * 2014-05-26 2017-03-29 西安电子科技大学 Based on the SAR image recognition methodss that semisupervised classification and region distance are estimated
CN104881459A (en) * 2015-05-22 2015-09-02 电子科技大学 Friend recommendation method of mobile social network
CN106485562B (en) * 2015-09-01 2020-12-04 苏宁云计算有限公司 Commodity information recommendation method and system based on user historical behaviors
CN106022473B (en) * 2016-05-23 2019-03-05 大连理工大学 A kind of gene regulatory network construction method merging population and genetic algorithm
CN106127493A (en) * 2016-06-23 2016-11-16 深圳大学 A kind of method and device analyzing customer transaction behavior

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9129214B1 (en) * 2013-03-14 2015-09-08 Netflix, Inc. Personalized markov chains
CN103914783A (en) * 2014-04-13 2014-07-09 北京工业大学 E-commerce website recommending method based on similarity of users
CN105931079A (en) * 2016-04-29 2016-09-07 合网络技术(北京)有限公司 Method and apparatus for diffusing seed users
CN106204103A (en) * 2016-06-24 2016-12-07 有米科技股份有限公司 The method of similar users found by a kind of moving advertising platform

Also Published As

Publication number Publication date
CN109034853A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
CN109034853B (en) Method, device, medium and electronic equipment for searching similar users based on seed users
CN107729937B (en) Method and device for determining user interest tag
CN110851713B (en) Information processing method, recommending method and related equipment
CN108205768B (en) Database establishing method, data recommending device, equipment and storage medium
WO2017190610A1 (en) Target user orientation method and device, and computer storage medium
US20190251626A1 (en) Utilizing artificial intelligence to make a prediction about an entity based on user sentiment and transaction history
US20120150626A1 (en) System and Method for Automated Recommendation of Advertisement Targeting Attributes
CN109840730B (en) Method and device for data prediction
EP4242955A1 (en) User profile-based object recommendation method and device
WO2019072128A1 (en) Object identification method and system therefor
US20190080352A1 (en) Segment Extension Based on Lookalike Selection
TW201804400A (en) Data object pushing method, device and system
US20170142119A1 (en) Method for creating group user profile, electronic device, and non-transitory computer-readable storage medium
US20190220909A1 (en) Collaborative Filtering to Generate Recommendations
CN109359180B (en) User portrait generation method and device, electronic equipment and computer readable medium
CN111461757B (en) Information processing method and device, computer storage medium and electronic equipment
CN111429161A (en) Feature extraction method, feature extraction device, storage medium, and electronic apparatus
CN111047009A (en) Event trigger probability pre-estimation model training method and event trigger probability pre-estimation method
CN112749323A (en) Method and device for constructing user portrait
CN111209351A (en) Object relation prediction method and device, object recommendation method and device, electronic equipment and medium
US20190087852A1 (en) Re-messaging with alternative content items in an online remarketing campaign
CN112950321A (en) Article recommendation method and device
CN113360816A (en) Click rate prediction method and device
US20150149248A1 (en) Information processing device, information processing method, and program
CN115423555A (en) Commodity recommendation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant