CN114398960A - Target user determination method and device, electronic equipment and storage medium - Google Patents

Target user determination method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114398960A
CN114398960A CN202111612769.5A CN202111612769A CN114398960A CN 114398960 A CN114398960 A CN 114398960A CN 202111612769 A CN202111612769 A CN 202111612769A CN 114398960 A CN114398960 A CN 114398960A
Authority
CN
China
Prior art keywords
user
seed
users
candidate
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111612769.5A
Other languages
Chinese (zh)
Inventor
吴腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Beijing Xiaomi Pinecone Electronic Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd, Beijing Xiaomi Pinecone Electronic Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN202111612769.5A priority Critical patent/CN114398960A/en
Publication of CN114398960A publication Critical patent/CN114398960A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a target user determination method and device, an electronic device and a storage medium. The target user determination method can comprise the following steps: constructing a user set, wherein the user set comprises: a seed user candidate user; determining the incidence relation between any two users in the user set according to the same user characteristics of any two users in the user set; inputting the incidence relation and the user characteristics of the two users corresponding to the incidence relation into a preset model to obtain a first characteristic parameter and a second characteristic parameter output by the preset model; and determining whether the candidate user is a target user according to the first characteristic parameter and the second characteristic parameter.

Description

Target user determination method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of information technologies, and in particular, to a target user determination method and apparatus, an electronic device, and a storage medium.
Background
The target audience (namely the target user) needs to be determined when the information is pushed. If the information is pushed to the non-target audience, the information is interfered to the non-target audience on the one hand, and the effect of information pushing cannot be achieved on the other hand. Therefore, before information is pushed, it is very critical to determine the target audience.
In the related art, there are a target user determination method based on a user tag, a target user determination method based on user similarity, and a user determination method based on a supervised classification model.
The above methods have some problems, such as:
the target user determination method based on the user label defines the person rights with the same label as a similar population, and positions the target user based on the similar population, although the interpretability is strong, the actual use discovery accuracy is not enough.
The similarity calculation method is based on the similarity calculation method, the similarity is calculated completely based on the user characteristics, the calculation amount is large, the expansion range of the target population is limited, and the like.
In the method based on the supervised classification model, user data and user labels are required in the model training process, but in some scenes, the user and the labels are difficult to acquire, so that the application scene range is limited. And there may also be problems with insufficient accuracy due to sample data and/or tag bias issues.
Disclosure of Invention
The disclosure provides a target user determination method and device, an electronic device and a storage medium.
A first aspect of the embodiments of the present disclosure provides a target user determining method, where the method includes:
constructing a user set, wherein the user set comprises: a seed user and a candidate user; the candidate users are: a user selected from the group of potential users having at least one representative characteristic; the representative features are: the method is determined according to the information content of the promotion information and/or the user characteristics of the seed user;
determining the incidence relation between any two users in the user set according to the same user characteristics of any two users in the user set;
inputting the incidence relation and the user characteristics of the two users corresponding to the incidence relation into a preset model to obtain a first characteristic parameter and a second characteristic parameter output by the preset model, wherein the first characteristic parameter is the characteristic parameter of the candidate user; the second characteristic parameter is the characteristic parameter of the seed user;
and determining whether the candidate user is a target user according to the first characteristic parameter and the second characteristic parameter.
In some embodiments, the determining, according to the same user characteristics in the user set, an association relationship between any two users in the user set includes:
establishing an undirected graph for representing the association relationship between users, wherein the undirected graph comprises: nodes and nodes have no directional edges, and one node represents one user in one user set; two users corresponding to the two nodes with the undirected edges have at least one same user characteristic;
the non-directional edge has a weight, and the size of the weight is positively correlated with the similarity of the user characteristics between the two users corresponding to the two nodes.
In some embodiments, the determining, according to the same user characteristics of any two users in the user set, an association relationship between any two users in the user set includes:
determining input characteristic parameters of the mth candidate user according to the undirected graph, wherein the input characteristic parameters of the mth candidate user comprise: a user characteristic of the mth candidate user, and one or more association characteristic values indicating the association relationship; one of the associated feature values includes: the user characteristics of the nth user and the non-edge weight between the mth candidate user and the node corresponding to the nth user; wherein the nth user is: the node corresponding to the nth user and the node corresponding to the mth candidate user have the candidate user without the directional edge or the seed user, and both n and m are positive integers;
and inputting the input characteristic value of the mth candidate user into the preset model to obtain a first characteristic parameter of the mth candidate user output by the preset model.
In some embodiments, the inputting the association relationship and the user characteristics of the two users corresponding to the association relationship into a preset model to obtain a first characteristic parameter and a second characteristic parameter output by the preset model includes:
according to the undirected graph, determining input characteristic parameters of an s-th seed user, wherein the input characteristic parameters of the s-th seed user comprise: the user characteristics of the s seed user and one or more association characteristic values indicating the association relationship; one of the associated feature values includes: the user characteristics of the z-th user and the non-edge weight between nodes corresponding to the s-th seed user; wherein, the z-th user is: the node corresponding to the z-th user and the node corresponding to the s-th seed user have the candidate user or seed user without the directional edge, wherein s and z are both positive integers;
and inputting the input characteristic value of the s-th seed user into the preset model to obtain a second characteristic parameter of the s-th seed user output by the preset model.
In some embodiments, the determining whether the candidate user is a target user according to the first feature parameter and the second feature parameter includes:
clustering the seed users according to the second characteristic parameters to obtain clustering results;
determining the third characteristic parameter according to the clustering result;
and determining whether the pth candidate user is the target user or not according to the similarity between the first characteristic parameter and the third characteristic parameter of the pth candidate user, wherein p is a positive integer.
In some embodiments, the determining whether the candidate user is a target user according to the first feature parameter and the second feature parameter includes:
mapping the candidate user and the seed user into a feature space according to the first feature parameter of the candidate user and the second feature parameter of the seed user;
dividing the feature space for X times based on locality sensitive hashing to obtain X seed user concentrations of the yth candidate user; wherein X is a positive integer; each time the feature space is divided to obtain a subspace, the concentration of the xth seed user of the yth candidate user is: the seed user concentration of the candidate user in the subspace of the xth division; x is a positive integer less than X;
and determining whether the ith candidate user is the target user or not according to the X seed user concentrations of the ith candidate user.
In some embodiments, the method further comprises:
determining to select representative features from the user features of the seed users according to the expected number of the target users; wherein the desired number is positively correlated with the number of representative features;
the candidate users are users having at least one of the representative features.
In some embodiments, the representative features include:
a user attribute feature;
a user preference feature.
A second aspect of the embodiments of the present disclosure provides a target user determination apparatus, where the apparatus includes:
a construction module configured to construct a user set, wherein the user set includes: a seed user and a candidate user; the candidate users are: a user selected from the group of potential users having at least one representative characteristic; the representative features are: the method is determined according to the information content of the promotion information and/or the user characteristics of the seed user;
the first determining module is used for determining the incidence relation between any two users in the user set according to the same user characteristics of any two users in the user set;
an obtaining module, configured to input the association relationship and user characteristics of two users corresponding to the association relationship into a preset model, and obtain a first characteristic parameter and a second characteristic parameter output by the preset model, where the first characteristic parameter is a characteristic parameter of the candidate user; the second characteristic parameter is the characteristic parameter of the seed user;
and the second determining module is used for determining whether the candidate user is the target user according to the first characteristic parameter and the second characteristic parameter.
In some embodiments, the first determining module is specifically configured to establish an undirected graph characterizing an association relationship between users, where the undirected graph includes: nodes and nodes have no directional edges, and one node represents one user in one user set; two users corresponding to the two nodes with the undirected edges have at least one same user characteristic;
the non-directional edge has a weight, and the size of the weight is positively correlated with the similarity of the user characteristics between the two users corresponding to the two nodes.
In some embodiments, the second determining module is specifically configured to determine, according to the undirected graph, input feature parameters of an mth candidate user, where the input feature parameters of the mth candidate user include: a user characteristic of the mth candidate user, and one or more association characteristic values indicating the association relationship; one of the associated feature values includes: the user characteristics of the nth user and the non-edge weight between the mth candidate user and the node corresponding to the nth user; wherein the nth user is: the node corresponding to the nth user and the node corresponding to the mth candidate user have the candidate user without the directional edge or the seed user, and both n and m are positive integers;
and inputting the input characteristic value of the mth candidate user into the preset model to obtain a first characteristic parameter of the mth candidate user output by the preset model.
In some embodiments, the obtaining module is specifically configured to determine, according to the undirected graph, an input feature parameter of an s-th seed user, where the input feature parameter of the s-th seed user includes: the user characteristics of the s seed user and one or more association characteristic values indicating the association relationship; one of the associated feature values includes: the user characteristics of the z-th user and the non-edge weight between nodes corresponding to the s-th seed user; wherein, the z-th user is: the node corresponding to the z-th user and the node corresponding to the s-th seed user have the candidate user or seed user without the directional edge, wherein s and z are both positive integers; and inputting the input characteristic value of the s-th seed user into the preset model to obtain a second characteristic parameter of the s-th seed user output by the preset model.
In some embodiments, the second determining module is specifically configured to cluster the seed users according to the second feature parameter to obtain a clustering result;
determining the third characteristic parameter according to the clustering result;
and determining whether the pth candidate user is the target user or not according to the similarity between the first characteristic parameter and the third characteristic parameter of the pth candidate user, wherein p is a positive integer.
In some embodiments, the second determining module is specifically configured to map the candidate user and the seed user into a feature space according to the first feature parameter of the candidate user and the second feature parameter of the seed user; dividing the feature space for X times based on locality sensitive hashing to obtain X seed user concentrations of the yth candidate user; wherein X is a positive integer; each time the feature space is divided to obtain a subspace, the concentration of the xth seed user of the yth candidate user is: the seed user concentration of the candidate user in the subspace of the xth division; x is a positive integer less than X; and determining whether the ith candidate user is the target user or not according to the X seed user concentrations of the ith candidate user.
In some embodiments, the apparatus further comprises:
a third determining module, configured to determine to select a representative feature from the user features of the seed user according to the expected number of the target users; wherein the desired number is positively correlated with the number of representative features; the candidate users are users having at least one of the representative features.
In some embodiments, the representative features include:
a user attribute feature;
a user preference feature.
A third aspect of the embodiments of the present disclosure provides an electronic device, including:
a memory for storing processor-executable instructions;
a processor coupled to the memory;
wherein the processor is configured to perform the target user determination method as provided in any of the technical solutions of the first aspect.
A fourth aspect of the embodiments of the present disclosure provides a non-transitory computer-readable storage medium, where instructions in the storage medium, when executed by a processor of a computer, enable the computer to perform the target user determination method as set forth in the technical solution of the first aspect.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
the method comprises the steps of constructing a user set which comprises seed users and candidate users with at least one representative feature, and then determining the association relation between any two users in the user set based on the same user features possessed by the user set. Based on the user characteristics and the association relationship, a first characteristic parameter of the candidate user and a second characteristic parameter of the seed user are obtained, and based on the first characteristic parameter and the second characteristic parameter, which candidate users are the target users are obtained. The determination mode of the target user not only considers the user characteristics, but also considers the incidence relation among the users, thereby having the characteristic of high accuracy of the determined target user.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a flow diagram illustrating a method for target user determination in accordance with an exemplary embodiment;
FIG. 2 is a schematic diagram of an undirected graph shown in accordance with an exemplary embodiment;
FIG. 3A is a flowchart illustrating a method of target user determination, according to an example embodiment;
FIG. 3B is a flowchart illustrating a method of target user determination, according to an example embodiment;
FIG. 4 is a diagram illustrating a method of target user determination, according to an example embodiment;
FIG. 5 is a flowchart illustrating a method of target user determination, according to an example embodiment;
FIG. 6 is a schematic diagram illustrating the structure of a target user determination device in accordance with one illustrative embodiment;
fig. 7 is a schematic structural diagram of an electronic device shown in accordance with an example embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of devices consistent with certain aspects of the present disclosure, as detailed in the appended claims.
As shown in fig. 1, an embodiment of the present disclosure provides a target user determination method, where the method includes:
s110: constructing a user set, wherein the user set comprises: a seed user and a candidate user; the candidate users are: a user selected from the group of potential users having at least one representative characteristic; the representative features are: the method is determined according to the information content of the promotion information and/or the user characteristics of the seed user; s120: determining the incidence relation between any two users in the user set according to the same user characteristics of any two users in the user set;
s130: inputting the incidence relation and the user characteristics of the two users corresponding to the incidence relation into a preset model to obtain a first characteristic parameter and a second characteristic parameter output by the preset model, wherein the first characteristic parameter is the characteristic parameter of the candidate user; the second characteristic parameter is the characteristic parameter of the seed user;
s140: and determining whether the candidate user is a target user according to the first characteristic parameter and the second characteristic parameter.
The target user determination method can be applied to various electronic devices, for example, a server for promoting information or a content promotion server.
The target user may be a user predicted to have a greater probability of receiving and viewing the promotional information.
For example, in the embodiment of the present disclosure, the seed user may be a user who has viewed the promotion information, or a user who has viewed information similar to the promotion information, or a user who has purchased the promotion subject matter of the promotion information.
Illustratively, the promotion information may include: coupon information for an advertisement and/or a good or service.
And if the promotion information is the advertisement, the target user is the target user receiving the pushed advertisement.
The seed user may be a user who purchased an advertised good or service, or a user who used a coupon associated with the coupon information.
The seed user may be provided by a provider of promotional information.
The representative features may be: determined according to the information content of the promotion information. For example, if the promotion information is a house advertisement, the promotion information is a group of users who have house purchasing requirements, and the group of users may have an age distribution characteristic or an economic capability distribution characteristic, so that two representative characteristics of age and/or economic capability can be determined according to the information content of the promotion information.
Further illustratively, through the reading behavior of the user, a plurality of users reading the property advertisement are discovered, and the users can be used as seed users, and one or more common user characteristics of the seed users are determined as the representative characteristics through operations such as clustering the user characteristics of the seed users.
The representative feature may include one or more, and when the representative feature includes a plurality of representative features, the plurality of representative feature portions may be determined according to the information content of the promotion information and/or according to the user feature of the seed user.
The representative features may include: attribute characteristics representing static attributes of the user, and/or behavior characteristics representing behavior characteristics of the target user, and the like.
Thus, the user set has both seed users and candidate users.
In some embodiments, the number of candidate users in the user set is greater than that of the seed user, so that the target user can be expanded quickly.
After the user set is established, determining the association relationship among the users according to the user characteristics of the users in the user set. Such an association reflects user similarity or relationship closeness between users.
Based on similar interest points of information of similar users and close-relation users, the obtained first characteristic parameter and second characteristic parameter are used for determining the target user, and the target user can be more accurate.
Exemplarily, S140 may include; and determining the similarity between the seed user and the candidate user according to the first characteristic parameter and the second characteristic parameter, and determining whether the candidate user is the target user or not based on the similarity. Of course, this is merely an example, and the specific implementation is not limited to this example.
In the embodiment of the present disclosure, based on the user feature and the association relationship, the first feature parameter of the candidate user and the second feature parameter of the seed user are obtained, and based on the first feature parameter and the second feature parameter, which candidate users are the target users are obtained. The determination mode of the target user not only considers the user characteristics, but also considers the incidence relation among the users, thereby having the characteristic of high accuracy of the determined target user.
The first characteristic parameter and the second characteristic parameter are both output by a preset model. For example, the first characteristic parameter and the second characteristic parameter may be: a feature vector and/or a feature matrix.
The preset model can be a machine learning model and/or a deep learning model.
The deep learning model may include: convolutional neural networks, fully-connected neural networks, cyclic neural networks, or the like.
In one embodiment, the S120 may include:
establishing an undirected graph for representing the association relationship between users, wherein the undirected graph comprises: nodes and nodes have no directional edges, and one node represents one user in one user set; two users corresponding to the two nodes with the undirected edges have at least one same user characteristic;
the non-directional edge has a weight, and the size of the weight is positively correlated with the similarity of the user characteristics between the two users corresponding to the two nodes.
Fig. 2 is a graph of an undirected graph connecting nodes by undirected edges. An undirected edge is an edge that is not pointed to, and in the disclosed embodiments, a node maps to a node in an undirected graph. The weight of an undirected edge reflects the similarity between two users connected by an undirected edge.
If the user characteristics of two users are the same, the nodes corresponding to the two users have undirected edges.
In an implementation scenario, if one of all user characteristics of any two users is the same, an undirected edge may be connected between nodes corresponding to the two users.
In another implementation scenario, at least one of the same features of any two users is in the feature set, and it is considered that an undirected edge may be connected between nodes corresponding to the two users. The user characteristics in the feature set may be user characteristics predetermined to be associated with the information to be promoted, or common characteristics of the seed users. Therefore, if the feature set is introduced to determine whether the nodes corresponding to the two users have no side edges, the target user can be accurately selected, and meanwhile, unnecessary calculation amount is reduced.
The size of the weight is positively correlated with the similarity of the user characteristics between the two users corresponding to the two nodes, and the method can be embodied in one or more of the following aspects:
the weight is positively correlated with the same user characteristic number of two users corresponding to the two nodes;
the size of the weight is inversely related to the spatial distance between the user characteristics of the two users corresponding to the two nodes and the feature space, and the closer the user characteristics of the two users are mapped to the feature space, the higher the similarity between the two users is.
In some embodiments, some user characteristics are further graduated in degree. For example, a user profile that represents a user's preferences may be subdivided into a plurality of preference levels. For example, likes, and likes-great, represent three different levels of user preference for a transaction. In this case, when determining the similarity between two users, it is necessary to consider not only whether the preference of the two users is the same, but also the degree of similarity of the preference degrees for the same transaction, so that the similarity between the two users can be determined more accurately.
As shown in the undirected graph of fig. 2, node 1 of the undirected graph represents user a, node 2 represents user B, and node 3 represents user C. If there is no side between the user a, the user B and the user C, it means that the user a and the user B have at least one same user feature, and the user B and the user C also have at least one same user feature. W1 largely reflects the similarity between user A and user B; w2 represents the similarity between user B and user C; w3 represents the similarity between user a and user C.
In one embodiment, the S130 may include:
determining input characteristic parameters of the mth candidate user according to the undirected graph, wherein the input characteristic parameters of the mth candidate user comprise: a user characteristic of the mth candidate user, and one or more association characteristic values indicating the association relationship; one of the associated feature values includes: the user characteristics of the nth user and the non-edge weight between the mth candidate user and the node corresponding to the nth user; wherein the nth user is: the node corresponding to the nth user and the node corresponding to the mth candidate user have the candidate user without the directional edge or the seed user, and both n and m are positive integers;
and inputting the input characteristic value of the mth candidate user into the preset model to obtain a first characteristic parameter of the mth candidate user output by the preset model.
Assuming that the user a in fig. 2 is the mth candidate user, when determining whether the user a is the target user, the inputting the characteristic parameters of the preset model may include:
user A user profile, user B user profile, W1, user C user profile, W3, user D user profile, W4, user E user profile, W5.
It is worth noting that: the user characteristics are all numerical user characteristics. For example, values between 0 and 1 are used to represent user characteristics.
In one embodiment, the user characteristics of user a to the user characteristics of user E may be represented by a user characteristic vector, which may be represented by a sequence of "0" and "1" or a sequence of any number between "0" and "1".
In some embodiments, the weights of the undirected edges are also normalized weights. For example, after the undirected graph is constructed, all the original weights in the undirected graph are normalized to obtain normalized weights. Through weight normalization processing, the numerical values contained in the input characteristic parameters of any user can be in a one-point range, the excessive calculation amount caused by high-order calculation is reduced, and the determination efficiency of the target user is improved.
The predetermined model may be any machine learning model. Illustratively, the pre-set model may be any neural network capable of running a graph. For example, the neural network includes, but is not limited to: graph Neural Network (Graph SAGE) or Graph Convolution Network (GCN).
After the input parameters of the mth candidate user are input into the preset model, the preset model outputs the first characteristic parameters of the mth candidate user side.
In one embodiment, the first characteristic parameter may include, but is not limited to: a feature vector and/or a feature matrix.
In some embodiments, the S130 may include:
according to the undirected graph, determining input characteristic parameters of an s-th seed user, wherein the input characteristic parameters of the s-th seed user comprise: the user characteristics of the s seed user and one or more association characteristic values indicating the association relationship; one of the associated feature values includes: the user characteristics of the z-th user and the non-edge weight between nodes corresponding to the s-th seed user; wherein, the z-th user is: the node corresponding to the z-th user and the node corresponding to the s-th seed user have the candidate user or seed user without the directional edge, wherein s and z are both positive integers;
and inputting the input characteristic value of the s-th seed user into the preset model to obtain a second characteristic parameter of the s-th seed user output by the preset model.
It is assumed that the user B in fig. 2 is the s-th seed user, and when the second feature parameter of the user B is obtained, the input feature parameter of the user B is input into the preset model. The input characteristic parameters of the user B may include:
user B's user profile, user A's user profile, W1, user C's user profile, W2, user G's user profile, W6, user H's user profile, W7, user J's user profile, W8.
It is worth noting that: the user characteristics are all numerical user characteristics. For example, values between 0 and 1 are used to represent user characteristics.
In one embodiment, the user features of user B to user J may be represented by a user feature vector, which may be represented by a sequence of "0" and "1" or a sequence of any number between "0" and "1".
After the input characteristic parameters of the user B are input into the preset model, the preset model outputs second characteristic parameters of the user B.
In some embodiments, the data format of the input characteristic parameter is:
the user characteristics of the current user and the associated characteristic value;
alternatively, the first and second electrodes may be,
and associating the characteristic value with the user characteristic of the current user.
In the two data formats of the input characteristic parameters, if a plurality of associated characteristic values exist, the associated characteristic values are sorted from large to small according to the corresponding weight values, or sorted from small to large according to the corresponding weight values.
In this way, the first characteristic parameter of the user a and the second characteristic parameter of the user B are output by using the same preset model, and thus have the same data format and/or data dimension. For example, the number of elements included in the feature vector corresponding to the second feature parameter of the user B is the same.
After the first feature parameter and the second feature parameter are calculated, the similarity between each candidate user and the seed user may be calculated based on the euclidean distance or the norm distance, and then it is determined whether the corresponding candidate user is the target user based on the similarity.
Meanwhile, cosine (cosine) or Jaccard similarity coefficients can be adopted to process the first characteristic parameters and the second characteristic parameters, so that the similarity of the first characteristic parameters and the second characteristic parameters can be obtained, and whether the corresponding candidate user is the target user or not is determined based on the numerical value representing the similarity obtained through calculation.
In one embodiment, as shown in fig. 3A, the S140 may include:
S141A: clustering the seed users according to the second characteristic parameters to obtain clustering results;
S142A: determining the third characteristic parameter according to the clustering result;
S143A: and determining whether the pth candidate user is the target user or not according to the similarity between the first characteristic parameter and the third characteristic parameter of the pth candidate user, wherein p is a positive integer.
The number of the seed users is multiple, and the second characteristic parameters of the seed users can be processed by using various clustering algorithms, so that clustering of the seed users is realized. After clustering, one or more user clusters are obtained. If only one user cluster is available, the third characteristic parameter can be obtained directly according to the first characteristic parameter corresponding to the user cluster. And if a plurality of user clusters are obtained, performing weighted average processing and the like on the first characteristic parameters corresponding to the user clusters to obtain the third characteristic parameters.
And after a third characteristic parameter is obtained, a function value for representing the similarity is obtained by the first characteristic parameter of each candidate user and the third characteristic parameter, and whether each candidate user is the target user is determined based on the function value.
The clustering algorithm includes, but is not limited to, a K-mean clustering algorithm, a density-based clustering algorithm, or a grid-based clustering algorithm, or hierarchical clustering.
P is a positive integer less than or equal to the total number of candidate users in the user set.
In another embodiment, as shown in fig. 3B, the S140 may include:
S141B: mapping the candidate user and the seed user into a feature space according to the first feature parameter and the second feature parameter;
S142B: dividing the feature space for X times based on locality sensitive hashing to obtain X seed user concentrations of the yth candidate user; wherein X is a positive integer; each time the feature space is divided to obtain a subspace, the concentration of the xth seed user of the yth candidate user is: the seed user concentration of the candidate user in the subspace of the xth division; x is a positive integer less than X;
S143B: and determining whether the ith candidate user is the target user or not according to the X seed user concentrations of the ith candidate user.
And dividing the characteristics of all users in the user set according to the local hash sensitivity, wherein at least two subspaces can be obtained by each division. In some embodiments, a subspace may also be referred to as a bucket.
The seed user concentration may be: the proportion of seed users in each subspace.
Dividing the feature space for X times, each candidate user will have X seed user concentrations. Thus, the X seed user concentrations represent the probability values that the candidate user is suitable as the target user. In general, the higher the seed user concentration, the higher the probability value that the corresponding candidate user is the target user.
Exemplarily, the S143B may include:
determining whether the y-th candidate user is the target user or not according to the sum of the concentrations of the X seed users;
alternatively, the first and second electrodes may be,
and determining whether the y-th candidate user is the target user according to the average value of the concentrations of the X seed users.
The characteristic space is divided through the manner of local sensitive hashing, whether the corresponding candidate user is the target user or not is determined, a large amount of similarity calculation is reduced, and the method has the characteristics of small calculation amount and high target user determination efficiency.
In some embodiments, the method further comprises:
determining to select representative features from the user features of the seed users according to the expected number of the target users; wherein the desired number is positively correlated with the number of representative features;
the candidate users are users having at least one of the representative features.
The representative features may be features having a common characteristic that represents the seed user.
For example, the difference is large, where the desired number of target users is 1 ten thousand and the number of target users is 100 ten thousand. To obtain a user set containing more candidate users, more representative features are added, and thus, more users among the potential users hit at least one representative feature.
In some embodiments, the representative features include:
a user attribute feature;
a user preference feature.
For example, the user attribute feature may be a feature describing a static characteristic of the user, and the user preference feature may be a feature describing a behavior of the user.
Typical user attribute characteristics may include, but are not limited to, at least one of: user gender, age, frequent, professional, or educational background.
Typical user preference features may include: the user consumes the favorite features and the features of the user focusing on the information.
Typically, the user preference characteristic is changed more frequently than the user attribute characteristic.
And if the representative features at least comprise one user attribute feature and at least one user preference feature, selecting the candidate users from the two feature dimensions.
The invention can be widely applied to similar population expansion of E-commerce and Internet advertisements, and the specific application scene is shown in figure 1.
First, a batch of seed users, which are users who usually pay attention to and have purchased goods in the industry to which the advertiser belongs, is given, and the batch of users is usually small in quantity, on the order of hundreds of thousands. With the seed users, on the larger-scale potential crowd, the crowd most similar to the seed users is selected based on the user characteristic mining similarity relation, and a larger-scale crowd package, such as millions, is generated. And the advertiser puts advertisements on the expanded similar population so as to maximize the putting efficiency.
The effect of the similarity calculation-based method greatly depends on the selection of the user characteristics and the selection of the similarity function, the characteristics are not distinguished by importance, and the similarity between every two users needs to be calculated, so that the similarity calculation-based method cannot be expanded to large-scale crowds. In addition, for the newly-appeared social relationship in the e-commerce field, the modeling mode cannot express the interest propagation caused by mutual attention among users. The method establishes the index from the characteristics to the crowd through 2 conditions of the industry category of the advertiser and the scale of the target crowd, only models the users which have the co-occurrence in 1 or more industries of the advertiser, and obviously reduces the complexity of calculating the similarity between every two users; the association score between the two users is calculated by the matching times of 0, 1 characteristics; the relationship between the users is expressed in a graph form, different users are regarded as different nodes, and the association between every two users is regarded as a weighted edge, so that interest propagation among the users can be well mined, and the social relationship of the current e-commerce can be better adapted. By applying a graph neural network algorithm such as GraphSAGE, GCN and the like on the graph, the low-dimensional dense vector features of the user are extracted, the complexity of the nearest neighbors of the seed group is reduced, and the overall recall rate of similar group expansion is improved.
Referring to fig. 5, a target user determination method provided by an embodiment of the present disclosure may include:
s210: acquiring user characteristics of each potential user in the potential user group;
s220: a feature-to-crowd index is established by targeting crowd size and advertiser industry. The target group is the group where the target user is located. Illustratively, a portion of the features are filtered out by targeted groups of people and the industry to which advertisers belong, and an index of the features is built into users who have values on those features. Namely, potential users with partial characteristics corresponding to the industry to which the advertiser belongs in the potential user group are determined as candidate users. An advertiser may be an individual or entity that provides an advertisement to be promoted.
S230: and constructing an incidence relation between every two users through indexes from the characteristics to the crowd.
S240: and establishing a graph according to the incidence relation, and executing a graph neural network algorithm according to the graph to obtain the feature vector of the user. The feature vector is one of the first feature parameter of the candidate user and the second feature parameter of the seed user. The neural network algorithm of the graph can be used for calculating the feature vector of the user, such as GraphSAGE, GCN and the like.
In order to reduce the calculation amount, after the first characteristic parameter and the second characteristic parameter such as the characteristic vector are obtained, the sparse high dimension is reduced to the dense low dimension;
s250: and (5) generating a target population according to the feature vectors of the root seed user and the candidate user. Illustratively, finally, nearest neighbor searching is carried out on the potential user group through the feature vector of the seed user, such as knn searching and the like, and a target crowd is generated.
Specifically, user characteristics of each user of a potential target user group are obtained, a large number of potential target user groups are stored in a server, each potential user has a plurality of user characteristics, the user characteristics can be non-constant types such as user figures and the like, or constant real-time behaviors, and the characteristic value of the user characteristics can be any value between 0 and 1, or a numerical type. These eigenvalue ranks then form eigenvectors that are high in dimensionality and mostly 0, as shown in table 1. The size of the crowd pack of the potential users is determined by the scale of the target users expected by the promotion information such as the advertisers. Taking the advertisement as an example, the information to be promoted is generally a crowd who browses or purchases commodities such as industry categories, brands and the like to which the advertiser belongs.
User' s Feature 1 Feature 2 Characteristic n
A 0 1 1
B 0 0 0
C 1 0 0
TABLE 1
The method comprises the steps of obtaining the characteristics of the category of the industry where the advertiser is located, the brand of the operated commodity and the like, wherein the data can be obtained through operation data accumulated on an e-commerce. And (3) converting the features into the metadata features in the step (1), selecting the number of the features to establish an index from the features to co-occurrence crowd according to the expansion scale of similar crowd expected by an advertiser, if the advertiser wants to produce 100 ten thousand crowd packets, selecting 1 or 2 representative features, and selecting users with values different from 0 on the representative features as candidate users on the features.
If the advertiser wants to produce 2000 million crowd packets, more than 10 representative features are selected, but all users with one representative feature can be used as candidate users, so that more user scales are covered.
With the indexing of features into populations, as shown in fig. 2 and 4, correlations between two co-occurring populations are calculated and weights are determined, which are calculated as the inner product of metadata features including, but not limited to, representing user features.
And after a weighted undirected graph of the relationship between the users is established, running a graph neural network algorithm, such as GraphSAGE and the like. The GraphSAGE can utilize multi-order neighbor node information, in a certain layer, the algorithm combines the feature vectors of neighbor nodes together with the current node and then uses the combined feature vectors as the input feature vectors of the next layer, and generally, the algorithm only extends to a second-order neighbor, namely, the neighbor of the neighbor, so that a better effect can be realized. Since the user's label is unsupervised, the designed penalty function gives similar representation to neighboring nodes, and the non-neighboring nodes differ greatly, so the penalty function is as follows, where vn~Pn(v) Denotes vnThe network parameter is obtained by sampling from the negative sampling distribution of the node v, and the network parameter and the vector characteristics of the node are obtained by carrying out back propagation through gradient descent.
Figure BDA0003435565440000151
Where u is a neighbor of node v, zuOutputs a vector for node u, σ is a sigmoid function,
Figure BDA0003435565440000152
all nodes that are negatively sampled by node v are expected.
Figure BDA0003435565440000153
Outputting vectors for non-adjacent nodes of the node u; z is a radical ofvThe output vector of the neighboring node v to the node u.
Figure BDA0003435565440000154
The loss value after calculation based on the graph neural network algorithm is calculated for node u.
After the potential candidate crowd vector and the seed crowd vector exist, a Nearest Neighbor search (KNN) is performed on each seed user, and due to the fact that the KNN calculation amount is too large, an Approximate Neighbor (ANN) search is generally used for searching.
And searching a plurality of candidate users most similar to each seed user, and adjusting the number of the candidate users most similar to one seed user by combining the number of similar crowds expected to be expanded by an advertiser to obtain the final expanded crowd.
By establishing the index from the features to the crowd on the features matched with the features of the advertiser, the complexity of calculating the similarity of every two users is reduced, and the scheme can be expanded to the large-scale crowd.
In addition, the user relationship is modeled in a graph neural network mode, the inherent social attributes of the user can be captured, the live broadcast and content e-commerce attributes appearing in the current e-commerce can be better adapted, and the accuracy of similar crowd expansion is improved.
As shown in fig. 6, an embodiment of the present disclosure provides a target user determination apparatus, which includes:
a constructing module 110, configured to construct a user set, where the user set includes: a seed user and a candidate user; the candidate users are: a user selected from the group of potential users having at least one representative characteristic; the representative features are: the method is determined according to the information content of the promotion information and/or the user characteristics of the seed user;
a first determining module 120, configured to determine, according to the same user characteristics that any two users in the user set have, an association relationship between any two users in the user set;
an obtaining module 130, configured to input the association relationship and user characteristics of two users corresponding to the association relationship into a preset model, and obtain a first characteristic parameter and a second characteristic parameter output by the preset model, where the first characteristic parameter is a characteristic parameter of the candidate user; the second characteristic parameter is the characteristic parameter of the seed user; (ii) a
A second determining module 140, configured to determine whether the candidate user is the target user according to the first characteristic parameter and the second characteristic parameter.
In some embodiments, the target user determination device may be used in a variety of electronic devices.
The electronic devices include, but are not limited to: a personal computer or a laboratory computer or server.
In some embodiments, the construction module 110, the first determination module 120, the obtaining module 130, and the second determination module 140 may be program modules; the program modules may be capable of performing the functions of the various modules described above when executed by a processor.
In other embodiments, the building module 110, the first determining module 120, the obtaining module 130, and the second determining module 140 may be a soft-hard combining module; the soft and hard combining module comprises but is not limited to various programmable arrays; the programmable array includes, but is not limited to: field programmable arrays or complex programmable arrays.
In still other embodiments, the building module 110, the first determining module 120, the obtaining module 130, and the second determining module 140 may be pure hardware modules; the pure hardware modules include, but are not limited to: an application specific integrated circuit.
The apparatus according to claim 9, wherein the first determining module 120 is specifically configured to establish an undirected graph that characterizes an association relationship between users, where the undirected graph includes: nodes and nodes have no directional edges, and one node represents one user in one user set; two users corresponding to the two nodes with the undirected edges have at least one same user characteristic;
the non-directional edge has a weight, and the size of the weight is positively correlated with the similarity of the user characteristics between the two users corresponding to the two nodes.
In some embodiments, the second determining module 140 is specifically configured to determine, according to the undirected graph, input feature parameters of an mth candidate user, where the input feature parameters of the mth candidate user include: a user characteristic of the mth candidate user, and one or more association characteristic values indicating the association relationship; one of the associated feature values includes: the user characteristics of the nth user and the non-edge weight between the mth candidate user and the node corresponding to the nth user; wherein the nth user is: the node corresponding to the nth user and the node corresponding to the mth candidate user have the candidate user without the directional edge or the seed user, and both n and m are positive integers;
and inputting the input characteristic value of the mth candidate user into the preset model to obtain a first characteristic parameter of the mth candidate user output by the preset model.
In some embodiments, the obtaining module 130 is specifically configured to determine an input feature parameter of an s-th seed user according to the undirected graph, where the input feature parameter of the s-th seed user includes: the user characteristics of the s seed user and one or more association characteristic values indicating the association relationship; one of the associated feature values includes: the user characteristics of the z-th user and the non-edge weight between nodes corresponding to the s-th seed user; wherein, the z-th user is: the node corresponding to the z-th user and the node corresponding to the s-th seed user have the candidate user or seed user without the directional edge, wherein s and z are both positive integers; and inputting the input characteristic value of the s-th seed user into the preset model to obtain a second characteristic parameter of the s-th seed user output by the preset model.
In some embodiments, the second determining module 140 is specifically configured to cluster the seed users according to the second characteristic parameter to obtain a clustering result;
determining the third characteristic parameter according to the clustering result;
and determining whether the pth candidate user is the target user or not according to the similarity between the first characteristic parameter and the third characteristic parameter of the pth candidate user, wherein p is a positive integer.
In some embodiments, the second determining module 140 is specifically configured to map the candidate user and the seed user into a feature space according to the first feature parameter of the candidate user and the second feature parameter of the seed user; dividing the feature space for X times based on locality sensitive hashing to obtain X seed user concentrations of the yth candidate user; wherein X is a positive integer; each time the feature space is divided to obtain a subspace, the concentration of the xth seed user of the yth candidate user is: the seed user concentration of the candidate user in the subspace of the xth division; x is a positive integer less than X; and determining whether the ith candidate user is the target user or not according to the X seed user concentrations of the ith candidate user.
In some embodiments, the apparatus further comprises:
a third determining module, configured to determine to select a representative feature from the user features of the seed user according to the expected number of the target users; wherein the desired number is positively correlated with the number of representative features; the candidate users are users having at least one of the representative features.
In some embodiments, the representative features include:
a user attribute feature;
a user preference feature.
An embodiment of the present disclosure provides an electronic device, including:
a memory for storing processor-executable instructions;
a processor connected with the memory;
wherein the processor is configured to execute the target user determination method provided by any of the preceding claims.
The processor may include various types of storage media, non-transitory computer storage media capable of continuing to remember to store the information thereon after a power loss to the communication device.
The processor may be connected to the memory via a bus or the like for reading the executable program stored on the memory, e.g. capable of performing at least one of the methods as shown in any of fig. 1, 4 to 6.
The electronic device may comprise the first device and/or the second device.
Fig. 7 is a block diagram illustrating an electronic device 800 in accordance with an example embodiment. For example, the electronic device 800 may be included in a terminal device such as a mobile phone or a mobile computer, or a device such as a server.
Referring to fig. 7, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, multimedia data component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 806 provides power to the various components of the electronic device 800. Power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic device 800.
The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operating state, such as a shooting state or a video state. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The multimedia data component 810 is configured to output and/or input multimedia data signals. For example, the multimedia data component 810 includes a Microphone (MIC) configured to receive external multimedia data signals when the electronic device 800 is in an operational state, such as a call state, a recording state, and a voice recognition state. The received multimedia data signal may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, the multimedia data component 810 further comprises a speaker for outputting the multimedia data signal.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, sensor assembly 814 may detect the open/closed status of device 800, the relative positioning of components, such as a display and keypad of device 800, sensor assembly 814 may also detect a change in position of device 800 or a component of device 800, the presence or absence of user contact with device 800, orientation or acceleration/deceleration of device 800, and a change in temperature of device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as Wi-Fi, 2G, or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, communications component 816 further includes a Near Field Communications (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
The embodiment of the present disclosure provides a computer storage medium, which may be a non-transitory computer-readable storage medium, and when instructions in the storage medium are executed by a processor of an electronic device, a target user determination method provided in any of the foregoing technical solutions can perform at least one of the methods shown in any of fig. 1, fig. 3A, and fig. 3B to fig. 5.
The target user determination method may include: constructing a user set, wherein the user set comprises: a seed user and a candidate user; the candidate users are: a user selected from the group of potential users having at least one representative characteristic; the representative features are: the method is determined according to the information content of the promotion information and/or the user characteristics of the seed user; determining the incidence relation between any two users in the user set according to the same user characteristics of any two users in the user set; inputting the incidence relation and the user characteristics of the two users corresponding to the incidence relation into a preset model to obtain a first characteristic parameter and a second characteristic parameter output by the preset model, wherein the first characteristic parameter is the characteristic parameter of the candidate user; the second characteristic parameter is the characteristic parameter of the seed user; and determining whether the candidate user is a target user according to the first characteristic parameter and the second characteristic parameter.
It is understood that the same user characteristics of any two users determine the association relationship between any two users in the user set, including:
establishing an undirected graph for representing the association relationship between users, wherein the undirected graph comprises: nodes and nodes have no directional edges, and one node represents one user in one user set; two users corresponding to the two nodes with the undirected edges have at least one same user characteristic;
the non-directional edge has a weight, and the size of the weight is positively correlated with the similarity of the user characteristics between the two users corresponding to the two nodes.
As can be understood, the inputting the association relationship and the user characteristics of the two users corresponding to the association relationship into a preset model to obtain a first characteristic parameter and a second characteristic parameter output by the preset model includes:
determining input characteristic parameters of the mth candidate user according to the undirected graph, wherein the input characteristic parameters of the mth candidate user comprise: a user characteristic of the mth candidate user, and one or more association characteristic values indicating the association relationship; one of the associated feature values includes: the user characteristics of the nth user and the non-edge weight between the mth candidate user and the node corresponding to the nth user; wherein the nth user is: the node corresponding to the nth user and the node corresponding to the mth candidate user have the candidate user without the directional edge or the seed user, and both n and m are positive integers;
and inputting the input characteristic value of the mth candidate user into the preset model to obtain a first characteristic parameter of the mth candidate user output by the preset model.
As can be understood, the inputting the association relationship and the user characteristics of the two users corresponding to the association relationship into a preset model to obtain a first characteristic parameter and a second characteristic parameter output by the preset model includes:
according to the undirected graph, determining input characteristic parameters of an s-th seed user, wherein the input characteristic parameters of the s-th seed user comprise: the user characteristics of the s seed user and one or more association characteristic values indicating the association relationship; one of the associated feature values includes: the user characteristics of the z-th user and the non-edge weight between nodes corresponding to the s-th seed user; wherein, the z-th user is: the node corresponding to the z-th user and the node corresponding to the s-th seed user have the candidate user or seed user without the directional edge, wherein s and z are both positive integers;
and inputting the input characteristic value of the s-th seed user into the preset model to obtain a second characteristic parameter of the s-th seed user output by the preset model.
It is to be understood that the determining whether the candidate user is the target user according to the first feature parameter and the second feature parameter includes:
clustering the seed users according to the second characteristic parameters to obtain clustering results;
determining the third characteristic parameter according to the clustering result;
and determining whether the pth candidate user is the target user or not according to the similarity between the first characteristic parameter and the third characteristic parameter of the pth candidate user, wherein p is a positive integer.
As can be understood, the determining whether the candidate user is a target user according to the first feature parameter of the candidate user and the second feature parameter of the seed user includes:
mapping the candidate user and the seed user into a feature space according to the first feature parameter and the second feature parameter;
dividing the feature space for X times based on locality sensitive hashing to obtain X seed user concentrations of the yth candidate user; wherein X is a positive integer; each time the feature space is divided to obtain a subspace, the concentration of the xth seed user of the yth candidate user is: the seed user concentration of the candidate user in the subspace of the xth division; x is a positive integer less than X;
and determining whether the ith candidate user is the target user or not according to the X seed user concentrations of the ith candidate user.
As can be appreciated, the method further comprises:
determining to select representative features from the user features of the seed users according to the expected number of the target users; wherein the desired number is positively correlated with the number of representative features;
the candidate users are users having at least one of the representative features.
As will be appreciated, the representative features include:
a user attribute feature;
a user preference feature. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (18)

1. A method for target user determination, the method comprising:
constructing a user set, wherein the user set comprises: a seed user and a candidate user; the candidate users are: a user selected from the group of potential users having at least one representative characteristic; the representative features are: the method is determined according to the information content of the promotion information and/or the user characteristics of the seed user;
determining the incidence relation between any two users in the user set according to the same user characteristics of any two users in the user set;
inputting the incidence relation and the user characteristics of the two users corresponding to the incidence relation into a preset model to obtain a first characteristic parameter and a second characteristic parameter output by the preset model, wherein the first characteristic parameter is the characteristic parameter of the candidate user; the second characteristic parameter is the characteristic parameter of the seed user;
and determining whether the candidate user is a target user according to the first characteristic parameter and the second characteristic parameter.
2. The method according to claim 1, wherein the determining the association relationship between any two users in the user set according to the same user characteristics of any two users in the user set comprises:
establishing an undirected graph for representing the association relationship between users, wherein the undirected graph comprises: nodes and nodes have no directional edges, and one node represents one user in one user set; two users corresponding to the two nodes with the undirected edges have at least one same user characteristic;
the non-directional edge has a weight, and the size of the weight is positively correlated with the similarity of the user characteristics between the two users corresponding to the two nodes.
3. The method according to claim 2, wherein the step of inputting the association relationship and the user characteristics of the two users corresponding to the association relationship into a preset model to obtain a first characteristic parameter and a second characteristic parameter output by the preset model comprises:
determining input characteristic parameters of the mth candidate user according to the undirected graph, wherein the input characteristic parameters of the mth candidate user comprise: a user characteristic of the mth candidate user, and one or more association characteristic values indicating the association relationship; one of the associated feature values includes: the user characteristics of the nth user and the non-edge weight between the mth candidate user and the node corresponding to the nth user; wherein the nth user is: the node corresponding to the nth user and the node corresponding to the mth candidate user have the candidate user without the directional edge or the seed user, and both n and m are positive integers;
and inputting the input characteristic value of the mth candidate user into the preset model to obtain a first characteristic parameter of the mth candidate user output by the preset model.
4. The method according to claim 2, wherein the step of inputting the association relationship and the user characteristics of the two users corresponding to the association relationship into a preset model to obtain a first characteristic parameter and a second characteristic parameter output by the preset model comprises:
according to the undirected graph, determining input characteristic parameters of an s-th seed user, wherein the input characteristic parameters of the s-th seed user comprise: the user characteristics of the s seed user and one or more association characteristic values indicating the association relationship; one of the associated feature values includes: the user characteristics of the z-th user and the non-edge weight between nodes corresponding to the s-th seed user; wherein, the z-th user is: the node corresponding to the z-th user and the node corresponding to the s-th seed user have the candidate user or seed user without the directional edge, wherein s and z are both positive integers;
and inputting the input characteristic value of the s-th seed user into the preset model to obtain a second characteristic parameter of the s-th seed user output by the preset model.
5. The method according to any one of claims 1 to 4, wherein the determining whether the candidate user is the target user according to the first feature parameter and the second feature parameter comprises:
clustering the seed users according to the second characteristic parameters to obtain clustering results;
determining the third characteristic parameter according to the clustering result;
and determining whether the pth candidate user is the target user or not according to the similarity between the first characteristic parameter and the third characteristic parameter of the pth candidate user, wherein p is a positive integer.
6. The method according to any one of claims 1 to 4, wherein the determining whether the candidate user is the target user according to the first feature parameter and the second feature parameter comprises:
mapping the candidate user and the seed user into a feature space according to the first feature parameter and the second feature parameter;
dividing the feature space for X times based on locality sensitive hashing to obtain X seed user concentrations of the yth candidate user; wherein X is a positive integer; each time the feature space is divided to obtain a subspace, the concentration of the xth seed user of the yth candidate user is: the seed user concentration of the candidate user in the subspace of the xth division; x is a positive integer less than X;
and determining whether the ith candidate user is the target user or not according to the X seed user concentrations of the ith candidate user.
7. The method according to any one of claims 1 to 4, further comprising:
determining to select representative features from the user features of the seed users according to the expected number of the target users; wherein the desired number is positively correlated with the number of representative features;
the candidate users are users having at least one of the representative features.
8. The method of claim 7, wherein the representative features comprise:
a user attribute feature;
a user preference feature.
9. A target user determination apparatus, the apparatus comprising:
a construction module configured to construct a user set, wherein the user set includes: a seed user and a candidate user; the candidate users are: a user selected from the group of potential users having at least one representative characteristic; the representative features are: the method is determined according to the information content of the promotion information and/or the user characteristics of the seed user;
the first determining module is used for determining the incidence relation between any two users in the user set according to the same user characteristics of any two users in the user set;
an obtaining module, configured to input the association relationship and user characteristics of two users corresponding to the association relationship into a preset model, and obtain a first characteristic parameter and a second characteristic parameter output by the preset model, where the first characteristic parameter is a characteristic parameter of the candidate user; the second characteristic parameter is the characteristic parameter of the seed user;
and the second determining module is used for determining whether the candidate user is the target user according to the first characteristic parameter and the second characteristic parameter.
10. The apparatus according to claim 9, wherein the first determining module is specifically configured to establish an undirected graph that characterizes an association relationship between users, where the undirected graph includes: nodes and nodes have no directional edges, and one node represents one user in one user set; two users corresponding to the two nodes with the undirected edges have at least one same user characteristic;
the non-directional edge has a weight, and the size of the weight is positively correlated with the similarity of the user characteristics between the two users corresponding to the two nodes.
11. The apparatus according to claim 410, wherein the second determining module is specifically configured to determine, according to the undirected graph, input feature parameters of an mth candidate user, wherein the input feature parameters of the mth candidate user include: a user characteristic of the mth candidate user, and one or more association characteristic values indicating the association relationship; one of the associated feature values includes: the user characteristics of the nth user and the non-edge weight between the mth candidate user and the node corresponding to the nth user; wherein the nth user is: the node corresponding to the nth user and the node corresponding to the mth candidate user have the candidate user without the directional edge or the seed user, and both n and m are positive integers;
and inputting the input characteristic value of the mth candidate user into the preset model to obtain a first characteristic parameter of the mth candidate user output by the preset model.
12. The apparatus of claim 10, wherein the obtaining module is specifically configured to determine an input feature parameter of an s-th seed user according to the undirected graph, where the input feature parameter of the s-th seed user includes: the user characteristics of the s seed user and one or more association characteristic values indicating the association relationship; one of the associated feature values includes: the user characteristics of the z-th user and the non-edge weight between nodes corresponding to the s-th seed user; wherein, the z-th user is: the node corresponding to the z-th user and the node corresponding to the s-th seed user have the candidate user or seed user without the directional edge, wherein s and z are both positive integers; and inputting the input characteristic value of the s-th seed user into the preset model to obtain a second characteristic parameter of the s-th seed user output by the preset model.
13. The apparatus according to any one of claims 9 to 12, wherein the second determining module is specifically configured to cluster the seed users according to the second feature parameter to obtain a clustering result;
determining the third characteristic parameter according to the clustering result;
and determining whether the pth candidate user is the target user or not according to the similarity between the first characteristic parameter and the third characteristic parameter of the pth candidate user, wherein p is a positive integer.
14. The apparatus according to any one of claims 9 to 12, wherein the second determining module is specifically configured to map the candidate user and the seed user into a feature space according to the first feature parameter and the second feature parameter; dividing the feature space for X times based on locality sensitive hashing to obtain X seed user concentrations of the yth candidate user; wherein X is a positive integer; each time the feature space is divided to obtain a subspace, the concentration of the xth seed user of the yth candidate user is: the seed user concentration of the candidate user in the subspace of the xth division; x is a positive integer less than X; and determining whether the ith candidate user is the target user or not according to the X seed user concentrations of the ith candidate user.
15. The apparatus of any one of claims 9 to 12, further comprising:
a third determining module, configured to determine to select a representative feature from the user features of the seed user according to the expected number of the target users; wherein the desired number is positively correlated with the number of representative features; the candidate users are users having at least one of the representative features.
16. The apparatus of claim 15, wherein the representative features comprise:
a user attribute feature;
a user preference feature.
17. An electronic device, comprising:
a memory for storing processor-executable instructions;
a processor coupled to the memory;
wherein the processor is configured to perform the target user determination method as provided in any one of claims 1 to 8.
18. A non-transitory computer-readable storage medium, instructions in which, when executed by a processor of a computer, enable the computer to perform the target user determination method of any one of claims 1 to 8.
CN202111612769.5A 2021-12-27 2021-12-27 Target user determination method and device, electronic equipment and storage medium Pending CN114398960A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111612769.5A CN114398960A (en) 2021-12-27 2021-12-27 Target user determination method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111612769.5A CN114398960A (en) 2021-12-27 2021-12-27 Target user determination method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114398960A true CN114398960A (en) 2022-04-26

Family

ID=81226182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111612769.5A Pending CN114398960A (en) 2021-12-27 2021-12-27 Target user determination method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114398960A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114792256A (en) * 2022-06-23 2022-07-26 上海维智卓新信息科技有限公司 Population expansion method and device based on model selection

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114792256A (en) * 2022-06-23 2022-07-26 上海维智卓新信息科技有限公司 Population expansion method and device based on model selection

Similar Documents

Publication Publication Date Title
CN109800325B (en) Video recommendation method and device and computer-readable storage medium
CN110297848B (en) Recommendation model training method, terminal and storage medium based on federal learning
CN104239408B (en) The data access of content based on the image recorded by mobile device
CN106326391B (en) Multimedia resource recommendation method and device
CN109993627B (en) Recommendation method, recommendation model training device and storage medium
CN112069414A (en) Recommendation model training method and device, computer equipment and storage medium
US11269966B2 (en) Multi-classifier-based recommendation method and device, and electronic device
CN111429161B (en) Feature extraction method, feature extraction device, storage medium and electronic equipment
CN113515942A (en) Text processing method and device, computer equipment and storage medium
WO2021155691A1 (en) User portrait generating method and apparatus, storage medium, and device
CN112364204A (en) Video searching method and device, computer equipment and storage medium
CN110727864B (en) User portrait method based on mobile phone App installation list
CN112528164B (en) User collaborative filtering recall method and device
CN111709398A (en) Image recognition method, and training method and device of image recognition model
CN110909222A (en) User portrait establishing method, device, medium and electronic equipment based on clustering
CN110245293A (en) A kind of Web content recalls method and apparatus
CN111667024B (en) Content pushing method, device, computer equipment and storage medium
CN113657087A (en) Information matching method and device
CN113722546B (en) Abnormal user account acquisition method and device, electronic equipment and storage medium
CN115757952A (en) Content information recommendation method, device, equipment and storage medium
CN114398960A (en) Target user determination method and device, electronic equipment and storage medium
CN112926310A (en) Keyword extraction method and device
CN115730125A (en) Object identification method and device, computer equipment and storage medium
CN113450167A (en) Commodity recommendation method and device
CN112632275B (en) Crowd clustering data processing method, device and equipment based on personal text information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination