CN115630996A - User crowd spreading method and related equipment - Google Patents

User crowd spreading method and related equipment Download PDF

Info

Publication number
CN115630996A
CN115630996A CN202211380029.8A CN202211380029A CN115630996A CN 115630996 A CN115630996 A CN 115630996A CN 202211380029 A CN202211380029 A CN 202211380029A CN 115630996 A CN115630996 A CN 115630996A
Authority
CN
China
Prior art keywords
user
seed
dimension
diffusion
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211380029.8A
Other languages
Chinese (zh)
Inventor
蔡凡华
胡万利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Bank Co Ltd
Original Assignee
Ping An Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Bank Co Ltd filed Critical Ping An Bank Co Ltd
Priority to CN202211380029.8A priority Critical patent/CN115630996A/en
Publication of CN115630996A publication Critical patent/CN115630996A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0269Targeted advertisements based on user profile or attribute
    • G06Q30/0271Personalized advertisement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0261Targeted advertisements based on user location

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a user crowd diffusion method and related equipment, which can obtain a seed user group and a candidate user group; for each seed user in the seed user group, performing feature extraction on the seed user in at least one dimension to obtain feature information of the seed user in the at least one dimension; determining target characteristic information of the seed user according to the characteristic information of the seed user in each dimension; performing diffusion treatment on the seed user in the candidate user group according to the target characteristic information to obtain a diffusion user corresponding to the seed user; and determining a diffusion user group corresponding to the seed user group based on the diffusion users corresponding to the seed users. The method and the device can improve the accuracy of user crowd diffusion.

Description

User crowd spreading method and related equipment
Technical Field
The application relates to the technical field of computers, in particular to a user crowd diffusion method and related equipment.
Background
With the development of internet technology, various instant messaging and social applications have come into play. The large amount of user data involved in instant messaging and social applications, such as user preferences, age, and needs, is of great significance for the delivery of information, such as advertisements.
In the related technical scheme, people who have the same requirements and interests for products and services and are collected under a specific business scene are called seed people, and the seed people are usually small in number and generally less than one hundred thousand; people who have the same needs and interests as the seed population are called extended populations, and the number of the extended populations is usually multiple times of that of the seed population.
In the prior art, when advertisement delivery is carried out, extended crowds with the same user portrait label are generally searched based on the user portrait label of the seed crowd, and then the extended crowds are used as target users for advertisement delivery.
Disclosure of Invention
The embodiment of the application provides a user crowd spreading method and related equipment, wherein the related equipment comprises a user crowd spreading device, electronic equipment, a computer readable storage medium and a computer program product, and the accuracy of user crowd spreading can be improved.
The embodiment of the application provides a user crowd diffusion method, which comprises the following steps:
acquiring a seed user group and a candidate user group;
for each seed user in the seed user group, performing feature extraction on the seed user in at least one dimension to obtain feature information of the seed user in the at least one dimension;
determining target characteristic information of the seed user according to the characteristic information of the seed user in each dimension;
performing diffusion treatment on the seed users in the candidate user group according to the target characteristic information to obtain diffusion users corresponding to the seed users;
and determining a diffusion user group corresponding to the seed user group based on the diffusion users corresponding to the seed users.
Correspondingly, the embodiment of the application provides a user crowd diffusion device, including:
the acquisition unit is used for acquiring a seed user group and a candidate user group;
the characteristic extraction unit is used for extracting characteristics of the seed users in at least one dimension aiming at each seed user in the seed user group to obtain characteristic information of the seed users in the at least one dimension;
the first determining unit is used for determining target characteristic information of the seed user according to the characteristic information of the seed user in each dimension;
the diffusion unit is used for performing diffusion processing on the seed users in the candidate user group according to the target characteristic information to obtain diffusion users corresponding to the seed users;
and the second determining unit is used for determining the diffusion user group corresponding to the seed user group based on the diffusion users corresponding to the seed users.
The electronic device provided by the embodiment of the application comprises a processor and a memory, wherein a plurality of instructions are stored in the memory, and the instructions are loaded by the processor to execute the steps in the user crowd spreading method provided by the embodiment of the application.
The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the user crowd diffusion method provided by the embodiment of the present application.
In addition, a computer program product is provided in an embodiment of the present application, and includes a computer program or instructions, and the computer program or instructions, when executed by a processor, implement the steps in the user crowd diffusion method provided in the embodiment of the present application.
The embodiment of the application provides a user crowd diffusion method and related equipment, which can acquire a seed user crowd and a candidate user crowd; for each seed user in the seed user group, performing feature extraction on the seed user in at least one dimension to obtain feature information of the seed user in the at least one dimension; determining target characteristic information of the seed user according to the characteristic information of the seed user in each dimension; performing diffusion treatment on the seed user in the candidate user group according to the target characteristic information to obtain a diffusion user corresponding to the seed user; and determining a diffusion user group corresponding to the seed user group based on the diffusion users corresponding to the seed users. The method and the device can improve the accuracy of user crowd diffusion.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1a is a scene schematic diagram of a user population diffusion method provided in an embodiment of the present application;
FIG. 1b is a flowchart of a user population diffusion method provided in the embodiment of the present application;
FIG. 1c is another flowchart of a user population diffusion method provided in the embodiment of the present application;
FIG. 2 is another flow chart of a user population diffusion method provided by the embodiment of the present application;
FIG. 3 is a schematic structural diagram of a user crowd spreading device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a user crowd spreading method and related equipment, and the related equipment can comprise a user crowd spreading device, electronic equipment, a computer readable storage medium and a computer program product. The user crowd spreading device can be specifically integrated in electronic equipment, and the electronic equipment can be equipment such as a terminal or a server.
It can be understood that the user crowd spreading method of this embodiment may be executed on the terminal, may also be executed on the server, and may also be executed by both the terminal and the server. The above examples should not be construed as limiting the present application.
As shown in fig. 1a, a user population diffusion method is performed by a terminal and a server together. The user crowd spreading system provided by the embodiment of the application comprises a terminal 10, a server 11 and the like; the terminal 10 and the server 11 are connected via a network, for example, a wired or wireless network, wherein the user population diffusion device may be integrated into the server.
The server 11 may be configured to: acquiring a seed user group and a candidate user group; for each seed user in the seed user group, performing feature extraction on the seed user in at least one dimension to obtain feature information of the seed user in the at least one dimension; determining target characteristic information of the seed user according to the characteristic information of the seed user in each dimension; performing diffusion treatment on the seed user in the candidate user group according to the target characteristic information to obtain a diffusion user corresponding to the seed user; and determining a diffusion user group corresponding to the seed user group based on the diffusion users corresponding to the seed users. The server 11 may be a single server, or may be a server cluster or a cloud server composed of a plurality of servers.
The terminal 10 may be configured to: sending a seed user group to a server 11, so that the server 11 performs diffusion processing on the seed user group to obtain a diffusion user group corresponding to the seed user group; the terminal 10 may further receive the diffusion user group corresponding to the seed user group sent by the server 11. The terminal 10 may include a mobile phone, a smart television, a tablet Computer, a notebook Computer, a Personal Computer (PC), or the like. A client, which may be an application client or a browser client or the like, may also be provided on the terminal 10.
The embodiment of the application provides a user crowd diffusion method, which relates to natural language processing and machine learning directions in the field of artificial intelligence.
Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Among them, natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question answering, knowledge mapping, and the like.
Machine Learning (ML) is a multi-domain cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis and algorithm complexity theory. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formula learning.
The following are detailed below. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.
The embodiment will be described in the light of a user crowd spreading device, which may be specifically integrated in an electronic device, where the electronic device may be a server or a terminal.
As shown in fig. 1b, the specific process of the user population diffusion method may be as follows:
101. and acquiring a seed user group and a candidate user group.
The seed user group includes at least one seed user, and the seed user in a certain business scenario can be understood as a user who has a clear desire to show products and services in the business scenario (for example, games or advertisements).
In this embodiment, the candidate user group may be specifically a potential user group in the service scenario, where there are users who may be interested in products and services in the service scenario and users who may not be interested in products and services in the service scenario. The potential user group may be a full user group excluding seed users, or may be a subset screened from the full user group, for example, users who obviously do not meet expectations may be screened according to the user profile, that is, users who obviously do not interest in products and services in the service scene are removed, and the screened full user group is used as the potential user group to reduce the subsequent calculation amount. According to the method and the device, based on the seed user group in a certain service scene, the diffusion users with higher preference degrees on products and services in the service scene can be determined from the candidate user group.
In a specific scenario, such as an advertisement delivery scenario, an advertisement delivery platform generally has its own advertisement data management platform to help an advertiser to define its own potential customers, or to develop relatively accurate marketing activities for existing customers, and the goal of crowd diffusion (Lookalike) is to find out other crowds similar to a target crowd from a mass of crowds based on the target crowd (i.e., a seed user). Generally, when a new product is just put on the market, the target users are relatively few, and the seed guest groups need to be rapidly spread for marketing propagation.
Wherein, lookalike is a technology that similar crowd is sought through the seed user, can promote crowd directional accuracy. Through the mode of similar crowd extension, can help the user to find the crowd that oneself wanted to look for, improve crowd directional efficiency.
102. And aiming at each seed user in the seed user group, performing at least one dimension feature extraction on the seed user to obtain feature information of the seed user in the at least one dimension.
Optionally, in this embodiment, the step "performing, for each seed user in the seed user group, feature extraction on the seed user in at least one dimension to obtain feature information of the seed user in the at least one dimension" may include:
for each seed user in the seed user group, acquiring attribute information of the seed user in at least one dimension;
performing label coding on the attribute information of each dimension to obtain coding information corresponding to each dimension;
and performing feature embedding processing on the attribute information and the coding information in the dimension aiming at each dimension to obtain feature information of the seed user in the dimension.
Specifically, the feature space of the user may include attribute information of a plurality of dimensions, and some relevant attribute information may be obtained according to the current task under the condition that the user agrees to authorization. In some embodiments, the attribute information of the seed user in at least one dimension may include, but is not limited to, a gender, an age, an identity, an interest, a geographic location, and a brand preference of the seed user.
In one embodiment, the attribute information of the seed user may include discrete attribute information and continuous attribute information, such as user gender, hobbies, attribute information with discrete academic history, user age, and attribute information with continuous income.
In this embodiment, in order to reduce the subsequent calculation amount, discretization may be performed on the continuous attribute information, specifically, a cut or qcut function (or segmentation may be performed according to a preset step size) may be used to divide the index range of the continuous attribute information into a series of interval ranges, where the segmented interval ranges are similar to [0,10], [11,20], [21,30], [31,40] …, and it can be understood that, if the required precision is higher, the segmentation of the index range of the continuous attribute information may be thinner; after these processes, all the continuous attribute information is converted into discrete attribute information.
Both the cut function and the qcut function are functions for performing binning processing on data.
The tag encoding of the attribute information may specifically be performed by Labelencoder, which is a tool class that can be used to normalize a tag.
For example, when the attribute information is gender, after the tag is encoded, the encoded information corresponding to the attribute information "male" may be 1, and the encoded information corresponding to the attribute information "female" may be 2.
Optionally, in this embodiment, the step of performing, for each dimension, feature embedding processing on the attribute information and the encoding information in the dimension to obtain feature information of the seed user in the dimension may include:
for each dimension, performing fusion processing on the attribute information and the coding information on the dimension to obtain attribute coding information on the dimension;
and performing characteristic embedding processing on the attribute coding information to obtain characteristic information of the seed user on the dimension.
There are various ways of performing fusion processing on the attribute information and the encoding information, which is not limited in this embodiment, for example, the way of performing fusion processing may be weighted fusion, or splicing processing.
The characteristic embedding processing is performed on the attribute coding information, and specifically may include performing convolution operation and pooling operation on the attribute coding information. Specifically, the attribute encoding information may be subjected to a feature embedding process by a word vector model, which may be a word2vec (word to vector) model. It should be understood that the word vector model of the present embodiment is not limited to the above-listed models.
word2vec is an NLP (natural Language processing) tool, and is characterized in that words can be converted into vectors to be expressed, so that the relationship between words can be quantitatively measured, and the association between words can be mined. Word2vec is a kind of neural network model, which mainly includes three layers of input layer (input layer), projection layer (projection layer) and output layer (output layer).
In the embodiment, the attribute information of each dimension of the seed user can be subjected to label coding through a Labelencoder to obtain coding information corresponding to each dimension; and splicing the coded information processed by the Labelencoder of the corresponding dimension after the attribute information of each dimension so as to obtain a character string text corresponding to each dimension, wherein the character string text is also the attribute coded information. Then, the word2vec model can be used for carrying out feature embedding processing on the processed character string text to obtain a word vector of each dimension, namely feature information on each dimension.
103. And determining target characteristic information of the seed user according to the characteristic information of the seed user in each dimension.
Optionally, in this embodiment, the step of "determining the target feature information of the seed user according to the feature information of the seed user in each dimension" may include:
fusing the feature information of the seed user in each dimension to obtain fused feature information corresponding to the seed user;
and determining the target characteristic information of the seed user according to the dimension quantity corresponding to the fused characteristic information and the characteristic information of the seed user.
There are various ways to fuse the feature information in each dimension, which is not limited in this embodiment, and the fusion way may be weighting operation, or stitching processing, etc.
The target feature information of the seed user is determined according to the fused feature information and the dimension number of the feature information, and specifically, the target feature information is obtained by performing a division operation on the fused feature information and the dimension number of the feature information.
The number of dimensions of the feature information is specifically the number of word vectors in the above embodiment.
104. And performing diffusion treatment on the seed users in the candidate user group according to the target characteristic information to obtain diffusion users corresponding to the seed users.
The favorite group characteristics of a certain service scene can be obtained through the target characteristic information of the seed user in the service scene. Based on the favorite group characteristics, the diffusion users corresponding to the seed users can be obtained.
Specifically, in order to increase the screening speed of the candidate users, the embodiment may perform diffusion processing on the seed users by using Faiss. Faiss is an artificial intelligence similarity search method which can be used for calculating the similarity between the feature information of candidate users and the feature information of seed users and determining diffusion users similar to the seed users in a candidate user group.
There may be various methods for calculating the similarity between the feature information of the candidate user and the target feature information of the seed user by using Faiss. For example, the similarity may be calculated by euclidean distance, or may be calculated by inner product processing; in some embodiments, this can be done by setting indexivflat index, which can greatly increase the speed of the Faiss search, but this type of index needs to add a training phase and needs to be executed on a vector set having the same distribution as the database vector (i.e. the feature information of the candidate user).
Optionally, in this embodiment, before the step of performing diffusion processing on the seed user in the candidate user group according to the target feature information to obtain a diffusion user corresponding to the seed user, "the method may further include:
based on the characteristic information of each candidate user in the candidate user group on at least one dimension, clustering the candidate users to obtain at least one category user set and a clustering center corresponding to each category user set, wherein each category user set comprises at least one candidate user;
the step of performing diffusion processing on the seed user in the candidate user group according to the target feature information to obtain a diffusion user corresponding to the seed user may include:
calculating the characteristic distance between the target characteristic information of the seed user and the clustering center corresponding to the user set of each category;
and performing diffusion treatment on the seed users according to the characteristic distance to obtain diffusion users corresponding to the seed users.
The cluster center of the user set may be a representative of candidate users in the user set, and it may be a center point of each candidate user in the user set in the feature space. Specifically, the cluster center may be feature information of a candidate user in the user set, or may be just a representation of a group feature of each candidate user of the user set, and does not specifically point to a candidate user of the user set. It will be appreciated that the cluster center may be considered as the cluster center in the feature space for each candidate user of the set of users.
It should be noted that the feature distance between the feature information of the candidate user in the user set of each category and the cluster center of the user set is smaller than a preset value, and the preset value may be set according to an actual situation.
In some embodiments, the cluster center of the user set may be obtained by performing fusion processing on feature information of each candidate user in the user set, and the fusion manner is multiple.
The feature distance between the target feature information and the clustering center corresponding to the user set of each category may specifically refer to a vector distance between the target feature information and the clustering center, the vector distance may measure a similarity between the target feature information and the clustering center, and the larger the vector distance is, the smaller the similarity is, that is, the smaller the preference degree to the product and the service in the service scene is; conversely, the smaller the vector distance, the higher the similarity, which indicates that the preference degree for the product and the service in the service scene is higher, and the more easily the vector distance is converted into a new user of the service scene. The vector distance may include a cosine distance, a euclidean distance, or a hamming distance, etc.
Optionally, in this embodiment, the step of performing diffusion processing on the seed user according to the characteristic distance to obtain a diffusion user corresponding to the seed user may include:
according to the characteristic distance, determining a similar user set matched with the seed user from the user sets of all categories;
calculating the similarity between the seed user and the candidate users in the similar user set;
and determining diffusion users corresponding to the seed users according to the similarity.
In some embodiments, the user set corresponding to the cluster center with the smallest feature distance of the target feature information may be determined as the similar user set matching the seed user. In other embodiments, the user set corresponding to the cluster center whose feature distance from the target feature information is smaller than the preset value may also be determined as a similar user set matched with the seed user, and the preset value may be set according to an actual situation.
The diffusion users corresponding to the seed users are determined according to the similarity, and specifically, the candidate users with the similarity greater than the preset similarity can be used as the diffusion users of the seed users. The preset similarity may be set according to actual conditions, which is not limited in this embodiment. For example, the number of people who need to be released in the business can be set, and if the number of people who need to be released in the business is large, the preset similarity can be set to be relatively small; conversely, the preset similarity should be set to be larger; for example, if only 100 ten thousand investment resources are actually available, the number of the delivered objects needs to be limited, and then the preset similarity is set according to the number of the delivered objects.
In this embodiment, the diffusion user corresponding to the seed user may be obtained through indexivflat. indexivflat is a way of performing vector clustering first and then performing index query of similarity calculation, specifically, clustering a vector library (i.e., candidate users in a candidate user group) first, establishing a clustering center, then querying a clustering center closest to a query (i.e., target feature information in the above embodiment), and then comparing vectors of all candidate users in this category, thereby selecting a vector similar to the query.
105. And determining a diffusion user group corresponding to the seed user group based on the diffusion users corresponding to the seed users.
In some embodiments, a union of the diffusion users corresponding to the various sub-users may be taken as the diffusion user group of the seed user group. In other embodiments, the flooding user group of the seed user group may be determined according to the frequency information selected by the flooding user, for example, the flooding user selected with the frequency information higher than the preset frequency may be added to the flooding user group of the seed user group.
The diffusion user group of the seed user group in a certain service scene may be specifically a resource delivery object in the service scene.
In a specific embodiment, as shown in fig. 1c, a crowd spreading process of the user crowd spreading method provided by the present application is specifically described as follows:
1. taking out a data source from the database system, wherein the data source comprises a seed user and candidate users, and the seed user needs to be marked first to be distinguished from other users; then, it is determined that similar indicators (i.e., attribute information in the above embodiment) need to be determined, and a diffusion range is determined, for example, one seed user needs to diffuse to find topN users most similar to the seed user, and the parameter N may be set manually according to a service requirement.
2. For the indexes which need to be judged to be similar, the indexes can be divided into two types, one type is a discrete index (namely discrete characteristic), the other type is a continuous index, the embodiment can carry out discretization treatment on the continuous index, and thus all the indexes can be converted into the discrete index;
label coding is carried out on all discretized indexes, and the coded information after coding is spliced on each index name (namely, the attribute information) to obtain the feature text (namely, the attribute coding information in the embodiment) of each user on each index (namely, each dimension);
3. for each user, performing feature embedding processing on the feature text of the user on each index to obtain word vectors of the feature text of the user on each index, summing the word vectors corresponding to each index, and then solving an average value (namely, the sum of the word vectors is divided by the number of the word vectors), so as to obtain vector representation of the user; and traversing all samples to obtain vector representations corresponding to all users.
4. And (4) carrying out similarity vector query by utilizing Faiss to obtain topN users most similar to each seed user, thereby obtaining a diffusion user group corresponding to the seed user group.
In some embodiments, faiss may perform user population diffusion by setting indexivflat index, which requires an additional training phase.
According to the user population diffusion method, the attribute information of the users can be subjected to feature embedding processing to obtain word vectors corresponding to user features, then the word vectors are summed and averaged to obtain vector representation of each user, and finally topN vectors which are most similar to the seed customer population are searched in the database by using a Faiss vector retrieval method.
In a specific scene, if the cpu of the hardware environment is 8 cores, the memory is 48G, the number of seed users is 3000, the number of the judged similar indexes is 20, the diffusion coefficient is 20, and the total number of users is 150 thousands, by using the user population diffusion method provided by the application, a result can be obtained within 30 minutes, the accuracy of the verification model is more than 95%, and the efficiency and the accuracy of the user population diffusion are greatly improved.
As can be seen from the above, the present embodiment can obtain a seed user group and a candidate user group; for each seed user in the seed user group, performing feature extraction on the seed user in at least one dimension to obtain feature information of the seed user in the at least one dimension; determining target feature information of the seed user according to the feature information of the seed user in each dimension; performing diffusion treatment on the seed user in the candidate user group according to the target characteristic information to obtain a diffusion user corresponding to the seed user; and determining a diffusion user group corresponding to the seed user group based on the diffusion users corresponding to the seed users. The method and the device can improve the accuracy of user crowd diffusion.
The method described in the previous embodiment will be further described in detail below by way of example in which the user population diffusing device is specifically integrated in a server.
The embodiment of the application provides a user crowd diffusion method, and as shown in fig. 2, a specific flow of the user crowd diffusion method may be as follows:
201. the server acquires a seed user group and a candidate user group.
The seed user group includes at least one seed user, and the seed user in a certain business scenario can be understood as a user who has a clear desire to show products and services in the business scenario (for example, games or advertisements).
202. And the server performs feature extraction on at least one dimension on each seed user in the seed user group to obtain feature information of the seed user on the at least one dimension.
Optionally, in this embodiment, the step "performing, for each seed user in the seed user group, feature extraction on the seed user in at least one dimension to obtain feature information of the seed user in the at least one dimension" may include:
for each seed user in the seed user group, acquiring attribute information of the seed user in at least one dimension;
performing label coding on the attribute information on each dimension to obtain coding information corresponding to each dimension;
and performing feature embedding processing on the attribute information and the coding information in the dimension aiming at each dimension to obtain feature information of the seed user in the dimension.
In some embodiments, the attribute information of the seed user in at least one dimension may include, but is not limited to, a gender, an age, an identity, an interest, a geographic location, and a brand preference of the seed user.
The tag encoding of the attribute information may specifically be performed by Labelencoder, which is a tool class that can be used to normalize a tag.
Optionally, in this embodiment, the step of performing, for each dimension, feature embedding processing on the attribute information and the encoding information in the dimension to obtain feature information of the seed user in the dimension may include:
for each dimension, performing fusion processing on the attribute information and the coding information on the dimension to obtain attribute coding information on the dimension;
and carrying out feature embedding processing on the attribute coding information to obtain feature information of the seed user on the dimension.
There are various ways of performing fusion processing on the attribute information and the encoding information, which is not limited in this embodiment, for example, the way of performing fusion processing may be weighted fusion, or splicing processing.
The characteristic embedding processing is performed on the attribute coding information, and specifically may include performing convolution operation and pooling operation on the attribute coding information. Specifically, the attribute encoding information may be subjected to a feature embedding process by a word vector model, which may be a word2vec (word to vector) model.
203. And the server fuses the feature information of the seed user in each dimension to obtain fused feature information corresponding to the seed user.
There are various ways to fuse the feature information in each dimension, which is not limited in this embodiment, and the fusion way may be weighting operation, or stitching processing.
204. And the server determines the target characteristic information of the seed user according to the fused characteristic information and the dimension number corresponding to the characteristic information of the seed user.
The target feature information of the seed user is determined according to the fused feature information and the dimension number of the feature information, and specifically, the target feature information is obtained by performing a division operation on the fused feature information and the dimension number of the feature information.
The number of dimensions of the feature information here is specifically the number of word vectors in the above embodiment.
205. And the server performs diffusion processing on the seed user in the candidate user group according to the target characteristic information to obtain a diffusion user corresponding to the seed user.
The favorite group characteristics of a certain service scene can be obtained through the target characteristic information of the seed user in the service scene. Based on the favorite group characteristics, the diffusion users corresponding to the seed users can be obtained.
Specifically, in order to increase the screening speed of the candidate users, the embodiment may perform diffusion processing on the seed users by using Faiss. Faiss is an artificial intelligence similarity search method which can be used for calculating the similarity between the feature information of candidate users and the feature information of seed users and determining diffusion users similar to the seed users in a candidate user group.
There may be various methods for calculating the similarity between the feature information of the candidate user and the target feature information of the seed user by using Faiss. For example, the similarity may be calculated by euclidean distance, or may be calculated by inner product processing; in some embodiments, this can be done by setting indexivflat index, which can greatly increase the speed of the Faiss search, but this type of index needs to add a training phase and needs to be executed on a vector set having the same distribution as the database vector (i.e. the feature information of the candidate user).
206. And the server determines a diffusion user group corresponding to the seed user group based on the diffusion users corresponding to the seed users.
In some embodiments, a union of the diffusion users corresponding to the various sub-users may be taken as the diffusion user group of the seed user group. In other embodiments, the flooding user group of the seed user group may be determined according to the frequency information selected by the flooding user, for example, the flooding user selected with the frequency information higher than the preset frequency may be added to the flooding user group of the seed user group.
The diffusion user group of the seed user group in a certain service scene may be specifically a resource delivery object in the service scene.
As can be seen from the above, the present embodiment may obtain the seed user group and the candidate user group through the server; for each seed user in the seed user group, performing feature extraction on the seed user in at least one dimension to obtain feature information of the seed user in the at least one dimension; fusing the feature information of the seed user in each dimension to obtain fused feature information corresponding to the seed user; determining target feature information of the seed user according to the fused feature information and the dimension number corresponding to the feature information of the seed user; performing diffusion treatment on the seed user in the candidate user group according to the target characteristic information to obtain a diffusion user corresponding to the seed user; and determining a diffusion user group corresponding to the seed user group based on the diffusion users corresponding to the seed users. The method and the device can improve the accuracy of user crowd diffusion.
In order to better implement the above method, an embodiment of the present application further provides a user crowd diffusion device, as shown in fig. 3, the user crowd diffusion device may include an obtaining unit 301, a feature extracting unit 302, a first determining unit 303, a diffusing unit 304, and a second determining unit 305, as follows:
(1) An acquisition unit 301;
and the acquisition unit is used for acquiring the seed user group and the candidate user group.
(2) A feature extraction unit 302;
and the feature extraction unit is used for extracting features of the seed users in at least one dimension aiming at each seed user in the seed user group to obtain feature information of the seed users in the at least one dimension.
Optionally, in some embodiments of the present application, the feature extraction unit may include an obtaining subunit, a coding subunit, and a feature embedding subunit, as follows:
the acquiring subunit is configured to acquire, for each seed user in the seed user group, attribute information of the seed user in at least one dimension;
the encoding subunit is used for performing label encoding on the attribute information of each dimension to obtain encoding information corresponding to each dimension;
and the feature embedding subunit is used for carrying out feature embedding processing on the attribute information and the coding information in the dimension aiming at each dimension to obtain the feature information of the seed user in the dimension.
Optionally, in some embodiments of the present application, the feature embedding subunit may be specifically configured to perform fusion processing on the attribute information and the coding information in each dimension to obtain attribute coding information in the dimension; and performing characteristic embedding processing on the attribute coding information to obtain characteristic information of the seed user on the dimension.
(3) A first determination unit 303;
the first determining unit is used for determining target feature information of the seed user according to the feature information of the seed user in each dimension.
Optionally, in some embodiments of the present application, the first determining unit may include a fusion subunit and a determining subunit, as follows:
the fusion subunit is configured to fuse the feature information of the seed user in each dimension to obtain fused feature information corresponding to the seed user;
and the determining subunit is used for determining the target feature information of the seed user according to the fused feature information and the dimension quantity corresponding to the feature information of the seed user.
(4) A diffusion unit 304;
and the diffusion unit is used for performing diffusion processing on the seed users in the candidate user group according to the target characteristic information to obtain diffusion users corresponding to the seed users.
Optionally, in some embodiments of the present application, the diffusion unit may include a clustering subunit, a calculating subunit, and a diffusion subunit, as follows:
the clustering subunit is configured to perform clustering processing on the candidate users based on feature information of each candidate user in the candidate user group in at least one dimension to obtain at least one category of user set and a clustering center corresponding to each category of user set, where each category of user set includes at least one candidate user;
the calculating subunit is used for calculating the characteristic distance between the target characteristic information of the seed user and the clustering center corresponding to the user set of each category;
and the diffusion subunit is used for performing diffusion processing on the seed user according to the characteristic distance to obtain a diffusion user corresponding to the seed user.
Optionally, in some embodiments of the present application, the diffusion subunit may be specifically configured to determine, according to the feature distance, a similar user set matching the seed user from among user sets of various categories; calculating the similarity between the seed user and the candidate users in the similar user set; and determining diffusion users corresponding to the seed users according to the similarity.
(5) A second determination unit 305;
and the second determining unit is used for determining the diffusion user group corresponding to the seed user group based on the diffusion users corresponding to the seed users.
As can be seen from the above, in this embodiment, the obtaining unit 301 may obtain the seed user group and the candidate user group; performing feature extraction on at least one dimension on each seed user in the seed user group through a feature extraction unit 302 to obtain feature information of the seed user on the at least one dimension; determining target feature information of the seed user according to feature information of the seed user in each dimension through a first determining unit 303; performing diffusion processing on the seed user in the candidate user group through a diffusion unit 304 according to the target feature information to obtain a diffusion user corresponding to the seed user; the second determining unit 305 determines a diffusion user group corresponding to each seed user based on the diffusion user corresponding to the seed user group. The method and the device can improve the accuracy of user crowd diffusion.
An electronic device according to an embodiment of the present application is further provided, as shown in fig. 4, which shows a schematic structural diagram of the electronic device according to the embodiment of the present application, where the electronic device may be a terminal or a server, and specifically:
the electronic device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 4 does not constitute a limitation of the electronic device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:
the processor 401 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by operating or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.
The memory 402 may be used to store software programs and modules, and the processor 401 may execute various functional applications and user population diffusion by running the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.
The electronic device further comprises a power supply 403 for supplying power to the various components, and preferably, the power supply 403 is logically connected to the processor 401 through a power management system, so that functions of managing charging, discharging, and power consumption are realized through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The electronic device may further include an input unit 404, and the input unit 404 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the electronic device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the electronic device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, thereby implementing various functions as follows:
acquiring a seed user group and a candidate user group; for each seed user in the seed user group, performing feature extraction on the seed user in at least one dimension to obtain feature information of the seed user in the at least one dimension; determining target characteristic information of the seed user according to the characteristic information of the seed user in each dimension; performing diffusion treatment on the seed user in the candidate user group according to the target characteristic information to obtain a diffusion user corresponding to the seed user; and determining a diffusion user group corresponding to the seed user group based on the diffusion users corresponding to the seed users.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
As can be seen from the above, the present embodiment can obtain a seed user group and a candidate user group; for each seed user in the seed user group, performing feature extraction on the seed user in at least one dimension to obtain feature information of the seed user in the at least one dimension; determining target characteristic information of the seed user according to the characteristic information of the seed user in each dimension; performing diffusion treatment on the seed users in the candidate user group according to the target characteristic information to obtain diffusion users corresponding to the seed users; and determining a diffusion user group corresponding to the seed user group based on the diffusion users corresponding to the seed users. The method and the device can improve the accuracy of user crowd diffusion.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present application provide a computer-readable storage medium having stored therein a plurality of instructions, which can be loaded by a processor to perform the steps of any of the user population diffusion methods provided by the embodiments of the present application. For example, the instructions may perform the steps of:
acquiring a seed user group and a candidate user group; for each seed user in the seed user group, performing feature extraction on the seed user in at least one dimension to obtain feature information of the seed user in the at least one dimension; determining target characteristic information of the seed user according to the characteristic information of the seed user in each dimension; performing diffusion treatment on the seed user in the candidate user group according to the target characteristic information to obtain a diffusion user corresponding to the seed user; and determining a diffusion user group corresponding to the seed user group based on the diffusion users corresponding to the seed users.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
Wherein the computer-readable storage medium may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the computer-readable storage medium can execute the steps in any user population diffusion method provided in the embodiment of the present application, beneficial effects that can be achieved by any user population diffusion method provided in the embodiment of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The computer instructions are read by a processor of the computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the methods provided in the various alternative implementations of the user crowd spreading aspect described above.
The user crowd spreading method and the related devices provided by the embodiment of the application are introduced in detail, a specific example is applied in the description to explain the principle and the implementation of the application, and the description of the embodiment is only used for helping to understand the method and the core idea of the application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A method for diffusing a population of users, comprising:
acquiring a seed user group and a candidate user group;
for each seed user in the seed user group, performing feature extraction on the seed user in at least one dimension to obtain feature information of the seed user in the at least one dimension;
determining target characteristic information of the seed user according to the characteristic information of the seed user in each dimension;
performing diffusion treatment on the seed user in the candidate user group according to the target characteristic information to obtain a diffusion user corresponding to the seed user;
and determining a diffusion user group corresponding to the seed user group based on the diffusion users corresponding to the seed users.
2. The method of claim 1, wherein the performing, for each seed user in the seed user group, feature extraction on the seed user in at least one dimension to obtain feature information of the seed user in the at least one dimension comprises:
for each seed user in the seed user group, acquiring attribute information of the seed user in at least one dimension;
performing label coding on the attribute information of each dimension to obtain coding information corresponding to each dimension;
and performing feature embedding processing on the attribute information and the coding information in the dimension aiming at each dimension to obtain feature information of the seed user in the dimension.
3. The method according to claim 2, wherein the performing, for each dimension, a feature embedding process on the attribute information and the encoded information in the dimension to obtain the feature information of the seed user in the dimension comprises:
for each dimension, performing fusion processing on the attribute information and the coding information on the dimension to obtain attribute coding information on the dimension;
and performing characteristic embedding processing on the attribute coding information to obtain characteristic information of the seed user on the dimension.
4. The method of claim 1, wherein the determining the target feature information of the seed user according to the feature information of the seed user in each dimension comprises:
fusing the feature information of the seed user in each dimension to obtain fused feature information corresponding to the seed user;
and determining the target characteristic information of the seed user according to the dimension quantity corresponding to the fused characteristic information and the characteristic information of the seed user.
5. The method according to claim 1, wherein before performing diffusion processing on the seed user in the candidate user group according to the target feature information to obtain a diffusion user corresponding to the seed user, the method further includes:
based on the characteristic information of each candidate user in the candidate user group on at least one dimension, clustering the candidate users to obtain at least one category user set and a clustering center corresponding to each category user set, wherein each category user set comprises at least one candidate user;
the performing diffusion processing on the seed user in the candidate user group according to the target feature information to obtain a diffusion user corresponding to the seed user includes:
calculating the characteristic distance between the target characteristic information of the seed user and the clustering center corresponding to the user set of each category;
and performing diffusion treatment on the seed users according to the characteristic distance to obtain diffusion users corresponding to the seed users.
6. The method according to claim 5, wherein the performing diffusion processing on the seed user according to the characteristic distance to obtain a diffusion user corresponding to the seed user comprises:
according to the characteristic distance, determining a similar user set matched with the seed user from the user sets of all categories;
calculating the similarity between the seed user and the candidate users in the similar user set;
and determining diffusion users corresponding to the seed users according to the similarity.
7. A user crowd spreading device, comprising:
the acquisition unit is used for acquiring a seed user group and a candidate user group;
the characteristic extraction unit is used for extracting characteristics of the seed users in at least one dimension aiming at each seed user in the seed user group to obtain characteristic information of the seed users in the at least one dimension;
the first determining unit is used for determining target characteristic information of the seed user according to the characteristic information of the seed user in each dimension;
the diffusion unit is used for performing diffusion processing on the seed users in the candidate user group according to the target characteristic information to obtain diffusion users corresponding to the seed users;
and the second determining unit is used for determining the diffusion user group corresponding to the seed user group based on the diffusion users corresponding to the seed users.
8. An electronic device comprising a memory and a processor; the memory stores an application program, and the processor is configured to execute the application program in the memory to perform the operations of the user crowd spreading method according to any one of claims 1 to 6.
9. A computer readable storage medium storing instructions adapted to be loaded by a processor to perform the steps of the method for diffusing a population of users according to any one of claims 1 to 6.
10. A computer program product comprising a computer program or instructions, characterized in that the computer program or instructions, when executed by a processor, implement the steps in the method for user population diffusion according to any of claims 1 to 6.
CN202211380029.8A 2022-11-04 2022-11-04 User crowd spreading method and related equipment Pending CN115630996A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211380029.8A CN115630996A (en) 2022-11-04 2022-11-04 User crowd spreading method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211380029.8A CN115630996A (en) 2022-11-04 2022-11-04 User crowd spreading method and related equipment

Publications (1)

Publication Number Publication Date
CN115630996A true CN115630996A (en) 2023-01-20

Family

ID=84908195

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211380029.8A Pending CN115630996A (en) 2022-11-04 2022-11-04 User crowd spreading method and related equipment

Country Status (1)

Country Link
CN (1) CN115630996A (en)

Similar Documents

Publication Publication Date Title
CN111177575B (en) Content recommendation method and device, electronic equipment and storage medium
CN111444428B (en) Information recommendation method and device based on artificial intelligence, electronic equipment and storage medium
WO2022041979A1 (en) Information recommendation model training method and related device
CN111259263B (en) Article recommendation method and device, computer equipment and storage medium
CN102737333B (en) For calculating user and the offer order engine to the coupling of small segmentation
CN105893406A (en) Group user profiling method and system
CN113254711B (en) Interactive image display method and device, computer equipment and storage medium
CN111274330A (en) Target object determination method and device, computer equipment and storage medium
CN111275492A (en) User portrait generation method, device, storage medium and equipment
CN115659008A (en) Information pushing system and method for big data information feedback, electronic device and medium
CN113656699B (en) User feature vector determining method, related equipment and medium
Wang et al. Link prediction in heterogeneous collaboration networks
CN116823410B (en) Data processing method, object processing method, recommending method and computing device
CN114328800A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN115131052A (en) Data processing method, computer equipment and storage medium
CN111506718A (en) Session message determining method, device, computer equipment and storage medium
CN116956183A (en) Multimedia resource recommendation method, model training method, device and storage medium
CN116957128A (en) Service index prediction method, device, equipment and storage medium
OUAFTOUH et al. Flat and hierarchical user profile clustering in an e-commerce recommender system
CN113705247B (en) Theme model effect evaluation method, device, equipment, storage medium and product
Luo et al. DeepAttr: Inferring demographic attributes via social network embedding
Nosshi et al. Hybrid recommender system via personalized users’ context
CN115630996A (en) User crowd spreading method and related equipment
CN114817697A (en) Method and device for determining label information, electronic equipment and storage medium
CN111860870A (en) Training method, device, equipment and medium for interactive behavior determination model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination