CN112905897B - Similar user determination method, vector conversion model, device, medium and equipment - Google Patents

Similar user determination method, vector conversion model, device, medium and equipment Download PDF

Info

Publication number
CN112905897B
CN112905897B CN202110340900.0A CN202110340900A CN112905897B CN 112905897 B CN112905897 B CN 112905897B CN 202110340900 A CN202110340900 A CN 202110340900A CN 112905897 B CN112905897 B CN 112905897B
Authority
CN
China
Prior art keywords
user
vector
training
behavior
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110340900.0A
Other languages
Chinese (zh)
Other versions
CN112905897A (en
Inventor
曹偲
蒋能学
郑玮
王梓良
徐可
马雨浩
王成林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Netease Cloud Music Technology Co Ltd
Original Assignee
Hangzhou Netease Cloud Music Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Netease Cloud Music Technology Co Ltd filed Critical Hangzhou Netease Cloud Music Technology Co Ltd
Priority to CN202110340900.0A priority Critical patent/CN112905897B/en
Publication of CN112905897A publication Critical patent/CN112905897A/en
Application granted granted Critical
Publication of CN112905897B publication Critical patent/CN112905897B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the disclosure provides a similar user determination method, a vector conversion model, a similar user determination device, a computer readable storage medium and an electronic device, and relates to the technical field of data analysis and mining. The method comprises the following steps: acquiring user data of candidate users, and performing vector conversion processing on the user data to generate corresponding user behavior vectors and behavior conversion vectors; generating a candidate user vector corresponding to the candidate user based on the user behavior vector and the behavior conversion vector; determining a seed user vector corresponding to a seed user; and calculating the similarity between the candidate user vector and the seed user vector so as to determine the similar user corresponding to the seed user from the candidate users according to the similarity. According to the method and the device, the click behavior preference and the behavior conversion preference of the user are taken as the consideration factors for determining the similar users, the corresponding user vector is generated, the determination result of the similar users is obtained, and the conversion cost of the user can be further reduced.

Description

Similar user determination method, vector conversion model, device, medium and equipment
Technical Field
Embodiments of the present disclosure relate to the field of data analysis and mining technology, and more particularly, to a similar user determination method, a vector conversion model, a similar user determination apparatus, a computer-readable storage medium, and an electronic device.
Background
This section is intended to provide a background or context to the embodiments of the disclosure recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
The recommendation system generally performs crowd expansion according to a pre-provided seed crowd packet and through a Look-like crowd expansion algorithm, and finds a potential target crowd to recommend an object to be recommended so as to reduce the conversion cost in the object recommendation process.
The Look-like population expansion technology relies on user vector representation, and the existing similar population expansion algorithm generally determines the user vector representation only according to the click behavior preference of the user so as to expand the similar population.
Disclosure of Invention
Because the expression capacity of the user vector determines the quality of the effect of similar crowd expansion, when the user vector is determined, only the click behavior preference of the user is taken as the consideration factor of vector representation, the conversion preference of the user behavior is ignored, the conversion rate and the conversion cost of the user behavior acting on a recommendation object are more concerned in the recommendation system, and the conversion preference of the user is difficult to be carved by using the click behavior of the user to carry out user vector representation learning.
Therefore, the present disclosure provides an improved similar user determination method, so that when determining a user vector according to user data, the user vector can simultaneously learn behavior preference and behavior transformation preference of a user, and determine a similar population based on a dual-target user vector including the behavior preference and the behavior transformation preference.
In this context, embodiments of the present disclosure desirably provide a similar user determination method, a vector conversion model, a similar user determination apparatus, a computer-readable storage medium, and an electronic device.
In a first aspect of the disclosed embodiments, a method for determining similar users is provided, including: acquiring user data of candidate users, and performing vector conversion processing on the user data to generate corresponding user behavior vectors and behavior conversion vectors; generating a candidate user vector corresponding to the candidate user based on the user behavior vector and the behavior conversion vector; determining a seed user vector corresponding to a seed user; and calculating the similarity between the candidate user vector and the seed user vector so as to determine the similar user corresponding to the seed user from the candidate users according to the similarity.
In one embodiment of the present disclosure, a pre-constructed vector transformation model is obtained; the vector conversion model comprises a first sub-model and a second sub-model; determining candidate user characteristics of the candidate users according to the user data of the candidate users; the candidate user features are input to a vector transformation model to determine a user behavior vector through a first sub-model and a behavior transformation vector through a second sub-model.
In one embodiment of the present disclosure, the vector transformation model is obtained by training the following steps: acquiring a training sample set and acquiring an initial double-tower model so as to input the training sample set into the initial double-tower model; the training sample set comprises training user characteristics and training recommendation object characteristics corresponding to training users; determining a behavior predicted value corresponding to a training user through a first sub-model based on the training user characteristics and the training recommendation object characteristics; determining a behavior conversion predicted value corresponding to the training user through a second sub-model based on the training user characteristics and the training recommendation object characteristics; determining a loss function of the initial double-tower model based on the behavior predicted value and the behavior conversion predicted value; and synchronously updating the model parameters of the first sub-model and the model parameters of the second sub-model through the loss function to obtain the vector conversion model.
In an embodiment of the present disclosure, before obtaining the training sample set, the method further includes: acquiring user behavior data of an initial training user, and determining initial user characteristics and initial recommendation object characteristics corresponding to the initial training user according to the user behavior data; the initial user features comprise initial user portrait features and initial user behavior features; generating an initial training sample set according to the initial user portrait characteristics, the initial user behavior characteristics and the initial recommended object characteristics; the initial training sample set comprises a first sample subset and a second sample subset; carrying out sample collection processing on the initial training sample set so that the quantity proportion between the first sample subset and the second sample subset is in a preset value interval; and taking the initial training sample set subjected to the sample collection processing as a training sample set.
In an embodiment of the present disclosure, determining a behavior prediction value corresponding to a training user through a first sub-model based on training user characteristics and training recommendation object characteristics includes: inputting the training user characteristics and the training recommendation object characteristics into a first sub-model, and determining the user behavior vector representation and the first recommendation object vector of the training user by the first sub-model; determining cosine similarity between the user behavior vector representation and the first recommended object vector as first initial similarity; and performing multiple amplification processing on the first initial similarity to obtain a first similarity, and determining a behavior predicted value according to the first similarity.
In an embodiment of the present disclosure, determining a behavior conversion predicted value corresponding to a training user through a second sub-model based on a training user characteristic and a training recommendation object characteristic includes: inputting the training user characteristics and the training recommendation object characteristics into a second submodel, and determining a user transformation vector representation and a second recommendation object vector of a training user by the second submodel; determining cosine similarity between the user conversion vector representation and the second recommendation object vector as a second initial similarity; and performing multiple amplification processing on the second initial similarity to obtain a second similarity, and determining a behavior conversion predicted value according to the second similarity.
In one embodiment of the present disclosure, determining a loss function of the initial double tower model based on the behavior prediction value and the behavior conversion prediction value comprises: determining a first loss function based on the behavior predicted value, and determining a first weight corresponding to the first loss function; determining a second loss function based on the behavior conversion predicted value, and determining a second weight corresponding to the second loss function; and performing weighted summation processing on the first loss function and the second loss function according to the first weight and the second weight to determine the loss function.
In one embodiment of the present disclosure, determining a seed user vector corresponding to a seed user includes: acquiring seed user data corresponding to seed users; the seed user data includes seed user identification; determining a seed user vector corresponding to the seed user from the candidate user vectors according to the seed user identification; or acquiring a vector conversion model constructed in advance; and carrying out vector conversion processing on the seed user data through a vector conversion model to generate a seed user vector.
In one embodiment of the present disclosure, calculating a similarity between the candidate user vector and the seed user vector comprises: clustering the seed user vectors to obtain a plurality of clustering centers corresponding to the seed user vectors; determining a clustering center vector corresponding to each clustering center; and calculating the average similarity between the candidate user vector and each cluster center vector to determine the similarity.
In one embodiment of the present disclosure, determining similar users corresponding to the seed user from the candidate users according to the similarity includes: acquiring a user expansion condition, and determining the number of similar users to be expanded according to the user expansion condition; determining a ranking result of similarities between the plurality of candidate user vectors and the seed user vector; and determining similar users from the candidate users according to the sorting result.
In a second aspect of embodiments of the present disclosure, there is provided a vector conversion model, comprising: the first submodel is used for determining a behavior predicted value corresponding to the training user according to the training user characteristics and the training recommendation object characteristics of the training user; the second submodel is used for determining a behavior conversion predicted value corresponding to the training user according to the training user characteristic and the training recommendation object characteristic; the second submodel is independent from the first submodel; the matching layer is used for carrying out weighted summation processing on the behavior predicted value and the behavior conversion predicted value to obtain a model output value of the vector conversion model; and reversely and synchronously updating the model parameters of the first sub-model and the model parameters of the second sub-model according to the model output values.
In one embodiment of the present disclosure, the first submodel includes a first input layer for receiving training user characteristics and training recommendation object characteristics; the first representation layer is used for carrying out first conversion processing on the training user characteristics and the training recommendation object characteristics to obtain user behavior vector representations and first recommendation object vectors corresponding to training users; and determining a first similarity between the user behavior vector representation and the first recommendation object vector, so that the matching layer determines a behavior prediction value according to the first similarity.
In one embodiment of the present disclosure, the second submodel includes a second input layer for receiving training user characteristics and training recommendation object characteristics; the second representation layer is used for carrying out second conversion processing on the training user characteristics and the training recommendation object characteristics to obtain user conversion vector representations and second recommendation object vectors corresponding to the training users; and determining a second similarity between the user conversion vector representation and the second recommended object vector, so that the matching layer determines the behavior conversion predicted value according to the second similarity.
In a third aspect of the disclosed embodiments, there is provided a similar user determination apparatus comprising: the vector conversion module is used for acquiring user data of the candidate user and performing vector conversion processing on the user data to generate a corresponding user behavior vector and a corresponding behavior conversion vector; the vector generation module is used for generating a candidate user vector corresponding to the candidate user based on the user behavior vector and the behavior conversion vector; the seed user vector determining module is used for determining a seed user vector corresponding to a seed user; and the similar user determining module is used for calculating the similarity between the candidate user vector and the seed user vector so as to determine the similar user corresponding to the seed user from the candidate users according to the similarity.
In one embodiment of the present disclosure, the vector conversion module includes a vector conversion unit configured to: acquiring a vector conversion model constructed in advance; the vector conversion model comprises a first submodel and a second submodel; determining candidate user characteristics of the candidate users according to the user data of the candidate users; the candidate user features are input to a vector transformation model to determine a user behavior vector through a first sub-model and a behavior transformation vector through a second sub-model.
In one embodiment of the present disclosure, the vector conversion module further includes a model training unit, the model training unit including: the data input subunit is used for acquiring a training sample set and acquiring an initial double-tower model so as to input the training sample set to the initial double-tower model; the training sample set comprises training user characteristics and training recommendation object characteristics corresponding to training users; the first training subunit is used for determining a behavior predicted value corresponding to a training user through a first submodel based on the training user characteristics and the training recommendation object characteristics; the second training subunit is used for determining a behavior conversion predicted value corresponding to the training user through a second submodel based on the training user characteristics and the training recommendation object characteristics; the loss function determining subunit is used for determining a loss function of the initial double-tower model based on the behavior predicted value and the behavior conversion predicted value; and the model training subunit is used for synchronously updating the model parameters of the first sub-model and the model parameters of the second sub-model through the loss function so as to obtain the vector conversion model.
In one embodiment of the present disclosure, the vector conversion module further comprises a sample set determination unit configured to: acquiring user behavior data of an initial training user, and determining initial user characteristics and initial recommendation object characteristics corresponding to the initial training user according to the user behavior data; the initial user characteristics comprise initial user portrait characteristics and initial user behavior characteristics; generating an initial training sample set according to the initial user portrait characteristics, the initial user behavior characteristics and the initial recommended object characteristics; the initial training sample set comprises a first sample subset and a second sample subset; carrying out sample collection processing on the initial training sample set so as to enable the quantity ratio between the first sample subset and the second sample subset to be in a preset value interval; and taking the initial training sample set subjected to the sample collection processing as a training sample set.
In one embodiment of the disclosure, the first training subunit is configured to: inputting the training user characteristics and the training recommendation object characteristics into a first sub-model, and determining the user behavior vector representation and the first recommendation object vector of the training user by the first sub-model; determining cosine similarity between the user behavior vector representation and the first recommended object vector as first initial similarity; and performing multiple amplification processing on the first initial similarity to obtain a first similarity, and determining a behavior predicted value according to the first similarity.
In one embodiment of the disclosure, the second training subunit is configured to: inputting the training user characteristics and the training recommendation object characteristics into a second submodel, and determining a user transformation vector representation and a second recommendation object vector of a training user by the second submodel; determining cosine similarity between the user conversion vector representation and the second recommendation object vector as a second initial similarity; and performing multiple amplification processing on the second initial similarity to obtain a second similarity, and determining a behavior conversion predicted value according to the second similarity.
In one embodiment of the present disclosure, the loss function determination subunit is configured to: determining a first loss function based on the behavior predicted value, and determining a first weight corresponding to the first loss function; determining a second loss function based on the behavior conversion predicted value, and determining a second weight corresponding to the second loss function; and performing weighted summation processing on the first loss function and the second loss function according to the first weight and the second weight to determine the loss function.
In one embodiment of the disclosure, the seed user vector determination module is configured to: acquiring seed user data corresponding to seed users; the seed user data comprises seed user identification; determining a seed user vector corresponding to a seed user from the candidate user vectors according to the seed user identification; or acquiring a vector conversion model constructed in advance; and carrying out vector transformation processing on the seed user data through a vector transformation model to generate a seed user vector.
In one embodiment of the present disclosure, the similar user determination module includes a similarity determination unit configured to: clustering the seed user vectors to obtain a plurality of clustering centers corresponding to the seed user vectors; determining a clustering center vector corresponding to each clustering center; and calculating the average similarity between the candidate user vector and each cluster center vector to determine the similarity.
In one embodiment of the present disclosure, the similar user determination module includes a similar user determination unit configured to: acquiring a user expansion condition, and determining the number of similar users to be expanded according to the user expansion condition; determining a ranking result of similarities between the plurality of candidate user vectors and the seed user vector; and determining similar users from the candidate users according to the sorting result.
In a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a similar user determination method as described above.
In a fifth aspect of embodiments of the present disclosure, there is provided an electronic device comprising: a processor; and a memory having computer readable instructions stored thereon which, when executed by the processor, implement the similar user determination method as described above.
According to the technical scheme of the embodiment of the disclosure, a candidate user vector comprising a user behavior vector and a behavior transformation vector of a candidate user can be generated according to user data of the candidate user, and a seed user vector of a seed user is obtained; according to the similarity between the seed user vector and the candidate user vector, the similar user corresponding to the seed user can be determined from the candidate users. On one hand, the candidate user vector determined by the candidate user data not only comprises the user behavior preference of the candidate user, but also comprises the behavior conversion preference of the candidate user; the seed users are expanded based on the candidate user vectors comprising the behavior preference and the behavior transformation preference, potential similar users can be determined more accurately, and object recommendation behaviors are carried out on the determined similar users. On the other hand, since the object recommendation depends on the conversion rate and the conversion cost of the recommended object, the conversion cost of the object recommended behavior can be further reduced by taking the behavior conversion preference as a consideration of the recommendation.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
fig. 1 schematically illustrates a schematic block diagram of a system architecture of an exemplary application scenario, in accordance with some embodiments of the present disclosure;
FIG. 2 schematically illustrates a flow diagram of a similar user determination method, in accordance with some embodiments of the present disclosure;
FIG. 3 schematically illustrates a structural schematic of a dual target vector translation model based on user behavior prediction and behavior translation prediction, according to some embodiments of the present disclosure;
FIG. 4 schematically illustrates a structural schematic of a first sub-model in a vector translation model, according to some embodiments of the present disclosure;
FIG. 5 schematically illustrates a structural diagram of a second sub-model in a vector translation model, according to some embodiments of the present disclosure;
FIG. 6 schematically illustrates a schematic diagram of obtaining a user vector representation, according to some embodiments of the present disclosure;
figure 7 schematically illustrates a data flow diagram for determining a user vector based on a DSSM model for user behavior and behavior translation dual targeting, according to some embodiments of the present disclosure;
FIG. 8 schematically illustrates a process diagram for similar user determination based on seed users, in accordance with some embodiments of the present disclosure;
fig. 9 schematically illustrates a schematic block diagram of a similar user determination apparatus, in accordance with some embodiments of the present disclosure;
FIG. 10 schematically shows a schematic view of a storage medium according to an example embodiment of the present disclosure; and
fig. 11 schematically shows a block diagram of an electronic device according to an exemplary embodiment of the invention.
In the drawings, like or corresponding reference characters designate like or corresponding parts.
Detailed Description
The principles and spirit of the present disclosure will be described below with reference to several exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the present disclosure, and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to an embodiment of the present disclosure, a similar user determination method, a similar user determination apparatus, a medium, and an electronic device are provided.
In this context, it should be understood that the related terms, such as Look-like population expansion, may be a process of finding more similar populations with potential relevance by using a certain algorithm evaluation model according to the provided seed users, and the expanded population characteristics and the selected population characteristics are consistent or as close as possible. The seed population may be the portion of the population on which the similar population matches are based. The user vector may be an N-dimensional vector that characterizes a user, each dimension of the vector being a numerical value. For example, the user vector may be learned through a deep learning model. Vector clustering may be a process of clustering based on user vectors, clustering is a process of categorically organizing data members in a data set that are similar in some way, and the clustering technique is often referred to as unsupervised learning. The advertisement exposure may be the number of times a particular advertisement is exposed within the relevant website within a specified time. An advertisement click through amount may refer to the number of times an advertisement on a web site page has been clicked. The advertisement click through rate may be a ratio of the number of times an advertisement on a web site page is clicked and displayed. The advertisement conversion rate may be a rate at which netizens who enter the promoted website by clicking on the advertisement form a conversion. Moreover, any number of elements in the drawings are by way of example and not by way of limitation, and any nomenclature is used solely for differentiation and not by way of limitation.
The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments of the present disclosure.
Summary of The Invention
The Look-align similar population expansion technology depends on user vector representation, the expression capability of a user vector determines the quality of the effect of similar population expansion, the existing user vector pre-training model only performs single-target model learning according to the click behavior of a user, the conversion preference of the user is not learned, the higher the click rate of the user is, the higher the conversion rate is not necessarily, and the recommendation system focuses more on the problems of the conversion rate and the conversion cost of a recommended object.
In one technical scheme, the Look-align similar population expansion based on user vector representation is generally divided into two parts, wherein the first part is based on pre-training model training to obtain user vector representation; the second part is to perform Look-like population expansion based on the user vector representation obtained by pre-training.
When the scheme is adopted for similar crowd expansion, the user vector representation is determined only according to the single target of the clicking behavior of the user, the clicking behavior and the preference of the user can only be learned by adopting the vector learning model of the single target, however, the complete link recommended by the recommended object is as follows: expose- > click- > convert. For a recommendation system, not only is the click rate of a recommendation object improved, but also the conversion rate and the conversion cost of the recommendation object are more concerned, the higher the click rate is, the higher the conversion rate is not necessarily, and the conversion preference of a user is difficult to be drawn through user vector representation learning by using the click behavior of the user.
Based on the above, the basic idea of the present disclosure is to obtain user data of candidate users, and perform vector transformation processing on the user data to generate corresponding user behavior vectors and behavior transformation vectors; generating a candidate user vector corresponding to the candidate user based on the user behavior vector and the behavior conversion vector; determining a seed user vector corresponding to a seed user; and calculating the similarity between the candidate user vector and the seed user vector so as to determine the similar user corresponding to the seed user from the candidate users according to the similarity. When the user vector is determined according to the user data, the user vector can simultaneously learn the behavior preference and the behavior conversion preference of the user, and the similar user is determined based on the dual-target user vector containing the behavior preference and the behavior conversion preference, so that the conversion cost of the recommended behavior can be further reduced.
Having described the general principles of the present disclosure, various non-limiting embodiments of the present disclosure are described in detail below.
Application scene overview
Referring first to fig. 1, fig. 1 shows a schematic block diagram of a system architecture of an exemplary application scenario to which a similar user determination method and apparatus of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include one or more of terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The terminal devices 101, 102, 103 may be various electronic devices having a display screen, including but not limited to desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.
The similar user determination method provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, a similar user determination device is generally disposed in the server 105. However, it is easily understood by those skilled in the art that the similar user determining method provided in the embodiment of the present disclosure may also be executed by the terminal devices 101, 102, and 103, and accordingly, the similar user determining apparatus may also be disposed in the terminal devices 101, 102, and 103, which is not particularly limited in this exemplary embodiment. For example, in an exemplary embodiment, the staff member uploads a pre-constructed knowledge graph and a user question input by a user to the server 105 through the terminal devices 101, 102, and 103, the server determines a candidate user vector according to user data of a candidate user through the similar user determination method provided by the embodiment of the present disclosure, calculates a similarity between the seed user vector and the candidate user vector, determines a similar user corresponding to the seed user from the candidate user according to the calculated similarity, and transmits information of the determined similar user to the terminal devices 101, 102, and 103, and the like, so that the terminal devices 101, 102, and 103 perform object recommendation on the similar user.
It should be understood that the application scenario illustrated in fig. 1 is only one example in which embodiments of the present disclosure may be implemented. The application scope of the embodiments of the present disclosure is not limited in any way by the application scenario.
Exemplary method
A similar user determination method according to an exemplary embodiment of the present disclosure is described below with reference to fig. 2 in conjunction with the application scenario of fig. 1. It should be noted that the above application scenarios are merely illustrated for the convenience of understanding the spirit and principles of the present disclosure, and the embodiments of the present disclosure are not limited in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.
The present disclosure first provides a method for determining a similar user, where an execution subject of the method may be a terminal device or a server, and the present disclosure is not particularly limited to this, and in this example embodiment, the method executed by the server is taken as an example for description.
Referring to fig. 2, the similar user determination method may include the following steps S210 to S240:
step S210, obtaining user data of the candidate user, and performing vector transformation processing on the user data to generate a corresponding user behavior vector and a corresponding behavior transformation vector.
In some example embodiments, the candidate user may be a user who produces a corresponding interaction behavior for the recommended object within a preset time period. For example, the candidate user may be a user who has clicked or purchased a recommended object in the last month for other operation, and the recommended object is an advertisement. The user data may be related data corresponding to the candidate user, and the user data may include user portrait information of the candidate user, behavior data of the candidate user for the recommended object, and behavior conversion data of user behavior generated by the candidate user for the recommended object. For example, the user profile information of the candidate user may include the candidate user's age, gender, city, interest preferences, and the like. The behavior data of the user may include click behaviors of the candidate users on the recommended objects, and the like. The behavior conversion data of the user may be conversion behavior data of browsing, registering, purchasing and the like of the user after clicking on the recommended object. The vector conversion process may be a process of extracting certain user features of candidate users from the user data to generate corresponding user vector representations. The user behavior vector, also called a click preference vector, may be a vector representing the click behavior preference of the user generated after performing behavior vector generation processing according to the user characteristics extracted from the user data of the candidate user. The behavior transformation vector, also called transformation preference vector, may be a vector representing user behavior transformation preference obtained by performing behavior transformation vector generation processing on user features extracted according to user data.
In the recommendation system, user data within a certain preset time period may be acquired, for example, monthly user data of a last month in the recommendation system is acquired as user data of candidate users. After the user data is acquired, vector conversion processing can be performed on the user data. Specifically, vector conversion processing is performed on user behavior data (such as click data) in the user data, so that a click preference vector can be generated; and performing vector conversion processing on the behavior conversion data in the user data to generate a corresponding conversion preference vector.
Step S220, generating a candidate user vector corresponding to the candidate user based on the user behavior vector and the behavior conversion vector.
In some example embodiments, the candidate user vector may be a vector obtained by performing feature extraction on user data of a candidate user and performing vector transformation processing on the extracted candidate user feature. The candidate user vector may be an N-dimensional vector, each dimension of the vector being a numerical value.
According to the User data, a click preference vector (User click embedding) and a conversion preference vector (User conversion embedding) of the candidate User can be obtained respectively. And splicing the generated click preference vector and the conversion preference vector to obtain a candidate User vector (User embedding) corresponding to the candidate User, splicing the click preference vector and the conversion preference vector to serve as the candidate User vector, and determining the similar User more accurately. Specifically, the candidate User vector (User embedding) can be obtained by formula 1:
user embedding ═ concentrate [ User conversion embedding, User click embedding ] (equation 1)
Wherein, the concatee [ x, y ] may represent the concatenation of the vector x and the vector y, and assuming that the vector lengths of the click preference vector and the conversion preference vector of the user are k _ x and k _ y, respectively, the length of the obtained candidate user vector is k _ x + k _ y.
Step S230, determining a seed user vector corresponding to the seed user.
In some example embodiments, the seed user may be the user upon which similar user determinations are made. For example, the seed user may be a historical conversion crowd corresponding to the recommendation object, such as a user group who has generated conversion behaviors of recommendation program downloading, item purchasing, form submission, and the like. The seed user vector may be a user vector representation corresponding to the seed user.
After the predetermined seed user is obtained, a seed user vector corresponding to the seed user can be determined. For example, the vector conversion processing may be performed on user data corresponding to the seed user to generate a seed user vector.
Step S240, calculating a similarity between the candidate user vector and the seed user vector, so as to determine a similar user corresponding to the seed user from the candidate users according to the similarity.
In some example embodiments, the similarity between the user vector and the seed user vector may be a degree of similarity between the two vectors. The similar users can be a user group which is determined from the candidate users and has the vector similarity with the seed user exceeding a certain threshold value.
After the candidate user vectors of the candidate users and the seed user vectors of the seed users are determined, the similarity between the candidate user vectors and the seed user vectors can be calculated, and the similar users corresponding to the seed users are determined from the candidate users according to the calculation results of the multiple similarities.
In the similar user determination method provided in this example embodiment, a candidate user vector including user click preferences and user conversion preferences of a candidate user may be generated according to user data of the candidate user, and a seed user vector of a seed user is obtained; according to the similarity between the seed user vector and the candidate user vector, the similar user corresponding to the seed user can be determined from the candidate users. On one hand, the candidate user vector determined by the candidate user data not only comprises the user behavior preference of the candidate user, but also comprises the behavior conversion preference of the candidate user; the seed users are expanded based on the candidate user vectors comprising the behavior preference and the behavior transformation preference, potential similar users can be determined more accurately, and object recommendation behaviors are carried out on the determined similar users. On the other hand, since the object recommendation depends on the conversion rate and the conversion cost of the recommended object, the conversion cost of the object recommended behavior can be further reduced by taking the behavior conversion preference as a consideration of the recommendation.
Next, steps S210 to S240 of the present exemplary embodiment will be described in more detail.
In one embodiment of the present disclosure, a pre-constructed vector transformation model is obtained; the vector conversion model comprises a first sub-model and a second sub-model; determining candidate user characteristics according to user data of the candidate users; the candidate user features are input to a vector transformation model to determine a user behavior vector through a first sub-model and a behavior transformation vector through a second sub-model.
The vector conversion model may be a model used for performing vector conversion processing on user data. The first sub-model may be a sub-model in the vector translation model that performs the user behavior vector generation processing task. The second sub-model may be a sub-model in the vector conversion model that performs the behavior conversion vector generation processing task. The candidate user features may be user features corresponding to the candidate users, and the candidate user features may be features corresponding to user information, behavior data, and behavior conversion data of the candidate users, respectively.
Referring to fig. 3, fig. 3 schematically illustrates a structural schematic diagram of a dual target vector translation model based on user behavior prediction and behavior translation prediction, according to some embodiments of the present disclosure. The vector conversion model 300 in fig. 3 may include a first sub-model 310 and a second sub-model 320, and the vector conversion model may be a pre-constructed and trained model. After the user data of the candidate user is obtained, the candidate user characteristics of the candidate user can be determined according to the user data. The candidate user features may include user continuous features and user discrete features; the user continuous characteristics can include click rate, exposure rate click rate, exposure amount, conversion rate and the like corresponding to the candidate users; the user discrete characteristics may include the age, gender, city, interest preferences, etc. of the candidate user.
After the candidate user features are determined, the candidate user features may be input to a vector conversion model, a user behavior vector of the candidate user is determined by a first sub-model of the vector conversion model, and a behavior conversion vector corresponding to the candidate user is determined by a second sub-model.
In one embodiment of the present disclosure, the first sub-model and the second sub-model of the vector transformation model are both DSSM dual-tower models, and are obtained by training through the following steps: acquiring a training sample set and acquiring an initial double-tower model so as to input the training sample set into the initial double-tower model; the training sample set comprises training user characteristics and training recommendation object characteristics corresponding to training users; determining a behavior predicted value corresponding to a training user through a first sub-model based on the training user characteristics and the training recommendation object characteristics; determining a behavior conversion predicted value corresponding to the training user through a second submodel based on the training user characteristics and the training recommendation object characteristics; determining a loss function of the initial double-tower model based on the behavior predicted value and the behavior conversion predicted value; and synchronously updating the model parameters of the first sub-model and the model parameters of the second sub-model through the loss function to obtain the vector conversion model. In the vector transformation model training process, training data input by the first sub-model and the second sub-model are the same, input is User side features and recommended object side features, and the User side features and the recommended object side features are not crossed before a matching layer, so that a User vector (User embedding) and a recommended object vector (Ad embedding) can be obtained respectively.
The user-side features are user features extracted from user data, including user portrait data, user behavior data, and behavior transformation data. The recommendation object feature may be a related feature of an exposed recommendation object (e.g., an advertisement). The recommended object feature may include a feature extracted according to the self attribute information of the recommended object, and the self attribute information of the recommended object may be an industry to which the recommended object belongs, a category to which the recommended object belongs, a price interval, an advertiser, an advertisement form, an advertisement plan identifier (Identity document, id), an advertiser first-level industry, an advertiser second-level industry, an advertisement historical click rate, an advertiser historical click rate, and the like. The recommended object features may include continuous features and discrete features of the recommended object; the object continuous characteristics can include click rate, exposure rate, click rate, exposure amount, conversion rate and the like corresponding to the recommended object; the object discrete characteristics may include the industry to which the recommended object belongs, the category to which the recommended object belongs, and the like.
The training sample set may be a sample set used for training a vector transformation model, and the training sample set includes a sample set formed by user data and recommendation object data corresponding to a training user. The initial double-tower Model may be a pre-constructed double-tower Model used for training a Deep learning computation Semantic similarity (DSSM) Model of the double targets. The training user features may be user features corresponding to training users, including training user continuous features and training user discrete features. The training recommended object features may be relevant features of the exposed recommended object, including recommended object continuous features and recommended object discrete features.
Before training the vector transformation model, a training sample set and an initial double-tower model may be obtained, the training sample set is input into the initial double-tower model, and the initial double-tower model is trained to obtain the vector transformation model. Specifically, training user characteristics and training recommendation object characteristics corresponding to the training users are input into the initial double-tower model, and a prediction task of a behavior prediction value is executed by a first sub-model of the initial double-tower model. And inputting the user continuous characteristic, the user discrete characteristic, the recommended object continuous characteristic and the recommended object discrete characteristic into a first sub-model of the initial double-tower model so as to carry out a prediction task of a behavior prediction value through the first sub-model. For example, the behavior predicted value may be a Click Through Rate of the recommended exposure object, the recommended exposure object may be an advertisement, and the behavior predicted value may be an advertisement Click Through Rate predicted value (pCTR).
Meanwhile, a prediction task of behavior conversion prediction values may be performed by a second sub-model of the initial double tower model. And inputting the user continuous characteristic, the user discrete characteristic, the recommended object continuous characteristic and the recommended object discrete characteristic into a second sub-model of the initial double-tower model, and performing a prediction task of behavior conversion predicted values through the second sub-model. For example, the behavior Conversion predicted value may be a click Conversion Rate of the recommended exposure object, the recommended exposure object may be an advertisement, and the behavior Conversion predicted value may be a click Conversion Rate predicted value (pCVR) of the advertisement.
With reference to fig. 3, after the first sub-model and the second sub-model output the behavior predicted value and the behavior transformation predicted value respectively, the obtained output results may be pushed to the matching layer for weighted calculation to obtain a target predicted value, and a loss function of the initial double-tower model is determined according to the target predicted value. And continuously adjusting the model parameters of the first sub-model and the second sub-model according to the determined loss function until the loss function is converged, and taking the corresponding model when the loss function is converged as a vector conversion model.
Since the vector conversion model is introduced in the present disclosure to obtain the vector representation of the user, the vector conversion model selects a double-tower structure, that is, the features of the user side and the features of the recommendation object side do not intersect at all from the feature input to the generation of the user vector and the recommendation object vector (i.e., the advertisement vector). However, the training target of the existing multi-target model is to predict a plurality of targets, so that in the scene of predicting the advertisement click rate of the user, the independence between the user characteristics and the advertisement characteristics cannot be ensured, and the corresponding user vector cannot be obtained by inputting the user characteristics. According to the method, the independence between the user vector and the advertisement vector is guaranteed through the structure of the double towers, and therefore after the training of the vector transformation model is completed, the transformation preference vector of the user and the click preference vector of the user can be obtained through the vector transformation model respectively. The candidate user vector is obtained by splicing the two, so that similar user determination in an advertisement scene can be better served, potential users can be found, and the conversion cost of recommendation behaviors is further reduced.
It should be noted that the model structure used for determining the user vector representation is a trained vector transformation model. After the vector conversion model converges, the candidate user characteristics may be input to the vector conversion model, the click preference vector of the candidate user is output through a first representation layer of a first sub-model in the vector conversion model, and the conversion preference vector of the candidate user is output through a second representation layer of a second sub-model in the vector conversion model, so that the click preference vector and the conversion preference vector are subjected to stitching processing, and a candidate user vector corresponding to the candidate user is generated.
In one embodiment of the disclosure, user behavior data of an initial training user is obtained, and an initial user characteristic and an initial recommendation object characteristic corresponding to the initial training user are determined according to the user behavior data; the initial user characteristics comprise initial user portrait characteristics and initial user behavior characteristics; generating an initial training sample set according to the initial user portrait characteristics, the initial user behavior characteristics and the initial recommended object characteristics; the initial training sample set comprises a first sample subset and a second sample subset; carrying out sample collection processing on the initial training sample set so that the quantity proportion between the first sample subset and the second sample subset is in a preset value interval; and taking the initial training sample set subjected to the sample collection processing as a training sample set.
Wherein, the initial training user may be an active user in the recommendation system within a certain preset time period. The user behavior data of the initial training user may be data generated by operation behaviors performed by the initial training user on the recommended object in the recommendation system, and may include behavior data of browsing, clicking, submitting a form, and the like on the recommended object, for example. The initial user features may be user features corresponding to the initial training user, and the initial user features include initial user portrait features and initial user behavior features. The initial user profile features may be the basic attributes and interest preferences of the initial training user, and the initial user profile features may be pre-stored in a database. The initial user behavior feature may be a feature corresponding to an operation behavior of an initial training user for an exposure recommendation object (i.e., an advertisement) within a preset time period, and may specifically be a statistical value of behavior data. For example, the initial user behavior feature may be all advertisement exposure click behaviors, conversion behaviors, specifically, statistics of exposure click, statistics of conversion behaviors, and the like of the user for different advertisements in the past 1, 3, 7, 15 days. The initial recommended object feature may be a recommended object feature that is exposed within some preset time period.
The initial training sample set may be a sample set consisting of the relevant features of the initial training user and the initial training recommendation features. The initial training sample set may include a first subset of samples and a second subset of samples. The first sample subset may be a subset of positive samples in the initial training sample set; wherein a positive sample may be a sample of user data that both clicks on an advertisement and has a conversion action (e.g., purchase of the corresponding item). The second subset of samples may be a subset of negative samples in the initial set of training samples; wherein negative examples may include examples where no click behavior was generated for the advertisement and may also include examples where click behavior was generated for the advertisement but no conversion behavior was generated (e.g., no object in the advertisement was purchased). The sample collection process may be a process of removing a portion of the samples in the initial training sample set. For example, in an advertisement scene, the number of negative samples in the initial training sample set is large, and in order to achieve a better effect of model training, the negative samples in the initial training sample set can be subjected to down-sampling processing, so that the positive and negative samples are kept in a fixed proportion to better meet the requirement of model training. The training sample set may be a sample set obtained by performing sample acquisition processing on the initial training sample set.
The initial training sample set is obtained, for example, a recommended object generated by exposure, click and conversion in the latest month in the recommendation system can be obtained as a training set, a corresponding user ID is obtained according to the recommended object in the training set, and portrait data of a user and behavior data of the user are obtained according to the user ID. And constructing initial training data according to the user portrait data, the user behavior data and the portrait data of the recommended object. According to the user portrait data and the user behavior data, initial user characteristics corresponding to an initial training user can be determined, and initial recommended object characteristics can be determined according to portrait data of a recommended object, so that after an initial training set is determined, corresponding initial user characteristics and initial recommended object characteristics can be added to generate an initial training sample set.
For the generated initial training sample set, in the advertisement scenario, the ratio of positive and negative samples of the initial training sample set is generally 1: about 500, at this ratio, the Area enclosed by the coordinate Axes (AUC) value Under the receiver operating characteristic Curve (ROC) of the model test set is usually 0.5, which indicates that the model training can not converge. In order to achieve a good effect of model training, sample collection processing can be performed on the initial training sample set to adjust the proportion of positive and negative samples. For example, negative sample downsampling processing may be performed on the initial training sample set, and the positive and negative sample ratios are adjusted to be 1: 7, after negative sample downsampling is carried out on the initial training sample set, the AUC value of the model test set is about 0.68, and the experimental result shows that after the negative sample downsampling is carried out, useful information can be learned by model training. Therefore, the initial training sample set subjected to sample collection processing can be used as a training sample set to train the initial double-tower model to obtain the vector transformation model.
In one embodiment of the disclosure, training user characteristics and training recommendation object characteristics are input into a first sub-model, and a user behavior vector representation and a first recommendation object vector of a training user are determined by the first sub-model; determining cosine similarity between the user behavior vector representation and the first recommended object vector as first initial similarity; and performing multiple amplification processing on the first initial similarity to obtain a first similarity, and determining a behavior predicted value according to the first similarity.
The User behavior vector representation of the training User can be a vector obtained by inputting the training User characteristics to the first submodel and performing vector generation processing on the training User characteristics according to the model structure and the model parameters of the first submodel, namely a click preference vector of the User, and the User behavior vector representation of the training User can be represented by User click embedding. The first recommended object vector may be a vector obtained by inputting the training recommended object features into the first submodel and performing vector generation processing on the training recommended object features according to the model structure and the model parameters of the first submodel. The first recommendation object vector may be represented by Ad click embedding. The cosine similarity may be a calculated value used to measure the similarity between the user behavior vector representation and the first recommendation object vector. The first initial similarity may be a cosine similarity between the user behavior vector representation and the first recommendation object vector. The magnification processing may be a process of performing numerical magnification processing on the first initial similarity. The first similarity may be a similarity value obtained by performing a multiple amplification process on the first initial similarity.
The training user features also comprise user continuous features and user discrete features of the training users, and the training recommendation object features comprise recommendation object continuous features and recommendation object discrete features. Referring to fig. 4, fig. 4 schematically illustrates a structural schematic diagram of a first sub-model in a vector transformation model according to some embodiments of the present disclosure. Inputting the user continuous characteristic, the user discrete characteristic, the recommended object continuous characteristic and the recommended object discrete characteristic into an initial double-tower model through a first Input layer of a first sub-model, inputting the characteristics into a Neural network Input (NN Input) through a first presentation layer, performing vector presentation processing through three hidden layers, and respectively determining a user behavior vector presentation and a first recommended object vector through the first presentation layer. Then, a behavior prediction value is determined from the user behavior vector representation and the first recommendation object vector. For example, the behavior predicted value may be a Click Rate of the exposure of the recommended object, the recommended object may be an advertisement, and the behavior predicted value may be a Click Rate predicted value (pCTR) of the advertisement.
Specifically, cosine similarity between the user behavior vector representation and the first recommendation object vector may be calculated and determined as a first initial similarity, and after the first initial similarity is determined, the first initial similarity may be subjected to multiple amplification processing, and the first initial similarity may be amplified by k times, where a value of k may be 3, 5, 7, and the like, so as to obtain the first similarity. And performing function calculation processing on the obtained first similarity by using a sigmoid function of an activation function to obtain pCTR (pCTR), namely the pCTR can be calculated according to a formula 2:
pCTR is a sigmoid (sine < User click embedding >) (formula 2)
Where cosine < x, y > may represent the cosine similarity between vector x and vector y.
In one embodiment of the disclosure, inputting the training user characteristics and the training recommendation object characteristics into a second submodel, and determining a user conversion vector representation and a second recommendation object vector of a training user by the second submodel; determining cosine similarity between the user conversion vector representation and the second recommendation object vector as a second initial similarity; and performing multiple amplification processing on the second initial similarity to obtain a second similarity, and determining a behavior conversion predicted value according to the second similarity.
The user transformation vector representation of the training user can be a vector obtained by inputting the characteristics of the training user into the second submodel and performing vector generation processing on the characteristics of the training user according to the model structure and the model parameters of the second submodel. The User conversion vector representation can be represented by User conversion embedding, namely a conversion preference vector for training a User. The second recommended object vector may be a vector obtained by inputting the training recommended object features into the second submodel and performing vector generation processing on the training recommended object features according to the model structure and the model parameters of the second submodel. The second recommendation object vector may be represented by Ad conversion embedding. The second initial similarity may be a cosine similarity between the user translation vector representation and the second recommendation object vector. The second similarity may be a similarity value obtained by performing a multiple amplification process on the second initial similarity.
Referring to fig. 5, fig. 5 schematically illustrates a structural schematic diagram of a second sub-model in a vector transformation model according to some embodiments of the present disclosure. Inputting the user continuous characteristic, the user discrete characteristic, the recommended object continuous characteristic and the recommended object discrete characteristic into the initial double-tower model through a second input layer of the second sub-model, performing vector representation processing on the characteristics through a neural network input layer and three hidden layers by a second representation layer, and determining user conversion vector representation and a second recommended object vector by the second representation layer respectively. And determining a behavior conversion predicted value according to the user conversion vector representation and the second recommended object vector. The behavior Conversion prediction value may be a click Conversion Rate of the recommendation target, and since the recommendation target may be an advertisement, the behavior Conversion prediction value may be a click Conversion Rate prediction value (pCVR) of the advertisement.
After the user conversion vector representation and the second recommendation object vector are determined, the cosine similarity between the user conversion vector representation and the second recommendation object vector can be determined as a second initial similarity, after the second initial similarity is determined, the first initial similarity can be subjected to multiple amplification, and the second initial similarity is subjected to amplification by the same multiple as the first initial similarity to obtain the second similarity. And performing function calculation processing on the obtained second similarity by using an activation function sigmoid function to obtain pCVR, namely the pCVR can be calculated according to a formula 3:
pCVR ═ sigmoid (sine < User conversion > k) ("equation 3)
By amplifying cosine similarity between vectors, the vector distance between positive and negative samples can be larger, so that the user vector obtained by training has stronger distance attribute and is beneficial to model convergence.
In one embodiment of the disclosure, a first loss function is determined based on the behavior prediction value, and a first weight corresponding to the first loss function is determined; determining a second loss function based on the behavior conversion predicted value, and determining a second weight corresponding to the second loss function; and performing weighted summation processing on the first loss function and the second loss function according to the first weight and the second weight to determine the loss function.
Wherein the first loss function may be a loss function corresponding to the pCTR prediction task. The first weight may be a calculated weight corresponding to the first loss function. The second loss function may be a loss function corresponding to the pCVR prediction task. The second weight may be a calculated weight corresponding to the second loss function.
The overall loss function of the model may be a weighted sum of the loss functions corresponding to the pCTR prediction task and the pCVR prediction task, and the loss functions of the two tasks are cross-entropy loss functions, and the specific calculation manner is shown in formula 4.
Loss ═ w _ ctr ═ los (y _ i, pCTR) + w _ cvr ═ los (y _ i & z _ i, pCTR ═ pCVR) (formula 4)
Wherein loss (y _ i, pCTR) may represent a loss function of the pCTR task, i.e., a first loss function; loss (y _ i & z _ i, pCTR × pCVR) may represent a loss function of the pCTR task, i.e. the second loss function; y _ i may represent whether to click; z _ i can indicate whether to convert; w _ ctr may represent a calculated weight of a loss function of the pCTR task, i.e., a first weight; w _ cvr may represent a calculated weight, i.e., a second weight, of the loss function of the pCVR task.
After the trained vector transformation model is obtained through the above steps, the user data of the candidate user can be input into the vector transformation model, and the intermediate output result of the vector transformation model is obtained and used as the candidate user vector of the candidate user. Referring to fig. 6, fig. 6 schematically illustrates a schematic diagram of obtaining a user vector representation according to some embodiments of the present disclosure. Because the vector conversion model is a double-tower structure, when the characteristics are input to the vector generation, the characteristics of the user side and the characteristics of the recommended object side do not intersect at all, the characteristics of the candidate user are input to the vector conversion model, the characteristics of the candidate user are transmitted to the first representation layer through the first input layer, and the neural network input layer and the three hidden layers of the first representation layer are used for carrying out vector conversion processing to obtain an intermediate output result, namely a user behavior vector; and meanwhile, the candidate user characteristics are transmitted to a second representation layer through a second input layer, vector conversion processing is carried out on the neural network input layer and the three hidden layers of the second representation layer, and behavior conversion vectors are output.
For a specific process of obtaining a candidate user vector corresponding to a candidate user by using a trained vector transformation model, reference may be made to fig. 7, where fig. 7 schematically illustrates a data flow diagram of determining a user vector based on a DSSM model of user behavior and behavior transformation dual targets according to some embodiments of the present disclosure. In step S710, an exposure click conversion sample within a preset time period is acquired as an initial sample set. In step S720, after the initial sample set is obtained, sample cleaning and positive and negative sample sampling processing are performed on the initial sample set to obtain a training sample set. In step S730, the training sample set is input to the initial double-tower model, and model training is performed on the initial double-tower model to obtain a vector transformation model. In step S740, the candidate user data 710 corresponding to the candidate user is input to the vector transformation model, so as to obtain a candidate user vector of the candidate user through the vector transformation model.
In an embodiment of the present disclosure, in some application scenarios, the vector transformation model may be retrained every preset time period (for example, 30 days), after the vector transformation model is retrained each time, candidate user data, that is, user data corresponding to a monthly user, is input into the vector transformation model obtained after retraining, a presentation layer of the vector transformation model outputs a user behavior vector and a behavior transformation vector corresponding to the candidate user, and a user vector is obtained by splicing according to the user behavior vector and the behavior transformation vector. By adopting a mode of periodically updating the vector conversion model, the latest user vector corresponding to the candidate user can be obtained.
In one embodiment of the present disclosure, seed user data corresponding to a seed user is obtained; the seed user data includes seed user identification; determining a seed user vector corresponding to the seed user from the candidate user vectors according to the seed user identification; or acquiring a vector conversion model constructed in advance; and carrying out vector conversion processing on the seed user data through a vector conversion model to generate a seed user vector.
Wherein the seed user may be a portion of users that are predetermined for making similar user determinations. The seed user data may be user data corresponding to the seed user, and the seed user data may include data of basic attributes of the seed user and behavior data of the seed user acting on the recommended object. The seed user identifier may be a unique identifier corresponding to a seed user, and the seed user identifier may uniquely determine a seed user. The seed user vector may be a vector obtained by performing feature extraction on seed user data and performing vector transformation on the extracted seed user features, and the seed user vector may also be an N-dimensional vector.
For each candidate user, a unique user identifier capable of distinguishing the identity of the candidate user can be added to each candidate user. For the candidate users, the candidate user data of all the candidate users may be input to the vector conversion model to obtain candidate user vectors corresponding to the candidate users, and each candidate user identifier may correspond to one candidate user vector. Because the seed user can be a user subset of the candidate user, when the seed user data is obtained, the seed user vector corresponding to the seed user can be searched from the candidate user identifier according to the seed user identifier in the seed user data, and the seed user vector corresponding to the seed user identifier is obtained from the candidate user vector, so that the seed user vectors of all seed users can be obtained.
In addition, after the seed user data is obtained, the seed user data may be input into a vector conversion model, the vector conversion model performs vector conversion processing on the seed user data, a seed user behavior vector and a seed user behavior conversion vector corresponding to a seed user are respectively output by a first sub-model and a second sub-model of the vector conversion model, and the seed user behavior vector and the seed user behavior conversion vector are spliced to obtain the seed user vector corresponding to the seed user. The specific process of determining the seed user behavior vector and the seed user behavior transformation vector through the vector transformation model is the same as the process of determining the user behavior vector and the behavior transformation vector of the candidate user, and details are not repeated in the disclosure.
Referring to fig. 8, fig. 8 schematically illustrates a process diagram for similar user determination based on seed users, according to some embodiments of the present disclosure. In step S810, a candidate user vector of the candidate user is acquired. In step S820, a seed user is determined. In step S830, a seed user vector of the seed user is obtained, for example, the seed user vector may be determined from the candidate user vectors according to the seed user identifier. In step S840, the seed user vectors are clustered to obtain a plurality of clustering centers. In step S850, the center weight of each cluster center is determined. In step S860, predefined user expansion conditions 810 are obtained, and according to the average similarity between the candidate user vector and each cluster center vector, the similarity between the candidate user vector and the seed user vector is determined according to the average similarity, and according to the user expansion conditions 810 and the similarity ranking result, a similar user is determined from the candidate users.
In one embodiment of the present disclosure, a seed user vector is clustered to obtain a plurality of clustering centers corresponding to the seed user vector; determining a clustering center vector corresponding to each clustering center; and calculating the average similarity between the candidate user vector and each cluster center vector to determine the similarity.
The cluster center can be the center of a plurality of different cluster clusters obtained after the seed user vectors are clustered, and the seed users in each cluster have stronger similarity. The cluster center vector may be a vector representation corresponding to the cluster center, and the cluster center vector may represent the relevant features of the cluster center. The average similarity may be an average of similarities between the plurality of cluster center vectors and the candidate user vectors, respectively. The similarity may be a similarity between the candidate user vector and the seed user vector.
A plurality of seed users can form a seed crowd bag, and the seed crowd bag can generally have thousands to tens of thousands of seed users. All seed users are represented by K cluster center vectors in a clustering mode, and the similarity degree of the candidate users and the seed user packages is represented by measuring the similarity degree of the K cluster centers of the candidate users and the seed users, so that after the seed user vectors of the seed users are obtained, the seed user vectors can be clustered, the K cluster center vectors are obtained and used for representing the seed groups, and for example, the value of K can be 10, 20 and the like. The method comprises the following specific steps:
(1) acquiring seed user data, and determining a seed user vector of a seed user through a vector conversion model; or searching a seed user vector from the candidate user vectors;
(2) clustering seed user vectors of seed users by using a K-means algorithm, wherein K clustering center vectors are used for representing the seed users;
(3) calculating cosine similarity of the candidate user vector and the clustering center vectors of the K seed users, and taking the average value of the K cosine similarity as the similarity between the candidate user and the seed users, wherein a specific calculation formula is shown as a formula 5.
Figure BDA0002999554340000231
Wherein, score user_i May represent the similarity between the candidate user i and the seed user; imbedding user_i An n-dimensional vector representation that may represent candidate user i; looklailke j An n-dimensional vector representation that can represent the cluster center j; cosine _ sim<x,y>The cosine similarity between the vector x and the vector y can be represented.
In one embodiment of the disclosure, a user expansion condition is obtained, and the number of similar users to be expanded is determined according to the user expansion condition; determining a ranking result of similarities between the plurality of candidate user vectors and the seed user vector; and determining similar users from the candidate users according to the sorting result.
The user expansion amount condition may be a user determination condition according to which the seed user determines a similar user from the candidate users. For example, the user expansion amount condition may include a condition such as an approximate number of similar users to be determined at this time. The number of similar users may be a specific number of similar users to be determined at this time, and the number of similar users may be represented by n. The ranking result may be a ranking result of the similarity between all candidate users and the seed user, for example, the calculated similarity may be ranked from high to low. The similar users may be users of the candidate users who have some similar characteristics to the seed user.
After the user expansion condition is obtained, the number n of similar users can be determined according to the user expansion condition, and the candidate users with the n top ranked similarity scores are selected from the similarity ranking results as similar users of the seed user, namely as similar crowd expansion results of Look-like, so that object recommendation can be performed on the similar users in the following process.
Exemplary model
In a second aspect of embodiments of the present disclosure, there is provided a vector conversion model, comprising: the first submodel is used for determining a behavior predicted value corresponding to the training user according to the training user characteristics and the training recommendation object characteristics of the training user; the second submodel is used for determining a behavior conversion predicted value corresponding to the training user according to the training user characteristic and the training recommendation object characteristic; the second submodel is independent from the first submodel; the matching layer is used for carrying out weighted summation processing on the behavior predicted value and the behavior conversion predicted value to obtain a model output value of the vector conversion model; and reversely and synchronously updating the model parameters of the first sub-model and the model parameters of the second sub-model according to the model output values.
In one embodiment of the present disclosure, the first submodel includes: the first input layer is used for receiving the training user characteristics and the training recommendation object characteristics; the first representation layer is used for carrying out first conversion processing on the training user characteristics and the training recommendation object characteristics to obtain user behavior vector representations and first recommendation object vectors corresponding to training users; and determining a first similarity between the user behavior vector representation and the first recommendation object vector, so that the matching layer determines a behavior prediction value according to the first similarity.
In one embodiment of the present disclosure, the second submodel includes: the second input layer is used for receiving the training user characteristics and the training recommendation object characteristics; the second representation layer is used for carrying out second conversion processing on the training user characteristics and the training recommendation object characteristics to obtain user conversion vector representations and second recommendation object vectors corresponding to the training users; and determining a second similarity between the user conversion vector representation and the second recommended object vector so that the matching layer determines the behavior conversion predicted value according to the second similarity.
Exemplary devices
Having described the method of the exemplary embodiment of the present disclosure, next, a similar user determination device of the exemplary embodiment of the present disclosure will be explained with reference to fig. 9.
In fig. 9, the similar user determining apparatus 900 may include a vector converting module 910, a vector generating module 920, a seed user vector determining module, and a similar user determining module 940. Wherein:
the vector conversion module 910 is configured to obtain user data of a candidate user, and perform vector conversion processing on the user data to generate a corresponding user behavior vector and a corresponding behavior conversion vector; a vector generating module 920, configured to generate a candidate user vector corresponding to the candidate user based on the user behavior vector and the behavior transformation vector; a seed user vector determining module 930, configured to determine a seed user vector corresponding to a seed user; a similar user determining module 940, configured to calculate similarity between the candidate user vector and the seed user vector, so as to determine a similar user corresponding to the seed user from the candidate users according to the similarity.
In one embodiment of the present disclosure, the vector conversion module comprises a vector conversion unit configured to: acquiring a pre-constructed vector conversion model; the vector conversion model comprises a first sub-model and a second sub-model; determining candidate user characteristics of the candidate users according to the user data of the candidate users; the candidate user features are input to a vector transformation model to determine a user behavior vector through a first sub-model and a behavior transformation vector through a second sub-model.
In one embodiment of the present disclosure, the vector conversion module further includes a model training unit, the model training unit including: the data input subunit is used for acquiring a training sample set and acquiring an initial double-tower model so as to input the training sample set to the initial double-tower model; the training sample set comprises training user characteristics and training recommendation object characteristics corresponding to training users; the first training subunit is used for determining a behavior predicted value corresponding to a training user through a first submodel based on the training user characteristics and the training recommendation object characteristics; the second training subunit is used for determining a behavior conversion predicted value corresponding to the training user through a second submodel based on the training user characteristic and the training recommendation object characteristic; the loss function determining subunit is used for determining a loss function of the initial double-tower model based on the behavior predicted value and the behavior conversion predicted value; and the model training subunit is used for synchronously updating the model parameters of the first sub-model and the model parameters of the second sub-model through the loss function so as to obtain the vector conversion model.
In one embodiment of the present disclosure, the vector conversion module further comprises a sample set determination unit configured to: acquiring user behavior data of an initial training user, and determining initial user characteristics and initial recommendation object characteristics corresponding to the initial training user according to the user behavior data; the initial user features comprise initial user portrait features and initial user behavior features; generating an initial training sample set according to the initial user portrait characteristics, the initial user behavior characteristics and the initial recommended object characteristics; the initial training sample set comprises a first sample subset and a second sample subset; carrying out sample collection processing on the initial training sample set so that the quantity proportion between the first sample subset and the second sample subset is in a preset value interval; and taking the initial training sample set subjected to sample collection processing as a training sample set.
In one embodiment of the disclosure, the first training subunit is configured to: inputting the training user characteristics and the training recommendation object characteristics into a first sub-model, and determining the user behavior vector representation and the first recommendation object vector of the training user by the first sub-model; determining cosine similarity between the user behavior vector representation and the first recommended object vector as first initial similarity; and performing multiple amplification processing on the first initial similarity to obtain a first similarity, and determining a behavior predicted value according to the first similarity.
In one embodiment of the disclosure, the second training subunit is configured to: inputting the training user characteristics and the training recommendation object characteristics into a second submodel, and determining a user transformation vector representation and a second recommendation object vector of a training user by the second submodel; determining cosine similarity between the user conversion vector representation and the second recommendation object vector as a second initial similarity; and performing multiple amplification processing on the second initial similarity to obtain a second similarity, and determining a behavior conversion predicted value according to the second similarity.
In one embodiment of the present disclosure, the loss function determination subunit is configured to: determining a first loss function based on the behavior predicted value, and determining a first weight corresponding to the first loss function; determining a second loss function based on the behavior conversion predicted value, and determining a second weight corresponding to the second loss function; and performing weighted summation processing on the first loss function and the second loss function according to the first weight and the second weight to determine the loss function.
In one embodiment of the disclosure, the seed user vector determination module is configured to: acquiring seed user data corresponding to seed users; the seed user data comprises seed user identification; determining a seed user vector corresponding to the seed user from the candidate user vectors according to the seed user identification; or acquiring a vector conversion model constructed in advance; and carrying out vector transformation processing on the seed user data through a vector transformation model to generate a seed user vector.
In one embodiment of the present disclosure, the similar user determination module includes a similarity determination unit configured to: clustering the seed user vectors to obtain a plurality of clustering centers corresponding to the seed user vectors; determining a clustering center vector corresponding to each clustering center; and calculating the average similarity between the candidate user vector and each cluster center vector to determine the similarity.
In one embodiment of the disclosure, the similar user determination module comprises a similar user determination unit configured to: acquiring a user expansion condition, and determining the number of similar users to be expanded according to the user expansion condition; determining a ranking result of similarities between the plurality of candidate user vectors and the seed user vector; and determining similar users from the candidate users according to the sorting result.
Since each functional module of the similar user determining apparatus in the exemplary embodiment of the present disclosure corresponds to the step of the exemplary embodiment of the similar user determining method, for details that are not disclosed in the embodiment of the apparatus of the present disclosure, please refer to the embodiment of the similar user determining method described above in the present disclosure, and details are not repeated here.
It should be noted that although in the above detailed description several modules or units of similar user determination means are mentioned, this division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
In a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a similar user determination method as described above in the first aspect.
Exemplary Medium
Having described the apparatuses of the exemplary embodiments of the present disclosure, next, a storage medium of an exemplary embodiment of the present disclosure will be described with reference to fig. 10.
In some embodiments, aspects of the present disclosure may also be implemented as a medium having program code stored thereon, which when executed by a processor of a device, is used to implement steps in a similar user determination method according to various exemplary embodiments of the present disclosure described in the "exemplary methods" section above of this specification.
For example, when the processor of the device executes the program code, step S210 shown in fig. 2 may be implemented to obtain user data of a candidate user, and perform vector conversion processing on the user data to generate a corresponding user behavior vector and a corresponding behavior conversion vector; step S220, generating a candidate user vector corresponding to the candidate user based on the user behavior vector and the behavior conversion vector; step S230, determining a seed user vector corresponding to a seed user; step S240, calculating a similarity between the candidate user vector and the seed user vector, so as to determine a similar user corresponding to the seed user from the candidate users according to the similarity.
Referring to fig. 10, a program product 1000 for implementing the above-described similar user determination method or implementing the above-described similar user determination method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. The readable signal medium may also be any readable medium other than a readable storage medium.
Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN).
Exemplary computing device
Having described the similar user determination method, the vector conversion model, the similar user determination apparatus, and the storage medium of the exemplary embodiment of the present disclosure, next, an electronic device of the exemplary embodiment of the present disclosure will be explained with reference to fig. 11.
As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
In some possible embodiments, an electronic device according to the present disclosure may include at least one processing unit, and at least one memory unit. Wherein the storage unit stores program code that, when executed by the processing unit, causes the processing unit to perform the steps of the similar user determination method according to various exemplary embodiments of the present disclosure described in the above section "exemplary method" of the present specification. For example, the processing unit may execute step S210 shown in fig. 2, acquire user data of a candidate user, and perform vector conversion processing on the user data to generate a corresponding user behavior vector and a behavior conversion vector; step S220, generating a candidate user vector corresponding to the candidate user based on the user behavior vector and the behavior conversion vector; step S230, determining a seed user vector corresponding to a seed user; step S240, calculating a similarity between the candidate user vector and the seed user vector, so as to determine a similar user corresponding to the seed user from the candidate users according to the similarity.
An electronic device 1100 according to an example embodiment of the disclosure is described below with reference to fig. 11. The electronic device 1100 shown in fig. 11 is only an example and should not bring any limitations to the functionality and scope of use of the embodiments of the present disclosure.
As shown in fig. 11, the electronic device 1100 is in the form of a general purpose computing device. The components of the electronic device 1100 may include, but are not limited to: the at least one processing unit 1101, the at least one storage unit 1102, a bus 1103 connecting different system components (including the storage unit 1102 and the processing unit 1101), and a display unit 1107.
Bus 1103 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.
The storage unit 1102 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)1121 and/or cache memory 1122, and may further include Read Only Memory (ROM) 1123.
The storage unit 1102 may also include a program/utility 1125 having a set (at least one) of program modules 1124, such program modules 1124 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which or some combination thereof may comprise an implementation of a network environment.
The electronic device 1100 may also communicate with one or more external devices 1104 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1100, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 1100 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 1105. Also, the electronic device 1100 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 1106. As shown, the network adapter 1106 communicates with other modules of the electronic device 1100 over the bus 1103. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1100, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of similar user determination devices are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Further, while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that the present disclosure is not limited to the particular embodiments disclosed, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (17)

1. A method for similar user determination, comprising:
acquiring user data of candidate users, and performing vector conversion processing on the user data to generate corresponding user behavior vectors and behavior conversion vectors; the user data comprises behavior data and behavior conversion data;
generating a candidate user vector corresponding to the candidate user based on the user behavior vector and the behavior conversion vector;
determining a seed user vector corresponding to a seed user;
calculating the similarity between the candidate user vector and the seed user vector so as to determine a similar user corresponding to the seed user from the candidate users according to the similarity;
the vector conversion processing of the user data to generate corresponding user behavior vectors and behavior conversion vectors includes:
acquiring a pre-constructed vector conversion model; the vector conversion model comprises a first sub-model and a second sub-model;
determining candidate user characteristics of the candidate user according to the user data of the candidate user;
inputting the candidate user features into the vector conversion model to determine the user behavior vector by the first sub-model and the behavior conversion vector by the second sub-model;
wherein the vector transformation model is obtained by training through the following steps:
acquiring a training sample set and acquiring an initial double-tower model so as to input the training sample set to the initial double-tower model; the training sample set comprises training user characteristics and training recommendation object characteristics corresponding to training users;
based on the training user characteristics and the training recommendation object characteristics, determining behavior predicted values corresponding to the training users through the first submodel;
determining a behavior conversion predicted value corresponding to the training user through the second submodel based on the training user characteristics and the training recommendation object characteristics;
determining a loss function of the initial double-tower model based on the behavior prediction value and the behavior conversion prediction value;
synchronously updating the model parameters of the first sub-model and the model parameters of the second sub-model through the loss function to obtain the vector conversion model;
the determining the behavior predicted value corresponding to the training user through the first sub-model based on the training user characteristics and the training recommendation object characteristics comprises:
inputting the training user characteristics and the training recommendation object characteristics into the first sub-model, and determining a user behavior vector representation and a first recommendation object vector of the training user by the first sub-model;
determining cosine similarity between the user behavior vector representation and the first recommendation object vector as a first initial similarity;
performing multiple amplification processing on the first initial similarity to obtain a first similarity, and determining the behavior predicted value according to the first similarity;
the determining the behavior conversion predicted value corresponding to the training user through the second sub-model based on the training user characteristics and the training recommendation object characteristics comprises:
inputting the training user characteristics and the training recommendation object characteristics into the second submodel, and determining a user conversion vector representation and a second recommendation object vector of the training user by the second submodel;
determining cosine similarity between the user conversion vector representation and the second recommendation object vector as a second initial similarity;
and performing multiple amplification processing on the second initial similarity to obtain a second similarity, and determining the behavior conversion predicted value according to the second similarity.
2. The method of claim 1, wherein prior to said obtaining a set of training samples, the method further comprises:
acquiring user behavior data of an initial training user, and determining initial user characteristics and initial recommended object characteristics corresponding to the initial training user according to the user behavior data; the initial user features comprise initial user portrait features and initial user behavior features;
generating an initial training sample set according to the initial user portrait characteristics, the initial user behavior characteristics and the initial recommended object characteristics; the initial training sample set comprises a first sample subset and a second sample subset;
performing sample collection processing on the initial training sample set so that the quantity ratio between the first sample subset and the second sample subset is in a preset value interval;
and taking the initial training sample set subjected to the sample collection processing as the training sample set.
3. The method of claim 1, wherein determining the loss function for the initial two-tower model based on the behavior prediction value and the behavior conversion prediction value comprises:
determining a first loss function based on the behavior predicted value, and determining a first weight corresponding to the first loss function;
determining a second loss function based on the behavior conversion predicted value, and determining a second weight corresponding to the second loss function;
and performing weighted summation processing on the first loss function and the second loss function according to the first weight and the second weight to determine the loss function.
4. The method of claim 1, wherein determining the seed user vector corresponding to the seed user comprises:
acquiring seed user data corresponding to the seed user; the seed user data comprises a seed user identification;
determining a seed user vector corresponding to the seed user from the candidate user vectors according to the seed user identifier; or
Acquiring a pre-constructed vector conversion model;
and carrying out vector conversion processing on the seed user data through the vector conversion model to generate the seed user vector.
5. The method of claim 1, wherein the calculating the similarity between the candidate user vector and the seed user vector comprises:
clustering the seed user vectors to obtain a plurality of clustering centers corresponding to the seed user vectors;
determining a clustering center vector corresponding to each clustering center;
and calculating the average similarity between the candidate user vector and each cluster center vector to determine the similarity.
6. The method according to claim 1, wherein the determining similar users corresponding to the seed user from the candidate users according to the similarity comprises:
acquiring a user expansion condition, and determining the number of similar users to be expanded according to the user expansion condition;
determining a ranking result of similarities between the plurality of candidate user vectors and the seed user vector;
and determining the similar users from a plurality of candidate users according to the sorting result.
7. A vector transformation model, comprising:
the first submodel is used for determining a behavior predicted value corresponding to a training user according to training user characteristics and training recommendation object characteristics of the training user;
the second submodel is used for determining a behavior conversion predicted value corresponding to the training user according to the training user characteristics and the training recommendation object characteristics; the second submodel is independent from the first submodel;
the matching layer is used for carrying out weighted summation processing on the behavior predicted value and the behavior conversion predicted value to obtain a model output value of the vector conversion model; reversely and synchronously updating the model parameters of the first sub-model and the second sub-model according to the model output value;
the first submodel is also used for determining the user behavior vector representation of the training user according to the training user characteristics and determining a first recommended object vector according to the training recommended object characteristics;
determining cosine similarity between the user behavior vector representation and the first recommendation object vector as a first initial similarity;
performing multiple amplification processing on the first initial similarity to obtain a first similarity, and determining the behavior predicted value according to the first similarity;
the second submodel is also used for determining user conversion vector representation of the training user according to the training user characteristics and determining a second recommended object vector according to the training recommended object characteristics;
determining cosine similarity between the user conversion vector representation and the second recommendation object vector as a second initial similarity;
and performing multiple amplification processing on the second initial similarity to obtain a second similarity, and determining the behavior conversion predicted value according to the second similarity.
8. The model of claim 7, characterized in that the first submodel comprises:
the first input layer is used for receiving the training user characteristics and the training recommendation object characteristics;
the first representation layer is used for carrying out first conversion processing on the training user characteristics and the training recommendation object characteristics to obtain user behavior vector representations and first recommendation object vectors corresponding to the training users; determining a first similarity between the user behavior vector representation and the first recommendation object vector so that the matching layer determines the behavior prediction value according to the first similarity.
9. The model of claim 7, wherein the second submodel comprises:
the second input layer is used for receiving the training user characteristics and the training recommendation object characteristics;
the second representation layer is used for carrying out second conversion processing on the training user characteristics and the training recommendation object characteristics to obtain user conversion vector representations and second recommendation object vectors corresponding to the training users; and determining a second similarity between the user conversion vector representation and the second recommended object vector so that the matching layer determines the behavior conversion predicted value according to the second similarity.
10. A similar user determination device, comprising:
the vector conversion module is used for acquiring user data of candidate users and performing vector conversion processing on the user data to generate corresponding user behavior vectors and behavior conversion vectors; the user data comprises behavior data and behavior conversion data;
the vector generation module is used for generating a candidate user vector corresponding to the candidate user based on the user behavior vector and the behavior conversion vector;
the seed user vector determining module is used for determining a seed user vector corresponding to a seed user;
a similar user determining module, configured to calculate a similarity between the candidate user vector and the seed user vector, so as to determine, according to the similarity, a similar user corresponding to the seed user from the candidate users;
the vector translation module includes a vector translation unit configured to:
acquiring a pre-constructed vector conversion model; the vector conversion model comprises a first sub-model and a second sub-model;
determining candidate user characteristics of the candidate user according to the user data of the candidate user;
inputting the candidate user features into the vector translation model to determine the user behavior vector by the first sub-model and the behavior translation vector by the second sub-model;
the vector conversion module further comprises a model training unit, the model training unit comprising:
the data input subunit is used for acquiring a training sample set and acquiring an initial double-tower model so as to input the training sample set to the initial double-tower model; the training sample set comprises training user characteristics and training recommendation object characteristics corresponding to training users;
the first training subunit is used for determining a behavior predicted value corresponding to the training user through the first submodel based on the training user characteristics and the training recommendation object characteristics;
the second training subunit is used for determining a behavior conversion predicted value corresponding to the training user through the second submodel based on the training user characteristics and the training recommendation object characteristics;
a loss function determination subunit, configured to determine a loss function of the initial double-tower model based on the behavior prediction value and the behavior conversion prediction value;
the model training subunit is used for synchronously updating the model parameters of the first sub-model and the model parameters of the second sub-model through the loss function so as to obtain the vector conversion model;
the first training subunit is configured to: inputting the training user characteristics and the training recommendation object characteristics into the first submodel, and determining a user behavior vector representation and a first recommendation object vector of the training user by the first submodel;
determining cosine similarity between the user behavior vector representation and the first recommendation object vector as a first initial similarity;
performing multiple amplification processing on the first initial similarity to obtain a first similarity, and determining the behavior prediction value according to the first similarity;
the second training subunit is configured to: inputting the training user characteristics and the training recommendation object characteristics into the second submodel, and determining a user conversion vector representation and a second recommendation object vector of the training user by the second submodel;
determining cosine similarity between the user conversion vector representation and the second recommendation object vector as a second initial similarity;
and performing multiple amplification processing on the second initial similarity to obtain a second similarity, and determining the behavior conversion predicted value according to the second similarity.
11. The apparatus of claim 10, wherein the vector translation module further comprises a sample set determination unit configured to:
acquiring user behavior data of an initial training user, and determining initial user characteristics and initial recommended object characteristics corresponding to the initial training user according to the user behavior data; the initial user features comprise initial user portrait features and initial user behavior features;
generating an initial training sample set according to the initial user portrait characteristics, the initial user behavior characteristics and the initial recommended object characteristics; the initial training sample set comprises a first sample subset and a second sample subset;
performing sample collection processing on the initial training sample set so that the quantity ratio between the first sample subset and the second sample subset is in a preset value interval;
and taking the initial training sample set subjected to the sample collection processing as the training sample set.
12. The apparatus of claim 10, wherein the loss function determining subunit is configured to:
determining a first loss function based on the behavior predicted value, and determining a first weight corresponding to the first loss function;
determining a second loss function based on the behavior conversion predicted value, and determining a second weight corresponding to the second loss function;
and performing weighted summation processing on the first loss function and the second loss function according to the first weight and the second weight to determine the loss function.
13. The apparatus of claim 10, wherein the seed user vector determination module is configured to:
acquiring seed user data corresponding to the seed user; the seed user data comprises a seed user identification;
determining a seed user vector corresponding to the seed user from the candidate user vectors according to the seed user identifier; or
Acquiring a pre-constructed vector conversion model;
and carrying out vector conversion processing on the seed user data through the vector conversion model to generate the seed user vector.
14. The apparatus of claim 10, wherein the similar user determination module comprises a similarity determination unit configured to:
clustering the seed user vectors to obtain a plurality of clustering centers corresponding to the seed user vectors;
determining a clustering center vector corresponding to each clustering center;
and calculating the average similarity between the candidate user vector and each cluster center vector to determine the similarity.
15. The apparatus of claim 10, wherein the similar user determination module comprises a similar user determination unit configured to:
acquiring a user expansion condition, and determining the number of similar users to be expanded according to the user expansion condition;
determining a ranking result of similarities between the plurality of candidate user vectors and the seed user vector;
and determining the similar users from a plurality of candidate users according to the sorting result.
16. An electronic device, comprising:
a processor; and
a memory having stored thereon computer readable instructions which, when executed by the processor, implement a similar user determination method as defined in any one of claims 1 to 6.
17. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a similar user determination method as claimed in any one of claims 1 to 6.
CN202110340900.0A 2021-03-30 2021-03-30 Similar user determination method, vector conversion model, device, medium and equipment Active CN112905897B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110340900.0A CN112905897B (en) 2021-03-30 2021-03-30 Similar user determination method, vector conversion model, device, medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110340900.0A CN112905897B (en) 2021-03-30 2021-03-30 Similar user determination method, vector conversion model, device, medium and equipment

Publications (2)

Publication Number Publication Date
CN112905897A CN112905897A (en) 2021-06-04
CN112905897B true CN112905897B (en) 2022-09-09

Family

ID=76109677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110340900.0A Active CN112905897B (en) 2021-03-30 2021-03-30 Similar user determination method, vector conversion model, device, medium and equipment

Country Status (1)

Country Link
CN (1) CN112905897B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113362139B (en) * 2021-06-17 2023-04-28 震坤行工业超市(上海)有限公司 Data processing method and device based on double-tower structure model
CN113378071A (en) * 2021-08-16 2021-09-10 武汉卓尔数字传媒科技有限公司 Advertisement recommendation method and device, electronic equipment and storage medium
CN114048294B (en) * 2022-01-11 2022-04-08 智者四海(北京)技术有限公司 Similar population extension model training method, similar population extension method and device
CN114792256B (en) * 2022-06-23 2023-05-26 上海维智卓新信息科技有限公司 Crowd expansion method and device based on model selection

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147882A (en) * 2018-09-03 2019-08-20 腾讯科技(深圳)有限公司 Training method, crowd's method of diffusion, device and the equipment of neural network model
CN110162703A (en) * 2019-05-13 2019-08-23 腾讯科技(深圳)有限公司 Content recommendation method, training method, device, equipment and storage medium
CN110956209A (en) * 2019-11-28 2020-04-03 上海风秩科技有限公司 Model training and predicting method, device, electronic equipment and storage medium
CN111242752A (en) * 2020-04-24 2020-06-05 支付宝(杭州)信息技术有限公司 Method and system for determining recommended object based on multi-task prediction
CN111523044A (en) * 2020-07-06 2020-08-11 南京梦饷网络科技有限公司 Method, computing device, and computer storage medium for recommending target objects

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921221B (en) * 2018-07-04 2022-11-18 腾讯科技(深圳)有限公司 User feature generation method, device, equipment and storage medium
JP6748759B1 (en) * 2019-05-09 2020-09-02 楽天株式会社 Behavior analysis device, advertisement distribution device, behavior analysis method, advertisement distribution method, behavior analysis program, and advertisement distribution program.
CN110647921B (en) * 2019-09-02 2024-03-15 腾讯科技(深圳)有限公司 User behavior prediction method, device, equipment and storage medium
CN111160638B (en) * 2019-12-20 2022-09-02 深圳前海微众银行股份有限公司 Conversion estimation method and device
CN112070542A (en) * 2020-09-09 2020-12-11 深圳前海微众银行股份有限公司 Information conversion rate prediction method, device, equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147882A (en) * 2018-09-03 2019-08-20 腾讯科技(深圳)有限公司 Training method, crowd's method of diffusion, device and the equipment of neural network model
CN110162703A (en) * 2019-05-13 2019-08-23 腾讯科技(深圳)有限公司 Content recommendation method, training method, device, equipment and storage medium
CN110956209A (en) * 2019-11-28 2020-04-03 上海风秩科技有限公司 Model training and predicting method, device, electronic equipment and storage medium
CN111242752A (en) * 2020-04-24 2020-06-05 支付宝(杭州)信息技术有限公司 Method and system for determining recommended object based on multi-task prediction
CN111523044A (en) * 2020-07-06 2020-08-11 南京梦饷网络科技有限公司 Method, computing device, and computer storage medium for recommending target objects

Also Published As

Publication number Publication date
CN112905897A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN112905897B (en) Similar user determination method, vector conversion model, device, medium and equipment
CN108829822B (en) Media content recommendation method and device, storage medium and electronic device
WO2020125445A1 (en) Classification model training method, classification method, device and medium
US20230102337A1 (en) Method and apparatus for training recommendation model, computer device, and storage medium
CN110163647B (en) Data processing method and device
CN110909165B (en) Data processing method, device, medium and electronic equipment
CN111259263B (en) Article recommendation method and device, computer equipment and storage medium
CN109471978B (en) Electronic resource recommendation method and device
CN111400599A (en) User group portrait generation method, device and system
KR102326744B1 (en) Control method, device and program of user participation keyword selection system
US20220138770A1 (en) Method and apparatus for analyzing sales conversation based on voice recognition
CN111429161B (en) Feature extraction method, feature extraction device, storage medium and electronic equipment
CN112148975A (en) Session recommendation method, device and equipment
CN113946754A (en) User portrait based rights and interests recommendation method, device, equipment and storage medium
CN111754278A (en) Article recommendation method and device, computer storage medium and electronic equipment
WO2024041483A1 (en) Recommendation method and related device
CN111209469A (en) Personalized recommendation method and device, computer equipment and storage medium
CN113392920B (en) Method, apparatus, device, medium, and program product for generating cheating prediction model
CN114328800A (en) Text processing method and device, electronic equipment and computer readable storage medium
US11823217B2 (en) Advanced segmentation with superior conversion potential
Karthikeyan et al. Machine learning techniques application: social media, agriculture, and scheduling in distributed systems
CN116910357A (en) Data processing method and related device
CN112632275B (en) Crowd clustering data processing method, device and equipment based on personal text information
CN112967100B (en) Similar crowd expansion method, device, computing equipment and medium
CN115293818A (en) Advertisement putting and selecting method and device, equipment and medium thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant