WO2023123933A1

WO2023123933A1 - User type information determination method and device, and storage medium

Info

Publication number: WO2023123933A1
Application number: PCT/CN2022/101734
Authority: WO
Inventors: 张海川
Original assignee: 深圳前海微众银行股份有限公司
Priority date: 2021-12-30
Filing date: 2022-06-28
Publication date: 2023-07-06
Also published as: CN114298232A

Abstract

The present application provides a user type information determination method and device, and a storage medium, and belongs to the field of Techfin. The method comprises: acquiring feature data of a user; respectively inputting the feature data of the user into a first prediction model and a second prediction model, acquiring a first conversion rate output by the first prediction model and a second conversion rate output by the second prediction model, the first prediction model being used for predicting a probability that target data of the user is generated under a preset execution condition, and the second prediction model being used for predicting a probability that target data of the user is generated without the preset execution condition; and determining type information of the user according to the first conversion rate and the second conversion rate. By means of this mode, types of users may be effectively classified under the preset execution condition.

Description

Method, device and storage medium for determining user type information

This application claims the priority of the Chinese patent application with the application number 202111655734.X and the application title "Method, device and storage medium for determining user type information" submitted to the China Patent Office on December 30, 2021, the entire content of which Incorporated in this application by reference.

technical field

The present application relates to the field of science and technology finance, and in particular to a method, device and storage medium for determining user type information.

Background technique

With the development of computer technology, more and more technologies are applied in the financial field. The traditional financial industry is gradually transforming into financial technology (Finteh). The determination technology of user type information is no exception. However, due to the security of the financial industry, Real-time requirements also put forward higher requirements for technology.

In the related technology, by obtaining the sample of the customer's cumulative loan usage rate, the customer's basic information (identity, transaction, asset, credit, purchase, etc.) is used as a feature, input into the machine learning model, and the predicted cumulative loan amount of the customer is output. Quota utilization rate. If the customer's cumulative loan usage rate exceeds a preset threshold, it indicates that the customer is a potential customer, thereby classifying the types of users.

However, the existing user type division method cannot reflect the user type under different marketing conditions, so that the effectiveness of the divided user type is not high.

Contents of the invention

The present application provides a method, device, and storage medium for determining user type information, so as to solve the problem of low validity of user types classified in the prior art.

In the first aspect, the embodiment of the present application provides a method for determining user type information, the method including:

Obtain the user's characteristic data;

Inputting the characteristic data of the user into the first prediction model and the second prediction model respectively, obtaining the first conversion rate output by the first prediction model and the second conversion rate output by the second prediction model, the first The prediction model is used to predict the probability of generating the user's target data under preset execution conditions, and the second prediction model is used to predict the probability of generating the user's target data without applying the preset execution conditions;

Determine the type information of the user according to the first conversion rate and the second conversion rate.

In an optional implementation manner, the determining the type information of the user according to the first conversion rate and the second conversion rate includes:

Taking the difference between the first conversion rate and the second conversion rate as the probability of generating the target data of the user for the preset execution condition

contribution value;

The type information of the user is determined according to the first conversion rate, the second conversion rate and the contribution value.

In an optional implementation manner, the first prediction model is generated after training through a first sample set, and the first sample set contains characteristic data of historical users and under the preset execution conditions generating result data of the target data of the historical user;

The second predictive model is generated after training through a second sample set, the second sample set contains characteristic data of historical users and target data of historical users generated without applying the preset execution conditions result data.

In an optional implementation manner, after determining the type information of the user according to the first conversion rate and the second conversion rate, the method further includes:

According to the user's multi-dimensional vector, query the transfer user of the target type of user in the database, the multi-dimensional vector is used to characterize the association relationship between users in multiple dimensions, and the transfer user is subject to the preset execution condition Users who can expand by magnitude.

In an optional implementation manner, the querying the transfer user of the target type of user in the database according to the multidimensional vector of the user includes:

According to the cosine similarity between the multidimensional vector of the user to be queried in the database and the multidimensional vector of the user of the target type, determine whether the user to be queried is the transferred user.

Sampling the multidimensional vectors of users of the target type and the multidimensional vectors of users of non-target types according to a preset sampling ratio to generate a third sample set, where the multidimensional vectors of users of the target type are the third samples The positive sample of the set, the multidimensional vector of the user of the non-target type is a negative sample of the third sample set;

using the third sample to train the similar population extension model;

Inputting the multidimensional vector of the user to be queried in the database into the trained similar crowd expansion model, and obtaining the crowd conversion probability output by the trained similar crowd expansion model;

Determine whether the user to be queried is the transferred user according to the population conversion probability.

In an optional implementation manner, before querying the transfer user of the target type of user in the database according to the multidimensional vector of the user, the method further includes:

A multidimensional vector of a user in the database is determined according to the association information between the users.

In an optional implementation manner, the determining the multidimensional vector of the user in the database according to the association information between the users includes:

selecting a target user from the database as a target node in the user relationship network;

sequentially determining the next user node of the current end node in the associated user node sequence of the target node until the length of the node sequence of the target user reaches a preset sequence length;

generating an associated node array of the target node according to the associated user node sequence of the target node;

Determine the multidimensional vector of the user in the database according to the associated node array of the target node.

In an optional implementation manner, the determining the next user node of the current end node in the associated user node sequence of the target node includes:

Determining the normalized transition probability of the current end node and performing weighted sampling on associated nodes of the current end node to determine the next user node.

In an optional implementation manner, the determining the normalized transition probability of the current end node includes:

According to the association information between the users, determine the transition probability between the current end node and any associated user node;

Normalize the transition probability between the current end node and any associated user node, and determine the normalized transition probability of the current end node.

In an optional implementation manner, the determining the transition probability between the current end node and any associated user node according to the association information between the users includes:

generating weight data between the current end node and any associated user node according to the association information between the users and the identifier of the user;

Determine the weight correction coefficient between the current end node and any associated user node according to the value of the shortest path distance between the current end node and any associated user node in the user relationship network;

According to the weight data between the current end node and any associated user node and the weight correction coefficient between the current end node and any associated user node, determine the relationship between the current end node and any associated user node transition probability.

In an optional implementation manner, the value of the shortest path distance includes a first value, a second value, and a third value; there is a mapping relationship between the value of the shortest path distance and the weight correction coefficient ;

If the associated user node is the previous node of the current end node, the value of the shortest path distance is the first value; if the associated user node and the current end node are adjacent nodes , then the value of the shortest path distance is the second value, if the associated user node is not the previous node of the current end node or the adjacent node of the current end node, then the shortest path The value of the distance is the third value.

In the second aspect, the embodiment of the present application provides an apparatus for determining user type information, including:

The obtaining module is used to obtain user characteristic data.

A prediction module, configured to input the characteristic data of the user into the first prediction model and the second prediction model respectively, and acquire the first conversion rate output by the first prediction model and the second conversion rate output by the second prediction model , the first prediction model is used to predict the probability of generating the user’s target data under preset execution conditions, and the second prediction model is used to predict the probability of generating the user’s target data without applying the preset execution conditions The probability of the target data.

A determining module, configured to determine the type information of the user according to the first conversion rate and the second conversion rate.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory;

Wherein, the memory stores a computer program, and the computer program is suitable for being loaded by the processor and executing the method for determining user type information according to any one of the first aspect and its optional manners.

In the fourth aspect, the embodiment of the present application provides a computer storage medium, the computer storage medium stores a plurality of instructions, and the instructions are suitable for being loaded and executed by a processor according to any one of the user's type information in the first aspect. Determine the method.

The method, device, and storage medium for determining user type information provided in the embodiments of the present application first obtain user characteristic data. Subsequently, the user's characteristic data are respectively input into the first prediction model and the second prediction model, and the first conversion rate output by the first prediction model and the second conversion rate output by the second prediction model are obtained. The first prediction model is used to predict the The probability of generating the user's target data under preset execution conditions, and the second prediction model is used to predict the probability of generating the user's target data without applying the preset execution conditions. Finally, according to the first conversion rate and the second conversion rate, the type information of the user is determined. In this manner, user types can be effectively classified under preset execution conditions.

Description of drawings

In order to more clearly illustrate the technical solutions in this application or the prior art, the accompanying drawings that need to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the accompanying drawings in the following description are the present For some embodiments of the application, those of ordinary skill in the art can also obtain other drawings based on these drawings without any creative effort.

FIG. 1 is a schematic diagram of a scenario of an operating environment provided by an embodiment of the present application;

FIG. 2 is a schematic flowchart of a method for determining user type information provided by an embodiment of the present application;

FIG. 3 is a schematic flowchart of a method for mining and transferring users provided in an embodiment of the present application;

FIG. 4 is a schematic flowchart of another method for mining and transferring users provided by the embodiment of the present application;

FIG. 5 is a schematic diagram of an association relationship between user nodes provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a multidimensional vector for determining a user provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a transition probability between nodes provided by an embodiment of the present application;

FIG. 8 is a schematic structural diagram of an apparatus for determining user type information provided by an embodiment of the present application;

FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

Detailed ways

In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments It is a part of the embodiments of the present disclosure, but not all of them. Based on the embodiments in the present disclosure, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present disclosure.

Inclusive finance is one of the important keynotes of the transformation and development of China's financial industry at present. Many commercial banks are creating new marketing models for online customer acquisition and operation. How to accurately locate potential customers of their own bank among a large number of people, recommend the most matching products to them, and improve the active stickiness of existing customers has become a key consideration for many banks.

In the related technology, by obtaining the customer's cumulative loan usage rate sample, the customer's basic information (identity, transaction, asset, credit, purchase, etc.) is used as a feature, input into the machine learning model, and the predicted customer's cumulative loan usage is output Rate. If the customer's cumulative loan usage rate exceeds a preset threshold, it indicates that the customer is a potential customer, thereby classifying the types of users. However, the existing user type division method cannot reflect the user type under different marketing conditions, so that the effectiveness of the divided user type is not high.

In order to solve the above problems, an embodiment of the present disclosure provides a method, device, and storage medium for determining user type information, predicting the first conversion rate of the user's target data generated under preset execution conditions and the first conversion rate when no preset execution conditions are imposed. Next, generate the second conversion rate of the user's target data, and then determine the type information of the user based on the first conversion rate and the second conversion rate, so that the type of the user can be effectively classified under the preset execution condition.

Before describing the method for determining the type information of the user in the present disclosure, first understand the example operating environment of the present disclosure according to FIG. 1 .

FIG. 1 is a schematic diagram of a scenario of an operating environment provided by an embodiment of the present disclosure. As shown in FIG. 1 , it shows subjects who want to obtain user type information, such as enterprises 101, banking institutions 102, etc., and these subjects can request the system platform 103 to query user type information as needed. Of course, the above-mentioned subject is only for illustration. In fact, there are other subjects that can initiate a query, for example, the system platform 103 automatically initiates a query, and no more examples are given here. The query request from each subject is provided to the system platform 103 through the network, and the system platform 103 is used to perform the task of determining the type information of the user. The system platform 103 can not only include a query module for querying the type information of the user, but also can It includes a marketing module and a mining module, the marketing module provides different marketing schemes for different types of users, and the mining module is used to mine potential transfer users of target types of users. In addition, the system platform 103 may also provide a database 104 during the process of querying the transferred user, and the database 104 includes the user to be queried, for example, an enterprise information base. It should be understood that the enterprise information base in the example environment is only exemplary, and other types of information bases all belong to the scope of protection of the present disclosure. Moreover, in the above example operation scenario, the above-mentioned subject that acquires user type information can use various devices to access the network, such as personal computers, servers, tablets, mobile phones, PDAs, notebooks or any other computing devices with networking capabilities. The system platform 103 can be implemented by using a server or server group with stronger processing capability and higher security. The networks used between them can include various types of wired and wireless networks, such as but not limited to: Internet, local area network, WIFI, WLAN, cellular communication network (GPRS, CDMA, 2G/3G/4G/5G cellular network ), satellite communication network, etc.

It can be understood that the above-mentioned method for determining user type information can be realized by the device for determining user type information provided in the embodiments of the present disclosure, and the device for determining user type information can be part or all of a certain device, such as a server or a server chip.

The technical solutions of the embodiments of the present disclosure will be described in detail below with specific embodiments by taking a server integrated or installed with related execution codes as an example. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.

FIG. 2 is a schematic flowchart of a method for determining user type information provided by an embodiment of the present disclosure. This embodiment relates to a process of how a server determines user type information. Different from the existing methods for determining user type information, the present disclosure separately predicts the first conversion rate of generating the user's target data under preset execution conditions and the first conversion rate of generating user's target data under no preset execution conditions. The second conversion rate, so as to determine the type information of the user according to the first conversion rate and the second conversion rate. Therefore, the method for determining user type information provided in the present disclosure can effectively classify user types under the influence of preset execution conditions on user types.

Specifically, as shown in Figure 2, the method includes:

S201. Obtain characteristic data of a user.

In the present disclosure, when it is necessary to classify the types of users, the server may obtain the characteristic data of the users.

It should be understood that the embodiment of the present disclosure does not limit how to obtain the characteristic data of the user. In some embodiments, the unique identification (ID) of the user may be obtained, and then the characteristic information related to the user may be determined according to the unique identification of the user. .

It should be noted that the users involved in the embodiments of the present disclosure may be enterprise users or individual users, which is not limited in the embodiments of the present disclosure.

It should be understood that the embodiments of the present disclosure do not limit the characteristic data of the user, which may be specifically determined according to actual conditions. Exemplarily, if it is an enterprise user, the characteristic data of the user may include but not limited to financial data, industrial and commercial data, regional data, equity data and abnormal business data.

Among them, financial data are used to represent total assets, owner's equity, investment income, operating income, non-operating income, total profit, main business income, net profit, total liabilities, total tax payment, operating costs, sales expenses, asset impairment loss etc. Industrial and commercial data are used to represent company type, year of establishment, industry, enterprise status, registered capital, number of industrial and commercial changes, etc. Territory data is used to represent provinces, cities, etc. Equity data is used to represent the number of direct shareholders, the number of direct shareholders of natural persons, the shareholding ratio of direct shareholders of natural persons, the number of direct shareholders of non-natural persons, the shareholding ratio of direct shareholders of non-natural persons, etc. Business abnormalities are used to represent the number of administrative penalties, business abnormalities, tax violations, defendants, and dishonesty executions.

S202. Input the user's feature data into the first prediction model and the second prediction model respectively, and obtain the first conversion rate output by the first prediction model and the second conversion rate output by the second prediction model. The first prediction model is used to predict the The probability of generating the user's target data under preset execution conditions, and the second prediction model is used to predict the probability of generating the user's target data without applying the preset execution conditions.

In this step, after the server obtains the characteristic data of the user, the characteristic data of the user may be input into the first prediction model and the second prediction model respectively, so as to obtain the first conversion rate and the second conversion rate.

It should be understood that the embodiments of the present disclosure do not limit the preset execution conditions and target data. In some embodiments, the preset execution conditions may be marketing to users, and correspondingly, the target data may be transaction data generated by users. The first conversion rate and the second conversion rate can reflect the probability that the user is converted into a transaction behavior under the condition of being marketed and not being marketed.

The following describes how to construct the first prediction model and the second prediction model.

In some embodiments, the first prediction model is generated after training through the first sample set, which contains the characteristic data of historical users and the result data of generating target data of historical users under preset execution conditions . Correspondingly, the second prediction model is generated after being trained through the second sample set, which contains characteristic data of historical users and result data of generating target data of historical users without applying preset execution conditions.

Exemplarily, the server may divide all customers into two groups according to whether they are treated (treated): the treated group and the unmarketed group (control). For two groups of customers, the result data of whether they are transformed (responded) is used as the target label (label), and the training set and the verification set are divided.

For example, the marketed group can be taken out separately, and the converted group can be used as label1, and the non-transformed group can be used as label0. Randomly sample 80% as a training set and 20% as a verification set, and train the first prediction model based on the XgBoost algorithm or other classification methods. For customers who have not been marketed, use the same processing method to train the second prediction model.

It should be understood that the embodiments of the present disclosure do not limit the types of the first prediction model and the second prediction model. Exemplarily, the first prediction model and the second prediction model may be binary classification models.

S203. Determine user type information according to the first conversion rate and the second conversion rate.

In this step, after the server acquires the first conversion rate and the second conversion rate, the type information of the user may be determined according to the first conversion rate and the second conversion rate.

It should be understood that the embodiment of the present disclosure does not limit how to determine the type information of the user. In some embodiments, the server may use the difference between the first conversion rate and the second conversion rate as the preset execution condition for generating the user's target data. The contribution value of the probability, so as to determine the type information of the user according to the first conversion rate, the second conversion rate and the contribution value.

Exemplarily, using the first prediction model and the second prediction model can predict the conversion rate p of the user in both marketing and non-marketing scenarios, so as to obtain the first conversion rate p _treated in the marketing scenario and p in the non-marketing scenario For the second conversion rate p _control , by making a difference between the first conversion rate and the second conversion rate, the contribution value lift of marketing to the probability of generating the user's transaction data can be obtained. Among them, lift=p _treated -p _control .

Subsequently, in some embodiments, if the first conversion rate is greater than the first threshold, the second conversion rate is less than or equal to the first threshold, and the contribution value is greater than the second threshold, the server may determine that the user belongs to the first user type.

If the first conversion rate is greater than the first threshold, the second conversion rate is greater than the first threshold, and the contribution value is greater than or equal to the third threshold and less than or equal to the second threshold, the server may determine that the user belongs to the second user type.

If the first conversion rate is less than or equal to the first threshold, the second conversion rate is less than or equal to the first threshold, and the contribution value is greater than or equal to the third threshold and less than or equal to the second threshold, the server may determine that the user belongs to the third user type.

If the first conversion rate is less than or equal to the first threshold, the second conversion rate is greater than the first threshold, and the contribution value is less than the third threshold, the server may determine that the user belongs to the fourth user type;

Wherein, the absolute value of the second threshold is equal to the absolute value of the third threshold.

It should be understood that the embodiment of the present disclosure does not limit the value of the first threshold, for example, the first threshold may be 0.5.

It should be understood that the embodiment of the present disclosure does not limit the value of the second threshold. Exemplarily, the second threshold thres is the threshold at which lift changes significantly, which may be a positive decimal less than 1 and closer to 0, and may be passed The quantile of the overall statistical distribution is determined, and correspondingly, the third threshold can be -thres.

Exemplarily, if p_treated>0.5, p_control≤0.5, lift>thres, then the user is the first type of user. If p_treated>0.5, p_control>0.5, -thres≤lift≤thres, then the user is the second type of user. If p_treated≤0.5, p_control≤0.5, -thres≤lift≤thres, the user is a third type user. If p_treated≤0.5, p_control>0.5, lift<-thres, the user is the fourth type of user.

Among them, the probability of generating target data for users of the first user type increases when preset execution conditions are applied; the probability of generating target data for users of the second user type is higher than that when preset execution conditions are applied or not. The upper limit of the target; the probability of the third user type generating target data is lower than the target lower limit when the preset execution condition is applied or not; the fourth user type is under the application of the preset execution condition The probability of generating target data is reduced.

Exemplarily, taking the preset execution condition of marketing to users as an example, the four types of users may correspondingly include marketing-sensitive groups, natural conversion groups, indifferent groups, and reactionary groups.

Among them, the proportion of marketing-sensitive groups is relatively low, but they are easily affected by marketing activities and have active behaviors. For this part of the users, we can further conduct stratified operations according to whether they are sensitive to prices, discounts, and profit concessions.

Naturally transform the crowd into spontaneous active users. Even if the bank does not invest marketing resources in them, the users will be spontaneously active and more high-quality. For some users, you can use the similar user expansion model to find more similar users in the enterprise information database, and guide them to become bank users through online marketing, telemarketing and other means.

The indifferent group refers to users who have been lost and cannot be recovered through marketing, or users who rarely read marketing messages, and there is no need to continue to invest more marketing resources.

The reactionary group will be spontaneously active, but they will be more disgusted with marketing interruptions. To avoid marketing interruptions to this part of users, there is no need to invest marketing resources.

In some embodiments, marketing-sensitive groups can be further divided, so that different marketing methods can be adopted based on the types of further divisions.

Exemplarily, marketing sensitive groups can be further divided into two types. The first type is price-sensitive. When there are marketing activities related to subsidies and discounts, there will be corresponding active behaviors. The second type is the price-insensitive type, which is less affected by whether there is a subsidy or not. For this type of users, only regular marketing reminders are required. After the price-sensitive and price-insensitive samples are constructed, the classic classification method can be used to classify users, which will not be repeated here.

Exemplarily, when a user has one of the following behaviors, it can be classified as a price-sensitive label=1 sample (where N is an adjustable threshold, and different values can be set based on different data distributions, usually not greater than 30)) :

1. The average loan interest rate in the past 3 months is within the lowest N% of the group;

2. The subsidy utilization rate of loan documents in the past 3 months is above topN%;

3. Actively share channel information to receive coupons.

Exemplarily, except for the above-mentioned price-sensitive customers, other customers are price-insensitive customers (label=0).

In some embodiments, for price-sensitive customers, during marketing, activities and rights with greater discounts can be recommended to them, so as to encourage them to have better active performance. Customers who are not price-sensitive can make cross-recommendations of multiple products within the bank through regular marketing reminders.

The method for determining user type information provided by the embodiments of the present disclosure can classify users more finely, which can avoid waste of marketing resources. Combined with the marketing response model, price sensitivity model and other models, the stock and incremental users can be divided in detail, which helps the bank to allocate marketing resources to the most needy marketing-sensitive users.

In the method for determining user type information provided by the embodiments of the present disclosure, firstly, user feature data is acquired. Subsequently, the user's characteristic data are respectively input into the first prediction model and the second prediction model, and the first conversion rate output by the first prediction model and the second conversion rate output by the second prediction model are obtained. The first prediction model is used to predict the The probability of generating the user's target data under preset execution conditions, and the second prediction model is used to predict the probability of generating the user's target data without applying the preset execution conditions. Finally, according to the first conversion rate and the second conversion rate, the type information of the user is determined. In this manner, user types can be effectively classified under preset execution conditions.

On the basis of the above embodiments, after determining the type information of the user, the server may also query the transfer user of the user of the target type in the database according to the multidimensional vector of the user. The multi-dimensional vector is used to characterize the relationship between users in multiple dimensions, and the transfer user is a user who is applied with preset execution conditions for magnitude expansion. FIG. 3 is a schematic flowchart of a method for mining and transferring users provided by an embodiment of the present disclosure. As shown in FIG. 3 , the method includes:

S301. According to the cosine similarity between the multidimensional vector of the user to be queried and the multidimensional vector of the user of the target type in the database.

It should be understood that this embodiment of the present disclosure does not limit the target type of users, and in some embodiments, the target type of users may be the above-mentioned second type of users, that is, a natural conversion group.

Exemplarily, assuming that the multidimensional vector (embedding) vectors of two users ui and uj are respectively (x _i1 , x _i2 , . . . , x _in ) and (x _j1 , x _j2 , . . . , x _jn ), then users ui, The cosine similarity cos(u _i ,u _j ) between uj can be determined by formula (1).

It should be understood that the embodiment of the present disclosure does not limit how to determine the multidimensional vector of the user. In some embodiments, the server may determine the multidimensional vector of the user in the database according to the association information between users.

Exemplarily, the server may first select a target user from the database as a target node in the user relationship network. Secondly, the server may sequentially determine the next user node of the current end node in the associated user node sequence of the target node until the length of the node sequence of the target user reaches a preset sequence length. Again, the server can generate an associated node array of the target node according to the associated user node sequence of the target node. Finally, the server can determine the multidimensional vector of the user in the database according to the associated node array of the target node.

It should be understood that to determine the normalized transition probability of the current end node, the transition probability between the current end node and any associated user node can be determined first according to the association information between users. Then, after normalizing the transition probability between the current end node and any associated user node, the normalized transition probability of the current end node is determined.

It should be understood that the embodiment of the present disclosure does not limit how to determine the transition probability between user nodes. For example, the server may first generate the current end node and any associated user according to the association information between users and the user identification. Weight data between nodes. Subsequently, the server determines the weight correction coefficient between the current end node and any associated user node according to the value of the shortest path distance between the current end node and any associated user node in the user relationship network. Finally, the server determines the transition probability between the current end node and any associated user node according to the weight data between the current end node and any associated user node and the weight correction coefficient between the current end node and any associated user node .

Wherein, the value of the shortest path distance includes the first value, the second value and the third value; there is a mapping relationship between the value of the shortest path distance and the weight correction coefficient;

If the associated user node is the previous node of the current end node, the value of the shortest path distance is the first value; if the associated user node and the current end node are adjacent nodes, the value of the shortest path distance is the second value value, if the associated user node is not the previous node of the current end node or the adjacent node of the current end node, the value of the shortest path distance is the third value.

S302. According to the cosine similarity, determine whether the user to be queried is a transferred user.

Wherein, the transfer user can be understood as a user who can perform marketing conversion.

It should be understood that the embodiment of the present disclosure does not limit how to determine whether the user to be queried is a transfer user according to the cosine similarity. In some embodiments, the greater the cosine similarity distance between two users, the more similar they are. Correspondingly, the marketing target customer can be found by looking for the customer with the smallest cosine similarity distance with the target type of user.

It should be noted that the method for mining and transferring users in FIG. 3 may be applicable to a situation where the magnitude of the target type of users is small, for example, less than or equal to 200 people. When the magnitude of the target type of users is large (for example, more than 200), the method shown in FIG. 4 may be used.

FIG. 4 is a schematic flowchart of another method for mining and transferring users provided by an embodiment of the present disclosure. As shown in FIG. 4 , the method for mining and transferring users includes:

S401. Sampling the multidimensional vectors of users of the target type and the multidimensional vectors of users of non-target types according to a preset sampling ratio to generate a third sample set, where the multidimensional vectors of users of the target type are positive samples of the third sample set , the multidimensional vector of users of non-target type is the negative sample of the third sample set.

Exemplarily, the natural conversion population can be used as the seed customer (label=1), and the non-natural conversion customer can be used as the negative sample (label=0), and the sample size can be appropriately sampled so that label1:label0 is 1:1～1 : between 3. Then, randomly divide the samples, take 80% as the training set, and the remaining 20% as the verification set.

S402. Use the third sample to train the extended model for similar groups of people.

Exemplarily, model training can be performed on sample users in the training set and verification set using their equity holding embedding vectors as features (XgBoost/LR feature combination can be used, etc.), and the binary classification model lookalike.model can be saved.

S403. Input the multidimensional vector of the user to be queried in the database into the trained similar group expansion model, and obtain the group conversion probability output by the trained similar group expansion model.

Exemplarily, a large number of users who also have embedding vector features can be used as the users to be queried, so that the trained lookalike.model model can be used to make predictions, and the probability score of the converted population of the users to be queried can be obtained.

S404. Determine whether the user to be queried is a transferred user according to the crowd conversion probability.

It should be understood that the embodiment of the present disclosure does not limit how to determine whether the user to be queried is a transferred user according to the population conversion probability. In some embodiments, the population switching probability may be compared to a threshold.

Exemplarily, if score≥thres, it can be determined that the user to be queried is a transfer user, and if score<thres, it can be determined that the user to be queried is not a transfer user.

It should be understood that the methods for mining and transferring users provided in Figure 3 and Figure 4 use the user's multidimensional vector, because the user's multidimensional vector contains the holding homogeneity similarity and holding structure similarity between users compared with the traditional method , it is possible to dig out the friend or acquaintance relationship between users, thereby improving the transfer success rate of transferred users.

Based on the methods for mining and transferring users provided in FIG. 3 and FIG. 4 , the server can determine the multidimensional vector of the users in the database according to the association information between users. The following describes how to determine the dimension vector of the user.

FIG. 5 is a schematic diagram of an association relationship between user nodes provided by an embodiment of the present disclosure. As shown in Figure 5, all the nodes in the figure represent a company, and the edges between nodes represent the holding relationship, from the investment company to the holding company, and the weight of the edge represents the holding ratio.

As shown in Figure 5, the enterprises corresponding to the two nodes can contain two similar relationships.

The first kind of similarity relationship, considering that enterprise u and s1, s2, s3, and s4 are neighbors, it can be considered that there is a certain similarity between enterprise u and enterprises s1, s2, s3, and s4, which is called homogeneity .

The second kind of similarity relationship, u and s6 are both central nodes of the corresponding subgraph, and have the largest degree in the corresponding subgraph, and they also have a certain similarity, which can be called structural similarity.

It should be noted that both depth-first traversal (DFS) and breadth-first traversal (BFS) are required to discover homogeneity and structural similarity at the same time and reflect them in the embedding results. In order to better integrate the advantages of the two traversal methods, the node2vec algorithm can be used. The algorithm uses a random walk method, which can take into account both depth-first traversal and breadth-first traversal, and generates a traversal node queue composed of nodes. Then traverse the node queue as the context, and use the skip-gram method to obtain the embedding word vector representation of each node.

Fig. 6 is a schematic diagram of determining a user's multidimensional vector provided by an embodiment of the present disclosure. As shown in Fig. 6, the method includes:

S501. Encode the user's name in a unique table.

It should be understood that this embodiment of the present disclosure does not limit how to encode the user's name, and the encoding may be performed according to a preset encoding sequence. Exemplarily, "Shenzhen Qianhai WeBank Co., Ltd." may be coded as s5.

S502. Generate weight data between user nodes corresponding to each directed edge in the user relationship network.

In some embodiments, the weight data between user nodes may include a start node, an end node and a weight system. Exemplarily, as shown in FIG. 5, the weight data may be, for example, "u s1 0.7", "u s2 0.35", "u s3 0.65" and so on.

S503. Determine the weight correction coefficient between the nodes according to the value of the shortest path distance between the two nodes in the user relationship network.

Exemplarily, the current node is v, and the previous node of v is t (t→v has a directed edge), then for the adjacent node x of the current node v, the weight correction coefficient can be defined as formula (2) Shown:

Among them, _dtx represents the shortest path distance between x and vertex t. There are only three cases of the shortest path distance: if it returns to node t (regardless of the directionality of the edge), then d _tx =0; if x and t are directly adjacent, then d _tx =1; in other cases d _tx = 2.

It should be understood that specific values of p and q may be specified in advance, which are not limited here.

S504. Determine the transition probability between nodes according to the weight data between user nodes and the weight correction coefficient between nodes.

Exemplarily, the transition probability between nodes can be determined by formula (3).

π(v,x)=α(t,x)w _vx (3)

Among them, w _vx is the weight of the node v and x in the user relationship network, and π(v, x) is the transition probability from node v to node x.

FIG. 7 is a schematic diagram of a transition probability between nodes provided by an embodiment of the present disclosure. As shown in Figure 7, s7 is the current node, and s6 is the previous node of s7. Then for the two adjacent nodes of s7: s8 and s5, the transition probabilities are respectively: π(s7, s6)=1/p*60%, π(s7, s8)=1*15%, π(s7, s5)=1/q*25%.

S505. Determine the normalized probability between nodes according to the transition probability between nodes.

Exemplarily, for each adjacent node x _i of node v, the transition probability π _i is obtained, and the transition probability is normalized

S506. Select a target user from the database as a target node in the user relationship network.

Exemplarily, a node t can be randomly selected from the user relationship network shown in FIG. 5 , and an adjacent node v of t can be selected as the target node, where t→v has a directed edge.

S507, sequentially determine the user node next to the current end node in the associated user node sequence of the target node, until the length of the node sequence of the target user reaches a preset sequence length.

The embodiment of the present disclosure does not limit how to determine the next user node. In some embodiments, the next user node can be determined by determining the normalized transition probability of the current end node and performing weighted sampling on the associated nodes of the current end node.

Wherein, the weighted sampling may specifically be an alias sample (alias sample).

Exemplarily, it is possible to calculate the normalized transition probability of all adjacent nodes x _i of the target node v through p _i , and perform node sampling based on alias sampling to obtain the next node x _i , and the sequence at this time is (v , x ₁ ). Subsequently, repeat the above process for the last node of the target user's node sequence to obtain the next user node, and obtain (v, x ₁ , x ₂ ). pass

The length of the sequence to be obtained is predefined as m+1, and the above process is repeated m times to obtain the sequence result of node v: (v, x ₁ , x ₂ , ... x _m ).

S508. Generate an associated node array of the target node according to the associated user node sequence of the target node.

Exemplarily, by predefining the required number of sequences M, M adjacent nodes of node t may be selected, and respective user node sequences of the M adjacent nodes of t may be calculated accordingly. By combining the respective user node sequences of the M adjacent nodes of t, the associated node array of the target node can be obtained as follows:

v ₁ ，x ₁₁ ，x ₁₂ ，…x _1m

v ₂ ，x ₁₁ ，x ₂₂ ，…x _2m

…

v _M ，x _M1 ，x _M2 ，…x _Mm

S509. Determine the multidimensional vector of the user in the database according to the associated node array of the target node.

Exemplarily, the associated node array of the target node can be input into word2vec to obtain the n-dimensional (n can be manually set) embedding vector representation of each node, the format is as follows:

"id1 0.13716 0.05973 -0.05692 0.34796…

id2 0.55362 -0.24561 0.67832 0.89571…

..."

The embodiment of the present disclosure proposes a better application method for relational data between enterprises (such as equity holding relationship, supply chain relationship, etc.), thereby improving the determination accuracy of transfer users, and can solve the current problems to a large extent. Difficulty in obtaining customers in the banking industry and difficulty in mobilizing existing customers.

Those of ordinary skill in the art can understand that all or part of the steps to realize the above method embodiments can be completed by hardware related to program instructions, and the aforementioned program can be stored in a computer-readable storage medium. When the program is executed, the It includes the steps of the above method embodiments; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other various media that can store program codes.

FIG. 8 is a schematic structural diagram of an apparatus for determining user type information provided by an embodiment of the present disclosure. The device for determining the type information of the user may be implemented by software, hardware or a combination of the two, so as to execute the method for determining the type information of the user in the foregoing embodiments. As shown in FIG. 8 , the device 600 for determining type information of the user includes: an acquisition module 601 , a prediction module 602 and a determination module 603 .

An acquisition module 601, configured to acquire user characteristic data.

The forecasting module 602 is configured to input the characteristic data of the user into the first forecasting model and the second forecasting model respectively, obtain the first conversion rate output by the first forecasting model and the second conversion rate outputted by the second forecasting model, and the first forecasting model The second prediction model is used for predicting the probability of generating the user's target data under preset execution conditions, and the second prediction model is used for predicting the probability of generating the user's target data without applying the preset execution conditions.

A determining module 603, configured to determine user type information according to the first conversion rate and the second conversion rate.

The device for determining user type information provided in the embodiments of the present disclosure can perform the actions of the method for determining user type information in the above embodiments, and its implementation principle and technical effect are similar, and will not be repeated here.

FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure. As shown in FIG. 9 , the electronic device may include: at least one processor 701 and a memory 702 . FIG. 9 shows an electronic device with a processor as an example.

The memory 702 is used to store programs. Specifically, the program may include program code, and the program code includes computer operation instructions.

The memory 702 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

The processor 701 is configured to execute the computer-executed instructions stored in the memory 702, so as to realize the method for determining the above-mentioned user type information;

Wherein, the processor 701 may be a central processing unit (Central Processing Unit, referred to as CPU), or a specific integrated circuit (Application Specific Integrated Circuit, referred to as ASIC), or is configured to implement one or multiple integrated circuits.

Optionally, in specific implementation, if the communication interface, memory 702 and processor 701 are independently implemented, the communication interface, memory 702 and processor 701 may be connected to each other through a bus to complete mutual communication. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus. The bus can be divided into address bus, data bus, control bus, etc., but it does not mean that there is only one bus or one type of bus.

Optionally, in terms of specific implementation, if the communication interface, memory 702 and processor 701 are integrated and implemented on one chip, the communication interface, memory 702 and processor 701 may complete communication through an internal interface.

The embodiment of the present disclosure also provides a chip, including a processor and an interface. The interface is used to input and output data or instructions processed by the processor. The processor is configured to execute the methods provided in the above method embodiments.

The present disclosure also provides a computer-readable storage medium, which may include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory) ), a magnetic disk or an optical disk, and other media that can store program codes. Specifically, the computer-readable storage medium stores program information, and the program information is used in the method for determining the above-mentioned user type information.

An embodiment of the present disclosure further provides a program, which is used to execute the method for determining user type information provided by the above method embodiments when executed by a processor.

An embodiment of the present disclosure also provides a program product, such as a computer-readable storage medium, in which an instruction is stored, and when it is run on a computer, the computer executes the method for determining user type information provided by the above-mentioned method embodiment .

In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. A computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present disclosure are produced in whole or in part. A computer can be a general purpose computer, special purpose computer, computer network, or other programmable device. Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. Coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, etc. integrated with one or more available media. Available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)).

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present disclosure, not to limit them; although the present disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present disclosure. scope.

Claims

A method for determining user type information, characterized in that the method includes:

Obtain the user's characteristic data;

Inputting the characteristic data of the user into the first prediction model and the second prediction model respectively, obtaining the first conversion rate output by the first prediction model and the second conversion rate output by the second prediction model, the first The prediction model is used to predict the probability of generating the user's target data under preset execution conditions, and the second prediction model is used to predict the probability of generating the user's target data without applying the preset execution conditions;

Determine the type information of the user according to the first conversion rate and the second conversion rate.
The method according to claim 1, wherein the determining the type information of the user according to the first conversion rate and the second conversion rate comprises:

Taking the difference between the first conversion rate and the second conversion rate as the contribution value of the preset execution condition to the probability of generating the user's target data;

The type information of the user is determined according to the first conversion rate, the second conversion rate and the contribution value.
The method according to claim 1 or 2, wherein the first prediction model is generated after training through a first sample set, and the first sample set contains characteristic data of historical users and the Generate the result data of the target data of the historical user under the preset execution conditions;

The second predictive model is generated after training through a second sample set, the second sample set contains characteristic data of historical users and target data of historical users generated without applying the preset execution conditions result data.
The method according to any one of claims 1-3, wherein after determining the type information of the user according to the first conversion rate and the second conversion rate, the method further comprises:

According to the user's multi-dimensional vector, query the transfer user of the target type of user in the database, the multi-dimensional vector is used to characterize the association relationship between users in multiple dimensions, and the transfer user is subject to the preset execution condition Users who can expand by magnitude.
The method according to claim 4, characterized in that, according to the multidimensional vector of the user, querying the transfer user of the user of the target type in the database includes:

According to the cosine similarity between the multidimensional vector of the user to be queried in the database and the multidimensional vector of the user of the target type, determine whether the user to be queried is the transferred user.
The method according to claim 4, characterized in that, according to the multidimensional vector of the user, the transfer user of the user of the query target type in the database includes:

Sampling the multidimensional vectors of users of the target type and the multidimensional vectors of users of non-target types according to a preset sampling ratio to generate a third sample set, where the multidimensional vectors of users of the target type are the third samples The positive sample of the set, the multidimensional vector of the user of the non-target type is a negative sample of the third sample set;

using the third sample to train the similar population extension model;

Inputting the multidimensional vector of the user to be queried in the database into the trained similar crowd expansion model, and obtaining the crowd conversion probability output by the trained similar crowd expansion model;

Determine whether the user to be queried is the transferred user according to the population conversion probability.
The method according to claim 4, characterized in that, before the transfer user of the user of the target type is inquired in the database according to the multidimensional vector of the user, the method further comprises:

A multidimensional vector of a user in the database is determined according to the association information between the users.
The method according to claim 7, wherein the determining the multidimensional vector of the user in the database according to the association information between the users comprises:

selecting a target user from the database as a target node in the user relationship network;

sequentially determining the next user node of the current end node in the associated user node sequence of the target node until the length of the node sequence of the target user reaches a preset sequence length;

generating an associated node array of the target node according to the associated user node sequence of the target node;

Determine the multidimensional vector of the user in the database according to the associated node array of the target node.
The method according to claim 8, wherein the determining the next user node of the current end node in the associated user node sequence of the target node comprises:

Determining the normalized transition probability of the current end node and performing weighted sampling on associated nodes of the current end node to determine the next user node.
The method according to claim 9, wherein said determining the normalized transition probability of said current end node comprises:

According to the association information between the users, determine the transition probability between the current end node and any associated user node;

Normalize the transition probability between the current end node and any associated user node, and determine the normalized transition probability of the current end node.
The method according to claim 10, wherein the determining the transition probability between the current end node and any associated user node according to the association information between the users comprises:

Generate weight data between the current end node and any associated user node according to the association information between the users and the identification of the user;

Determine the weight correction coefficient between the current end node and any associated user node according to the value of the shortest path distance between the current end node and any associated user node in the user relationship network;

According to the weight data between the current end node and any associated user node and the weight correction coefficient between the current end node and any associated user node, determine the relationship between the current end node and any associated user node transition probability.
The method according to claim 11, wherein the value of the shortest path distance includes a first value, a second value and a third value; the value of the shortest path distance is related to the weight correction There is a mapping relationship between the coefficients;

If the associated user node is the previous node of the current end node, the value of the shortest path distance is the first value; if the associated user node and the current end node are adjacent nodes , then the value of the shortest path distance is the second value, if the associated user node is not the previous node of the current end node or the adjacent node of the current end node, then the shortest path The value of the distance is the third value.
An electronic device, characterized in that it comprises: a processor and a memory; wherein, the memory stores a computer program, and the computer program is adapted to be loaded by the processor and execute any one of claims 1-12 method.
A computer storage medium, characterized in that the computer storage medium stores a plurality of instructions, and the instructions are adapted to be loaded by a processor to execute the method steps according to any one of claims 1-12.
A computer program, characterized by comprising program code, and when the computer runs the computer program, the program code executes the method according to any one of claims 1-12.