CN114298232A

CN114298232A - Method, device and storage medium for determining type information of user

Info

Publication number: CN114298232A
Application number: CN202111655734.XA
Authority: CN
Inventors: 张海川
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2022-04-08
Also published as: WO2023123933A1

Abstract

The embodiment of the disclosure provides a method, equipment and a storage medium for determining type information of a user, belonging to the technical and financial field, wherein the method comprises the following steps: acquiring characteristic data of a user; respectively inputting the characteristic data of the user into a first prediction model and a second prediction model, and obtaining a first conversion rate output by the first prediction model and a second conversion rate output by the second prediction model, wherein the first prediction model is used for predicting the probability of generating the target data of the user under a preset execution condition, and the second prediction model is used for predicting the probability of generating the target data of the user under the condition of not applying the preset execution condition; and determining the type information of the user according to the first conversion rate and the second conversion rate. In this way, the types of users can be effectively divided under the preset execution condition.

Description

Method, device and storage medium for determining type information of user

Technical Field

The present disclosure relates to the field of science and technology finance, and in particular, to a method, an apparatus, and a storage medium for determining type information of a user.

Background

With the development of computer technology, more and more technologies are applied in the financial field, the traditional financial industry is gradually changing to financial technology (Finteh), and the determination technology of the type information of the user is no exception, but due to the requirements of the financial industry on safety and real-time performance, higher requirements are also put forward on the technology.

In the related art, basic information (identity, transaction, property, credit, purchase, and the like) of a client is input into a machine learning model as a characteristic by obtaining a sample of the usage rate of the accumulated credit line of the client, and the predicted usage rate of the loan accumulated credit line of the client is output. If the utilization rate of the loan accumulated amount of the client exceeds a preset threshold value, the client is a potential client, and therefore the types of the users are divided.

However, in the existing user type division method, the user types under different marketing conditions cannot be reflected, so that the effectiveness of the divided user types is not high.

Disclosure of Invention

The embodiment of the disclosure provides a method, a device and a storage medium for determining user type information, so as to solve the problem that the validity of a user type divided in the prior art is not high.

In a first aspect, an embodiment of the present disclosure provides a method for determining type information of a user, where the method includes:

acquiring characteristic data of a user;

respectively inputting the feature data of the user into a first prediction model and a second prediction model, and obtaining a first conversion rate output by the first prediction model and a second conversion rate output by the second prediction model, wherein the first prediction model is used for predicting the probability of generating the target data of the user under a preset execution condition, and the second prediction model is used for predicting the probability of generating the target data of the user under the condition of not applying the preset execution condition;

and determining the type information of the user according to the first conversion rate and the second conversion rate.

In an optional implementation, the determining the type information of the user according to the first conversion rate and the second conversion rate includes:

taking the difference value of the first conversion rate and the second conversion rate as the probability of generating the target data of the user for the preset execution condition

The contribution value of (d);

and determining the type information of the user according to the first conversion rate, the second conversion rate and the contribution value.

In an optional embodiment, the first prediction model is generated after being trained by a first sample set, and the first sample set includes feature data of a historical user and result data of generating target data of the historical user under the preset execution condition;

the second prediction model is generated after training through a second sample set, and the second sample set comprises feature data of a historical user and result data of target data of the historical user generated under the condition that the preset execution condition is not applied.

In an optional embodiment, after determining the type information of the user according to the first conversion rate and the second conversion rate, the method further includes:

according to the multidimensional vector of the user, a transfer user of the target type is inquired in the database, the multidimensional vector is used for representing the incidence relation among the users under multiple dimensions, and the transfer user is the user applied with the preset execution condition for magnitude expansion.

In an alternative embodiment, the querying a database for a transfer user of a target type of user according to a multidimensional vector of users includes:

and determining whether the user to be queried is the transfer user or not according to cosine similarity between the multidimensional vector of the user to be queried in the database and the multidimensional vector of the user of the target type.

sampling the multidimensional vector of the target type user and the multidimensional vector of the non-target type user according to a preset sampling proportion to generate a third sample set, wherein the multidimensional vector of the target type user is a positive sample of the third sample set, and the multidimensional vector of the non-target type user is a negative sample of the third sample set;

training a similar population extension model using the third sample;

inputting the multidimensional vector of the user to be inquired in the database into a trained similar population expansion model, and acquiring the population conversion probability output by the trained similar population expansion model;

and determining whether the user to be inquired is the transfer user or not according to the crowd conversion probability.

In an optional embodiment, before the querying the database for the transfer user of the target type of user according to the multidimensional vector of the user, the method further includes:

and determining the multidimensional vector of the users in the database according to the association information among the users.

In an optional embodiment, the determining the multidimensional vector of the users in the database according to the association information between the users includes:

selecting a target user from the database as a target node in a user relationship network;

sequentially determining a next user node of a current tail node in the associated user node sequence of the target node until the length of the node sequence of the target user reaches a preset sequence length;

generating a correlation node array of the target node according to the correlation user node sequence of the target node;

and determining the multidimensional vector of the users in the database according to the associated node array of the target node.

In an optional embodiment, the determining a next user node of a current end node in the associated user node sequence of the target node includes:

and determining the normalized transition probability of the current tail node, performing weighted sampling on the associated node of the current tail node, and determining the next user node.

In an optional embodiment, the determining the normalized transition probability of the current end node includes:

determining the transition probability between the current tail node and any associated user node according to the associated information between the users;

normalizing the transition probability between the current tail node and any associated user node, and determining the normalized transition probability of the current tail node.

In an optional implementation manner, the determining, according to the association information between the users, a transition probability between the current end node and any associated user node includes:

generating weight data between the current tail node and any associated user node according to the associated information among the users and the user identification;

determining a weight correction coefficient between the current tail node and any associated user node according to the value of the shortest path distance between the current tail node and any associated user node in the user relationship network;

and determining the transition probability between the current tail node and any associated user node according to the weight data between the current tail node and any associated user node and the weight correction coefficient between the current tail node and any associated user node.

In an optional embodiment, the values of the shortest path distance include a first value, a second value, and a third value; the value of the shortest path distance and the weight correction coefficient have a mapping relation;

if the associated user node is the previous node of the current end node, the shortest path distance value is the first value, if the associated user node and the current end node are adjacent nodes, the shortest path distance value is the second value, and if the associated user node is not the previous node of the current end node or the adjacent node of the current end node, the shortest path distance value is the third value.

In a second aspect, an embodiment of the present disclosure provides an apparatus for determining type information of a user, including:

and the acquisition module is used for acquiring the characteristic data of the user.

The prediction module is used for inputting the characteristic data of the user into a first prediction model and a second prediction model respectively, and acquiring a first conversion rate output by the first prediction model and a second conversion rate output by the second prediction model, wherein the first prediction model is used for predicting the probability of generating the target data of the user under a preset execution condition, and the second prediction model is used for predicting the probability of generating the target data of the user under the condition of not applying the preset execution condition.

And the determining module is used for determining the type information of the user according to the first conversion rate and the second conversion rate.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: a processor and a memory;

wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method of determining type information of a user according to any of the first aspect and its alternatives.

In a fourth aspect, the present disclosure provides a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method for determining type information of a user according to any one of the first aspect.

According to the method, the device and the storage medium for determining the type information of the user, provided by the embodiment of the disclosure, the characteristic data of the user is firstly acquired. And then, respectively inputting the characteristic data of the user into a first prediction model and a second prediction model, and acquiring a first conversion rate output by the first prediction model and a second conversion rate output by the second prediction model, wherein the first prediction model is used for predicting the probability of generating the target data of the user under the preset execution condition, and the second prediction model is used for predicting the probability of generating the target data of the user under the condition of not applying the preset execution condition. And finally, determining the type information of the user according to the first conversion rate and the second conversion rate. In this way, the types of users can be effectively divided under the preset execution condition.

Drawings

In order to more clearly illustrate the technical solutions of the present disclosure or the prior art, the following briefly introduces the drawings needed to be used in the description of the embodiments or the prior art, and obviously, the drawings in the following description are some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive labor.

FIG. 1 is a schematic view of a scenario of an operating environment provided by an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a method for determining type information of a user according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of a method for mining a transition user according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of another method for mining a transition user according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of an association relationship between user nodes according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of determining a multidimensional vector of a user according to an embodiment of the present disclosure;

fig. 7 is a schematic diagram of transition probabilities between nodes according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an apparatus for determining type information of a user according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

The general finance is one of the important basic calls for the transformation development of the financial industry in China at present, and a plurality of commercial banks are creating new marketing modes for getting customers and operating on the line. How to accurately position potential customers of own banks in massive crowds and recommend the best matched products to the potential customers, so that the active viscosity of the customers in stock is improved, and the problem becomes a key consideration for many banks.

In the related art, basic information (identity, transaction, property, credit, purchase, etc.) of a client is input into a machine learning model as a characteristic by obtaining a sample of the usage rate of the accumulated credit line of the client, and a predicted usage rate of the loan accumulated credit line of the client is output. If the utilization rate of the loan accumulated amount of the client exceeds a preset threshold value, the client is a potential client, and therefore the types of the users are divided. However, in the existing user type division method, the user types under different marketing conditions cannot be reflected, so that the effectiveness of the divided user types is not high.

In order to solve the above problem, embodiments of the present disclosure provide a method, an apparatus, and a storage medium for determining user type information, in which a first conversion rate for generating target data of a user under a preset execution condition and a second conversion rate for generating target data of the user under a non-preset execution condition are predicted, and then the user type information is determined based on the first conversion rate and the second conversion rate, so that the user types can be effectively divided under the preset execution condition.

Before describing the determination method of the type information of the user of the present disclosure, an example operating environment of the present disclosure will be understood with reference to fig. 1.

Fig. 1 is a scene schematic diagram of an operating environment according to an embodiment of the present disclosure. As shown in fig. 1, there are shown subjects who want to acquire type information of a user, such as an enterprise 101, a banking institution 102, and the like, and these subjects can request from a system platform 103 to inquire about the type information of the user as needed. Of course, the above-mentioned subjects are only for illustration, and there are other subjects that may actually initiate the query, such as the system platform 103 automatically initiating the query, which is not illustrated here. Query requests from the respective subjects are provided to the system platform 103 through the network, the system platform 103 is used for performing a determination task of the type information of the users, the system platform 103 may include not only a query module for querying the type information of the users, but also a marketing module for providing different marketing schemes for different types of users and a mining module for mining potential transfer users of the target type. In addition, the system platform 103 may further provide a database 104 during the process of querying the transfer user, where the database 104 includes the user to be queried, for example, an enterprise information base. It should be understood that the enterprise information repository in the example environment is merely exemplary, and that other types of information repositories are within the scope of the present disclosure. Also, in the above example operational scenario, the subject obtaining the type information of the user may access a network using various devices, such as a personal computer, a server, a tablet, a mobile phone, a PDA, a notebook, or any other computing device with networking capability. The system platform 103 may be implemented using a server or group of servers with greater processing power and greater security. And the networks used therebetween may include various types of wired and wireless networks such as, but not limited to: the internet, local area networks, WIFI, WLAN, cellular communication networks (GPRS, CDMA, 2G/3G/4G/5G cellular networks), satellite communication networks, and so forth.

It is understood that the method for determining the type information of the user may be implemented by the device for determining the type information of the user provided in the embodiment of the present disclosure, and the device for determining the type information of the user may be a part or all of a certain device, for example, a server or a chip of the server.

The following takes a server integrated or installed with relevant execution codes as an example, and details technical solutions of the embodiments of the present disclosure with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 2 is a flowchart illustrating a method for determining type information of a user according to an embodiment of the present disclosure, where the embodiment relates to a process of how a server determines the type information of the user. Unlike the existing determination method of the type information of the user, the present disclosure predicts a first conversion rate of generating the target data of the user under the preset execution condition and a second conversion rate of generating the target data of the user without applying the preset execution condition, respectively, to determine the type information of the user according to the first conversion rate and the second conversion rate. Therefore, the method for determining the type information of the user provided by the disclosure can effectively divide the type of the user under the influence of the preset execution condition on the type of the user.

Specifically, as shown in fig. 2, the method includes:

s201, acquiring characteristic data of a user.

In the present disclosure, the server may obtain the feature data of the user when the type of the user needs to be classified.

It should be understood that the embodiment of the present disclosure is not limited to how to obtain the feature data of the user, and in some embodiments, the feature information related to the user may be determined by obtaining a unique Identification (ID) of the user and then according to the unique identification of the user.

It should be noted that the user related to the embodiment of the present disclosure may be an enterprise user or an individual user, and the embodiment of the present disclosure does not limit this.

It should be understood that the embodiment of the present disclosure is not limited to the feature data of the user, and may be determined specifically according to the actual situation. For example, if the user is an enterprise user, the characteristic data of the user may include, but is not limited to, financial data, business data, geographic data, equity data, and abnormal operation data.

Wherein the financial data is used to characterize the total amount of assets, owner equity, return on investment, revenue on business, revenue off business, total amount of profits, revenue on business, net profits, total amount of liabilities, total amount of taxes, cost of business, sales expenses, loss of equity, and the like. The business data is used for characterizing the company type, the year of establishment, the industry, the enterprise state, the registered capital, the number of business changes and the like. The region data is used for representing provinces, cities and the like. The stock right data is used for representing the number of direct stock holders, the number of direct stock holders of natural people, the ratio of direct stock holders of natural people, the number of direct stock holders of unnatural people, the ratio of direct stock holders of unnatural people and the like. The operation exception is used for representing the number of administrative punishments, the operation exception constant, the number of tax violations, the reported times, the number of times of execution of lost credit and the like.

S202, inputting the characteristic data of the user into a first prediction model and a second prediction model respectively, and obtaining a first conversion rate output by the first prediction model and a second conversion rate output by the second prediction model, wherein the first prediction model is used for predicting the probability of generating the target data of the user under the preset execution condition, and the second prediction model is used for predicting the probability of generating the target data of the user under the condition of not applying the preset execution condition.

In this step, after the server obtains the feature data of the user, the feature data of the user may be input into the first prediction model and the second prediction model, respectively, so as to obtain the first conversion rate and the second conversion rate.

It should be understood that the preset execution condition and the target data are not limited in the embodiments of the present disclosure, and in some embodiments, the preset execution condition may be marketing to a user, and correspondingly, the target data may be transaction data generated by the user. Through the first conversion rate and the second conversion rate, the probability that the user is converted into the transaction behavior under the condition of being marketed and not being marketed can be reflected.

The following is a description of how the first prediction model and the second prediction model are constructed.

In some embodiments, the first prediction model is generated after training through a first sample set, and the first sample set contains feature data of the historical user and result data of generating target data of the historical user under a preset execution condition. Accordingly, the second prediction model is generated after training through a second sample set, and the second sample set comprises feature data of the historical user and result data of generating target data of the historical user under the condition that the preset execution condition is not applied.

For example, the server may divide all clients into two groups according to whether they are marketed (marketed): a managed group and an unmarked group (control). For both groups of customers, the training set and validation set were divided with the result data of whether converted (responded) as the target label (label).

For example, the marketed population may be individually removed, with the transformed population as label1 and the untransformed population as label 0. The first prediction model is trained based on the XgBoost algorithm or other classification methods by taking 80% of random samples as a training set and 20% of random samples as a verification set. For the unmarked customers, the same processing method is used to train the second prediction model.

It should be understood that the disclosed embodiments are not limited to the type of the first prediction model and the second prediction model, and the first prediction model and the second prediction model may be two classification models, for example.

And S203, determining the type information of the user according to the first conversion rate and the second conversion rate.

In this step, after the server obtains the first conversion rate and the second conversion rate, the type information of the user may be determined according to the first conversion rate and the second conversion rate.

It should be understood that the present disclosure is not limited to how to determine the type information of the user, and in some embodiments, the server may determine the type information of the user according to the first conversion rate, the second conversion rate, and the contribution value by using a difference value between the first conversion rate and the second conversion rate as a contribution value of the preset execution condition to the probability of generating the target data of the user.

For example, the conversion rate p of the user in both marketing and non-marketing scenarios can be predicted by using the first prediction model and the second prediction model, so that the first conversion rate p in the marketing scenario is obtained_treatedAnd a second conversion rate p in a non-marketing scenario_controlBy making a difference between the first conversion rate and the second conversion rate, a contribution lift of marketing to a probability of generating transaction data of the user can be obtained. Wherein lift is p_treated-p_control。

Subsequently, in some embodiments, the server may determine that the user belongs to the first user type if the first conversion rate is greater than the first threshold, the second conversion rate is less than or equal to the first threshold, and the contribution value is greater than the second threshold.

If the first conversion rate is greater than the first threshold, the second conversion rate is greater than the first threshold, and the contribution value is greater than or equal to the third threshold and less than or equal to the second threshold, the server may determine that the user belongs to the second user type.

If the first conversion rate is less than or equal to the first threshold, the second conversion rate is less than or equal to the first threshold, and the contribution value is greater than or equal to the third threshold and less than or equal to the second threshold, the server may determine that the user belongs to the third user type.

If the first conversion rate is less than or equal to the first threshold, the second conversion rate is greater than the first threshold, and the contribution value is less than a third threshold, the server may determine that the user belongs to a fourth user type;

wherein the absolute value of the second threshold is equal to the absolute value of the third threshold.

It should be understood that the value of the first threshold is not limited in the embodiments of the present disclosure, and the first threshold may be 0.5 by way of example.

It should be understood that the value of the second threshold is not limited in the embodiments of the present disclosure, for example, the second threshold thres is a threshold where lift changes significantly, and may be a positive decimal number smaller than 1 and closer to 0, and may be determined by the quantile of the overall statistical distribution, and correspondingly, the third threshold may be-thres.

Illustratively, if p _ managed >0.5, p _ control ≦ 0.5, lift > thres, then the user is the first type of user. If p _ managed >0.5, p _ control >0.5, -thres ≦ lift ≦ thres, then the user is a second type of user. If p _ managed is less than or equal to 0.5, p _ control is less than or equal to 0.5, and-thres is less than or equal to lift is less than or equal to thres, the user is a third type user. If p _ managed is less than or equal to 0.5, p _ control is greater than 0.5, lift < -thres, then the user is a fourth type user.

The probability of generating target data is improved when a user of the first user type applies a preset execution condition; the probability that the target data is generated by the users of the second user type when the preset execution condition is applied or not applied is higher than the target upper limit value; the probability that the target data is generated by the user of the third user type when the preset execution condition is applied or not applied is lower than the target lower limit value; the probability that the user of the fourth user type generates the target data when the preset execution condition is applied is reduced.

For example, the preset execution condition is taken as an example for marketing to the users, and four types of users can correspondingly comprise marketing sensitive people, natural transformation people, involuntary people and counteractive people.

Wherein, the active activity proportion of the marketing sensitive people is low, but the marketing sensitive people are easily influenced by the marketing activities to generate active behaviors. For the part of users, the hierarchical management can be further carried out according to the sensitivity to price, discount, interest offering and the like.

The natural conversion crowd is a spontaneous active user, and even if a bank does not put marketing resources into the spontaneous active user, the spontaneous active user can also be active and has high quality. For the part of users, similar user expansion models can be used, more users similar to the similar users can be found in the enterprise information base, and the users can be guided to be bank users through means of internet marketing, electric marketing and the like.

The immortal people are users who are determined to be lost and cannot be recovered through marketing or users who rarely see marketing messages, and do not need to continuously invest more marketing resources.

The reaction crowd is active spontaneously, but the marketing disturbance is rather averse, so that the marketing disturbance to the part of users is avoided, and the marketing resources are not required to be invested.

In some embodiments, marketing-sensitive people may be further divided, such that different marketing approaches are employed based on the type of further division.

Illustratively, marketing-sensitive people may be further divided into two types. The first type is price sensitive, and when there is a subsidy, discount related marketing campaign, there will be corresponding active behavior. The second type is a price insensitive type, the influence of subsidies is small, and users of the type only need to be regularly reminded of marketing. After the price sensitive and price insensitive samples are constructed, the user classification can be carried out by using a classical classification method, and details are not repeated herein.

For example, when the user has one of the following lines, then it can be classified as a sample with price sensitive label-1 (where N is an adjustable threshold, and different values can be set based on different data distributions, and usually not greater than 30)):

1. the average loan interest rate over the past 3 months is at a level within the lowest N% of the population;

2. the subsidy utilization rate of the loan documents in the past 3 months is more than topN%;

3. and actively sharing channel information to get the coupons.

Illustratively, in addition to the price sensitive customers described above, the other customers are price insensitive customers (label ═ 0).

In some embodiments, the price-sensitive customers can be emphatically recommended with activity rights with higher preferential strength to encourage better active performance when marketing. And for the customers with non-sensitive prices, multi-product cross recommendation in the bank can be made for the customers through regular marketing reminding.

The method for determining the type information of the user, provided by the embodiment of the disclosure, has the advantages that the user is finely divided, and the waste of marketing resources can be avoided. And the stock and increment users can be finely divided by combining marketing response models, price sensitivity models and other models, so that the bank is facilitated to incline marketing resources to the marketing sensitive users with the most needs.

The method for determining the type information of the user, provided by the embodiment of the disclosure, first obtains the characteristic data of the user. And then, respectively inputting the characteristic data of the user into a first prediction model and a second prediction model, and acquiring a first conversion rate output by the first prediction model and a second conversion rate output by the second prediction model, wherein the first prediction model is used for predicting the probability of generating the target data of the user under the preset execution condition, and the second prediction model is used for predicting the probability of generating the target data of the user under the condition of not applying the preset execution condition. And finally, determining the type information of the user according to the first conversion rate and the second conversion rate. In this way, the types of users can be effectively divided under the preset execution condition.

On the basis of the above embodiment, after determining the type information of the user, the server may also query the transfer user of the target type in the database according to the multidimensional vector of the user. The multi-dimensional vector is used for representing the incidence relation among users in multiple dimensions, and the transfer user is a user applied with a preset execution condition for magnitude expansion. Fig. 3 is a schematic flowchart of a method for mining a transition user according to an embodiment of the present disclosure, and as shown in fig. 3, the method includes:

s301, according to cosine similarity between the multidimensional vector of the user to be inquired in the database and the multidimensional vector of the user of the target type.

It should be understood that the embodiments of the present disclosure are not limited to the target type of user, which in some embodiments may be the second type of user described above, i.e., the natural conversion crowd.

For example, assuming that the multidimensional vectors (embedding) of two users ui and uj are (x _ i1, x _ i2, …, x _ in) and (x _ j1, x _ j2, …, x _ jn), respectively, the cosine similarity cos (u) between the users ui and uj_i,u_j) Can be determined by equation (1).

It should be understood that the disclosed embodiments are not limited to how the multidimensional vector of the user is determined, and in some embodiments, the server may determine the multidimensional vector of the user in the database according to the association information between the users.

For example, the server may first select a target user from the database as a target node in the user relationship network. Secondly, the server may sequentially determine a next user node of the current end node in the associated user node sequence of the target node until the length of the node sequence of the target user reaches a preset sequence length. And thirdly, the server can generate the associated node array of the target node according to the associated user node sequence of the target node. Finally, the server may determine the multidimensional vector of the users in the database according to the associated node array of the target node.

It should be understood that, when determining the normalized transition probability of the current end node, the transition probability between the current end node and any associated user node may be determined according to the association information between users. Then, the transition probability between the current tail node and any associated user node is normalized, and the normalized transition probability of the current tail node is determined.

It should be understood that the embodiment of the present disclosure does not limit how to determine the transition probability between the user nodes, and for example, the server may first generate weight data between the current end node and any associated user node according to the association information between the users and the identifiers of the users. And then, the server determines a weight correction coefficient between the current tail node and any associated user node according to the value of the shortest path distance between the current tail node and any associated user node in the user relationship network. And finally, the server determines the transition probability between the current tail node and any associated user node according to the weight data between the current tail node and any associated user node and the weight correction coefficient between the current tail node and any associated user node.

The value of the shortest path distance comprises a first value, a second value and a third value; the value of the shortest path distance and the weight correction coefficient have a mapping relation;

if the associated user node is the previous node of the current tail node, the shortest path distance value is a first value, if the associated user node and the current tail node are adjacent nodes, the shortest path distance value is a second value, and if the associated user node is not the previous node of the current tail node or the adjacent node of the current tail node, the shortest path distance value is a third value.

S302, determining whether the user to be inquired is a transfer user according to the cosine similarity.

Wherein, the transfer user can be understood as a user who can perform marketing conversion.

It should be understood that the embodiment of the present disclosure is not limited to how to determine whether the user to be queried is the transfer user according to the cosine similarity, and in some embodiments, the larger the cosine similarity distance between two users is, the more similar the cosine similarity distance is. Accordingly, the marketing target customer can be found by finding the customer with the smallest cosine similarity distance with the target type of the customer.

It should be noted that the method of mining and transferring users of fig. 3 may be applied to a case where the magnitude of users of the target type is small, for example, less than or equal to 200 people. When the magnitude of the target type of user is large (e.g., greater than 200 people), the method shown in FIG. 4 may be employed.

Fig. 4 is a schematic flowchart of another method for mining a transition user according to an embodiment of the present disclosure, and as shown in fig. 4, the method for mining a transition user includes:

s401, sampling the multi-dimensional vector of the target type user and the multi-dimensional vector of the non-target type user according to a preset sampling proportion to generate a third sample set, wherein the multi-dimensional vector of the target type user is a positive sample of the third sample set, and the multi-dimensional vector of the non-target type user is a negative sample of the third sample set.

Illustratively, the naturally transformed population may be used as a seed client (label ═ 1), the non-naturally transformed client may be used as a negative sample (label ═ 0), and the sample magnitude may be appropriately sampled so that label1: label0 is between 1:1 and 1: 3. Subsequently, the samples were randomly divided, taking 80% as the training set and the remaining 20% as the validation set.

S402, training the similar population expansion model by using a third sample.

For example, model training may be performed on sample users in a training set and a verification set by using their share right investment vectors as features (XgBoost/LR feature combinations may be used, etc.), and a binary model lookelike.

S403, inputting the multi-dimensional vector of the user to be inquired in the database into the trained similar population expansion model, and obtaining the population conversion probability output by the trained similar population expansion model.

Exemplarily, a large number of users having the same embedding vector characteristics can be used as users to be queried, so that a trained lookelike model is used for prediction to obtain the probability score of the conversion crowd of the users to be queried.

S404, determining whether the user to be inquired is a transfer user or not according to the crowd conversion probability.

It should be understood that the embodiments of the present disclosure do not limit how to determine whether the user to be queried is the transfer user according to the crowd conversion probability. In some embodiments, the crowd transition probability may be compared to a threshold.

For example, if score ≧ thres, it may be determined that the user to be queried is the transferring user, and if score < thres, it may be determined that the user to be queried is not the transferring user.

It should be understood that the method for mining and transferring users provided in fig. 3 and 4 uses multidimensional vectors of users, and as the multidimensional vectors of users include share control homogeneity similarity and share control structure similarity between users, compared with the conventional method, friends or acquaintances between users can be mined, so that the transfer success rate of transferring users is improved.

On the basis of the method for mining the transferred users provided by fig. 3 and 4, the server can determine the multidimensional vector of the users in the database according to the association information between the users. The following is a description of how to determine the user's dimension vector.

Fig. 5 is a schematic diagram of an association relationship between user nodes according to an embodiment of the present disclosure. As shown in fig. 5, all nodes in the graph represent an enterprise, edges between the nodes represent stock-controlling relationships, the investing enterprise points to the managed stock enterprise, and the weight of the edges represents stock-controlling proportion.

As can be seen in fig. 5, the enterprises corresponding to two nodes may contain two similar relationships.

Considering that the enterprise u is in a neighbor relationship with s1, s2, s3 and s4, the first similarity relationship is that the enterprise u and the enterprises s1, s2, s3 and s4 have certain similarity, which is called homogeneity.

In the second similarity relationship, u and s6 are both central nodes of the corresponding subgraphs, and the highest degree in the corresponding subgraphs is also the one with certain similarity, which may be referred to as structural similarity.

It should be noted that, to find homogeneity and structural similarity simultaneously, and to be reflected in the embedding result, both depth-first traversal (DFS) and breadth-first traversal (BFS) need to be used. To better integrate the advantages of both traversal methods, the node2vec algorithm can be used. The algorithm uses a random walk method, depth-first traversal and breadth-first traversal can be considered, and a traversal node queue composed of nodes is generated. And then, using the traversal node queue as a context, and obtaining the embedding word vector representation of each node by using a skip-gram method.

Fig. 6 is a schematic diagram of determining a multidimensional vector of a user according to an embodiment of the present disclosure, as shown in fig. 6, the method includes:

s501, unique table coding is carried out on the name of the user.

It should be understood that the embodiment of the present disclosure is not limited to how to encode the name of the user, and the encoding may be performed according to a preset encoding order. Illustratively, "Shenzhen Shenhai Zhongzhong Bank limited" may be encoded as s 5.

S502, generating weight data between user nodes corresponding to each directed edge in the user relationship network.

In some embodiments, the weight data between the user nodes may include a start node, an end node, and a weight system. Illustratively, as shown in fig. 5, the weight data may be, for example, "u s 10.7.7", "u s 20.35.35", "u s 30.65.30", and the like.

S503, determining a weight correction coefficient between the nodes according to the value of the shortest path distance between the two nodes in the user relationship network.

For example, if the current node is v and the previous node of v is t (t → v has a directed edge), then the weight correction factor defined for the neighboring node x of the current node v can be as shown in formula (2):

where dtx represents the shortest path distance between x and the vertex t. The shortest path distance has only 3 cases: if go back to node t again (without considering the directionality of the edge), dtx is 0; if x and t are directly adjacent, dtx is 1; otherwise, dtx is 2.

It is to be understood that p and q may be specified with specific values in advance, and are not limited thereto.

S504, determining the transition probability between the nodes according to the weight data between the user nodes and the weight correction coefficient between the nodes.

Illustratively, the transition probability between nodes can be determined by equation (3).

π(v,x)＝α(t,x)·w_vx (3)

Wvx is the weight of the node v and the x edge in the user relationship network, and pi (v, x) is the transition probability from the node v to the node x.

Fig. 7 is a schematic diagram of transition probabilities between nodes according to an embodiment of the present disclosure. As shown in fig. 7, s7 is the current node, and s6 is the last node of s 7. Then for two neighboring nodes of s 7: s8 and s5, wherein the transition probabilities are respectively as follows: pi (s _7, s _6) ═ 1/p × 60%, pi (s _7, s _8) ═ 1 × 15%, and pi (s _7, s _5) ═ 1/q × 25%.

And S505, determining the normalization probability among the nodes according to the transition probability among the nodes.

Illustratively, for each adjacent node xi of the node v, the transition probability pi i is obtained, and the transition probability normalization is carried out

S506, selecting the target user from the database as a target node in the user relationship network.

Illustratively, a node t may be randomly selected from the user relationship network shown in fig. 5, and an adjacent node v of t is selected as a target node, where t → v has a directed edge.

And S507, sequentially determining the next user node of the current tail node in the associated user node sequence of the target node until the length of the node sequence of the target user reaches the preset sequence length.

The embodiment of the present disclosure does not limit how to determine the next user node, and in some embodiments, the next user node may be determined by determining a normalized transition probability of the current end node and performing weighted sampling on the associated node of the current end node.

The weighted sample may be specifically an alias sample (alias sample).

For example, the normalized transition probability of all the neighbor nodes xi of the target node v may be calculated by pi, and the node sampling is performed based on the alias sampling, so as to obtain the next node xi, and this time sequence is (v, x 1). Then, repeating the above process for the last node of the node sequence of the target user to obtain the next user node, and obtaining (v, x)₁，x₂). By predefining the length of the sequence to be obtained as m +1, repeating the above process m times, the sequence result of the node v can be obtained: (v, x1, x2, … xm).

And S508, generating a related node array of the target node according to the related user node sequence of the target node.

Illustratively, by predefining the required sequence number M, M neighboring nodes of the node t may be selected, and respective user node sequences of the M neighboring nodes of the node t may be calculated accordingly. By combining the user node sequences of the M adjacent nodes of t, the associated node array of the target node can be obtained as follows:

v1，x11，x12，…x1m

v2，x21，x22，…x2m

…

vM，xM1，xM2，…xMm

s509, determining the multidimensional vector of the user in the database according to the associated node array of the target node.

Illustratively, the associated node array of the target node may be input into word2vec, and an n-dimensional (n can be set manually) embedding vector representation of each node is obtained, the format is as follows:

“id1 0.13716 0.05973 -0.05692 0.34796…

id2 0.55362 -0.24561 0.67832 0.89571…

…”

the embodiment of the disclosure provides a better application mode for relationship data (such as a stock right stock control relationship, a supply chain relationship and the like) among enterprises, so that the determination accuracy of transferring users is improved, and the problems that the prior banking industry is difficult to obtain customers and difficult to keep stock of customers are solved to a great extent.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Fig. 8 is a schematic structural diagram of an apparatus for determining type information of a user according to an embodiment of the present disclosure. The determination means of the type information of the user may be implemented by software, hardware, or a combination of both to execute the determination method of the type information of the user of the above embodiment. As shown in fig. 8, the apparatus 600 for determining the type information of the user includes: an acquisition module 601, a prediction module 602, and a determination module 603.

The obtaining module 601 is configured to obtain feature data of a user.

The prediction module 602 is configured to input feature data of a user into a first prediction model and a second prediction model respectively, and obtain a first conversion rate output by the first prediction model and a second conversion rate output by the second prediction model, where the first prediction model is used to predict a probability that target data of the user is generated under a preset execution condition, and the second prediction model is used to predict a probability that target data of the user is generated without applying the preset execution condition.

A determining module 603, configured to determine type information of the user according to the first conversion rate and the second conversion rate.

The device for determining the type information of the user provided in the embodiment of the present disclosure may perform the actions of the method for determining the type information of the user in the above embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 9, the electronic device may include: at least one processor 701 and a memory 702. Fig. 9 shows an electronic device as an example of a processor.

And a memory 702 for storing programs. In particular, the program may include program code including computer operating instructions.

The memory 702 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The processor 701 is configured to execute computer-executable instructions stored in the memory 702 to implement the method for determining the type information of the user;

the processor 701 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement the embodiments of the present disclosure.

Optionally, in a specific implementation, if the communication interface, the memory 702 and the processor 701 are implemented independently, the communication interface, the memory 702 and the processor 701 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. Buses may be classified as address buses, data buses, control buses, etc., but do not represent only one bus or type of bus.

Alternatively, in a specific implementation, if the communication interface, the memory 702 and the processor 701 are integrated into a chip, the communication interface, the memory 702 and the processor 701 may complete communication through an internal interface.

The embodiment of the disclosure also provides a chip, which comprises a processor and an interface. Wherein the interface is used for inputting and outputting data or instructions processed by the processor. The processor is configured to perform the methods provided in the above method embodiments.

The present disclosure also provides a computer-readable storage medium, which may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, are used, and specifically, the computer-readable storage medium stores program information, and the program information is used for determining the type information of the user.

The disclosed embodiments also provide a program that, when executed by a processor, is configured to perform the method for determining type information of a user provided by the above method embodiments.

The embodiment of the present disclosure further provides a program product, such as a computer-readable storage medium, having stored therein instructions, which when run on a computer, cause the computer to execute the method for determining the type information of the user provided by the above method embodiment.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the disclosure are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present disclosure, and not for limiting the same; while the present disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present disclosure.

Claims

1. A method for determining type information of a user, the method comprising:

acquiring characteristic data of a user;

2. The method of claim 1, wherein determining the type information of the user according to the first conversion rate and the second conversion rate comprises:

taking the difference value of the first conversion rate and the second conversion rate as a contribution value of the preset execution condition to the probability of generating the target data of the user;

3. The method according to claim 1, wherein the first prediction model is generated after training through a first sample set, and the first sample set includes feature data of a historical user and result data of generating target data of the historical user under the preset execution condition;

4. The method according to any of claims 1-3, wherein after determining the type information of the user based on the first conversion rate and the second conversion rate, the method further comprises:

5. The method of claim 4, wherein querying the database for a transition user of the target type of user based on the multidimensional vector of users comprises:

6. The method of claim 4, wherein querying the database for a transition user of the target type of user based on the multidimensional vector of users comprises:

training a similar population extension model using the third sample;

7. The method of claim 4, wherein prior to said querying a database for a transferring user of the target type of user based on the multidimensional vector of users, the method further comprises:

8. The method of claim 7, wherein determining the multidimensional vector of users in the database according to the association information between the users comprises:

9. The method of claim 8, wherein determining the next user node of the current end node in the sequence of associated user nodes of the target node comprises:

10. The method of claim 9, wherein the determining the normalized transition probability for the current end node comprises:

11. The method according to claim 10, wherein the determining the transition probability between the current end node and any associated user node according to the association information between the users comprises:

12. The method of claim 11, wherein the shortest path distance value includes a first value, a second value, and a third value; the value of the shortest path distance and the weight correction coefficient have a mapping relation;

13. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method according to any of claims 1-12.

14. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the method steps according to any of claims 1-12.