WO2023123933A1 - User type information determination method and device, and storage medium - Google Patents

User type information determination method and device, and storage medium Download PDF

Info

Publication number
WO2023123933A1
WO2023123933A1 PCT/CN2022/101734 CN2022101734W WO2023123933A1 WO 2023123933 A1 WO2023123933 A1 WO 2023123933A1 CN 2022101734 W CN2022101734 W CN 2022101734W WO 2023123933 A1 WO2023123933 A1 WO 2023123933A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
node
target
users
conversion rate
Prior art date
Application number
PCT/CN2022/101734
Other languages
French (fr)
Chinese (zh)
Inventor
张海川
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2023123933A1 publication Critical patent/WO2023123933A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present application relates to the field of science and technology finance, and in particular to a method, device and storage medium for determining user type information.
  • the customer's basic information (identity, transaction, asset, credit, purchase, etc.) is used as a feature, input into the machine learning model, and the predicted cumulative loan amount of the customer is output. Quota utilization rate. If the customer's cumulative loan usage rate exceeds a preset threshold, it indicates that the customer is a potential customer, thereby classifying the types of users.
  • the existing user type division method cannot reflect the user type under different marketing conditions, so that the effectiveness of the divided user type is not high.
  • the present application provides a method, device, and storage medium for determining user type information, so as to solve the problem of low validity of user types classified in the prior art.
  • the embodiment of the present application provides a method for determining user type information, the method including:
  • the first prediction model Inputting the characteristic data of the user into the first prediction model and the second prediction model respectively, obtaining the first conversion rate output by the first prediction model and the second conversion rate output by the second prediction model, the first The prediction model is used to predict the probability of generating the user's target data under preset execution conditions, and the second prediction model is used to predict the probability of generating the user's target data without applying the preset execution conditions;
  • the determining the type information of the user according to the first conversion rate and the second conversion rate includes:
  • the type information of the user is determined according to the first conversion rate, the second conversion rate and the contribution value.
  • the first prediction model is generated after training through a first sample set, and the first sample set contains characteristic data of historical users and under the preset execution conditions generating result data of the target data of the historical user;
  • the second predictive model is generated after training through a second sample set, the second sample set contains characteristic data of historical users and target data of historical users generated without applying the preset execution conditions result data.
  • the method further includes:
  • the multi-dimensional vector query the transfer user of the target type of user in the database, the multi-dimensional vector is used to characterize the association relationship between users in multiple dimensions, and the transfer user is subject to the preset execution condition Users who can expand by magnitude.
  • the querying the transfer user of the target type of user in the database according to the multidimensional vector of the user includes:
  • the querying the transfer user of the target type of user in the database according to the multidimensional vector of the user includes:
  • the method before querying the transfer user of the target type of user in the database according to the multidimensional vector of the user, the method further includes:
  • a multidimensional vector of a user in the database is determined according to the association information between the users.
  • the determining the multidimensional vector of the user in the database according to the association information between the users includes:
  • the determining the next user node of the current end node in the associated user node sequence of the target node includes:
  • the determining the normalized transition probability of the current end node includes:
  • the determining the transition probability between the current end node and any associated user node according to the association information between the users includes:
  • weight data between the current end node and any associated user node and the weight correction coefficient between the current end node and any associated user node determine the relationship between the current end node and any associated user node transition probability.
  • the value of the shortest path distance includes a first value, a second value, and a third value; there is a mapping relationship between the value of the shortest path distance and the weight correction coefficient ;
  • the value of the shortest path distance is the first value; if the associated user node and the current end node are adjacent nodes , then the value of the shortest path distance is the second value, if the associated user node is not the previous node of the current end node or the adjacent node of the current end node, then the shortest path
  • the value of the distance is the third value.
  • the embodiment of the present application provides an apparatus for determining user type information, including:
  • the obtaining module is used to obtain user characteristic data.
  • a prediction module configured to input the characteristic data of the user into the first prediction model and the second prediction model respectively, and acquire the first conversion rate output by the first prediction model and the second conversion rate output by the second prediction model , the first prediction model is used to predict the probability of generating the user’s target data under preset execution conditions, and the second prediction model is used to predict the probability of generating the user’s target data without applying the preset execution conditions The probability of the target data.
  • a determining module configured to determine the type information of the user according to the first conversion rate and the second conversion rate.
  • an embodiment of the present application provides an electronic device, including: a processor and a memory;
  • the memory stores a computer program
  • the computer program is suitable for being loaded by the processor and executing the method for determining user type information according to any one of the first aspect and its optional manners.
  • the embodiment of the present application provides a computer storage medium, the computer storage medium stores a plurality of instructions, and the instructions are suitable for being loaded and executed by a processor according to any one of the user's type information in the first aspect. Determine the method.
  • the method, device, and storage medium for determining user type information provided in the embodiments of the present application first obtain user characteristic data. Subsequently, the user's characteristic data are respectively input into the first prediction model and the second prediction model, and the first conversion rate output by the first prediction model and the second conversion rate output by the second prediction model are obtained.
  • the first prediction model is used to predict the The probability of generating the user's target data under preset execution conditions
  • the second prediction model is used to predict the probability of generating the user's target data without applying the preset execution conditions.
  • the type information of the user is determined. In this manner, user types can be effectively classified under preset execution conditions.
  • FIG. 1 is a schematic diagram of a scenario of an operating environment provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a method for determining user type information provided by an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a method for mining and transferring users provided in an embodiment of the present application
  • FIG. 4 is a schematic flowchart of another method for mining and transferring users provided by the embodiment of the present application.
  • FIG. 5 is a schematic diagram of an association relationship between user nodes provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a multidimensional vector for determining a user provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a transition probability between nodes provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an apparatus for determining user type information provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Inclusive finance is one of the important keynotes of the transformation and development of China's financial industry at present. Many commercial banks are creating new marketing models for online customer acquisition and operation. How to accurately locate potential customers of their own bank among a large number of people, recommend the most matching products to them, and improve the active stickiness of existing customers has become a key consideration for many banks.
  • the customer's cumulative loan usage rate sample by obtaining the customer's cumulative loan usage rate sample, the customer's basic information (identity, transaction, asset, credit, purchase, etc.) is used as a feature, input into the machine learning model, and the predicted customer's cumulative loan usage is output Rate. If the customer's cumulative loan usage rate exceeds a preset threshold, it indicates that the customer is a potential customer, thereby classifying the types of users.
  • the existing user type division method cannot reflect the user type under different marketing conditions, so that the effectiveness of the divided user type is not high.
  • an embodiment of the present disclosure provides a method, device, and storage medium for determining user type information, predicting the first conversion rate of the user's target data generated under preset execution conditions and the first conversion rate when no preset execution conditions are imposed. Next, generate the second conversion rate of the user's target data, and then determine the type information of the user based on the first conversion rate and the second conversion rate, so that the type of the user can be effectively classified under the preset execution condition.
  • FIG. 1 is a schematic diagram of a scenario of an operating environment provided by an embodiment of the present disclosure. As shown in FIG. 1 , it shows subjects who want to obtain user type information, such as enterprises 101, banking institutions 102, etc., and these subjects can request the system platform 103 to query user type information as needed.
  • user type information such as enterprises 101, banking institutions 102, etc.
  • the system platform 103 automatically initiates a query, and no more examples are given here.
  • the query request from each subject is provided to the system platform 103 through the network, and the system platform 103 is used to perform the task of determining the type information of the user.
  • the system platform 103 can not only include a query module for querying the type information of the user, but also can It includes a marketing module and a mining module, the marketing module provides different marketing schemes for different types of users, and the mining module is used to mine potential transfer users of target types of users.
  • the system platform 103 may also provide a database 104 during the process of querying the transferred user, and the database 104 includes the user to be queried, for example, an enterprise information base. It should be understood that the enterprise information base in the example environment is only exemplary, and other types of information bases all belong to the scope of protection of the present disclosure.
  • the above-mentioned subject that acquires user type information can use various devices to access the network, such as personal computers, servers, tablets, mobile phones, PDAs, notebooks or any other computing devices with networking capabilities.
  • the system platform 103 can be implemented by using a server or server group with stronger processing capability and higher security.
  • the networks used between them can include various types of wired and wireless networks, such as but not limited to: Internet, local area network, WIFI, WLAN, cellular communication network (GPRS, CDMA, 2G/3G/4G/5G cellular network ), satellite communication network, etc.
  • the above-mentioned method for determining user type information can be realized by the device for determining user type information provided in the embodiments of the present disclosure, and the device for determining user type information can be part or all of a certain device, such as a server or a server chip.
  • FIG. 2 is a schematic flowchart of a method for determining user type information provided by an embodiment of the present disclosure.
  • This embodiment relates to a process of how a server determines user type information.
  • the present disclosure separately predicts the first conversion rate of generating the user's target data under preset execution conditions and the first conversion rate of generating user's target data under no preset execution conditions.
  • the second conversion rate so as to determine the type information of the user according to the first conversion rate and the second conversion rate. Therefore, the method for determining user type information provided in the present disclosure can effectively classify user types under the influence of preset execution conditions on user types.
  • the method includes:
  • the server may obtain the characteristic data of the users.
  • the embodiment of the present disclosure does not limit how to obtain the characteristic data of the user.
  • the unique identification (ID) of the user may be obtained, and then the characteristic information related to the user may be determined according to the unique identification of the user. .
  • the users involved in the embodiments of the present disclosure may be enterprise users or individual users, which is not limited in the embodiments of the present disclosure.
  • the embodiments of the present disclosure do not limit the characteristic data of the user, which may be specifically determined according to actual conditions.
  • the characteristic data of the user may include but not limited to financial data, industrial and commercial data, regional data, equity data and abnormal business data.
  • financial data are used to represent total assets, owner's equity, investment income, operating income, non-operating income, total profit, main business income, net profit, total liabilities, total tax payment, operating costs, sales expenses, asset impairment loss etc.
  • Industrial and commercial data are used to represent company type, year of establishment, industry, enterprise status, registered capital, number of industrial and commercial changes, etc.
  • Territory data is used to represent provinces, cities, etc.
  • Equity data is used to represent the number of direct shareholders, the number of direct shareholders of natural persons, the shareholding ratio of direct shareholders of natural persons, the number of direct shareholders of non-natural persons, the shareholding ratio of direct shareholders of non-natural persons, etc.
  • Business abnormalities are used to represent the number of administrative penalties, business abnormalities, tax violations, jurys, and dishonesty executions.
  • S202 Input the user's feature data into the first prediction model and the second prediction model respectively, and obtain the first conversion rate output by the first prediction model and the second conversion rate output by the second prediction model.
  • the first prediction model is used to predict the The probability of generating the user's target data under preset execution conditions
  • the second prediction model is used to predict the probability of generating the user's target data without applying the preset execution conditions.
  • the characteristic data of the user may be input into the first prediction model and the second prediction model respectively, so as to obtain the first conversion rate and the second conversion rate.
  • the embodiments of the present disclosure do not limit the preset execution conditions and target data.
  • the preset execution conditions may be marketing to users, and correspondingly, the target data may be transaction data generated by users.
  • the first conversion rate and the second conversion rate can reflect the probability that the user is converted into a transaction behavior under the condition of being marketed and not being marketed.
  • the following describes how to construct the first prediction model and the second prediction model.
  • the first prediction model is generated after training through the first sample set, which contains the characteristic data of historical users and the result data of generating target data of historical users under preset execution conditions .
  • the second prediction model is generated after being trained through the second sample set, which contains characteristic data of historical users and result data of generating target data of historical users without applying preset execution conditions.
  • the server may divide all customers into two groups according to whether they are treated (treated): the treated group and the unmarketed group (control). For two groups of customers, the result data of whether they are transformed (responded) is used as the target label (label), and the training set and the verification set are divided.
  • the marketed group can be taken out separately, and the converted group can be used as label1, and the non-transformed group can be used as label0.
  • first prediction model and the second prediction model may be binary classification models.
  • S203 Determine user type information according to the first conversion rate and the second conversion rate.
  • the type information of the user may be determined according to the first conversion rate and the second conversion rate.
  • the server may use the difference between the first conversion rate and the second conversion rate as the preset execution condition for generating the user's target data.
  • the contribution value of the probability so as to determine the type information of the user according to the first conversion rate, the second conversion rate and the contribution value.
  • using the first prediction model and the second prediction model can predict the conversion rate p of the user in both marketing and non-marketing scenarios, so as to obtain the first conversion rate p treated in the marketing scenario and p in the non-marketing scenario
  • the server may determine that the user belongs to the first user type.
  • the server may determine that the user belongs to the second user type.
  • the server may determine that the user belongs to the third user type.
  • the server may determine that the user belongs to the fourth user type
  • the absolute value of the second threshold is equal to the absolute value of the third threshold.
  • the embodiment of the present disclosure does not limit the value of the first threshold, for example, the first threshold may be 0.5.
  • the embodiment of the present disclosure does not limit the value of the second threshold.
  • the second threshold thres is the threshold at which lift changes significantly, which may be a positive decimal less than 1 and closer to 0, and may be passed The quantile of the overall statistical distribution is determined, and correspondingly, the third threshold can be -thres.
  • p_treated>0.5, p_control ⁇ 0.5, lift>thres then the user is the first type of user. If p_treated>0.5, p_control>0.5, -thres ⁇ lift ⁇ thres, then the user is the second type of user. If p_treated ⁇ 0.5, p_control ⁇ 0.5, -thres ⁇ lift ⁇ thres, the user is a third type user. If p_treated ⁇ 0.5, p_control>0.5, lift ⁇ -thres, the user is the fourth type of user.
  • the probability of generating target data for users of the first user type increases when preset execution conditions are applied; the probability of generating target data for users of the second user type is higher than that when preset execution conditions are applied or not.
  • the upper limit of the target; the probability of the third user type generating target data is lower than the target lower limit when the preset execution condition is applied or not; the fourth user type is under the application of the preset execution condition The probability of generating target data is reduced.
  • the four types of users may correspondingly include marketing-sensitive groups, natural conversion groups, indifferent groups, and reactionary groups.
  • the proportion of marketing-sensitive groups is relatively low, but they are easily affected by marketing activities and have active behaviors. For this part of the users, we can further conduct stratified operations according to whether they are sensitive to prices, discounts, and profit concessions.
  • the indifferent group refers to users who have been lost and cannot be recovered through marketing, or users who rarely read marketing messages, and there is no need to continue to invest more marketing resources.
  • the reactionary group will be spontaneously active, but they will be more disgusted with marketing interruptions. To avoid marketing interruptions to this part of users, there is no need to invest marketing resources.
  • marketing-sensitive groups can be further divided, so that different marketing methods can be adopted based on the types of further divisions.
  • marketing sensitive groups can be further divided into two types.
  • the first type is price-sensitive. When there are marketing activities related to subsidies and discounts, there will be corresponding active behaviors.
  • the second type is the price-insensitive type, which is less affected by whether there is a subsidy or not. For this type of users, only regular marketing reminders are required. After the price-sensitive and price-insensitive samples are constructed, the classic classification method can be used to classify users, which will not be repeated here.
  • the average loan interest rate in the past 3 months is within the lowest N% of the group
  • the method for determining user type information provided by the embodiments of the present disclosure can classify users more finely, which can avoid waste of marketing resources. Combined with the marketing response model, price sensitivity model and other models, the stock and incremental users can be divided in detail, which helps the bank to allocate marketing resources to the most needy marketing-sensitive users.
  • the method for determining user type information provided by the embodiments of the present disclosure, firstly, user feature data is acquired. Subsequently, the user's characteristic data are respectively input into the first prediction model and the second prediction model, and the first conversion rate output by the first prediction model and the second conversion rate output by the second prediction model are obtained.
  • the first prediction model is used to predict the The probability of generating the user's target data under preset execution conditions
  • the second prediction model is used to predict the probability of generating the user's target data without applying the preset execution conditions.
  • the type information of the user is determined. In this manner, user types can be effectively classified under preset execution conditions.
  • FIG. 3 is a schematic flowchart of a method for mining and transferring users provided by an embodiment of the present disclosure. As shown in FIG. 3 , the method includes:
  • the target type of users may be the above-mentioned second type of users, that is, a natural conversion group.
  • the multidimensional vector (embedding) vectors of two users ui and uj are respectively (x i1 , x i2 , . . . , x in ) and (x j1 , x j2 , . . . , x jn ), then users ui,
  • the cosine similarity cos(u i ,u j ) between uj can be determined by formula (1).
  • the server may determine the multidimensional vector of the user in the database according to the association information between users.
  • the server may first select a target user from the database as a target node in the user relationship network. Secondly, the server may sequentially determine the next user node of the current end node in the associated user node sequence of the target node until the length of the node sequence of the target user reaches a preset sequence length. Again, the server can generate an associated node array of the target node according to the associated user node sequence of the target node. Finally, the server can determine the multidimensional vector of the user in the database according to the associated node array of the target node.
  • the transition probability between the current end node and any associated user node can be determined first according to the association information between users. Then, after normalizing the transition probability between the current end node and any associated user node, the normalized transition probability of the current end node is determined.
  • the server may first generate the current end node and any associated user according to the association information between users and the user identification. Weight data between nodes. Subsequently, the server determines the weight correction coefficient between the current end node and any associated user node according to the value of the shortest path distance between the current end node and any associated user node in the user relationship network. Finally, the server determines the transition probability between the current end node and any associated user node according to the weight data between the current end node and any associated user node and the weight correction coefficient between the current end node and any associated user node .
  • the value of the shortest path distance includes the first value, the second value and the third value; there is a mapping relationship between the value of the shortest path distance and the weight correction coefficient;
  • the value of the shortest path distance is the first value; if the associated user node and the current end node are adjacent nodes, the value of the shortest path distance is the second value value, if the associated user node is not the previous node of the current end node or the adjacent node of the current end node, the value of the shortest path distance is the third value.
  • the transfer user can be understood as a user who can perform marketing conversion.
  • the embodiment of the present disclosure does not limit how to determine whether the user to be queried is a transfer user according to the cosine similarity.
  • the greater the cosine similarity distance between two users the more similar they are.
  • the marketing target customer can be found by looking for the customer with the smallest cosine similarity distance with the target type of user.
  • the method for mining and transferring users in FIG. 3 may be applicable to a situation where the magnitude of the target type of users is small, for example, less than or equal to 200 people.
  • the method shown in FIG. 4 may be used.
  • FIG. 4 is a schematic flowchart of another method for mining and transferring users provided by an embodiment of the present disclosure. As shown in FIG. 4 , the method for mining and transferring users includes:
  • model training can be performed on sample users in the training set and verification set using their equity holding embedding vectors as features (XgBoost/LR feature combination can be used, etc.), and the binary classification model lookalike.model can be saved.
  • XgBoost/LR feature combination can be used, etc.
  • a large number of users who also have embedding vector features can be used as the users to be queried, so that the trained lookalike.model model can be used to make predictions, and the probability score of the converted population of the users to be queried can be obtained.
  • the embodiment of the present disclosure does not limit how to determine whether the user to be queried is a transferred user according to the population conversion probability.
  • the population switching probability may be compared to a threshold.
  • score ⁇ thres it can be determined that the user to be queried is a transfer user, and if score ⁇ thres, it can be determined that the user to be queried is not a transfer user.
  • the server can determine the multidimensional vector of the users in the database according to the association information between users. The following describes how to determine the dimension vector of the user.
  • FIG. 5 is a schematic diagram of an association relationship between user nodes provided by an embodiment of the present disclosure. As shown in Figure 5, all the nodes in the figure represent a company, and the edges between nodes represent the holding relationship, from the investment company to the holding company, and the weight of the edge represents the holding ratio.
  • the enterprises corresponding to the two nodes can contain two similar relationships.
  • the first kind of similarity relationship considering that enterprise u and s1, s2, s3, and s4 are neighbors, it can be considered that there is a certain similarity between enterprise u and enterprises s1, s2, s3, and s4, which is called homogeneity .
  • the second kind of similarity relationship, u and s6 are both central nodes of the corresponding subgraph, and have the largest degree in the corresponding subgraph, and they also have a certain similarity, which can be called structural similarity.
  • the node2vec algorithm uses a random walk method, which can take into account both depth-first traversal and breadth-first traversal, and generates a traversal node queue composed of nodes. Then traverse the node queue as the context, and use the skip-gram method to obtain the embedding word vector representation of each node.
  • Fig. 6 is a schematic diagram of determining a user's multidimensional vector provided by an embodiment of the present disclosure. As shown in Fig. 6, the method includes:
  • this embodiment of the present disclosure does not limit how to encode the user's name, and the encoding may be performed according to a preset encoding sequence.
  • exemplary, "Shenzhen Qianhai WeBank Co., Ltd.” may be coded as s5.
  • the weight data between user nodes may include a start node, an end node and a weight system.
  • the weight data may be, for example, "u s1 0.7", “u s2 0.35", “u s3 0.65" and so on.
  • the value of the shortest path distance includes the first value, the second value and the third value; there is a mapping relationship between the value of the shortest path distance and the weight correction coefficient;
  • the value of the shortest path distance is the first value; if the associated user node and the current end node are adjacent nodes, the value of the shortest path distance is the second value value, if the associated user node is not the previous node of the current end node or the adjacent node of the current end node, the value of the shortest path distance is the third value.
  • the current node is v
  • the previous node of v is t (t ⁇ v has a directed edge)
  • the weight correction coefficient can be defined as formula (2) Shown:
  • the transition probability between nodes can be determined by formula (3).
  • w vx is the weight of the node v and x in the user relationship network
  • ⁇ (v, x) is the transition probability from node v to node x.
  • FIG. 7 is a schematic diagram of a transition probability between nodes provided by an embodiment of the present disclosure.
  • s7 is the current node
  • s6 is the previous node of s7.
  • S505. Determine the normalized probability between nodes according to the transition probability between nodes.
  • the transition probability ⁇ i is obtained, and the transition probability is normalized
  • a node t can be randomly selected from the user relationship network shown in FIG. 5 , and an adjacent node v of t can be selected as the target node, where t ⁇ v has a directed edge.
  • the embodiment of the present disclosure does not limit how to determine the next user node.
  • the next user node can be determined by determining the normalized transition probability of the current end node and performing weighted sampling on the associated nodes of the current end node.
  • the weighted sampling may specifically be an alias sample (alias sample).
  • the length of the sequence to be obtained is predefined as m+1, and the above process is repeated m times to obtain the sequence result of node v: (v, x 1 , x 2 , ... x m ).
  • M adjacent nodes of node t may be selected, and respective user node sequences of the M adjacent nodes of t may be calculated accordingly.
  • the associated node array of the target node can be obtained as follows:
  • S509 Determine the multidimensional vector of the user in the database according to the associated node array of the target node.
  • the associated node array of the target node can be input into word2vec to obtain the n-dimensional (n can be manually set) embedding vector representation of each node, the format is as follows:
  • the embodiment of the present disclosure proposes a better application method for relational data between enterprises (such as equity holding relationship, supply chain relationship, etc.), thereby improving the determination accuracy of transfer users, and can solve the current problems to a large extent. Difficulty in obtaining customers in the banking industry and difficulty in mobilizing existing customers.
  • enterprises such as equity holding relationship, supply chain relationship, etc.
  • FIG. 8 is a schematic structural diagram of an apparatus for determining user type information provided by an embodiment of the present disclosure.
  • the device for determining the type information of the user may be implemented by software, hardware or a combination of the two, so as to execute the method for determining the type information of the user in the foregoing embodiments.
  • the device 600 for determining type information of the user includes: an acquisition module 601 , a prediction module 602 and a determination module 603 .
  • An acquisition module 601, configured to acquire user characteristic data.
  • the forecasting module 602 is configured to input the characteristic data of the user into the first forecasting model and the second forecasting model respectively, obtain the first conversion rate output by the first forecasting model and the second conversion rate outputted by the second forecasting model, and the first forecasting model
  • the second prediction model is used for predicting the probability of generating the user's target data under preset execution conditions
  • the second prediction model is used for predicting the probability of generating the user's target data without applying the preset execution conditions.
  • a determining module 603, configured to determine user type information according to the first conversion rate and the second conversion rate.
  • the device for determining user type information provided in the embodiments of the present disclosure can perform the actions of the method for determining user type information in the above embodiments, and its implementation principle and technical effect are similar, and will not be repeated here.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure. As shown in FIG. 9 , the electronic device may include: at least one processor 701 and a memory 702 . FIG. 9 shows an electronic device with a processor as an example.
  • the memory 702 is used to store programs.
  • the program may include program code, and the program code includes computer operation instructions.
  • the memory 702 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the processor 701 is configured to execute the computer-executed instructions stored in the memory 702, so as to realize the method for determining the above-mentioned user type information;
  • the processor 701 may be a central processing unit (Central Processing Unit, referred to as CPU), or a specific integrated circuit (Application Specific Integrated Circuit, referred to as ASIC), or is configured to implement one or multiple integrated circuits.
  • CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • the communication interface, memory 702 and processor 701 may be connected to each other through a bus to complete mutual communication.
  • the bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into address bus, data bus, control bus, etc., but it does not mean that there is only one bus or one type of bus.
  • the communication interface, memory 702 and processor 701 may complete communication through an internal interface.
  • the embodiment of the present disclosure also provides a chip, including a processor and an interface.
  • the interface is used to input and output data or instructions processed by the processor.
  • the processor is configured to execute the methods provided in the above method embodiments.
  • the present disclosure also provides a computer-readable storage medium, which may include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory) ), a magnetic disk or an optical disk, and other media that can store program codes.
  • the computer-readable storage medium stores program information, and the program information is used in the method for determining the above-mentioned user type information.
  • An embodiment of the present disclosure further provides a program, which is used to execute the method for determining user type information provided by the above method embodiments when executed by a processor.
  • An embodiment of the present disclosure also provides a program product, such as a computer-readable storage medium, in which an instruction is stored, and when it is run on a computer, the computer executes the method for determining user type information provided by the above-mentioned method embodiment .
  • a program product such as a computer-readable storage medium, in which an instruction is stored, and when it is run on a computer, the computer executes the method for determining user type information provided by the above-mentioned method embodiment .
  • a computer program product includes one or more computer instructions.
  • Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. Coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
  • DSL digital subscriber line
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, etc. integrated with one or more available media.
  • Available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)).

Abstract

The present application provides a user type information determination method and device, and a storage medium, and belongs to the field of Techfin. The method comprises: acquiring feature data of a user; respectively inputting the feature data of the user into a first prediction model and a second prediction model, acquiring a first conversion rate output by the first prediction model and a second conversion rate output by the second prediction model, the first prediction model being used for predicting a probability that target data of the user is generated under a preset execution condition, and the second prediction model being used for predicting a probability that target data of the user is generated without the preset execution condition; and determining type information of the user according to the first conversion rate and the second conversion rate. By means of this mode, types of users may be effectively classified under the preset execution condition.

Description

用户的类型信息的确定方法、设备及存储介质Method, device and storage medium for determining user type information
本申请要求于2021年12月30日提交中国专利局、申请号为202111655734.X、申请名称为“用户的类型信息的确定方法、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111655734.X and the application title "Method, device and storage medium for determining user type information" submitted to the China Patent Office on December 30, 2021, the entire content of which Incorporated in this application by reference.
技术领域technical field
本申请涉及科技金融领域,尤其涉及一种用户的类型信息的确定方法、设备及存储介质。The present application relates to the field of science and technology finance, and in particular to a method, device and storage medium for determining user type information.
背景技术Background technique
随着计算机技术的发展,越来越多的技术应用在金融领域,传统金融业正在逐步向金融科技(Finteh)转变,用户的类型信息的确定技术也不例外,但由于金融行业的安全性、实时性要求,也对技术提出的更高的要求。With the development of computer technology, more and more technologies are applied in the financial field. The traditional financial industry is gradually transforming into financial technology (Finteh). The determination technology of user type information is no exception. However, due to the security of the financial industry, Real-time requirements also put forward higher requirements for technology.
相关技术中,通过获取客户的累计额度使用率样本,将客户的基本信息(身份、交易、资产、授信、购买等)作为特征,输入到机器学习模型中,输出预测的所述客户的贷款累计额度使用率。若客户的贷款累计额度使用率超过预设阈值,则表示该客户为潜在客户,从而对用户的类型进行划分。In the related technology, by obtaining the sample of the customer's cumulative loan usage rate, the customer's basic information (identity, transaction, asset, credit, purchase, etc.) is used as a feature, input into the machine learning model, and the predicted cumulative loan amount of the customer is output. Quota utilization rate. If the customer's cumulative loan usage rate exceeds a preset threshold, it indicates that the customer is a potential customer, thereby classifying the types of users.
然而,现有的用户的类型的划分方式中,无法反映不同营销条件施加下的用户类型,从而使得划分出的用户的类型的有效性不高。However, the existing user type division method cannot reflect the user type under different marketing conditions, so that the effectiveness of the divided user type is not high.
发明内容Contents of the invention
本申请提供一种用户的类型信息的确定方法、设备及存储介质,以解决现有技术中划分出的用户的类型的有效性不高的问题。The present application provides a method, device, and storage medium for determining user type information, so as to solve the problem of low validity of user types classified in the prior art.
第一方面,本申请实施例提供一种用户的类型信息的确定方法,所述方法包括:In the first aspect, the embodiment of the present application provides a method for determining user type information, the method including:
获取用户的特征数据;Obtain the user's characteristic data;
将所述用户的特征数据分别输入第一预测模型和第二预测模型,获取所述第一预测模型输出的第一转化率以及所述第二预测模型输出的第二转化率,所述第一预测模型用于预测在预设执行条件下生成所述用户的目标数据的概率,所述第二预测模型用于预测在不施加所述预设执行条件下生成所述用户的目标数据的概率;Inputting the characteristic data of the user into the first prediction model and the second prediction model respectively, obtaining the first conversion rate output by the first prediction model and the second conversion rate output by the second prediction model, the first The prediction model is used to predict the probability of generating the user's target data under preset execution conditions, and the second prediction model is used to predict the probability of generating the user's target data without applying the preset execution conditions;
根据所述第一转化率和所述第二转化率,确定所述用户的类型信息。Determine the type information of the user according to the first conversion rate and the second conversion rate.
一种可选的实施方式中,所述根据所述第一转化率和所述第二转化率,确定所述用户的类型信息,包括:In an optional implementation manner, the determining the type information of the user according to the first conversion rate and the second conversion rate includes:
将所述第一转化率和所述第二转化率的差值作为所述预设执行条件对生成所述用户的目标数据的概率Taking the difference between the first conversion rate and the second conversion rate as the probability of generating the target data of the user for the preset execution condition
的贡献值;contribution value;
根据所述第一转化率、所述第二转化率以及所述贡献值,确定所述用户的类型信息。The type information of the user is determined according to the first conversion rate, the second conversion rate and the contribution value.
一种可选的实施方式中,所述第一预测模型是通过第一样本集训练后生成的,所述第一样本集中包含有历史用户的特征数据以及在所述预设执行条件下生成所述历史用户的目标数据的结果数据;In an optional implementation manner, the first prediction model is generated after training through a first sample set, and the first sample set contains characteristic data of historical users and under the preset execution conditions generating result data of the target data of the historical user;
所述第二预测模型是通过第二样本集训练后生成的,所述第二样本集中包含有历史用户的特征数据以及在不施加所述预设执行条件下生成所述历史用户的目标数据的结果数据。The second predictive model is generated after training through a second sample set, the second sample set contains characteristic data of historical users and target data of historical users generated without applying the preset execution conditions result data.
一种可选的实施方式中,在根据所述第一转化率和所述第二转化率,确定所述用户的类型信息之后,所述方法还包括:In an optional implementation manner, after determining the type information of the user according to the first conversion rate and the second conversion rate, the method further includes:
根据用户的多维向量,在数据库中查询目标类型的用户的转移用户,所述多维向量用于表征在多个维度下用户之间的关联关系,所述转移用户为被施加所述预设执行条件以进行量级拓展的用户。According to the user's multi-dimensional vector, query the transfer user of the target type of user in the database, the multi-dimensional vector is used to characterize the association relationship between users in multiple dimensions, and the transfer user is subject to the preset execution condition Users who can expand by magnitude.
一种可选的实施方式中,所述根据用户的多维向量,在数据库中查询目标类型的用户的转移用户,包括:In an optional implementation manner, the querying the transfer user of the target type of user in the database according to the multidimensional vector of the user includes:
根据所述数据库中待查询的用户的多维向量和所述目标类型的用户的多维向量之间的余弦相似度,确定所述待查询的用户是否为所述转移用户。According to the cosine similarity between the multidimensional vector of the user to be queried in the database and the multidimensional vector of the user of the target type, determine whether the user to be queried is the transferred user.
一种可选的实施方式中,所述根据用户的多维向量,在数据库中查询目标类型的用户的转移用户,包括:In an optional implementation manner, the querying the transfer user of the target type of user in the database according to the multidimensional vector of the user includes:
根据预设的采样比例,对所述目标类型的用户的多维向量和非目标类型的用户的多维向量进行采样,生成第三样本集,所述目标类型的用户的多维向量为所述第三样本集的正样本,所述非目标类型的用户的多维向量为所述第三样本集的负样本;Sampling the multidimensional vectors of users of the target type and the multidimensional vectors of users of non-target types according to a preset sampling ratio to generate a third sample set, where the multidimensional vectors of users of the target type are the third samples The positive sample of the set, the multidimensional vector of the user of the non-target type is a negative sample of the third sample set;
使用所述第三样本对相似人群扩展模型进行训练;using the third sample to train the similar population extension model;
将所述数据库中待查询的用户的多维向量输入训练后的相似人群扩展模型中,并获取所述训练后的相似人群扩展模型输出的人群转换概率;Inputting the multidimensional vector of the user to be queried in the database into the trained similar crowd expansion model, and obtaining the crowd conversion probability output by the trained similar crowd expansion model;
根据所述人群转换概率,确定所述待查询的用户是否为所述转移用户。Determine whether the user to be queried is the transferred user according to the population conversion probability.
一种可选的实施方式中,在所述根据用户的多维向量,在数据库中查询目标类型的用户的转移用户之前,所述方法还包括:In an optional implementation manner, before querying the transfer user of the target type of user in the database according to the multidimensional vector of the user, the method further includes:
根据所述用户之间的关联信息,确定所述数据库中的用户的多维向量。A multidimensional vector of a user in the database is determined according to the association information between the users.
一种可选的实施方式中,所述根据所述用户之间的关联信息,确定所述数据库中用户的多维向量,包括:In an optional implementation manner, the determining the multidimensional vector of the user in the database according to the association information between the users includes:
从所述数据库中选择目标用户作为用户关系网中的目标节点;selecting a target user from the database as a target node in the user relationship network;
依次确定所述目标节点的关联用户节点序列中的当前末尾节点的下一用户节点,直至所述目标用户的节点序列的长度达到预设的序列长度;sequentially determining the next user node of the current end node in the associated user node sequence of the target node until the length of the node sequence of the target user reaches a preset sequence length;
根据所述目标节点的关联用户节点序列,生成所述目标节点的关联节点阵列;generating an associated node array of the target node according to the associated user node sequence of the target node;
根据所述目标节点的关联节点阵列,确定所述数据库中用户的多维向量。Determine the multidimensional vector of the user in the database according to the associated node array of the target node.
一种可选的实施方式中,所述确定所述目标节点的关联用户节点序列中的当前末尾节点的下一用户节点,包括:In an optional implementation manner, the determining the next user node of the current end node in the associated user node sequence of the target node includes:
确定所述当前末尾节点的归一化转移概率并对所述当前末尾节点的关联节点进行加权采样,确定所述下一用户节点。Determining the normalized transition probability of the current end node and performing weighted sampling on associated nodes of the current end node to determine the next user node.
一种可选的实施方式中,所述确定所述当前末尾节点的归一化转移概率,包括:In an optional implementation manner, the determining the normalized transition probability of the current end node includes:
根据所述用户之间的关联信息,确定所述当前末尾节点与任一关联用户节点之间的转移概率;According to the association information between the users, determine the transition probability between the current end node and any associated user node;
归一化所述当前末尾节点与任一关联用户节点之间的转移概率,确定所述当前末尾节点的归一化转移概率。Normalize the transition probability between the current end node and any associated user node, and determine the normalized transition probability of the current end node.
一种可选的实施方式中,所述根据所述用户之间的关联信息,确定所述当前末尾节点与任一关联用户节点之间的转移概率,包括:In an optional implementation manner, the determining the transition probability between the current end node and any associated user node according to the association information between the users includes:
根据所述用户之间的关联信息以及所述用户的标识,生成所述当前末尾节点与任一关联用户节点之间的权重数据;generating weight data between the current end node and any associated user node according to the association information between the users and the identifier of the user;
根据所述当前末尾节点与任一关联用户节点在用户关系网中最短路径距离的取值,确定所述当前末尾节点与任一关联用户节点之间的权重修正系数;Determine the weight correction coefficient between the current end node and any associated user node according to the value of the shortest path distance between the current end node and any associated user node in the user relationship network;
根据所述当前末尾节点与任一关联用户节点之间的权重数据和所述当前末尾节点与任一关联用户节点之间的权重修正系数,确定所述当前末尾节点与任一关联用户节点之间的转移概率。According to the weight data between the current end node and any associated user node and the weight correction coefficient between the current end node and any associated user node, determine the relationship between the current end node and any associated user node transition probability.
一种可选的实施方式中,所述最短路径距离的取值包括第一取值、第二取值和第三取值;所述最短路径距离的取值与所述权重修正系数存在映射关系;In an optional implementation manner, the value of the shortest path distance includes a first value, a second value, and a third value; there is a mapping relationship between the value of the shortest path distance and the weight correction coefficient ;
若所述关联用户节点为所述当前末尾节点的上一节点,则所述最短路径距离的取值为所述第一取值,若所述关联用户节点与所述当前末尾节点为相邻节点,则所述最短路径距离的取值为所述第二取值,若所述关联用户节点不是所述当前末尾节点的上一节点或所述当前末尾节点的相邻节点,则所述最短路径距离的取值为所述第三取值。If the associated user node is the previous node of the current end node, the value of the shortest path distance is the first value; if the associated user node and the current end node are adjacent nodes , then the value of the shortest path distance is the second value, if the associated user node is not the previous node of the current end node or the adjacent node of the current end node, then the shortest path The value of the distance is the third value.
第二方面,本申请实施例提供一种用户的类型信息的确定装置,包括:In the second aspect, the embodiment of the present application provides an apparatus for determining user type information, including:
获取模块,用于获取用户的特征数据。The obtaining module is used to obtain user characteristic data.
预测模块,用于将所述用户的特征数据分别输入第一预测模型和第二预测模型,获取所述第一预测模型输出的第一转化率以及所述第二预测模型输出的第二转化率,所述第一预测模型用于预测在预设执行条件下生成所述用户的目标数据的概率,所述第二预测模型用于预测在不施加所述预设执行条件下生成所述用户的目标数据的概率。A prediction module, configured to input the characteristic data of the user into the first prediction model and the second prediction model respectively, and acquire the first conversion rate output by the first prediction model and the second conversion rate output by the second prediction model , the first prediction model is used to predict the probability of generating the user’s target data under preset execution conditions, and the second prediction model is used to predict the probability of generating the user’s target data without applying the preset execution conditions The probability of the target data.
确定模块,用于根据所述第一转化率和所述第二转化率,确定所述用户的类型信息。A determining module, configured to determine the type information of the user according to the first conversion rate and the second conversion rate.
第三方面,本申请实施例提供一种电子设备,包括:处理器和存储器;In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory;
其中,所述存储器存储有计算机程序,所述计算机程序适于由所述处理器加载并执行如第一方面及其可选方式中任意一项的用户的类型信息的确定方法。Wherein, the memory stores a computer program, and the computer program is suitable for being loaded by the processor and executing the method for determining user type information according to any one of the first aspect and its optional manners.
第四方面,本申请实施例提供一种计算机存储介质,所述计算机存储介质存储有多条指令,所述指令适于由处理器加载并执行如第一方面任意一项的用户的类型信息的确定方法。In the fourth aspect, the embodiment of the present application provides a computer storage medium, the computer storage medium stores a plurality of instructions, and the instructions are suitable for being loaded and executed by a processor according to any one of the user's type information in the first aspect. Determine the method.
本申请实施例提供的用户的类型信息的确定方法、设备及存储介质,首先获取用户的特征数据。随后,将用户的特征数据分别输入第一预测模型和第二预测模型,获取第一预测模型输出的第一转化率以及第二预测模型输出的第二转化率,第一预测模型用于预测在预设执行条件下生成用户的目标数据的概率,第二预测模型用于预测在不施加预设执行条件下生成用户的目标数据的概率。最后,根据第一转化率和第二转化率,确定用户的类型信息。通过该方式,可以在预设执行条件下有效对用户的类型进行划分。The method, device, and storage medium for determining user type information provided in the embodiments of the present application first obtain user characteristic data. Subsequently, the user's characteristic data are respectively input into the first prediction model and the second prediction model, and the first conversion rate output by the first prediction model and the second conversion rate output by the second prediction model are obtained. The first prediction model is used to predict the The probability of generating the user's target data under preset execution conditions, and the second prediction model is used to predict the probability of generating the user's target data without applying the preset execution conditions. Finally, according to the first conversion rate and the second conversion rate, the type information of the user is determined. In this manner, user types can be effectively classified under preset execution conditions.
附图说明Description of drawings
为了更清楚地说明本申请或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in this application or the prior art, the accompanying drawings that need to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the accompanying drawings in the following description are the present For some embodiments of the application, those of ordinary skill in the art can also obtain other drawings based on these drawings without any creative effort.
图1为本申请实施例提供的一种运行环境的场景示意图;FIG. 1 is a schematic diagram of a scenario of an operating environment provided by an embodiment of the present application;
图2为本申请实施例提供的一种用户的类型信息的确定方法的流程示意图;FIG. 2 is a schematic flowchart of a method for determining user type information provided by an embodiment of the present application;
图3为本申请实施例提供的一种挖掘转移用户的方法的流程示意图;FIG. 3 is a schematic flowchart of a method for mining and transferring users provided in an embodiment of the present application;
图4为本申请实施例提供的另一种挖掘转移用户的方法的流程示意图;FIG. 4 is a schematic flowchart of another method for mining and transferring users provided by the embodiment of the present application;
图5为本申请实施例提供的一种用户节点之间的关联关系示意图;FIG. 5 is a schematic diagram of an association relationship between user nodes provided by an embodiment of the present application;
图6为本申请实施例提供的一种确定用户的多维向量的示意图;FIG. 6 is a schematic diagram of a multidimensional vector for determining a user provided by an embodiment of the present application;
图7为本申请实施例提供的一种节点之间转移概率的示意图;FIG. 7 is a schematic diagram of a transition probability between nodes provided by an embodiment of the present application;
图8为本申请实施例提供的一种用户的类型信息的确定装置的结构示意图;FIG. 8 is a schematic structural diagram of an apparatus for determining user type information provided by an embodiment of the present application;
图9为本申请实施例提供的一种电子设备的结构示意图。FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments It is a part of the embodiments of the present disclosure, but not all of them. Based on the embodiments in the present disclosure, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present disclosure.
普惠金融是目前中国金融业转型发展的重要基调之一,很多商业银行正在打造线上获客、运营的新营销模式。如何在海量人群中精准定位到自己银行的潜在客户,并向其推荐最匹配的产品,提升存量客户活跃黏性,成为很多银行重点考虑的问题。Inclusive finance is one of the important keynotes of the transformation and development of China's financial industry at present. Many commercial banks are creating new marketing models for online customer acquisition and operation. How to accurately locate potential customers of their own bank among a large number of people, recommend the most matching products to them, and improve the active stickiness of existing customers has become a key consideration for many banks.
相关技术中,通过获取客户的累计额度使用率样本,将客户的基本信息(身份、交易、资产、授信、购买等)作为特征,输入到机器学习模型中,输出预测的客户的贷款累计额度使用率。若客户的贷款累计额度使用率超过预设阈值,则表示该客户为潜在客户,从而对用户的类型进行划分。然而,现有的用户的类型的划分方式中,无法反映不同营销条件施加下的用户类型,从而使得划分出的用户的类型的有效性不高。In the related technology, by obtaining the customer's cumulative loan usage rate sample, the customer's basic information (identity, transaction, asset, credit, purchase, etc.) is used as a feature, input into the machine learning model, and the predicted customer's cumulative loan usage is output Rate. If the customer's cumulative loan usage rate exceeds a preset threshold, it indicates that the customer is a potential customer, thereby classifying the types of users. However, the existing user type division method cannot reflect the user type under different marketing conditions, so that the effectiveness of the divided user type is not high.
为解决上述问题,本公开实施例提供一种用户的类型信息的确定方法、设备及存储介质,预测在预设执行条件下生成用户的目标数据的第一转化率和在不施加预设执行条件下生成用户的目标数据的第二转化率,再基于第一转化率和第二转化率确定用户的类型信息,从而可以在预设执行条件下有效对用户的类型进行划分。In order to solve the above problems, an embodiment of the present disclosure provides a method, device, and storage medium for determining user type information, predicting the first conversion rate of the user's target data generated under preset execution conditions and the first conversion rate when no preset execution conditions are imposed. Next, generate the second conversion rate of the user's target data, and then determine the type information of the user based on the first conversion rate and the second conversion rate, so that the type of the user can be effectively classified under the preset execution condition.
在描述本公开的用户的类型信息的确定方法之前,先根据图1来了解本公开的示例运行环境。Before describing the method for determining the type information of the user in the present disclosure, first understand the example operating environment of the present disclosure according to FIG. 1 .
图1为本公开实施例提供的一种运行环境的场景示意图。如图1所示,示出了想要获取用户的类型信息的主体,例如企业101、银行机构102等等,这些主体可以根据需要向 系统平台103请求查询用户的类型信息。当然,上述主体仅仅是为了示例说明,实际上还有其他主体可以发起查询,例如系统平台103自动发起查询,在此不再一一举例。来自各个主体的查询请求通过网络被提供给系统平台103,该系统平台103用于执行用户的类型信息的确定任务,该系统平台103不但可以包括用于查询用户的类型信息的查询模块,还可以包括营销模块和挖掘模块,该营销模块用户针对不同类型的用户提供不同的营销方案,该挖掘模块用于挖掘目标类型的用户的潜在转移用户。此外,在查询转移用户的过程中系统平台103还可以提供数据库104,该数据库104中包含有待查询的用户,例如,企业信息库。应该理解,在示例环境中的企业信息库仅仅是示例性的,其他类型的信息库都属于本公开的保护范畴。并且,在上述示例运行场景中,上述获取用户的类型信息的主体可以使用各种设备访问网络,例如个人计算机、服务器、平板、手机、PDA、笔记本或其它任何具有联网功能的计算设备。而系统平台103则可以利用具有更强大处理能力和更高安全性的一个服务器或服务器组来实现。而它们之间所使用的网络可以包括各种类型的有线和无线网络,例如但不局限于:互联网、局域网、WIFI、WLAN、蜂窝通信网络(GPRS、CDMA、2G/3G/4G/5G蜂窝网络)、卫星通信网络等等。FIG. 1 is a schematic diagram of a scenario of an operating environment provided by an embodiment of the present disclosure. As shown in FIG. 1 , it shows subjects who want to obtain user type information, such as enterprises 101, banking institutions 102, etc., and these subjects can request the system platform 103 to query user type information as needed. Of course, the above-mentioned subject is only for illustration. In fact, there are other subjects that can initiate a query, for example, the system platform 103 automatically initiates a query, and no more examples are given here. The query request from each subject is provided to the system platform 103 through the network, and the system platform 103 is used to perform the task of determining the type information of the user. The system platform 103 can not only include a query module for querying the type information of the user, but also can It includes a marketing module and a mining module, the marketing module provides different marketing schemes for different types of users, and the mining module is used to mine potential transfer users of target types of users. In addition, the system platform 103 may also provide a database 104 during the process of querying the transferred user, and the database 104 includes the user to be queried, for example, an enterprise information base. It should be understood that the enterprise information base in the example environment is only exemplary, and other types of information bases all belong to the scope of protection of the present disclosure. Moreover, in the above example operation scenario, the above-mentioned subject that acquires user type information can use various devices to access the network, such as personal computers, servers, tablets, mobile phones, PDAs, notebooks or any other computing devices with networking capabilities. The system platform 103 can be implemented by using a server or server group with stronger processing capability and higher security. The networks used between them can include various types of wired and wireless networks, such as but not limited to: Internet, local area network, WIFI, WLAN, cellular communication network (GPRS, CDMA, 2G/3G/4G/5G cellular network ), satellite communication network, etc.
可以理解,上述用户的类型信息的确定方法可以通过本公开实施例提供的用户的类型信息的确定装置实现,用户的类型信息的确定装置可以是某个设备的部分或全部,例如为服务器或服务器的芯片。It can be understood that the above-mentioned method for determining user type information can be realized by the device for determining user type information provided in the embodiments of the present disclosure, and the device for determining user type information can be part or all of a certain device, such as a server or a server chip.
下面以集成或安装有相关执行代码的服务器为例,以具体地实施例对本公开实施例的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。The technical solutions of the embodiments of the present disclosure will be described in detail below with specific embodiments by taking a server integrated or installed with related execution codes as an example. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.
图2为本公开实施例提供的一种用户的类型信息的确定方法的流程示意图,本实施例涉及的是服务器如何确定用户的类型信息的过程。与现有的用户的类型信息的确定方法不同的是,本公开分别预测在预设执行条件下生成用户的目标数据的第一转化率和在不施加预设执行条件下生成用户的目标数据的第二转化率,从而根据第一转换率和第二转化率确定用户的类型信息。因此,本公开提供的用户的类型信息的确定方法,可以在预设执行条件对于用户类型的影响下,有效对用户的类型进行划分。FIG. 2 is a schematic flowchart of a method for determining user type information provided by an embodiment of the present disclosure. This embodiment relates to a process of how a server determines user type information. Different from the existing methods for determining user type information, the present disclosure separately predicts the first conversion rate of generating the user's target data under preset execution conditions and the first conversion rate of generating user's target data under no preset execution conditions. The second conversion rate, so as to determine the type information of the user according to the first conversion rate and the second conversion rate. Therefore, the method for determining user type information provided in the present disclosure can effectively classify user types under the influence of preset execution conditions on user types.
具体而言,如图2所示,该方法包括:Specifically, as shown in Figure 2, the method includes:
S201、获取用户的特征数据。S201. Obtain characteristic data of a user.
在本公开中,当需要对用户的类型进行划分时,服务器可以获取用户的特征数据。In the present disclosure, when it is necessary to classify the types of users, the server may obtain the characteristic data of the users.
应理解,本公开实施例对于如何获取用户的特征数据不作限制,在一些实施例中,可以通过获取用户的唯一识别标识(ID),再根据用户的唯一识别标识,确定与用户相关的 特征信息。It should be understood that the embodiment of the present disclosure does not limit how to obtain the characteristic data of the user. In some embodiments, the unique identification (ID) of the user may be obtained, and then the characteristic information related to the user may be determined according to the unique identification of the user. .
需要说明的是,本公开实施例涉及的用户可以为企业用户,也可以为个人用户,本公开实施例对此不作限制。It should be noted that the users involved in the embodiments of the present disclosure may be enterprise users or individual users, which is not limited in the embodiments of the present disclosure.
应理解,本公开实施例对于用户的特征数据也不作限制,可以根据实际情况具体确定。示例性的,若是企业用户,则用户的特征数据可以包括但不限于财务数据、工商数据、地域数据、股权数据和经营异常数据。It should be understood that the embodiments of the present disclosure do not limit the characteristic data of the user, which may be specifically determined according to actual conditions. Exemplarily, if it is an enterprise user, the characteristic data of the user may include but not limited to financial data, industrial and commercial data, regional data, equity data and abnormal business data.
其中,财务数据用于表征资产总额、所有者权益、投资收益、营业收入、营业外收入、利润总额、主营业务收入、净利润、负债总额、纳税总额、营业成本、销售费用、资产减值损失等。工商数据用于表征公司类型、成立年限、行业、企业状态、注册资本、工商变更次数等。地域数据用于表征省份、城市等。股权数据用于表征直接股东数量,自然人直接股东数量,自然人直接股东持股比例,非自然人直接股东数量,非自然人直接股东持股比例等。经营异常用于表征行政处罚数,经营异常数,税务违法数,被告次数,失信被执行次数等。Among them, financial data are used to represent total assets, owner's equity, investment income, operating income, non-operating income, total profit, main business income, net profit, total liabilities, total tax payment, operating costs, sales expenses, asset impairment loss etc. Industrial and commercial data are used to represent company type, year of establishment, industry, enterprise status, registered capital, number of industrial and commercial changes, etc. Territory data is used to represent provinces, cities, etc. Equity data is used to represent the number of direct shareholders, the number of direct shareholders of natural persons, the shareholding ratio of direct shareholders of natural persons, the number of direct shareholders of non-natural persons, the shareholding ratio of direct shareholders of non-natural persons, etc. Business abnormalities are used to represent the number of administrative penalties, business abnormalities, tax violations, defendants, and dishonesty executions.
S202、将用户的特征数据分别输入第一预测模型和第二预测模型,获取第一预测模型输出的第一转化率以及第二预测模型输出的第二转化率,第一预测模型用于预测在预设执行条件下生成用户的目标数据的概率,第二预测模型用于预测在不施加预设执行条件下生成用户的目标数据的概率。S202. Input the user's feature data into the first prediction model and the second prediction model respectively, and obtain the first conversion rate output by the first prediction model and the second conversion rate output by the second prediction model. The first prediction model is used to predict the The probability of generating the user's target data under preset execution conditions, and the second prediction model is used to predict the probability of generating the user's target data without applying the preset execution conditions.
在本步骤中,当服务器获取用户的特征数据后,可以将用户的特征数据分别输入第一预测模型和第二预测模型,从而获取第一转化率和第二转化率。In this step, after the server obtains the characteristic data of the user, the characteristic data of the user may be input into the first prediction model and the second prediction model respectively, so as to obtain the first conversion rate and the second conversion rate.
应理解,本公开实施例对于预设执行条件和目标数据不作限制,在一些实施例中,预设执行条件可以为对用户进行营销,相应的,目标数据则可以为用户发生的交易数据。通过第一转换率和第二转换率,可以反映用户在被营销和未被营销的情况下,被转化为发生交易行为的概率。It should be understood that the embodiments of the present disclosure do not limit the preset execution conditions and target data. In some embodiments, the preset execution conditions may be marketing to users, and correspondingly, the target data may be transaction data generated by users. The first conversion rate and the second conversion rate can reflect the probability that the user is converted into a transaction behavior under the condition of being marketed and not being marketed.
下面对于如何构建第一预测模型和第二预测模型进行说明。The following describes how to construct the first prediction model and the second prediction model.
在一些实施例中,第一预测模型是通过第一样本集训练后生成的,第一样本集中包含有历史用户的特征数据以及在预设执行条件下生成历史用户的目标数据的结果数据。相应的,第二预测模型是通过第二样本集训练后生成的,第二样本集中包含有历史用户的特征数据以及在不施加预设执行条件下生成历史用户的目标数据的结果数据。In some embodiments, the first prediction model is generated after training through the first sample set, which contains the characteristic data of historical users and the result data of generating target data of historical users under preset execution conditions . Correspondingly, the second prediction model is generated after being trained through the second sample set, which contains characteristic data of historical users and result data of generating target data of historical users without applying preset execution conditions.
示例性的,服务器可以按照是否被营销(treated),将所有客户分为两组:treated组和未被营销组(control)。对于两组客户,都以是否转化(responded)的结果数据作为目标标签(label),划分训练集和验证集。Exemplarily, the server may divide all customers into two groups according to whether they are treated (treated): the treated group and the unmarketed group (control). For two groups of customers, the result data of whether they are transformed (responded) is used as the target label (label), and the training set and the verification set are divided.
例如,可以将被营销人群单独取出,以其中转化人群作为label1,未转化人群作为label0。随机抽样80%作为训练集,20%作为验证集,基于XgBoost算法或其它分类方法训练第一预测模型。对未被营销客户,使用同样的处理方法,训练第二预测模型。For example, the marketed group can be taken out separately, and the converted group can be used as label1, and the non-transformed group can be used as label0. Randomly sample 80% as a training set and 20% as a verification set, and train the first prediction model based on the XgBoost algorithm or other classification methods. For customers who have not been marketed, use the same processing method to train the second prediction model.
应理解,本公开实施例对于第一预测模型和第二预测模型的类型不作限制,示例性的,第一预测模型和第二预测模型可以为二分类模型。It should be understood that the embodiments of the present disclosure do not limit the types of the first prediction model and the second prediction model. Exemplarily, the first prediction model and the second prediction model may be binary classification models.
S203、根据第一转化率和第二转化率,确定用户的类型信息。S203. Determine user type information according to the first conversion rate and the second conversion rate.
在本步骤中,当服务器获取到第一转换率和第二转换率后,可以根据第一转化率和第二转化率,确定用户的类型信息。In this step, after the server acquires the first conversion rate and the second conversion rate, the type information of the user may be determined according to the first conversion rate and the second conversion rate.
应理解,本公开实施例对于如何确定用户的类型信息不作限制,在一些实施例中,服务器可以将第一转化率和第二转化率的差值作为预设执行条件对生成用户的目标数据的概率的贡献值,从而根据第一转化率、第二转化率以及贡献值,确定用户的类型信息。It should be understood that the embodiment of the present disclosure does not limit how to determine the type information of the user. In some embodiments, the server may use the difference between the first conversion rate and the second conversion rate as the preset execution condition for generating the user's target data. The contribution value of the probability, so as to determine the type information of the user according to the first conversion rate, the second conversion rate and the contribution value.
示例性的,使用第一预测模型和第二预测模型可以对用户在营销和非营销两种场景下的转换率p进行预测,从而得到营销场景下的第一转换率p treated和非营销场景下的第二转换率p control,通过对第一转换率和第二转换率做差,可以得到营销对生成用户的交易数据的概率的贡献值lift。其中,lift=p treated-p controlExemplarily, using the first prediction model and the second prediction model can predict the conversion rate p of the user in both marketing and non-marketing scenarios, so as to obtain the first conversion rate p treated in the marketing scenario and p in the non-marketing scenario For the second conversion rate p control , by making a difference between the first conversion rate and the second conversion rate, the contribution value lift of marketing to the probability of generating the user's transaction data can be obtained. Among them, lift=p treated -p control .
后续的,在一些实施例中,若第一转化率大于第一阈值、第二转换率小于等于第一阈值、贡献值大于第二阈值,服务器则可以确定用户属于第一用户类型。Subsequently, in some embodiments, if the first conversion rate is greater than the first threshold, the second conversion rate is less than or equal to the first threshold, and the contribution value is greater than the second threshold, the server may determine that the user belongs to the first user type.
若第一转化率大于第一阈值、第二转换率大于第一阈值、贡献值大于等于第三阈值且小于等于第二阈值,服务器则可以确定用户属于第二用户类型。If the first conversion rate is greater than the first threshold, the second conversion rate is greater than the first threshold, and the contribution value is greater than or equal to the third threshold and less than or equal to the second threshold, the server may determine that the user belongs to the second user type.
若第一转化率小于等于第一阈值、第二转换率小于等于第一阈值、贡献值大于等于第三阈值且小于等于第二阈值,服务器则可以确定用户属于第三用户类型。If the first conversion rate is less than or equal to the first threshold, the second conversion rate is less than or equal to the first threshold, and the contribution value is greater than or equal to the third threshold and less than or equal to the second threshold, the server may determine that the user belongs to the third user type.
若第一转化率小于等于第一阈值、第二转换率大于第一阈值、贡献值些小于第三阈值,服务器则可以确定用户属于第四用户类型;If the first conversion rate is less than or equal to the first threshold, the second conversion rate is greater than the first threshold, and the contribution value is less than the third threshold, the server may determine that the user belongs to the fourth user type;
其中,第二阈值的绝对值和第三阈值的绝对值相等。Wherein, the absolute value of the second threshold is equal to the absolute value of the third threshold.
应理解,本公开实施例对于第一阈值的取值不作限制,示例性的,第一阈值可以为0.5.It should be understood that the embodiment of the present disclosure does not limit the value of the first threshold, for example, the first threshold may be 0.5.
应理解,本公开实施例对于第二阈值的取值也不作限制,示例性的,第二阈值thres为lift发生显著变化的阈值,可以是一个小于1且较接近于0的正小数,可以通过整体统计分布的分位数确定,相应的,第三阈值则可以为-thres。It should be understood that the embodiment of the present disclosure does not limit the value of the second threshold. Exemplarily, the second threshold thres is the threshold at which lift changes significantly, which may be a positive decimal less than 1 and closer to 0, and may be passed The quantile of the overall statistical distribution is determined, and correspondingly, the third threshold can be -thres.
示例性的,若p_treated>0.5,p_control≤0.5,lift>thres,则该用户为第一类型用户。若p_treated>0.5,p_control>0.5,-thres≤lift≤thres,则该用户为第二类型用户。若p_treated≤0.5,p_control≤0.5,-thres≤lift≤thres,则该用户为第三类型用户。若p_treated≤0.5, p_control>0.5,lift<-thres,则该用户为第四类型用户。Exemplarily, if p_treated>0.5, p_control≤0.5, lift>thres, then the user is the first type of user. If p_treated>0.5, p_control>0.5, -thres≤lift≤thres, then the user is the second type of user. If p_treated≤0.5, p_control≤0.5, -thres≤lift≤thres, the user is a third type user. If p_treated≤0.5, p_control>0.5, lift<-thres, the user is the fourth type of user.
其中,第一用户类型的用户在施加预设执行条件时生成目标数据的概率提升;第二用户类型的用户在施加预设执行条件或不施加预设执行条件时生成目标数据的概率均高于目标上限值;第三用户类型的用户在施加预设执行条件或不施加预设执行条件时生成目标数据的概率均低于目标下限值;第四用户类型的用户在施加预设执行条件时生成目标数据的概率降低。Among them, the probability of generating target data for users of the first user type increases when preset execution conditions are applied; the probability of generating target data for users of the second user type is higher than that when preset execution conditions are applied or not. The upper limit of the target; the probability of the third user type generating target data is lower than the target lower limit when the preset execution condition is applied or not; the fourth user type is under the application of the preset execution condition The probability of generating target data is reduced.
示例性的,以预设执行条件为对用户进行营销为例,四类用户可以相应的包括营销敏感人群、自然转化人群、无动于衷人群和反作用人群。Exemplarily, taking the preset execution condition of marketing to users as an example, the four types of users may correspondingly include marketing-sensitive groups, natural conversion groups, indifferent groups, and reactionary groups.
其中,营销敏感人群主动活跃比例较低,但容易受到营销活动影响而产生活跃行为。对该部分用户,可以按照对价格、折扣、让利等是否敏感,进一步做分层经营。Among them, the proportion of marketing-sensitive groups is relatively low, but they are easily affected by marketing activities and have active behaviors. For this part of the users, we can further conduct stratified operations according to whether they are sensitive to prices, discounts, and profit concessions.
自然转化人群为自发活跃用户,即使银行不对其投入营销资源,用户也会自发活跃,较为优质。对该部分用户,可以使用相似用户拓展模型,在企业信息库中找到更多与其相似的用户,通过网销、电销等手段将其引导为银行用户。Naturally transform the crowd into spontaneous active users. Even if the bank does not invest marketing resources in them, the users will be spontaneously active and more high-quality. For some users, you can use the similar user expansion model to find more similar users in the enterprise information database, and guide them to become bank users through online marketing, telemarketing and other means.
无动于衷人群为已确定流失、且通过营销也无法挽回的用户,或者极少看营销消息的用户,无需再继续投入更多营销资源。The indifferent group refers to users who have been lost and cannot be recovered through marketing, or users who rarely read marketing messages, and there is no need to continue to invest more marketing resources.
反作用人群会自发活跃,但会对营销打扰较为反感,要避免对该部分用户的营销打扰,也无需投入营销资源。The reactionary group will be spontaneously active, but they will be more disgusted with marketing interruptions. To avoid marketing interruptions to this part of users, there is no need to invest marketing resources.
在一些实施例中,针对营销敏感人群可以进一步进行划分,从而基于进一步划分的类型采用不同的营销方式。In some embodiments, marketing-sensitive groups can be further divided, so that different marketing methods can be adopted based on the types of further divisions.
示例性的,可以将营销敏感人群进一步划分为两种类型。第一种类型为价格敏感型,当存在补贴、折扣相关的营销活动时,会有相应的活跃行为。第二种类型为价格不敏感型,有无补贴对其影响较小,对于该类用户只需定期对其进行营销提醒即可。当构建价格敏感和价格不敏感样本后,可以使用经典的分类方法进行用户分类,在此不再赘述。Exemplarily, marketing sensitive groups can be further divided into two types. The first type is price-sensitive. When there are marketing activities related to subsidies and discounts, there will be corresponding active behaviors. The second type is the price-insensitive type, which is less affected by whether there is a subsidy or not. For this type of users, only regular marketing reminders are required. After the price-sensitive and price-insensitive samples are constructed, the classic classification method can be used to classify users, which will not be repeated here.
示例性的,当用户有如下行为之一时,则可以划为价格敏感label=1样本(此处N为可调节阈值,基于不同的数据分布可设定不同的取值,通常不大于30)):Exemplarily, when a user has one of the following behaviors, it can be classified as a price-sensitive label=1 sample (where N is an adjustable threshold, and different values can be set based on different data distributions, usually not greater than 30)) :
1、过去3个月的平均贷款利率处于群体的最低N%以内水平;1. The average loan interest rate in the past 3 months is within the lowest N% of the group;
2、过去3个月贷款单据的补贴使用率处于topN%以上;2. The subsidy utilization rate of loan documents in the past 3 months is above topN%;
3、主动分享渠道信息领取优惠券。3. Actively share channel information to receive coupons.
示例性的,除上述价格敏感客户之外,其它客户为价格非敏感客户(label=0)。Exemplarily, except for the above-mentioned price-sensitive customers, other customers are price-insensitive customers (label=0).
在一些实施例中,对价格敏感客户,在进行营销时,可以重点向其推荐优惠力度较大的活动权益,以激励其有较好的活跃表现。对价格非敏感客户,可以通过定期营销提醒, 为其做银行内的多产品交叉推荐。In some embodiments, for price-sensitive customers, during marketing, activities and rights with greater discounts can be recommended to them, so as to encourage them to have better active performance. Customers who are not price-sensitive can make cross-recommendations of multiple products within the bank through regular marketing reminders.
本公开实施例提供的用户的类型信息的确定方法,对用户的划分较细,可避免营销资源的浪费。结合营销响应模型、价格敏感度模型等模型,可将存量及增量用户进行细致划分,有助于银行将营销资源倾斜至最有需要的营销敏感用户。The method for determining user type information provided by the embodiments of the present disclosure can classify users more finely, which can avoid waste of marketing resources. Combined with the marketing response model, price sensitivity model and other models, the stock and incremental users can be divided in detail, which helps the bank to allocate marketing resources to the most needy marketing-sensitive users.
本公开实施例提供的用户的类型信息的确定方法,首先获取用户的特征数据。随后,将用户的特征数据分别输入第一预测模型和第二预测模型,获取第一预测模型输出的第一转化率以及第二预测模型输出的第二转化率,第一预测模型用于预测在预设执行条件下生成用户的目标数据的概率,第二预测模型用于预测在不施加预设执行条件下生成用户的目标数据的概率。最后,根据第一转化率和第二转化率,确定用户的类型信息。通过该方式,可以在预设执行条件下有效对用户的类型进行划分。In the method for determining user type information provided by the embodiments of the present disclosure, firstly, user feature data is acquired. Subsequently, the user's characteristic data are respectively input into the first prediction model and the second prediction model, and the first conversion rate output by the first prediction model and the second conversion rate output by the second prediction model are obtained. The first prediction model is used to predict the The probability of generating the user's target data under preset execution conditions, and the second prediction model is used to predict the probability of generating the user's target data without applying the preset execution conditions. Finally, according to the first conversion rate and the second conversion rate, the type information of the user is determined. In this manner, user types can be effectively classified under preset execution conditions.
在上述实施例的基础上,在对用户的类型信息进行确定后,服务器还可以根据用户的多维向量,在数据库中查询目标类型的用户的转移用户。该多维向量用于表征在多个维度下用户之间的关联关系,该转移用户为被施加预设执行条件以进行量级拓展的用户。图3为本公开实施例提供的一种挖掘转移用户的方法的流程示意图,如图3所示,该方法包括:On the basis of the above embodiments, after determining the type information of the user, the server may also query the transfer user of the user of the target type in the database according to the multidimensional vector of the user. The multi-dimensional vector is used to characterize the relationship between users in multiple dimensions, and the transfer user is a user who is applied with preset execution conditions for magnitude expansion. FIG. 3 is a schematic flowchart of a method for mining and transferring users provided by an embodiment of the present disclosure. As shown in FIG. 3 , the method includes:
S301、根据数据库中待查询的用户的多维向量和目标类型的用户的多维向量之间的余弦相似度。S301. According to the cosine similarity between the multidimensional vector of the user to be queried and the multidimensional vector of the user of the target type in the database.
应理解,本公开实施例对于目标类型的用户不作限制,在一些实施例中,目标类型的用户可以为上述第二类型的用户,即,自然转换人群。It should be understood that this embodiment of the present disclosure does not limit the target type of users, and in some embodiments, the target type of users may be the above-mentioned second type of users, that is, a natural conversion group.
示例性的,设两个用户ui和uj的多维向量(embedding)向量分别为(x i1,x i2,…,x in)和(x j1,x j2,…,x jn),则用户ui、uj之间的余弦相似度cos(u i,u j)可以通过公式(1)确定。 Exemplarily, assuming that the multidimensional vector (embedding) vectors of two users ui and uj are respectively (x i1 , x i2 , . . . , x in ) and (x j1 , x j2 , . . . , x jn ), then users ui, The cosine similarity cos(u i ,u j ) between uj can be determined by formula (1).
Figure PCTCN2022101734-appb-000001
Figure PCTCN2022101734-appb-000001
应理解,本公开实施例对于如何确定用户的多维向量不作限制,在一些实施例中,服务器可以根据用户之间的关联信息,确定数据库中的用户的多维向量。It should be understood that the embodiment of the present disclosure does not limit how to determine the multidimensional vector of the user. In some embodiments, the server may determine the multidimensional vector of the user in the database according to the association information between users.
示例性的,服务器可以先从数据库中选择目标用户作为用户关系网中的目标节点。其次,服务器可以依次确定目标节点的关联用户节点序列中的当前末尾节点的下一用户节点,直至目标用户的节点序列的长度达到预设的序列长度。再次,服务器可以根据目标节点的关联用户节点序列,生成目标节点的关联节点阵列。最后,服务器可以根据目标节点的关联节点阵列,确定数据库中用户的多维向量。Exemplarily, the server may first select a target user from the database as a target node in the user relationship network. Secondly, the server may sequentially determine the next user node of the current end node in the associated user node sequence of the target node until the length of the node sequence of the target user reaches a preset sequence length. Again, the server can generate an associated node array of the target node according to the associated user node sequence of the target node. Finally, the server can determine the multidimensional vector of the user in the database according to the associated node array of the target node.
应理解,确定当前末尾节点的归一化转移概率,可以先根据用户之间的关联信息,确 定当前末尾节点与任一关联用户节点之间的转移概率。随后,在归一化当前末尾节点与任一关联用户节点之间的转移概率,确定当前末尾节点的归一化转移概率。It should be understood that to determine the normalized transition probability of the current end node, the transition probability between the current end node and any associated user node can be determined first according to the association information between users. Then, after normalizing the transition probability between the current end node and any associated user node, the normalized transition probability of the current end node is determined.
应理解,本公开实施例对于如何确定用户节点之间的转移概率也不做限制,示例性的,服务器可以先根据用户之间的关联信息以及用户的标识,生成当前末尾节点与任一关联用户节点之间的权重数据。随后,服务器根据当前末尾节点与任一关联用户节点在用户关系网中最短路径距离的取值,确定当前末尾节点与任一关联用户节点之间的权重修正系数。最后,服务器根据当前末尾节点与任一关联用户节点之间的权重数据和当前末尾节点与任一关联用户节点之间的权重修正系数,确定当前末尾节点与任一关联用户节点之间的转移概率。It should be understood that the embodiment of the present disclosure does not limit how to determine the transition probability between user nodes. For example, the server may first generate the current end node and any associated user according to the association information between users and the user identification. Weight data between nodes. Subsequently, the server determines the weight correction coefficient between the current end node and any associated user node according to the value of the shortest path distance between the current end node and any associated user node in the user relationship network. Finally, the server determines the transition probability between the current end node and any associated user node according to the weight data between the current end node and any associated user node and the weight correction coefficient between the current end node and any associated user node .
其中,最短路径距离的取值包括第一取值、第二取值和第三取值;最短路径距离的取值与权重修正系数存在映射关系;Wherein, the value of the shortest path distance includes the first value, the second value and the third value; there is a mapping relationship between the value of the shortest path distance and the weight correction coefficient;
若关联用户节点为当前末尾节点的上一节点,则最短路径距离的取值为第一取值,若关联用户节点与当前末尾节点为相邻节点,则最短路径距离的取值为第二取值,若关联用户节点不是当前末尾节点的上一节点或当前末尾节点的相邻节点,则最短路径距离的取值为第三取值。If the associated user node is the previous node of the current end node, the value of the shortest path distance is the first value; if the associated user node and the current end node are adjacent nodes, the value of the shortest path distance is the second value value, if the associated user node is not the previous node of the current end node or the adjacent node of the current end node, the value of the shortest path distance is the third value.
S302、据余弦相似度,确定待查询的用户是否为转移用户。S302. According to the cosine similarity, determine whether the user to be queried is a transferred user.
其中,转移用户可以理解为可以进行营销转换的用户。Wherein, the transfer user can be understood as a user who can perform marketing conversion.
应理解,本公开实施例对于如何根据余弦相似度确定待查询的用户是否为转移用户不作限制,在一些实施例中,两用户之间的余弦相似度距离越大,则越相似。相应的,可以通过寻找与目标类型的用户余弦相似度距离最小的客户,来找到营销目标客户。It should be understood that the embodiment of the present disclosure does not limit how to determine whether the user to be queried is a transfer user according to the cosine similarity. In some embodiments, the greater the cosine similarity distance between two users, the more similar they are. Correspondingly, the marketing target customer can be found by looking for the customer with the smallest cosine similarity distance with the target type of user.
需要说明的是,图3的挖掘转移用户的方法可以适用于目标类型的用户的量级较小的情况,例如小于等于200人。当目标类型的用户的量级较大时(例如,大于200人),可以采用图4所示的方法。It should be noted that the method for mining and transferring users in FIG. 3 may be applicable to a situation where the magnitude of the target type of users is small, for example, less than or equal to 200 people. When the magnitude of the target type of users is large (for example, more than 200), the method shown in FIG. 4 may be used.
图4为本公开实施例提供的另一种挖掘转移用户的方法的流程示意图,如图4所示,该挖掘转移用户的方法,包括:FIG. 4 is a schematic flowchart of another method for mining and transferring users provided by an embodiment of the present disclosure. As shown in FIG. 4 , the method for mining and transferring users includes:
S401、根据预设的采样比例,对目标类型的用户的多维向量和非目标类型的用户的多维向量进行采样,生成第三样本集,目标类型的用户的多维向量为第三样本集的正样本,非目标类型的用户的多维向量为第三样本集的负样本。S401. Sampling the multidimensional vectors of users of the target type and the multidimensional vectors of users of non-target types according to a preset sampling ratio to generate a third sample set, where the multidimensional vectors of users of the target type are positive samples of the third sample set , the multidimensional vector of users of non-target type is the negative sample of the third sample set.
示例性的,可以以自然转化人群作为种子客户(label=1),以非自然转化客户作为负样本(label=0),对样本量级做适当采样,使label1:label0在1:1~1:3之间。随后,对样本做随机划分,取80%作为训练集,剩余20%作为验证集。Exemplarily, the natural conversion population can be used as the seed customer (label=1), and the non-natural conversion customer can be used as the negative sample (label=0), and the sample size can be appropriately sampled so that label1:label0 is 1:1~1 : between 3. Then, randomly divide the samples, take 80% as the training set, and the remaining 20% as the verification set.
S402、使用第三样本对相似人群扩展模型进行训练。S402. Use the third sample to train the extended model for similar groups of people.
示例性的,可以对训练集、验证集中的样本用户,使用其股权控股embedding向量作为特征,进行模型训练(可以使用XgBoost/LR特征组合等),保存二分类模型lookalike.model。Exemplarily, model training can be performed on sample users in the training set and verification set using their equity holding embedding vectors as features (XgBoost/LR feature combination can be used, etc.), and the binary classification model lookalike.model can be saved.
S403、将数据库中待查询的用户的多维向量输入训练后的相似人群扩展模型中,并获取训练后的相似人群扩展模型输出的人群转换概率。S403. Input the multidimensional vector of the user to be queried in the database into the trained similar group expansion model, and obtain the group conversion probability output by the trained similar group expansion model.
示例性的,对同样拥有embedding向量特征的大量用户可以作为待查询用户,从而使用训练后的lookalike.model模型进行预测,得到待查询用户的转化人群的概率score。Exemplarily, a large number of users who also have embedding vector features can be used as the users to be queried, so that the trained lookalike.model model can be used to make predictions, and the probability score of the converted population of the users to be queried can be obtained.
S404、根据人群转换概率,确定待查询的用户是否为转移用户。S404. Determine whether the user to be queried is a transferred user according to the crowd conversion probability.
应理解,本公开实施例对于如何根据人群转换概率,确定待查询的用户是否为转移用户不作限制。在一些实施例中,可以将人群转换概率和阈值进行比较。It should be understood that the embodiment of the present disclosure does not limit how to determine whether the user to be queried is a transferred user according to the population conversion probability. In some embodiments, the population switching probability may be compared to a threshold.
示例性的,若score≥thres,则可以确定待查询的用户为转移用户,若score<thres,则可以确定待查询的用户不是转移用户。Exemplarily, if score≥thres, it can be determined that the user to be queried is a transfer user, and if score<thres, it can be determined that the user to be queried is not a transfer user.
应理解,图3和图4提供的挖掘转移用户的方法,使用了用户的多维向量,由于用户的多维向量中包含有用户之间的控股同质相似性与控股结构相似性与传统方法相比,能够挖掘出用户之间的好友或熟人关系,从而提高了转移用户的转移成功率。It should be understood that the methods for mining and transferring users provided in Figure 3 and Figure 4 use the user's multidimensional vector, because the user's multidimensional vector contains the holding homogeneity similarity and holding structure similarity between users compared with the traditional method , it is possible to dig out the friend or acquaintance relationship between users, thereby improving the transfer success rate of transferred users.
在图3和图4提供的挖掘转移用户的方法的基础上,服务器可以根据用户之间的关联信息,确定数据库中的用户的多维向量。下面对于如何确定用户的维向量进行说明。Based on the methods for mining and transferring users provided in FIG. 3 and FIG. 4 , the server can determine the multidimensional vector of the users in the database according to the association information between users. The following describes how to determine the dimension vector of the user.
图5为本公开实施例提供的一种用户节点之间的关联关系示意图。如图5所示,图中所有的节点都表示一个企业,节点与节点之间的边表示控股关系,由投股企业指向被控股企业,边的权重表示控股比例。FIG. 5 is a schematic diagram of an association relationship between user nodes provided by an embodiment of the present disclosure. As shown in Figure 5, all the nodes in the figure represent a company, and the edges between nodes represent the holding relationship, from the investment company to the holding company, and the weight of the edge represents the holding ratio.
如图5可知,两个节点对应的企业可以包含两种相似关系。As shown in Figure 5, the enterprises corresponding to the two nodes can contain two similar relationships.
第一种相似关系,考虑到企业u与s1、s2、s3、s4是邻居关系,可以认为企业u与企业s1、s2、s3、s4之间是具有一定相似性的,称之为同质性。The first kind of similarity relationship, considering that enterprise u and s1, s2, s3, and s4 are neighbors, it can be considered that there is a certain similarity between enterprise u and enterprises s1, s2, s3, and s4, which is called homogeneity .
第二种相似关系,u和s6都是对应子图的中心节点,在对应子图中的度最大,也是具有一定相似性的,可以称之为结构相似性。The second kind of similarity relationship, u and s6 are both central nodes of the corresponding subgraph, and have the largest degree in the corresponding subgraph, and they also have a certain similarity, which can be called structural similarity.
需要说明的是,要同时发现同质性与结构相似性,并在embedding结果中有所体现,既需要使用深度优先遍历(DFS),也需要使用广度优先遍历(BFS)。为了更好地整合两种遍历方法的优点,可以使用node2vec算法。该算法使用一种随机步(random walk)的做法,可以兼顾深度优先遍历与广度优先遍历,生成由节点组成的遍历节点队列。然后将遍历节点队列作为上下文,使用skip-gram方法得到每个节点的embedding词向量表示。It should be noted that both depth-first traversal (DFS) and breadth-first traversal (BFS) are required to discover homogeneity and structural similarity at the same time and reflect them in the embedding results. In order to better integrate the advantages of the two traversal methods, the node2vec algorithm can be used. The algorithm uses a random walk method, which can take into account both depth-first traversal and breadth-first traversal, and generates a traversal node queue composed of nodes. Then traverse the node queue as the context, and use the skip-gram method to obtain the embedding word vector representation of each node.
图6为本公开实施例提供的一种确定用户的多维向量的示意图,如图6所示,该方法包括:Fig. 6 is a schematic diagram of determining a user's multidimensional vector provided by an embodiment of the present disclosure. As shown in Fig. 6, the method includes:
S501、对用户的名称进行唯一表编码。S501. Encode the user's name in a unique table.
应理解,本公开实施例对于如何对用户的名称进行编码不作限制,可以根据预设的编码顺序进行编码。示例性的,“深圳前海微众银行股份有限公司”可以编码为s5。It should be understood that this embodiment of the present disclosure does not limit how to encode the user's name, and the encoding may be performed according to a preset encoding sequence. Exemplarily, "Shenzhen Qianhai WeBank Co., Ltd." may be coded as s5.
S502、生成用户关系网中每条有向边对应的用户节点之间的权重数据。S502. Generate weight data between user nodes corresponding to each directed edge in the user relationship network.
在一些实施例中,用户节点之间的权重数据中可以包含有起始节点、结尾节点和权重系统。示例性的,如图5所示,权重数据可例如“u s1 0.7”、“u s2 0.35”、“u s3 0.65”等。In some embodiments, the weight data between user nodes may include a start node, an end node and a weight system. Exemplarily, as shown in FIG. 5, the weight data may be, for example, "u s1 0.7", "u s2 0.35", "u s3 0.65" and so on.
S503、根据两个节点在用户关系网中最短路径距离的取值,确定节点之间的权重修正系数。S503. Determine the weight correction coefficient between the nodes according to the value of the shortest path distance between the two nodes in the user relationship network.
其中,最短路径距离的取值包括第一取值、第二取值和第三取值;最短路径距离的取值与权重修正系数存在映射关系;Wherein, the value of the shortest path distance includes the first value, the second value and the third value; there is a mapping relationship between the value of the shortest path distance and the weight correction coefficient;
若关联用户节点为当前末尾节点的上一节点,则最短路径距离的取值为第一取值,若关联用户节点与当前末尾节点为相邻节点,则最短路径距离的取值为第二取值,若关联用户节点不是当前末尾节点的上一节点或当前末尾节点的相邻节点,则最短路径距离的取值为第三取值。If the associated user node is the previous node of the current end node, the value of the shortest path distance is the first value; if the associated user node and the current end node are adjacent nodes, the value of the shortest path distance is the second value value, if the associated user node is not the previous node of the current end node or the adjacent node of the current end node, the value of the shortest path distance is the third value.
示例性的,当前所处节点为v,v的上一个节点为t(t→v存在一条有向边),则对当前节点v的相邻节点x,定义权重修正系数可以如公式(2)所示:Exemplarily, the current node is v, and the previous node of v is t (t→v has a directed edge), then for the adjacent node x of the current node v, the weight correction coefficient can be defined as formula (2) Shown:
Figure PCTCN2022101734-appb-000002
Figure PCTCN2022101734-appb-000002
其中,d tx表示x与顶点t之间的最短路径距离。该最短路径距离只有3种情况:如果又回到节点t(不考虑边的有向性),则d tx=0;如果x和t直接相邻,则d tx=1;其它情况d tx=2。 Among them, dtx represents the shortest path distance between x and vertex t. There are only three cases of the shortest path distance: if it returns to node t (regardless of the directionality of the edge), then d tx =0; if x and t are directly adjacent, then d tx =1; in other cases d tx = 2.
应理解,p和q可以事先指定具体取值,在此不做限制。It should be understood that specific values of p and q may be specified in advance, which are not limited here.
S504、根据用户节点之间的权重数据和节点之间的权重修正系数,确定节点之间的转移概率。S504. Determine the transition probability between nodes according to the weight data between user nodes and the weight correction coefficient between nodes.
示例性的,可以通过公式(3)确定节点之间的转移概率。Exemplarily, the transition probability between nodes can be determined by formula (3).
π(v,x)=α(t,x)w vx     (3) π(v,x)=α(t,x)w vx (3)
其中,w vx为用户关系网中的节点v和x边的权重,π(v,x)为节点v至节点x的转移 概率。 Among them, w vx is the weight of the node v and x in the user relationship network, and π(v, x) is the transition probability from node v to node x.
图7为本公开实施例提供的一种节点之间转移概率的示意图。如图7所示,s7为当前节点,s6为s7的上一个节点。则对s7的两个相邻节点:s8、s5,其转移概率分别为:π(s7,s6)=1/p*60%、π(s7,s8)=1*15%、π(s7,s5)=1/q*25%。FIG. 7 is a schematic diagram of a transition probability between nodes provided by an embodiment of the present disclosure. As shown in Figure 7, s7 is the current node, and s6 is the previous node of s7. Then for the two adjacent nodes of s7: s8 and s5, the transition probabilities are respectively: π(s7, s6)=1/p*60%, π(s7, s8)=1*15%, π(s7, s5)=1/q*25%.
S505、根据节点之间的转移概率,确定节点之间的归一化概率。S505. Determine the normalized probability between nodes according to the transition probability between nodes.
示例性的,对节点v的每个相邻节点x i,求取转移概率π i,并做转移概率归一化
Figure PCTCN2022101734-appb-000003
Exemplarily, for each adjacent node x i of node v, the transition probability π i is obtained, and the transition probability is normalized
Figure PCTCN2022101734-appb-000003
S506、从数据库中选择目标用户作为用户关系网中的目标节点。S506. Select a target user from the database as a target node in the user relationship network.
示例性的,从图5所示的用户关系网中可以随机选取一个节点t,并选择t的一个相邻节点v作为目标节点,其中t→v存在一条有向边。Exemplarily, a node t can be randomly selected from the user relationship network shown in FIG. 5 , and an adjacent node v of t can be selected as the target node, where t→v has a directed edge.
S507、依次确定目标节点的关联用户节点序列中的当前末尾节点的下一用户节点,直至目标用户的节点序列的长度达到预设的序列长度。S507, sequentially determine the user node next to the current end node in the associated user node sequence of the target node, until the length of the node sequence of the target user reaches a preset sequence length.
本公开实施例对于如何确定下一用户节点不作限制,在一些实施例中,可以通过确定当前末尾节点的归一化转移概率并对当前末尾节点的关联节点进行加权采样,确定下一用户节点。The embodiment of the present disclosure does not limit how to determine the next user node. In some embodiments, the next user node can be determined by determining the normalized transition probability of the current end node and performing weighted sampling on the associated nodes of the current end node.
其中,加权采样具体可以为别名采样(alias sample)。Wherein, the weighted sampling may specifically be an alias sample (alias sample).
示例性的,可以对目标节点v的所有相邻节点x i,通过p i计算其归一化的转移概率,并基于别名采样进行节点采样,得到下一个节点x i,此时序列为(v,x 1)。随后,对目标用户的节点序列的最后一个节点重复上述过程,得到下一用户节点,得到(v,x 1,x 2)。通过 Exemplarily, it is possible to calculate the normalized transition probability of all adjacent nodes x i of the target node v through p i , and perform node sampling based on alias sampling to obtain the next node x i , and the sequence at this time is (v , x 1 ). Subsequently, repeat the above process for the last node of the target user's node sequence to obtain the next user node, and obtain (v, x 1 , x 2 ). pass
预定义要得到的序列长度为m+1,重复过程上述过程m次,可以得节点v的序列结果:(v,x 1,x 2,…x m)。 The length of the sequence to be obtained is predefined as m+1, and the above process is repeated m times to obtain the sequence result of node v: (v, x 1 , x 2 , ... x m ).
S508、根据目标节点的关联用户节点序列,生成目标节点的关联节点阵列。S508. Generate an associated node array of the target node according to the associated user node sequence of the target node.
示例性的,通过预定义需要的序列数量M,可以选取节点t的M个相邻节点,并相应的计算t的M个相邻节点各自的用户节点序列。通过将t的M个相邻节点各自的用户节点序列组合,可以得到目标节点的关联节点阵列如下:Exemplarily, by predefining the required number of sequences M, M adjacent nodes of node t may be selected, and respective user node sequences of the M adjacent nodes of t may be calculated accordingly. By combining the respective user node sequences of the M adjacent nodes of t, the associated node array of the target node can be obtained as follows:
v 1,x 11,x 12,…x 1m v 1 ,x 11 ,x 12 ,…x 1m
v 2,x 11,x 22,…x 2m v 2 ,x 11 ,x 22 ,…x 2m
v M,x M1,x M2,…x Mm v M ,x M1 ,x M2 ,…x Mm
S509、根据目标节点的关联节点阵列,确定数据库中用户的多维向量。S509. Determine the multidimensional vector of the user in the database according to the associated node array of the target node.
示例性的,可以将目标节点的关联节点阵列输入word2vec,得到每个节点的n维(n 可人工设定)embedding向量表示,格式如下:Exemplarily, the associated node array of the target node can be input into word2vec to obtain the n-dimensional (n can be manually set) embedding vector representation of each node, the format is as follows:
“id1 0.13716 0.05973 -0.05692 0.34796…"id1 0.13716 0.05973 -0.05692 0.34796…
id2 0.55362 -0.24561 0.67832 0.89571…id2 0.55362 -0.24561 0.67832 0.89571…
…”..."
本公开实施例通过对企业间的关系类数据(例如股权控股关系、供应链关系等)提出一种较好的应用方式,从而提高了转移用户的确定准确率,在很大程度上能够解决目前银行业获客难、存量客户盘活困难的问题。The embodiment of the present disclosure proposes a better application method for relational data between enterprises (such as equity holding relationship, supply chain relationship, etc.), thereby improving the determination accuracy of transfer users, and can solve the current problems to a large extent. Difficulty in obtaining customers in the banking industry and difficulty in mobilizing existing customers.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps to realize the above method embodiments can be completed by hardware related to program instructions, and the aforementioned program can be stored in a computer-readable storage medium. When the program is executed, the It includes the steps of the above method embodiments; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other various media that can store program codes.
图8为本公开实施例提供的一种用户的类型信息的确定装置的结构示意图。该用户的类型信息的确定装置可以通过软件、硬件或者两者的结合实现,以执行上述实施例的用户的类型信息的确定方法。如图8所示,该用户的类型信息的确定装置600包括:获取模块601、预测模块602和确定模块603。FIG. 8 is a schematic structural diagram of an apparatus for determining user type information provided by an embodiment of the present disclosure. The device for determining the type information of the user may be implemented by software, hardware or a combination of the two, so as to execute the method for determining the type information of the user in the foregoing embodiments. As shown in FIG. 8 , the device 600 for determining type information of the user includes: an acquisition module 601 , a prediction module 602 and a determination module 603 .
获取模块601,用于获取用户的特征数据。An acquisition module 601, configured to acquire user characteristic data.
预测模块602,用于将用户的特征数据分别输入第一预测模型和第二预测模型,获取第一预测模型输出的第一转化率以及第二预测模型输出的第二转化率,第一预测模型用于预测在预设执行条件下生成用户的目标数据的概率,第二预测模型用于预测在不施加预设执行条件下生成用户的目标数据的概率。The forecasting module 602 is configured to input the characteristic data of the user into the first forecasting model and the second forecasting model respectively, obtain the first conversion rate output by the first forecasting model and the second conversion rate outputted by the second forecasting model, and the first forecasting model The second prediction model is used for predicting the probability of generating the user's target data under preset execution conditions, and the second prediction model is used for predicting the probability of generating the user's target data without applying the preset execution conditions.
确定模块603,用于根据第一转化率和第二转化率,确定用户的类型信息。A determining module 603, configured to determine user type information according to the first conversion rate and the second conversion rate.
本公开实施例提供的用户的类型信息的确定装置,可以执行上述实施例中的用户的类型信息的确定方法的动作,其实现原理和技术效果类似,在此不再赘述。The device for determining user type information provided in the embodiments of the present disclosure can perform the actions of the method for determining user type information in the above embodiments, and its implementation principle and technical effect are similar, and will not be repeated here.
图9为本公开实施例提供的一种电子设备的结构示意图。如图9所示,该电子设备可以包括:至少一个处理器701和存储器702。图9示出的是以一个处理器为例的电子设备。FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure. As shown in FIG. 9 , the electronic device may include: at least one processor 701 and a memory 702 . FIG. 9 shows an electronic device with a processor as an example.
存储器702,用于存放程序。具体地,程序可以包括程序代码,程序代码包括计算机操作指令。The memory 702 is used to store programs. Specifically, the program may include program code, and the program code includes computer operation instructions.
存储器702可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。The memory 702 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
处理器701用于执行存储器702存储的计算机执行指令,以实现上述用户的类型信息的确定方法;The processor 701 is configured to execute the computer-executed instructions stored in the memory 702, so as to realize the method for determining the above-mentioned user type information;
其中,处理器701可能是一个中央处理器(Central Processing Unit,简称为CPU),或者是特定集成电路(Application Specific Integrated Circuit,简称为ASIC),或者是被配置成实施本公开实施例的一个或多个集成电路。Wherein, the processor 701 may be a central processing unit (Central Processing Unit, referred to as CPU), or a specific integrated circuit (Application Specific Integrated Circuit, referred to as ASIC), or is configured to implement one or multiple integrated circuits.
可选的,在具体实现上,如果通信接口、存储器702和处理器701独立实现,则通信接口、存储器702和处理器701可以通过总线相互连接并完成相互间的通信。总线可以是工业标准体系结构(Industry Standard Architecture,简称为ISA)总线、外部设备互连(Peripheral Component,简称为PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,简称为EISA)总线等。总线可以分为地址总线、数据总线、控制总线等,但并不表示仅有一根总线或一种类型的总线。Optionally, in specific implementation, if the communication interface, memory 702 and processor 701 are independently implemented, the communication interface, memory 702 and processor 701 may be connected to each other through a bus to complete mutual communication. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component (PCI) bus, or an Extended Industry Standard Architecture (EISA) bus. The bus can be divided into address bus, data bus, control bus, etc., but it does not mean that there is only one bus or one type of bus.
可选的,在具体实现上,如果通信接口、存储器702和处理器701集成在一块芯片上实现,则通信接口、存储器702和处理器701可以通过内部接口完成通信。Optionally, in terms of specific implementation, if the communication interface, memory 702 and processor 701 are integrated and implemented on one chip, the communication interface, memory 702 and processor 701 may complete communication through an internal interface.
本公开实施例还提供了一种芯片,包括处理器和接口。其中接口用于输入输出处理器所处理的数据或指令。处理器用于执行以上方法实施例中提供的方法。The embodiment of the present disclosure also provides a chip, including a processor and an interface. The interface is used to input and output data or instructions processed by the processor. The processor is configured to execute the methods provided in the above method embodiments.
本公开还提供了一种计算机可读存储介质,该计算机可读存储介质可以包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或者光盘等各种可以存储程序代码的介质,具体的,该计算机可读存储介质中存储有程序信息,程序信息用于上述用户的类型信息的确定方法。The present disclosure also provides a computer-readable storage medium, which may include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory) ), a magnetic disk or an optical disk, and other media that can store program codes. Specifically, the computer-readable storage medium stores program information, and the program information is used in the method for determining the above-mentioned user type information.
本公开实施例还提供一种程序,该程序在被处理器执行时用于执行以上方法实施例提供的用户的类型信息的确定方法。An embodiment of the present disclosure further provides a program, which is used to execute the method for determining user type information provided by the above method embodiments when executed by a processor.
本公开实施例还提供一种程序产品,例如计算机可读存储介质,该程序产品中存储有指令,当其在计算机上运行时,使得计算机执行上述方法实施例提供的用户的类型信息的确定方法。An embodiment of the present disclosure also provides a program product, such as a computer-readable storage medium, in which an instruction is stored, and when it is run on a computer, the computer executes the method for determining user type information provided by the above-mentioned method embodiment .
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本公开实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个 可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. A computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present disclosure are produced in whole or in part. A computer can be a general purpose computer, special purpose computer, computer network, or other programmable device. Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g. Coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, etc. integrated with one or more available media. Available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)).
最后应说明的是:以上各实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述各实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本公开各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present disclosure, not to limit them; although the present disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present disclosure. scope.

Claims (15)

  1. 一种用户的类型信息的确定方法,其特征在于,所述方法包括:A method for determining user type information, characterized in that the method includes:
    获取用户的特征数据;Obtain the user's characteristic data;
    将所述用户的特征数据分别输入第一预测模型和第二预测模型,获取所述第一预测模型输出的第一转化率以及所述第二预测模型输出的第二转化率,所述第一预测模型用于预测在预设执行条件下生成所述用户的目标数据的概率,所述第二预测模型用于预测在不施加所述预设执行条件下生成所述用户的目标数据的概率;Inputting the characteristic data of the user into the first prediction model and the second prediction model respectively, obtaining the first conversion rate output by the first prediction model and the second conversion rate output by the second prediction model, the first The prediction model is used to predict the probability of generating the user's target data under preset execution conditions, and the second prediction model is used to predict the probability of generating the user's target data without applying the preset execution conditions;
    根据所述第一转化率和所述第二转化率,确定所述用户的类型信息。Determine the type information of the user according to the first conversion rate and the second conversion rate.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第一转化率和所述第二转化率,确定所述用户的类型信息,包括:The method according to claim 1, wherein the determining the type information of the user according to the first conversion rate and the second conversion rate comprises:
    将所述第一转化率和所述第二转化率的差值作为所述预设执行条件对生成所述用户的目标数据的概率的贡献值;Taking the difference between the first conversion rate and the second conversion rate as the contribution value of the preset execution condition to the probability of generating the user's target data;
    根据所述第一转化率、所述第二转化率以及所述贡献值,确定所述用户的类型信息。The type information of the user is determined according to the first conversion rate, the second conversion rate and the contribution value.
  3. 根据权利要求1或2所述的方法,其特征在于,所述第一预测模型是通过第一样本集训练后生成的,所述第一样本集中包含有历史用户的特征数据以及在所述预设执行条件下生成所述历史用户的目标数据的结果数据;The method according to claim 1 or 2, wherein the first prediction model is generated after training through a first sample set, and the first sample set contains characteristic data of historical users and the Generate the result data of the target data of the historical user under the preset execution conditions;
    所述第二预测模型是通过第二样本集训练后生成的,所述第二样本集中包含有历史用户的特征数据以及在不施加所述预设执行条件下生成所述历史用户的目标数据的结果数据。The second predictive model is generated after training through a second sample set, the second sample set contains characteristic data of historical users and target data of historical users generated without applying the preset execution conditions result data.
  4. 根据权利要求1-3任一项所述的方法,其特征在于,在根据所述第一转化率和所述第二转化率,确定所述用户的类型信息之后,所述方法还包括:The method according to any one of claims 1-3, wherein after determining the type information of the user according to the first conversion rate and the second conversion rate, the method further comprises:
    根据用户的多维向量,在数据库中查询目标类型的用户的转移用户,所述多维向量用于表征在多个维度下用户之间的关联关系,所述转移用户为被施加所述预设执行条件以进行量级拓展的用户。According to the user's multi-dimensional vector, query the transfer user of the target type of user in the database, the multi-dimensional vector is used to characterize the association relationship between users in multiple dimensions, and the transfer user is subject to the preset execution condition Users who can expand by magnitude.
  5. 根据权利要求4所述的方法,其特征在于,所述根据用户的多维向量,在数据库中查询目标类型的用户的转移用户,包括:The method according to claim 4, characterized in that, according to the multidimensional vector of the user, querying the transfer user of the user of the target type in the database includes:
    根据所述数据库中待查询的用户的多维向量和所述目标类型的用户的多维向量之间的余弦相似度,确定所述待查询的用户是否为所述转移用户。According to the cosine similarity between the multidimensional vector of the user to be queried in the database and the multidimensional vector of the user of the target type, determine whether the user to be queried is the transferred user.
  6. 根据权利要求4所述的方法,其特征在于,所述根据用户的多维向量,在数据 库中查询目标类型的用户的转移用户,包括:The method according to claim 4, characterized in that, according to the multidimensional vector of the user, the transfer user of the user of the query target type in the database includes:
    根据预设的采样比例,对所述目标类型的用户的多维向量和非目标类型的用户的多维向量进行采样,生成第三样本集,所述目标类型的用户的多维向量为所述第三样本集的正样本,所述非目标类型的用户的多维向量为所述第三样本集的负样本;Sampling the multidimensional vectors of users of the target type and the multidimensional vectors of users of non-target types according to a preset sampling ratio to generate a third sample set, where the multidimensional vectors of users of the target type are the third samples The positive sample of the set, the multidimensional vector of the user of the non-target type is a negative sample of the third sample set;
    使用所述第三样本对相似人群扩展模型进行训练;using the third sample to train the similar population extension model;
    将所述数据库中待查询的用户的多维向量输入训练后的相似人群扩展模型中,并获取所述训练后的相似人群扩展模型输出的人群转换概率;Inputting the multidimensional vector of the user to be queried in the database into the trained similar crowd expansion model, and obtaining the crowd conversion probability output by the trained similar crowd expansion model;
    根据所述人群转换概率,确定所述待查询的用户是否为所述转移用户。Determine whether the user to be queried is the transferred user according to the population conversion probability.
  7. 根据权利要求4所述的方法,其特征在于,在所述根据用户的多维向量,在数据库中查询目标类型的用户的转移用户之前,所述方法还包括:The method according to claim 4, characterized in that, before the transfer user of the user of the target type is inquired in the database according to the multidimensional vector of the user, the method further comprises:
    根据所述用户之间的关联信息,确定所述数据库中的用户的多维向量。A multidimensional vector of a user in the database is determined according to the association information between the users.
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述用户之间的关联信息,确定所述数据库中用户的多维向量,包括:The method according to claim 7, wherein the determining the multidimensional vector of the user in the database according to the association information between the users comprises:
    从所述数据库中选择目标用户作为用户关系网中的目标节点;selecting a target user from the database as a target node in the user relationship network;
    依次确定所述目标节点的关联用户节点序列中的当前末尾节点的下一用户节点,直至所述目标用户的节点序列的长度达到预设的序列长度;sequentially determining the next user node of the current end node in the associated user node sequence of the target node until the length of the node sequence of the target user reaches a preset sequence length;
    根据所述目标节点的关联用户节点序列,生成所述目标节点的关联节点阵列;generating an associated node array of the target node according to the associated user node sequence of the target node;
    根据所述目标节点的关联节点阵列,确定所述数据库中用户的多维向量。Determine the multidimensional vector of the user in the database according to the associated node array of the target node.
  9. 根据权利要求8所述的方法,其特征在于,所述确定所述目标节点的关联用户节点序列中的当前末尾节点的下一用户节点,包括:The method according to claim 8, wherein the determining the next user node of the current end node in the associated user node sequence of the target node comprises:
    确定所述当前末尾节点的归一化转移概率并对所述当前末尾节点的关联节点进行加权采样,确定所述下一用户节点。Determining the normalized transition probability of the current end node and performing weighted sampling on associated nodes of the current end node to determine the next user node.
  10. 根据权利要求9所述的方法,其特征在于,所述确定所述当前末尾节点的归一化转移概率,包括:The method according to claim 9, wherein said determining the normalized transition probability of said current end node comprises:
    根据所述用户之间的关联信息,确定所述当前末尾节点与任一关联用户节点之间的转移概率;According to the association information between the users, determine the transition probability between the current end node and any associated user node;
    归一化所述当前末尾节点与任一关联用户节点之间的转移概率,确定所述当前末尾节点的归一化转移概率。Normalize the transition probability between the current end node and any associated user node, and determine the normalized transition probability of the current end node.
  11. 根据权利要求10所述的方法,其特征在于,所述根据所述用户之间的关联信息,确定所述当前末尾节点与任一关联用户节点之间的转移概率,包括:The method according to claim 10, wherein the determining the transition probability between the current end node and any associated user node according to the association information between the users comprises:
    根据所述用户之间的关联信息以及所述用户的标识,生成所述当前末尾节点与任 一关联用户节点之间的权重数据;Generate weight data between the current end node and any associated user node according to the association information between the users and the identification of the user;
    根据所述当前末尾节点与任一关联用户节点在用户关系网中最短路径距离的取值,确定所述当前末尾节点与任一关联用户节点之间的权重修正系数;Determine the weight correction coefficient between the current end node and any associated user node according to the value of the shortest path distance between the current end node and any associated user node in the user relationship network;
    根据所述当前末尾节点与任一关联用户节点之间的权重数据和所述当前末尾节点与任一关联用户节点之间的权重修正系数,确定所述当前末尾节点与任一关联用户节点之间的转移概率。According to the weight data between the current end node and any associated user node and the weight correction coefficient between the current end node and any associated user node, determine the relationship between the current end node and any associated user node transition probability.
  12. 根据权利要求11所述的方法,其特征在于,所述最短路径距离的取值包括第一取值、第二取值和第三取值;所述最短路径距离的取值与所述权重修正系数存在映射关系;The method according to claim 11, wherein the value of the shortest path distance includes a first value, a second value and a third value; the value of the shortest path distance is related to the weight correction There is a mapping relationship between the coefficients;
    若所述关联用户节点为所述当前末尾节点的上一节点,则所述最短路径距离的取值为所述第一取值,若所述关联用户节点与所述当前末尾节点为相邻节点,则所述最短路径距离的取值为所述第二取值,若所述关联用户节点不是所述当前末尾节点的上一节点或所述当前末尾节点的相邻节点,则所述最短路径距离的取值为所述第三取值。If the associated user node is the previous node of the current end node, the value of the shortest path distance is the first value; if the associated user node and the current end node are adjacent nodes , then the value of the shortest path distance is the second value, if the associated user node is not the previous node of the current end node or the adjacent node of the current end node, then the shortest path The value of the distance is the third value.
  13. 一种电子设备,其特征在于,包括:处理器和存储器;其中,所述存储器存储有计算机程序,所述计算机程序适于由所述处理器加载并执行如权利要求1-12任意一项的方法。An electronic device, characterized in that it comprises: a processor and a memory; wherein, the memory stores a computer program, and the computer program is adapted to be loaded by the processor and execute any one of claims 1-12 method.
  14. 一种计算机存储介质,其特征在于,所述计算机存储介质存储有多条指令,所述指令适于由处理器加载并执行如权利要求1-12任意一项的方法步骤。A computer storage medium, characterized in that the computer storage medium stores a plurality of instructions, and the instructions are adapted to be loaded by a processor to execute the method steps according to any one of claims 1-12.
  15. 一种计算机程序,其特征在于,包括程序代码,当计算机运行所述计算机程序时,所述程序代码执行如权利要求1-12任一项所述的方法。A computer program, characterized by comprising program code, and when the computer runs the computer program, the program code executes the method according to any one of claims 1-12.
PCT/CN2022/101734 2021-12-30 2022-06-28 User type information determination method and device, and storage medium WO2023123933A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111655734.XA CN114298232A (en) 2021-12-30 2021-12-30 Method, device and storage medium for determining type information of user
CN202111655734.X 2021-12-30

Publications (1)

Publication Number Publication Date
WO2023123933A1 true WO2023123933A1 (en) 2023-07-06

Family

ID=80974556

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/101734 WO2023123933A1 (en) 2021-12-30 2022-06-28 User type information determination method and device, and storage medium

Country Status (2)

Country Link
CN (1) CN114298232A (en)
WO (1) WO2023123933A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117097909A (en) * 2023-10-20 2023-11-21 深圳市星易美科技有限公司 Distributed household audio and video processing method and system

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114298232A (en) * 2021-12-30 2022-04-08 深圳前海微众银行股份有限公司 Method, device and storage medium for determining type information of user
CN115168478B (en) * 2022-09-06 2022-11-29 深圳市明源云科技有限公司 Data type conversion method, electronic device and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140029994A (en) * 2012-08-31 2014-03-11 중소기업은행 Method for verifying sutability of loss given default/exposure at default
CN110728323A (en) * 2019-10-12 2020-01-24 中诚信征信有限公司 Target type user identification method and device, electronic equipment and storage medium
CN111090677A (en) * 2018-10-23 2020-05-01 北京嘀嘀无限科技发展有限公司 Method and device for determining data object type
CN113763019A (en) * 2021-01-28 2021-12-07 北京沃东天骏信息技术有限公司 User information management method and device
CN114298232A (en) * 2021-12-30 2022-04-08 深圳前海微众银行股份有限公司 Method, device and storage medium for determining type information of user

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140029994A (en) * 2012-08-31 2014-03-11 중소기업은행 Method for verifying sutability of loss given default/exposure at default
CN111090677A (en) * 2018-10-23 2020-05-01 北京嘀嘀无限科技发展有限公司 Method and device for determining data object type
CN110728323A (en) * 2019-10-12 2020-01-24 中诚信征信有限公司 Target type user identification method and device, electronic equipment and storage medium
CN113763019A (en) * 2021-01-28 2021-12-07 北京沃东天骏信息技术有限公司 User information management method and device
CN114298232A (en) * 2021-12-30 2022-04-08 深圳前海微众银行股份有限公司 Method, device and storage medium for determining type information of user

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117097909A (en) * 2023-10-20 2023-11-21 深圳市星易美科技有限公司 Distributed household audio and video processing method and system
CN117097909B (en) * 2023-10-20 2024-02-02 深圳市星易美科技有限公司 Distributed household audio and video processing method and system

Also Published As

Publication number Publication date
CN114298232A (en) 2022-04-08

Similar Documents

Publication Publication Date Title
WO2023123933A1 (en) User type information determination method and device, and storage medium
WO2019169756A1 (en) Product recommendation method and apparatus, and storage medium
CN112148987B (en) Message pushing method based on target object activity and related equipment
WO2021174693A1 (en) Data analysis method and apparatus, and computer system and readable storage medium
WO2022048363A1 (en) Website classification method and apparatus, computer device, and storage medium
US20230023630A1 (en) Creating predictor variables for prediction models from unstructured data using natural language processing
WO2020035075A1 (en) Method and system for carrying out maching learning under data privacy protection
CN111210335A (en) User risk identification method and device and electronic equipment
CN115018656B (en) Risk identification method, and training method, device and equipment of risk identification model
CN114139539A (en) Enterprise social responsibility index quantification method, system and application
CN111598360A (en) Service policy determination method and device and electronic equipment
Zhou et al. Extreme value modeling of coincident lane load effects for multi-lane factors of bridges using peaks-over-threshold method
CN116664306A (en) Intelligent recommendation method and device for wind control rules, electronic equipment and medium
TWI792101B (en) Data Quantification Method Based on Confirmed Value and Predicted Value
CN115099875A (en) Data classification method based on decision tree model and related equipment
US20140324523A1 (en) Missing String Compensation In Capped Customer Linkage Model
CN115018608A (en) Risk prediction method and device and computer equipment
WO2021129368A1 (en) Method and apparatus for determining client type
CN112818235A (en) Violation user identification method and device based on associated features and computer equipment
CN113094595A (en) Object recognition method, device, computer system and readable storage medium
CN112307334A (en) Information recommendation method, information recommendation device, storage medium and electronic equipment
Tu et al. A novel grey relational clustering model under sequential three-way decision framework
Wu et al. Variance reduced Shapley value estimation for trustworthy data valuation
JP7302107B1 (en) LEARNING SYSTEMS, LEARNING METHODS AND PROGRAMS
Şahinarslan et al. Machine learning algorithms to forecast population: Turkey example