CN112488863A - Dangerous seed recommendation method and related equipment in user cold start scene - Google Patents

Dangerous seed recommendation method and related equipment in user cold start scene Download PDF

Info

Publication number
CN112488863A
CN112488863A CN202011388337.6A CN202011388337A CN112488863A CN 112488863 A CN112488863 A CN 112488863A CN 202011388337 A CN202011388337 A CN 202011388337A CN 112488863 A CN112488863 A CN 112488863A
Authority
CN
China
Prior art keywords
user
users
determining
similar
dangerous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011388337.6A
Other languages
Chinese (zh)
Inventor
林鹏程
鞠芳
梁曦
侯成文
王帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Life Insurance Co Ltd China
Original Assignee
China Life Insurance Co Ltd China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Life Insurance Co Ltd China filed Critical China Life Insurance Co Ltd China
Priority to CN202011388337.6A priority Critical patent/CN112488863A/en
Publication of CN112488863A publication Critical patent/CN112488863A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

Provided are a dangerous seed recommendation method and related equipment in a user cold start scene. One or more embodiments of the present specification provide a dangerous seed recommendation method in a user cold start scenario, including: acquiring registration information of a target user; determining a user set corresponding to the target user through a pre-constructed user set determination model according to the registration information; in the user set, determining similar users of the target user through a pre-constructed similar user determination model; and determining the recommended risk categories of the target user according to the risk categories selected by the similar users. According to the method and the device, the accuracy of the dangerous seed recommendation is improved, the coverage rate of the dangerous seed recommendation is improved, and the resource consumption is reduced by preliminarily determining the user set and determining the similar users in the user set.

Description

Dangerous seed recommendation method and related equipment in user cold start scene
Technical Field
One or more embodiments of the present disclosure relate to the field of data processing technologies, and in particular, to a method for recommending dangerous seeds in a user cold start scenario and a related device.
Background
For a product recommendation system, a large number of new users access the system every day, if the recommendation system can recommend favorite commodities for the new users, the recommendation system can gain trust of more users, loyalty of the users to the system is improved, and high-quality personalized services can be obtained for the users at any time.
Most of existing product recommendation systems are based on the idea of collaborative filtering, but the cold start problem is a classic problem which is widely concerned in a collaborative filtering recommendation algorithm, the problem always influences the healthy development of the traditional collaborative filtering recommendation system, and the existence of the problem seriously influences the recommendation quality of the recommendation system. The recommendation under the user cold start scene is usually the recommendation of a pointer to a new user, and because the new user has less characteristic information mastered by a company and no history selection record, the targeted recommendation is difficult to be performed by adopting the existing recommendation algorithm technology.
The method for solving the cold start problem is that a user registration information database is used for regularly counting an article frequency statistical table according to each feature, a TOP-N article list with the highest occurrence frequency of each feature (M features are assumed) is transmitted to a user, M x N article records are aggregated to form t rows of article and accumulated ratio value records, and finally the t rows of article records are sorted according to the maximum value of the accumulated ratio value and the first N articles are preferentially recommended.
The existing recommendation method is easy to be biased to hot article recommendation, so that the recommendation effect is poor in accuracy, the coverage rate of recommended articles is low, and the long tail effect of the articles cannot be reduced; and needs to make large-scale frequency statistics frequently and regularly, and needs large resource consumption.
Disclosure of Invention
In view of this, one or more embodiments of the present disclosure are directed to a method and related device for recommending a dangerous type in a user cold start scenario, so as to solve the problem in the prior art that the accuracy is poor when a new user is recommended a product.
In view of the above, one or more embodiments of the present specification provide a method for recommending dangerous seeds in a user cold start scenario, including:
acquiring registration information of a target user;
determining a user set corresponding to the target user through a pre-constructed user set determination model according to the registration information;
in the user set, determining similar users of the target user through a pre-constructed similar user determination model;
and determining the recommended risk categories of the target user according to the risk categories selected by the similar users.
Optionally, the method further includes:
constructing a first sample set comprising a number of first samples; wherein the first sample comprises: first sample data and first tag data; the first sample data comprises registration information and dangerous species selection information of historical users; the first tag data comprises a user set corresponding to the historical user;
and according to the first sample set, constructing and training to obtain the user set determination model through a preset first machine learning algorithm.
Optionally, the method further includes:
constructing a second sample set comprising a plurality of second samples; wherein the second samples comprise: second sample data and second tag data; the second sample data comprises registration information of the historical user, dangerous species selection information and a user set corresponding to the historical user; the second tag data comprises similar users of the historical users;
and constructing and training the similar user determination model according to the second sample set through a preset second machine learning algorithm.
Optionally, the constructing and training the similar user determination model according to the second sample set by using a predetermined second machine learning algorithm includes:
dividing the characteristics of the registration information of the historical users in the user set into discrete characteristics and continuous characteristics;
calculating the similarity of the discrete features by using the hamming distance N _ unequal (x, y) and calculating the similarity of the continuous features by using the manhattan distance sum (| x-y |), and then calculating the user similarity according to the following formula: n _ unequal (x, y) KL + sum (| x-y |) KX; wherein KL is a discrete feature weight vector, and KX is a continuous feature weight vector;
constructing an objective function P ═ F (KL, KX); wherein P is the recommendation accuracy;
performing iterative computation by using a Tree-structured Parzen Estimator algorithm to obtain the optimal solution KL and KX of the target function P ═ F (KL, KX), so that the value of P is maximum;
and substituting the optimal solutions KL and KX into N _ unequal (x, y) KL + sum (| x-y |). KX, and constructing the similar user determination model.
Optionally, the constructing an objective function P ═ F (KL, KX) includes:
initializing KL and KX, and determining similar users of the historical users according to the formula N _ unequal (x, y) KL + sum (| x-y |). KX for calculating the user similarity;
determining recommended dangerous seeds of the historical users according to the dangerous seeds selected by similar users of the historical users;
calculating the times of the recommended risk of the historical user hitting the selected risk of the historical user as the number of hits;
and obtaining a recommendation accuracy rate P according to the number of hits and the recommendation times.
Optionally, the iteratively calculating, by using a Tree-structured Parzen Estimator algorithm, to obtain optimal solutions KL and KX of the objective function P ═ F (KL, KX), so that a value of P is maximum, includes:
initializing the weight vector (KL, KX) to K0;
substituting K0 into the objective function P ═ F (KL, KX), resulting in a P value P0, and expressing the vector obtained for the first time with (K0, P0);
substituting the first obtained vector { (K0, P0) } into a Tree-structured park Estimator algorithm to obtain a weight vector K1; substituting K1 into the objective function P ═ F (KL, KX), resulting in a P value P1, and expressing the vector obtained for the second time with (K1, P1);
introducing the first obtained vector and the second obtained vector { (K0, P0), (K1, P1) } into a Tree-structured Parzen Estimator algorithm to obtain a weight vector K2; substituting K2 into the objective function P ═ F (KL, KX), resulting in a P value P2, and expressing the third obtained vector with (K2, P2);
and (4) according to preset times of iterative calculation, and taking KL and KX corresponding to the maximum P value as the optimal solution of the target function P ═ F (KL, KX).
Optionally, the determining the recommended risk category of the target user according to the risk categories selected by the similar users includes:
and sorting the risk varieties selected by the similar users according to the sequence of the selected frequency from high to low, and taking the risk variety with the front preset digit as the recommended risk variety.
Based on the same inventive concept, one or more embodiments of the present specification provide a dangerous seed recommendation device in a user cold start scenario, including:
the registration information acquisition module is used for acquiring the registration information of the target user;
the user set determining module is used for determining a user set corresponding to the target user through a pre-constructed user set determining model according to the registration information;
a similar user determination module, configured to determine, in the user set, similar users of the target user through a pre-constructed similar user determination model;
and the recommended dangerous seed determining module is used for determining the recommended dangerous seed of the target user according to the dangerous seeds selected by the similar users.
Based on the same inventive concept, one or more embodiments of the present specification provide an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method as described above when executing the program.
Based on the same inventive concept, one or more embodiments of the present specification provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the above-described method.
As can be seen from the above description, one or more embodiments of the present specification provide a method for recommending dangerous seeds in a user cold start scenario, including: acquiring registration information of a target user; determining a user set corresponding to the target user through a pre-constructed user set determination model according to the registration information; in the user set, determining similar users of the target user through a pre-constructed similar user determination model; and determining the recommended risk categories of the target user according to the risk categories selected by the similar users. According to the method and the device, the accuracy of the dangerous seed recommendation is improved, the coverage rate of the dangerous seed recommendation is improved, and the resource consumption is reduced by preliminarily determining the user set and then determining the similar users in the user set.
Drawings
In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, the drawings that are needed in the description of the embodiments or prior art will be briefly described below, and it is obvious that the drawings in the following description are only one or more embodiments of the present specification, and that other drawings may be obtained by those skilled in the art without inventive effort from these drawings.
FIG. 1 is a flowchart illustrating a method for recommending dangerous seeds in a user cold start scenario according to one or more embodiments of the present disclosure;
FIG. 2 is a schematic flow diagram of a similar user determination model construction method provided in one or more embodiments of the present disclosure;
FIG. 3 is a flow diagram of a method for constructing an objective function according to one or more embodiments of the present disclosure;
FIG. 4 is a schematic flow diagram of a method for solving an optimal solution according to one or more embodiments of the present disclosure;
FIG. 5 is a schematic diagram illustrating an architecture of a dangerous seed recommendation device in a user cold start scenario according to one or more embodiments of the present disclosure;
fig. 6 is a schematic diagram of a more specific hardware structure of an electronic device according to one or more embodiments of the present disclosure.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present specification should have the ordinary meaning as understood by those of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in one or more embodiments of the specification is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.
For a product recommendation system, a large number of new users access the system every day, if the recommendation system can recommend favorite commodities for the new users, the recommendation system can gain trust of more users, loyalty of the users to the system is improved, and high-quality personalized services can be obtained for the users at any time.
The conventional product recommendation system is based on the idea of collaborative filtering, but the cold start problem is a classic problem which is widely concerned in a collaborative filtering recommendation algorithm, the problem always influences the healthy development of the conventional collaborative filtering recommendation system, and the existence of the problem seriously influences the recommendation quality of the recommendation system. The recommendation in the user cold start scenario is typically a pointer recommendation to a new user. Because the new user has little characteristic information mastered by the company and no historical purchase record, the existing recommendation algorithm technology is difficult to be adopted for targeted recommendation.
The method for solving the cold start problem is that a user registration information database is used for regularly counting an article frequency statistical table according to each feature, a TOP-N article list with the highest occurrence frequency of each feature (M features are assumed) is transmitted to a user, M x N article records are aggregated to form t rows of article and accumulated ratio value records, and finally the t rows of article records are sorted according to the maximum value of the accumulated ratio value and the first N articles are preferentially recommended.
The inventor finds that the technical scheme has the following defects in the implementation process:
the recommendation effect is poor in accuracy, and the customer satisfaction cannot be improved: the traditional recommendation method cannot realize the recommendation of thousands of people and can only carry out frequency statistics according to the characteristic group where the client registers information; due to the fact that the importance of the feature weight cannot be distinguished, a single feature group may have the disadvantage of being indistinguishable, so that the recommended contents of the clients are very likely to be uniform, the clients are always biased to the popular item recommendation, and the satisfaction degree of the clients cannot be improved.
The recommended article coverage rate is low, and the article long tail effect cannot be reduced: the concentrated hot item recommendation leads to the fact that the recommended items account for lower overall items, and the two-eight item principle cannot be broken, namely only a small number of items are recommended to a large number of customers, and the long-tail effect is still obvious.
Large resource consumption is required: traditional user cold starts require frequent and regular large-scale (e.g., billions of customers) frequency statistics; and for the client, frequent feature aggregation sequencing needs to be performed for the client for many times each time, and the consumed resources are significant.
In order to solve the above problem, one or more embodiments of the present specification provide a dangerous seed recommendation method in a user cold start scenario, and fig. 1 is a flowchart of the dangerous seed recommendation method in the user cold start scenario provided by one or more embodiments of the present specification, where the dangerous seed recommendation method includes:
and S110, acquiring the registration information of the target user.
The registration information refers to information input by a user during registration, the user can be a personal user or an enterprise user, and for the personal user, the registration information includes information such as age, gender, region, occupation, income condition, intention guarantee fee and the like; for enterprise users, information such as enterprise size, enterprise nature, enterprise on-duty personnel, enterprise occupation type, and time of purchase is registered. As one example, the obtained registration information of the target user includes information of unit property (economy), unit property (law), unit industry category, occupation code, season of the insurance, institution, number of workers, pay rate of last three years, interest per guarantee fee, current time interval and the like.
And S120, determining a user set corresponding to the target user through a pre-constructed user set determination model according to the registration information.
The method for establishing the user set determination model comprises the following steps: constructing a sample set comprising a plurality of samples; wherein the sample comprises: sample data and tag data; the sample data comprises registration information of a historical user and dangerous species selection information; the label data comprises a user set corresponding to the historical user;
and according to the sample set, constructing and training the user set determination model through a preset machine learning algorithm.
Wherein the predetermined machine learning algorithm may be selected from one or more of a naive bayes algorithm, a decision tree algorithm, a support vector machine algorithm, a kNN algorithm, a neural network algorithm, a deep learning algorithm, and a logistic regression algorithm.
The users can be divided into a plurality of user sets through the user set determination model, and the insurance requirements of the users in the same user set have relevance.
After the user set determination model is built, for a newly registered user, namely a target user, the registration information of the target user is input into the user set determination model, and a user set corresponding to the target user is output.
The registration information of the target user is preliminarily matched in a mode of determining the user set corresponding to the target user, information with low relevance in the information of the historical user is filtered, pertinence of dangerous seed recommendation is improved, and subsequent operation efficiency is also improved.
S130, determining similar users of the target user through a pre-constructed similar user determination model in the user set.
The inventor finds that the prior recommendation method carries out recommendation on the basis of data of all historical users, so that the probability of recommendation of the dangerous species selected by the similar user groups with larger quantity, namely the hot dangerous species, is increased, and correspondingly, the demands of the similar user groups with smaller quantity are easily ignored. Therefore, on one hand, the recommendation accuracy can be reduced for users, on the other hand, for products, only a small part of articles are recommended to a large part of users, the long-tail effect is obvious, and the product coverage rate is low.
The method determines the similar users of the target user on the basis of determining the user set of the target user in the steps, is corresponding and most intuitive, and has the difference between the recommended dangerous species of different user sets because of different requirements of different user sets, thereby improving the recommendation accuracy, improving the coverage rate of products and effectively improving the long tail effect.
The method for constructing the similar user determination model comprises the following steps: constructing a second sample set comprising a plurality of second samples; wherein the second samples comprise: second sample data and second tag data; the second sample data comprises registration information of the historical user, dangerous species selection information and a user set corresponding to the historical user; the second tag data comprises similar users of the historical users;
and constructing and training the similar user determination model according to the second sample set through a preset second machine learning algorithm.
Wherein the predetermined machine learning algorithm may be selected from one or more of a naive bayes algorithm, a decision tree algorithm, a support vector machine algorithm, a kNN algorithm, a neural network algorithm, a deep learning algorithm, and a logistic regression algorithm.
After the similar user determination model is built, for a newly registered user, namely a target user, the registration information of the target user and the user set information corresponding to the target user are input into the similar user determination model, and the similar user corresponding to the target user is output.
S140, determining the recommended dangerous species of the target user according to the dangerous species selected by the similar users.
Based on the idea of collaborative filtering, for the risk seeds selected by the users with higher similarity to the target user, the probability that the target user selects the risk seeds is higher, so that all the risk seeds selected by the screened similar users are ranked from high to low according to the frequency of the risk seeds selected by the users in the user set, and a plurality of risk seeds ranked in the top are used as recommended risk seeds. As an example, the first three or five risk categories may be considered as recommended risk categories.
Fig. 2 is a schematic flowchart of a similar user determination model building method according to one or more embodiments of the present specification, where the similar user determination model building method includes:
as an alternative embodiment, the similar user determination model provided by the present invention is constructed based on KNN (K-Nearest Neighbor) algorithm.
The idea of the KNN algorithm is as follows: in feature space, if the majority of the k nearest (i.e., nearest neighbor in feature space) samples in the vicinity of a sample belong to a certain class, then the sample also belongs to this class. Through the similar user determination model, similar users of the target user can be determined. In one possible implementation, one hundred similar users are determined.
S210, dividing the characteristics of the registration information of the historical users in the user set into discrete characteristics and continuous characteristics.
The registration information includes several items, and for individual users, the registration information includes information such as age, gender, region, occupation, income condition, intention guarantee fee and the like; for enterprise users, registration information such as unit nature (economy), unit nature (law), unit industry category, job code, year of application, institution, number of workers, odds of last three years, intended per-capitalist premium, current time interval, and the like. The registration information of the user is split according to the service meaning of each item of registration information, and the general, continuous type is not counted, and the discrete type is counted. For example, for an individual user, gender, location, and occupation are discrete features, and age, income and intent guarantees are continuous features. For enterprise users, discrete features include unit nature (economy), unit nature (law), unit industry category, career code, quarter where the application is located, institution where the application is located, and the like. The continuous features include the number of people at work, the odds paid in the last three years, the intent per-person premium and the time interval from the current time, etc.
S220, calculating the similarity of the discrete features by utilizing the Hamming distance N _ unequal (x, y) and calculating the similarity of the continuous features by utilizing the Manhattan distance sum (| x-y |), wherein a formula for calculating the user similarity is as follows: n _ unequal (x, y) KL + sum (| x-y |) KX.
In the prior art, similarity between features is generally calculated by using Euclidean distance, and the Euclidean distance is considered as difference between different attributes (namely indexes or variable dimensions) of a sample, but the inventor finds that different attributes of individuals have different importance for distinguishing the individuals in dangerous seed recommendation. Thus, the present invention uses hamming distances and manhattan distances of different weights to calculate the similarity between features.
The formula for calculating the user similarity is as follows: n _ unequal (x, y) KL + sum (| x-y |) KX.
Wherein KL is a discrete feature weight vector, and KX is a continuous feature weight vector; x and y are features of the user's registration information, respectively.
For discrete features, the present invention calculates their similarity using hamming distance N _ unequal (x, y), for example, x ═ 2 (00,02,05,50103,4,410023), y ═ 2 (00,01,05,50102,4,410023), and N _ unequal (x, y). The hamming distance represents the number of different elements in the two features, and since there are two different elements in the above example, the distance is 2.
S230, constructing an objective function P ═ F (KL, KX); wherein P is the recommended accuracy.
The objective function P ═ F (KL, KX) is constructed for training to obtain the weight of the formula N _ equitotal (x, y) × KL + sum (| x-y |) KX for calculating the user similarity. The weight that maximizes the value of P is taken as the optimal solution.
S240, using a Tree-structured park Estimator algorithm, and performing iterative computation to obtain the optimal solution KL and KX of the target function P ═ F (KL, KX), so that the value of P is maximum.
The Tree-structured park Estimator (TPE) algorithm can build a probabilistic model from the past results and determine the next set of hyper-parameters to evaluate in the objective function by maximizing the expected improvement.
And S250, substituting the optimal solutions KL and KX into N _ unequal (x, y) KL + sum (| x-y |). KX to construct the similar user determination model.
After the similar user determination model is built, for a newly registered user, namely a target user, the registration information of the target user and the user set information corresponding to the target user are input into the similar user determination model, and the similar user corresponding to the target user is output.
Fig. 3 is a flowchart illustrating an objective function constructing method according to one or more embodiments of the present disclosure, where the objective function constructing method includes:
s310, initializing KL and KX, and determining similar users of the historical users according to the formula N _ unequal (x, y) KL + sum (| x-y |). KX for calculating the user similarity.
And regarding all users in a user set, taking the registration information and insurance selection information of all users as a training set. For each training set user, according to a similarity formula N _ equitotal (x, y) × KL + sum (| x-y |) KX, several most similar users, for example, 100, 200, or 500 users, are selected from the user set as similar users.
S320, determining the recommended risk types of the historical users according to the risk types selected by the similar users of the historical users.
The risk types selected by the similar users are alternative risk types, the alternative risk types are arranged according to the selected frequency from large to small, and a plurality of risk types with large frequency are used as recommended risk types, such as 1,3 or 5.
S330, calculating the times of the recommended risk of the historical user hitting the selected risk of the historical user as the number of hits.
The training set user has truly selected dangerous species in the historical record, namely insurance selection information, the truly selected dangerous species and the recommended dangerous species are compared, and if the truly selected dangerous species and the recommended dangerous species are the same, the recommended dangerous species is considered to hit the selected dangerous species.
As an alternative embodiment, the selected dangerous seeds can be preset to control the guidance recommendation direction.
And S340, obtaining a recommendation accuracy rate P according to the number of hits and the recommendation times.
And after the training is finished, the ratio of the total number of hits to the total number of tests is the recommendation accuracy.
Fig. 4 is a flowchart illustrating an optimal solution solving method according to one or more embodiments of the present disclosure, where the optimal solution solving method includes:
s410, initializing the weight vector (KL, KX) to K0.
S420, substituting K0 into the objective function P ═ F (KL, KX), obtains a P value P0, and represents the first obtained vector by (K0, P0).
S430, substituting the first obtained vector { (K0, P0) } into a Tree-structured Parzen Estimator algorithm to obtain a weight vector K1; the objective function P ═ F (KL, KX) is substituted with K1 to obtain a P value P1, and the vector obtained for the second time is represented by (K1, P1).
S440, substituting the first obtained vector and the second obtained vector { (K0, P0), (K1, P1) } into a Tree-structured Parzen Estimator algorithm to obtain a weight vector K2; the target function P ═ F (KL, KX) is substituted with K2 to obtain a P value P2, and the vector obtained for the third time is represented by (K2, P2).
And S450, performing iterative calculation according to preset times, and taking KL and KX corresponding to the maximum P value as the optimal solution of the target function P which is F (KL, KX).
As an example, for the first time: initializing the weight vector (KL, KX) to K0, assuming a K0 value of (1,1,1,1,1,1,1,1,1,1), and a target function P ═ F (KL, KX), resulting in a P0 value of 0.72; taking (K0, P0) as a first obtained vector;
and (3) for the second time: substituting the above-mentioned first obtained vector { (K0, P0) } into the TPE algorithm, resulting in a K1 value of (1,0.1,0.3,0.7,0.1,0.4,0.6,0.7,0.3,0.4), a P1 value of 0.65 via an objective function P ═ F (KL, KX); taking { (K0, P0), (K1, P1) } as the second-time obtained vector;
and thirdly: the first and second obtained vectors { (K0, P0), (K1, P1) } are brought into the TPE algorithm to obtain K2 values of (0.75,0.9,0.7,0.66,0.32,0.43,0.62,0.63,0.23,0.13), and a P2 value of 0.78 via an objective function P ═ F (KL, KX); using { (K0, P0), (K1, P1), (K2, P2) } as a third-time obtained vector;
fourth time: the first, second and third obtained vectors { (K0, P0), (K1, P1), (K2, P2) } are introduced into the TPE algorithm to obtain a K3 value of (0.6,1.2,0.77,0.63,0.31,0.49,0.34,0.67,0.98,0.34), a P3 value of 0.85 by the objective function P ═ F (KL, KX); using { (K0, P0), (K1, P1), (K2, P2), (K3, P3) } as a fourth-time obtained vector;
……
the first ten thousand times: the vector obtained from the first time to the 9999 th time { (K0, P0), (K1, P1), (K2, P2) … (K9999, P9999) } is introduced into the TPE algorithm, resulting in a K10000 value of (0.21,1.21,0.88,1.33,2.31,9.23,3.31,6.71,3.89,4.41), and a P10000 value of 0.83 by the objective function.
And taking the K value corresponding to the maximum P value in the ten-thousand iterations as an optimal solution. Assuming that the largest P value in the above ten thousand occurs in the fourth time, i.e., P3 is 0.85, (0.6,1.2,0.77,0.63,0.31,0.49,0.34,0.67,0.98,0.34) is taken as the optimal solution for K, i.e., kl is (0.6,1.2,0.77,0.63, 0.31); kx is (0.49,0.34,0.67,0.98, 0.34).
As can be seen from the above description, one or more embodiments of the present specification provide a method for recommending dangerous seeds in a user cold start scenario, including: acquiring registration information of a target user; determining a user set corresponding to the target user through a pre-constructed user set determination model according to the registration information; in the user set, determining similar users of the target user through a pre-constructed similar user determination model; and determining the recommended risk categories of the target user according to the frequency of the risk categories selected by the similar users. According to the method and the system, the accuracy of the dangerous seed recommendation is improved, the coverage rate of the dangerous seed recommendation is improved, and the resource consumption is reduced by the way of primarily determining the user set and then determining the similar users in the user set.
It should be noted that the method of one or more embodiments of the present disclosure may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may perform only one or more steps of the method of one or more embodiments of the present disclosure, and the devices may interact with each other to complete the method.
It should be noted that the above description describes certain embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Based on the same inventive concept, corresponding to any embodiment method, one or more embodiments of the present specification further provide a dangerous seed recommending apparatus in a user cold start scenario.
Referring to fig. 5, the dangerous seed recommending apparatus in a user cold start scenario includes:
a registration information obtaining module 510, configured to obtain registration information of a target user.
A user set determining module 520, configured to determine, according to the registration information, a user set corresponding to the target user through a pre-constructed user set determination model.
A similar user determination module 530, configured to determine, in the user set, similar users of the target user through a pre-constructed similar user determination model.
And a recommended risk category determining module 540, configured to determine the recommended risk category of the target user according to the risk categories selected by the similar users.
Optionally, the similar user determining module 530 is specifically configured to:
dividing the registration information characteristics of the historical users in the user set into discrete characteristics and continuous characteristics;
calculating the similarity of the discrete features by using the hamming distance N _ unequal (x, y) and calculating the similarity of the continuous features by using the manhattan distance sum (| x-y |), and then calculating the user similarity according to the following formula: n _ unequal (x, y) KL + sum (| x-y |) KX; wherein KL is a discrete feature weight vector, and KX is a continuous feature weight vector;
constructing an objective function P ═ F (KL, KX); wherein P is the recommendation accuracy;
performing iterative computation by using a Tree-structured Parzen Estimator algorithm to obtain the optimal solution KL and KX of the target function P ═ F (KL, KX), so that the value of P is maximum;
and substituting the optimal solutions KL and KX into N _ unequal (x, y) KL + sum (| x-y |). KX, and constructing the similar user determination model.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the modules may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.
The device of the above embodiment is used for implementing the corresponding dangerous seed recommendation method in the user cold start scene in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the above embodiments, one or more embodiments of the present specification further provide an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the risk recommendation method in the user cold start scenario described in any of the above embodiments.
Fig. 6 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.
The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.
It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
The electronic device of the above embodiment is used for implementing the corresponding dangerous seed recommendation method in the user cold start scene in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Based on the same inventive concept, corresponding to any of the above embodiment methods, one or more embodiments of the present specification further provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the risk recommendation method in a user cold start scenario according to any of the above embodiments.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
The computer instructions stored in the storage medium of the above embodiment are used to enable the computer to execute the dangerous seed recommendation method in the user cold start scenario as described in any of the above embodiments, and have the beneficial effects of the corresponding method embodiments, which are not described herein again.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the spirit of the present disclosure, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments of the present description as described above, which are not provided in detail for the sake of brevity.
In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures, for simplicity of illustration and discussion, and so as not to obscure one or more embodiments of the disclosure. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the understanding of one or more embodiments of the present description, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the one or more embodiments of the present description are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that one or more embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
It is intended that the one or more embodiments of the present specification embrace all such alternatives, modifications and variations as fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (10)

1. A dangerous seed recommendation method in a user cold start scene is characterized by comprising the following steps:
acquiring registration information of a target user;
determining a user set corresponding to the target user through a pre-constructed user set determination model according to the registration information;
in the user set, determining similar users of the target user through a pre-constructed similar user determination model;
and determining the recommended risk categories of the target user according to the risk categories selected by the similar users.
2. The method of claim 1, further comprising:
constructing a first sample set comprising a number of first samples; wherein the first sample comprises: first sample data and first tag data; the first sample data comprises registration information and dangerous species selection information of historical users; the first tag data comprises a user set corresponding to the historical user;
and according to the first sample set, constructing and training to obtain the user set determination model through a preset first machine learning algorithm.
3. The method of claim 2, further comprising:
constructing a second sample set comprising a plurality of second samples; wherein the second samples comprise: second sample data and second tag data; the second sample data comprises registration information of the historical user, dangerous species selection information and a user set corresponding to the historical user; the second tag data comprises similar users of the historical users;
and constructing and training the similar user determination model according to the second sample set through a preset second machine learning algorithm.
4. The method of claim 3, wherein constructing and training the similar user-determined model from the second sample set by a predetermined second machine learning algorithm comprises:
dividing the characteristics of the registration information of the historical users in the user set into discrete characteristics and continuous characteristics;
calculating the similarity of the discrete features by using the hamming distance N _ unequal (x, y) and calculating the similarity of the continuous features by using the manhattan distance sum (| x-y |), and then calculating the user similarity according to the following formula: n _ unequal (x, y) KL + sum (| x-y |) KX; wherein KL is a discrete feature weight vector, and KX is a continuous feature weight vector;
constructing an objective function P ═ F (KL, KX); wherein P is the recommendation accuracy;
performing iterative computation by using a Tree-structured Parzen Estimator algorithm to obtain the optimal solution KL and KX of the target function P ═ F (KL, KX), so that the value of P is maximum;
and substituting the optimal solutions KL and KX into N _ unequal (x, y) KL + sum (| x-y |). KX, and constructing the similar user determination model.
5. The method according to claim 4, wherein constructing an objective function P ═ F (KL, KX) comprises:
initializing KL and KX, and determining similar users of the historical users according to the formula N _ unequal (x, y) KL + sum (| x-y |). KX for calculating the user similarity;
determining recommended dangerous seeds of the historical users according to the dangerous seeds selected by similar users of the historical users;
calculating the times of the recommended risk of the historical user hitting the selected risk of the historical user as the number of hits;
and obtaining a recommendation accuracy rate P according to the number of hits and the recommendation times.
6. The method of claim 4, wherein iteratively calculating optimal solutions KL and KX of the objective function P ═ F (KL, KX) using a Tree-structured park Estimator algorithm such that the value of P is maximized comprises:
initializing the weight vector (KL, KX) to K0;
substituting K0 into the objective function P ═ F (KL, KX), resulting in a P value P0, and expressing the vector obtained for the first time with (K0, P0);
substituting the first obtained vector { (K0, P0) } into a Tree-structured park Estimator algorithm to obtain a weight vector K1; substituting K1 into the objective function P ═ F (KL, KX), resulting in a P value P1, and expressing the vector obtained for the second time with (K1, P1);
introducing the first obtained vector and the second obtained vector { (K0, P0), (K1, P1) } into a Tree-structured Parzen Estimator algorithm to obtain a weight vector K2; substituting K2 into the objective function P ═ F (KL, KX), resulting in a P value P2, and expressing the third obtained vector with (K2, P2);
and (4) according to preset times of iterative calculation, and taking KL and KX corresponding to the maximum P value as the optimal solution of the target function P ═ F (KL, KX).
7. The method of claim 1, wherein the determining the recommended risk category of the target user based on the risk categories selected by the similar users comprises:
and sorting the risk varieties selected by the similar users according to the sequence of the selected frequency from high to low, and taking the risk variety with the front preset digit as the recommended risk variety.
8. A dangerous seed recommending device in a user cold start scenario, comprising:
the registration information acquisition module is used for acquiring the registration information of the target user;
the user set determining module is used for determining a user set corresponding to the target user through a pre-constructed user set determining model according to the registration information;
a similar user determination module, configured to determine, in the user set, similar users of the target user through a pre-constructed similar user determination model;
and the recommended dangerous seed determining module is used for determining the recommended dangerous seed of the target user according to the dangerous seeds selected by the similar users.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the program.
10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 7.
CN202011388337.6A 2020-12-01 2020-12-01 Dangerous seed recommendation method and related equipment in user cold start scene Pending CN112488863A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011388337.6A CN112488863A (en) 2020-12-01 2020-12-01 Dangerous seed recommendation method and related equipment in user cold start scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011388337.6A CN112488863A (en) 2020-12-01 2020-12-01 Dangerous seed recommendation method and related equipment in user cold start scene

Publications (1)

Publication Number Publication Date
CN112488863A true CN112488863A (en) 2021-03-12

Family

ID=74938831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011388337.6A Pending CN112488863A (en) 2020-12-01 2020-12-01 Dangerous seed recommendation method and related equipment in user cold start scene

Country Status (1)

Country Link
CN (1) CN112488863A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127749A (en) * 2021-05-18 2021-07-16 北京大米科技有限公司 Content recommendation method and device, readable storage medium and electronic equipment
CN113592605A (en) * 2021-08-10 2021-11-02 平安银行股份有限公司 Product recommendation method, device, equipment and storage medium based on similar products
CN113761370A (en) * 2021-09-08 2021-12-07 广东百家投资咨询有限公司 Insurance product recommendation method, system, device and storage medium
US11869015B1 (en) 2022-12-09 2024-01-09 Northern Trust Corporation Computing technologies for benchmarking

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030177110A1 (en) * 2002-03-15 2003-09-18 Fujitsu Limited Profile information recommendation method, program and apparatus
CN108510402A (en) * 2018-06-06 2018-09-07 中国平安人寿保险股份有限公司 Insurance kind information recommendation method, device, computer equipment and storage medium
CN109858999A (en) * 2018-12-29 2019-06-07 重庆金链科技有限公司 A kind of insurance products recommended method, device, equipment and readable storage medium storing program for executing
CN110046965A (en) * 2019-04-18 2019-07-23 北京百度网讯科技有限公司 Information recommendation method, device, equipment and medium
CN111476595A (en) * 2020-03-06 2020-07-31 深圳壹账通智能科技有限公司 Product pushing method and device, computer equipment and storage medium
CN111506724A (en) * 2020-07-02 2020-08-07 北京梦天门科技股份有限公司 Standard phrase recommendation method and device
CN111739657A (en) * 2020-07-20 2020-10-02 北京梦天门科技股份有限公司 Epidemic infected person prediction method and system based on knowledge graph
CN111861760A (en) * 2020-06-22 2020-10-30 中国平安财产保险股份有限公司 Product recommendation method, device, equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030177110A1 (en) * 2002-03-15 2003-09-18 Fujitsu Limited Profile information recommendation method, program and apparatus
CN108510402A (en) * 2018-06-06 2018-09-07 中国平安人寿保险股份有限公司 Insurance kind information recommendation method, device, computer equipment and storage medium
CN109858999A (en) * 2018-12-29 2019-06-07 重庆金链科技有限公司 A kind of insurance products recommended method, device, equipment and readable storage medium storing program for executing
CN110046965A (en) * 2019-04-18 2019-07-23 北京百度网讯科技有限公司 Information recommendation method, device, equipment and medium
CN111476595A (en) * 2020-03-06 2020-07-31 深圳壹账通智能科技有限公司 Product pushing method and device, computer equipment and storage medium
CN111861760A (en) * 2020-06-22 2020-10-30 中国平安财产保险股份有限公司 Product recommendation method, device, equipment and storage medium
CN111506724A (en) * 2020-07-02 2020-08-07 北京梦天门科技股份有限公司 Standard phrase recommendation method and device
CN111739657A (en) * 2020-07-20 2020-10-02 北京梦天门科技股份有限公司 Epidemic infected person prediction method and system based on knowledge graph

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘鑫 等: "改进的协同过滤算法在资源推荐系统中的应用研究", 科技传播, pages 155 - 156 *
李超群;: "基于贝叶斯网络对全国PM_(2.5)浓度影响因素分析", 科技创新与应用, no. 22, pages 1 - 5 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127749A (en) * 2021-05-18 2021-07-16 北京大米科技有限公司 Content recommendation method and device, readable storage medium and electronic equipment
CN113592605A (en) * 2021-08-10 2021-11-02 平安银行股份有限公司 Product recommendation method, device, equipment and storage medium based on similar products
CN113592605B (en) * 2021-08-10 2023-08-22 平安银行股份有限公司 Product recommendation method, device, equipment and storage medium based on similar products
CN113761370A (en) * 2021-09-08 2021-12-07 广东百家投资咨询有限公司 Insurance product recommendation method, system, device and storage medium
CN113761370B (en) * 2021-09-08 2024-02-13 广东百家投资咨询有限公司 Insurance product recommendation method, system, device and storage medium
US11869015B1 (en) 2022-12-09 2024-01-09 Northern Trust Corporation Computing technologies for benchmarking

Similar Documents

Publication Publication Date Title
CN108629665B (en) Personalized commodity recommendation method and system
CN112488863A (en) Dangerous seed recommendation method and related equipment in user cold start scene
CN108205768A (en) Database building method and data recommendation method and device, equipment and storage medium
CN111798273A (en) Training method of purchase probability prediction model of product and purchase probability prediction method
US20140278778A1 (en) Method, apparatus, and computer-readable medium for predicting sales volume
CN110457577B (en) Data processing method, device, equipment and computer storage medium
CN109299356B (en) Activity recommendation method and device based on big data, electronic equipment and storage medium
CN110489642A (en) Method of Commodity Recommendation, system, equipment and the medium of Behavior-based control signature analysis
CN111461841A (en) Article recommendation method, device, server and storage medium
CN109460519B (en) Browsing object recommendation method and device, storage medium and server
CN105468628B (en) A kind of sort method and device
CN111695023A (en) Information recommendation method and device, storage medium and equipment
JP6696568B2 (en) Item recommendation method, item recommendation program and item recommendation device
CN115578163A (en) Personalized pushing method and system for combined commodity information
CN112598472A (en) Product recommendation method, device, system, medium and program product
CN109977299A (en) A kind of proposed algorithm of convergence project temperature and expert's coefficient
CN107093122B (en) Object classification method and device
CN103366308A (en) Information processing apparatus, information processing method, and program
US20170316483A1 (en) Generating a personalized list of items
US20190205341A1 (en) Systems and methods for measuring collected content significance
CN112632275B (en) Crowd clustering data processing method, device and equipment based on personal text information
CN108389113A (en) A kind of collaborative filtering recommending method and system
CN114780865A (en) Information recommendation method and device, computer equipment and storage medium
CN114119069A (en) Resource management product recommendation method and device, electronic equipment and storage medium
CN112732891A (en) Office course recommendation method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination