CN109710890A - Behavior portrait model based on building identifies the method and system of false material in real time - Google Patents

Behavior portrait model based on building identifies the method and system of false material in real time Download PDF

Info

Publication number
CN109710890A
CN109710890A CN201811566426.8A CN201811566426A CN109710890A CN 109710890 A CN109710890 A CN 109710890A CN 201811566426 A CN201811566426 A CN 201811566426A CN 109710890 A CN109710890 A CN 109710890A
Authority
CN
China
Prior art keywords
user
model
portrait
false
building
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811566426.8A
Other languages
Chinese (zh)
Other versions
CN109710890B (en
Inventor
王萍
贾坤
陈少磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan XW Bank Co Ltd
Original Assignee
Sichuan XW Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan XW Bank Co Ltd filed Critical Sichuan XW Bank Co Ltd
Priority to CN201811566426.8A priority Critical patent/CN109710890B/en
Publication of CN109710890A publication Critical patent/CN109710890A/en
Application granted granted Critical
Publication of CN109710890B publication Critical patent/CN109710890B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of, and the behavior portrait model based on building identifies the method and system of false material in real time, and this method is mainly used in the real-time audit field of internet industry user's material true and false.The technical method is solved the prior art and is judged using single-point, is accidentally shot straight, the problem that precision is lower, input cost is very big.The present invention includes the building of identification model, is audited by identification model to user's material, and carries out real-time, interactive with user in mobile terminal or the end PC, and real-time monitoring user submits the true and false of material.The present invention submits the true and false of material for online identification user in real time.

Description

Behavior portrait model based on building identifies the method and system of false material in real time
Technical field
A kind of behavior portrait model based on building identifies the method and system of false material in real time, for accurately identifying The true and false for the material that user fills in, belongs to that internet is counter to cheat technical field, and can be extended to and need to verify user on any line The field of the fill data true and false.
Background technique
User behavior portrait refers to the label of the sequence of user behavioural characteristic based on the building of user clickstream data.
False data: in Internet service scene, user is needed to fill in the personal material of some column, comprising: home address, Contact person's cell-phone number, name of contact person, company's unit, unit address etc.;It is false for referring to the data that user fills in.
It needs to find that black production technological means constantly upgrades by the industry survey for verifying user information in internet, in order to allow Underproof material meets the requirements, and the false data of various packagings emerges one after another, and how effectively to identify that the subscriber data true and false has become For the prevention and control emphasis of internet industry.Traditional method mainly relies on artificial electric core, cross validation etc..The prior art is generally logical Cross the true and false of the methods of artificial electric core, cross validation identification data.First method, artificial electricity core, i.e., used by customer service call Family carries out verification review to the data that user fills in, and passes through the comprehensive true and false for verifying information of phone;The main drawback of such method Have: 1, fund cost and human cost investment are high;2, audit time is long, to increase entire submission duration;3, user experience It is very poor;4, it is not easy to find the true and false of data from telephone talk.Second method, cross validation, i.e., by inquiring external tripartite Data are compared with the data that user fills in.The major drawbacks of such method have: 1, inquiring external data, need to pass through purchase The mode bought obtains external data, certainly will increase fund cost;2, external data there are problems that rank recall and when point tolerance, lead Rate increase is accidentally refused in cause;3, for text category information, it can not accomplish complete accurate fuzzy matching at present, such as: the company that user fills in For " Chengdu Huawei ", the user company information that external data queries are returned is " Chengdu research institute, Huawei Tech Co., Ltd ", is Uniting, it is inconsistent to will be considered that, causes to misidentify.
To sum up, it is existing about submit data verification method existing for main problem are as follows: 1, accuracy rate it is extremely low;2, investment at This is high;3, can not online real-time judge data the true and false.
Summary of the invention
Aiming at the problem that the studies above, the purpose of the present invention is to provide the behavior portrait models based on building to identify in real time The method and system of false material is solved the prior art and is judged using single-point, accidentally shot straight, precision is lower, input cost Very big problem, the present invention include the building of identification model, are audited by identification model to user's material, and in mobile terminal Or the end PC carries out real-time, interactive with user, real-time monitoring user submits the true and false of material.
In order to achieve the above object, the present invention adopts the following technical scheme:
A method of building identifies the behavior portrait model of false material, which is characterized in that following steps:
S1, the online user behavior portrait obtained in real time for training, as feature set V1;
S2, feature set V1 is subjected to feature importance ranking by random forest in real time, reject the feature that is not split off into Row feature machining, the user behavior portrait after feature is rejected in building, as feature set V2;
S3, two modelings are carried out to feature set V2, the feature set V3 after obtaining two modelings;
S4, identification model is obtained with feature set V3 construction logic regression model, obtains final identification model.
Further, the specific steps of the step S1 are as follows:
S1.1, in such a way that the page buries SDK, obtain the relevant clickstream data of user, such as;
S1.2, it will click on flow data processing generation user behavior portrait label in feature factory, all user behaviors are drawn As label constitutes user behavior portrait.
Further, in the step S1.2, flow data processing on real-time will click on using system features factory and generate user's row For label of drawing a portrait.
Further, the specific steps of the step S3 are as follows:
S3.1, random stratified sampling is carried out to feature set V2, generates M1, M2 ..Mn, n different types of divisions, each Type is divided into training set and verifying collection;
S3.2, two modelings, n different models of building are carried out to different type based on XGBOOST method;
S3.3, using the result that the model of building obtains is as new feature and binding characteristic collection V2 constructs new feature set V3。
Further, the specific steps of the step S4 are as follows:
S4.1, select in feature set V3 most significant feature as explanatory variable by ridge regression mode;
S4.2, max log possibility predication is carried out to explanatory variable;
If there are also significant explanatory variables in S4.3, feature set V3, step S4.1- step S4.3 is repeated, otherwise by likelihood Estimate all features retained as identification model.
A method of the behavior portrait model based on building identifies false material in real time, which is characterized in that specific steps Are as follows:
When step 1, user's fill data, the online relevant clickstream data of user that obtains in real time is processed into use to be identified Family behavior portrait label;
Step 2, label that user behavior to be identified is drawn a portrait are input to identification model and judge, and to the falseness of identification Material is intercepted.
Further, the identification model in the step 2 is implanted into such a way that the page buries SDK.
A kind of system that the behavior portrait model based on building identifies false material in real time characterized by comprising rule Automotive engine system, rule engine system include;
System features factory module: for obtaining the relevant click fluxion of user in real time online in user's fill data According to being processed into user behavior to be identified portrait label;
Identification module: user behavior to be identified portrait label is input to identification model and is judged, and to identification False material is intercepted.
Further, the identification model being implanted into such a way that the page buries SDK in the identification module.
The present invention compared with the existing technology, its advantages are shown in:
1, the AUC of identification model is 0.9-0.98 in the present invention, and Gini coefficient is 0.8-0.96, can be with from AUC and Gini Find out that false data model has extremely strong separating capacity.Optimal model is selected, optimal cut-off is looked for by F1 Point, when be arranged false data index >=75 be false when, recognition methods precision with higher in the present invention and extremely low Accidentally hit rate, wherein accidentally hit rate is 0.7%, precision 92% compares traditional identification method, recognition capability pole of the present invention Height improves 80% compared with conventional cross verification method precision;
2, recognition methods of the invention can carry out real-time, interactive with user, and real-time prompting user, which fills in, correctly submits money Expect material, improves the availability of material;
3, recognition methods of the invention greatly reduces capital investment, has saved time cost.Traditional cross validation Method needs to inquire external data and carries out intersection comparison, and the online real-time judge of the technical method easily solves external data The problem that rank recall is extremely low, inquiry is costly, the torsion time is long;By the true and false of system automatic identification material,
4, recognition methods of the invention is judged, no completely instead of the method manually verified by the online real-time perfoming of SDK Intelligence is only realized, human cost is really saved;
5, recognition methods of the invention can cope with the continuous renewal of fraud crime means, and adjustment rate can keep up with new in real time Fraudulent mean;
6. the present invention uses two modelings, i.e., model is again pulled up on the basis of initial model, in order to more preferable The accuracy and robustness for reaching data.
Detailed description of the invention
Nothing
Specific embodiment
Embodiment
This paper presents a kind of false data recognition methods based on user behavior portrait, and this method is based on huge user Behavioural characteristic label system, by way of two modelings after, logistic regression based on successive Regression generates the scoring of submission data Block (i.e. identification model), exports submission data index, the true and false of rational judgment user's fill data.Not only instead of artificial data The process of audit, and a possibility that user fills in false data can be identified in real time
Brought effect is as follows in specific application scenarios:
Submit data business credit link in internet, submit user to predict to online in real time, and can with user into Row real-time, interactive reminds user to fill in correct information.
Specific embodiment is as follows:
Identification model is first constructed, a method of building identifies the behavior portrait model of false material, following steps:
S1, it obtains and draws a portrait for the user behavior of training, as feature set V1;
Specific steps are as follows:
S1.1, in such a way that the page buries SDK, obtain the relevant clickstream data of user;User behavior portrait relies on It in huge user behavior data, needs bury a little in mobile terminal, in such a way that the page buries SDK, it is relevant to obtain user Basic information, such as: IP, GPS, facility information;And record all operation notes of user;
S1.2, it will click on flow data processing generation user behavior portrait label, all user behaviors portrait labels, which are constituted, to be used Family behavior portrait;The real-time calculating of the feature of user behavior portrait, is based on system features factory module, such as uses in credit industry Anti- fraud feature factory module in anti-fraud rule engine system, powerful data-handling capacity and operational capability, (in the least in real time Second grade) processing generation user behavior portrait label.
By system features factory module, processing on real-time generates a series of specific user behaviors portrait labels, such as: 1, using Family use space-time preference, such as: commercial circle place preference (tourist area, shopping centre, Office Area), using time preference (working day/ Festivals or holidays, early, middle and late etc.);2, user's operation frequency/period preference, such as: interval time is submitted in monthly login times, registration Deng;3, stream information is clicked, such as: user's typing speed, user input company's unit time;4, gyroscope information, such as: user makes With the position of mobile phone, common click location;5, each dimension related information of user is (such as: the associated cell-phone number number of equipment, cell-phone number Associated identity card number etc.) etc..
Obtained user behavior portrait label is input in the identification module in anti-fraud rule engine system, thus into Row step S2-S4 constructs identification model.
S2, feature set V1 is subjected to feature importance ranking by random forest, rejects the feature not being split off, building is picked Except the user behavior portrait after feature, as feature set V2;
S3, two modelings are carried out to feature set V2, the feature set V3 after obtaining two modelings;
Specific steps are as follows:
S3.1, random stratified sampling is carried out to feature set V2, generates M1, M2 ..Mn, n different types of divisions, each Type is divided into training set and verifying collection;Different type is different false material types, such as contact person's falseness model, work unit's name Claim 3 models such as false model, business address falseness model;
S3.2, two modelings, n different models of building are carried out to different type based on XGBOOST method;
S3.3, using the result that the model of building obtains is as new feature and binding characteristic collection V2 constructs new feature set V3。
S4, identification model is obtained with feature set V3 construction logic regression model to get final identification model is arrived.Construct The scorecard of false data and false data indexes Index (0-100).Specific steps are as follows:
S4.1, select in feature set V3 most significant feature as explanatory variable by ridge regression mode;
S4.2, max log possibility predication is carried out to explanatory variable;
If there are also significant explanatory variables in S4.3, feature set V3, step S4.1- step S4.3 is repeated, otherwise by likelihood Estimate all features retained as identification model.
In embodiment, can define 75 points the above are false data high risks.
Before carrying out specific false material identification, first passes through the page and bury the mode of SDK and be implanted into the identification for identifying false material Model.
A method of the behavior portrait model based on building identifies false material, specific steps in real time are as follows:
It is online to obtain the relevant clickstream data of user in real time when step 1, user's fill data, pass through feature factory mould Block is processed into user behavior portrait label to be identified in real time;
Step 2, label that user behavior to be identified is drawn a portrait are input to identification model and judge, and to the falseness of identification Material is intercepted.The user behavior portrait label that will be processed is passed to SDK, carries out the identification of data falseness model;It is anti-in real time Recognition result is presented, if SDK is judged as data falseness, page prompts user " please input correct information ";Interaction is (i.e. defeated repeatedly Again input judgement again after mistake), it is known that identification model assert that data is true.To prevent from judging by accident, interaction can be implanted into SDK Number limitation, such as most 3 times.Interaction is completed to continue backward process and is walked.
In conclusion the present invention judges user's material by identification model, and in mobile terminal or the end PC with user Real-time, interactive is carried out, real-time monitoring user submits the true and false of material.
The above is only the representative embodiment in the numerous concrete application ranges of the present invention, to protection scope of the present invention not structure At any restrictions.It is all using transformation or equivalence replacement and the technical solution that is formed, all fall within rights protection scope of the present invention it It is interior.

Claims (9)

1. a kind of method that building identifies the behavior portrait model of false material, which is characterized in that following steps:
S1, it obtains and draws a portrait for the user behavior of training, as feature set V1;
S2, feature set V1 is subjected to feature importance ranking by random forest, rejects the feature not being split off, building is rejected special User behavior portrait after sign, as feature set V2;
S3, two modelings are carried out to feature set V2, the feature set V3 after obtaining two modelings;
S4, identification model is obtained with feature set V3 construction logic regression model, obtains final identification model.
2. the method that a kind of building according to claim 1 identifies the behavior portrait model of false material, it is characterised in that: The specific steps of the step S1 are as follows:
S1.1, in such a way that the page buries SDK, obtain the relevant clickstream data of user;
S1.2, it will click on flow data processing generation user behavior portrait label, all user behaviors portrait labels constitute user's rows For portrait.
3. the method that a kind of building according to claim 2 identifies the behavior portrait model of false material, it is characterised in that: In the step S1.2, flow data processing on real-time will click on using system features factory and generate user behavior portrait label.
4. the method that a kind of building according to claim 1 or 3 identifies the behavior portrait model of false material, feature exist In the specific steps of the step S3 are as follows:
S3.1, random stratified sampling is carried out to feature set V2, generates M1、M2、..Mn, n different types of divisions, each type It is divided into training set and verifying collection;
S3.2, two modelings, n different models of building are carried out to different type based on XGBOOST method;
S3.3, using the result that the model of building obtains is as new feature and binding characteristic collection V2 constructs new feature set V3.
5. the method that a kind of building according to claim 4 identifies the behavior portrait model of false material, it is characterised in that: The specific steps of the step S4 are as follows:
S4.1, select in feature set V3 most significant feature as explanatory variable by ridge regression mode;
S4.2, max log possibility predication is carried out to explanatory variable;
If there are also significant explanatory variables in S4.3, feature set V3, step S4.1- step S4.3 is repeated, otherwise by possibility predication All features retained are as identification model.
6. a kind of method that the behavior portrait model based on building identifies false material in real time, which is characterized in that specific steps are as follows:
When step 1, user's fill data, the online relevant clickstream data of user that obtains in real time is processed into user's row to be identified For label of drawing a portrait;
Step 2, label that user behavior to be identified is drawn a portrait are input to identification model and judge, and to the false material of identification It is intercepted.
7. the method that a kind of behavior portrait model based on building according to claim 6 identifies false material in real time, It is characterized in that, the identification model in the step 2 is implanted into such a way that the page buries SDK.
8. a kind of system that the behavior portrait model based on building identifies false material in real time characterized by comprising rule is drawn System is held up, rule engine system includes;
System features factory module: for being obtained at the relevant clickstream data of user in real time online in user's fill data Manage into user behavior portrait label to be identified;
Identification module: user behavior to be identified portrait label is input to identification model and is judged, and to the falseness of identification Material is intercepted.
9. the system that a kind of behavior portrait model based on building according to claim 8 identifies false material in real time, It is characterized in that: the identification model being implanted into such a way that the page buries SDK in the identification module.
CN201811566426.8A 2018-12-20 2018-12-20 Method and system for identifying false material in real time based on constructed behavior portrait model Active CN109710890B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811566426.8A CN109710890B (en) 2018-12-20 2018-12-20 Method and system for identifying false material in real time based on constructed behavior portrait model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811566426.8A CN109710890B (en) 2018-12-20 2018-12-20 Method and system for identifying false material in real time based on constructed behavior portrait model

Publications (2)

Publication Number Publication Date
CN109710890A true CN109710890A (en) 2019-05-03
CN109710890B CN109710890B (en) 2023-06-09

Family

ID=66257041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811566426.8A Active CN109710890B (en) 2018-12-20 2018-12-20 Method and system for identifying false material in real time based on constructed behavior portrait model

Country Status (1)

Country Link
CN (1) CN109710890B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581326A (en) * 2019-09-30 2021-03-30 北京国双科技有限公司 Method, device, storage medium and equipment for discriminating false litigation
CN113157516A (en) * 2020-12-11 2021-07-23 四川新网银行股份有限公司 Model monitoring system and method for quasi-real-time calculation

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150378A (en) * 2013-03-13 2013-06-12 珠海市君天电子科技有限公司 Method for identifying false favorable comments in microblog advertisements
CN106529773A (en) * 2016-10-31 2017-03-22 宜人恒业科技发展(北京)有限公司 Online credit and fraud risk evaluation method based on identifying code type question answering
CN106599022A (en) * 2016-11-01 2017-04-26 中山大学 User portrait forming method based on user access data
CN107317688A (en) * 2017-07-25 2017-11-03 薛江炜 The device and method of communication group is created based on tag along sort
CN107423442A (en) * 2017-08-07 2017-12-01 火烈鸟网络(广州)股份有限公司 Method and system, storage medium and computer equipment are recommended in application based on user's portrait behavioural analysis
CN108256691A (en) * 2018-02-08 2018-07-06 成都智宝大数据科技有限公司 Refund Probabilistic Prediction Model construction method and device
CN108304884A (en) * 2018-02-23 2018-07-20 华东理工大学 A kind of cost-sensitive stacking integrated study frame of feature based inverse mapping
CN108364191A (en) * 2018-01-11 2018-08-03 国网山东省电力公司 Top-tier customer Optimum Identification Method and device based on random forest and logistic regression
US20180285731A1 (en) * 2017-03-30 2018-10-04 Atomwise Inc. Systems and methods for correcting error in a first classifier by evaluating classifier output in parallel
CN108898479A (en) * 2018-06-28 2018-11-27 中国农业银行股份有限公司 The construction method and device of Credit Evaluation Model
CN108932625A (en) * 2017-05-23 2018-12-04 北京京东尚科信息技术有限公司 Analysis method, device, medium and the electronic equipment of user behavior data

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150378A (en) * 2013-03-13 2013-06-12 珠海市君天电子科技有限公司 Method for identifying false favorable comments in microblog advertisements
CN106529773A (en) * 2016-10-31 2017-03-22 宜人恒业科技发展(北京)有限公司 Online credit and fraud risk evaluation method based on identifying code type question answering
CN106599022A (en) * 2016-11-01 2017-04-26 中山大学 User portrait forming method based on user access data
US20180285731A1 (en) * 2017-03-30 2018-10-04 Atomwise Inc. Systems and methods for correcting error in a first classifier by evaluating classifier output in parallel
CN108932625A (en) * 2017-05-23 2018-12-04 北京京东尚科信息技术有限公司 Analysis method, device, medium and the electronic equipment of user behavior data
CN107317688A (en) * 2017-07-25 2017-11-03 薛江炜 The device and method of communication group is created based on tag along sort
CN107423442A (en) * 2017-08-07 2017-12-01 火烈鸟网络(广州)股份有限公司 Method and system, storage medium and computer equipment are recommended in application based on user's portrait behavioural analysis
CN108364191A (en) * 2018-01-11 2018-08-03 国网山东省电力公司 Top-tier customer Optimum Identification Method and device based on random forest and logistic regression
CN108256691A (en) * 2018-02-08 2018-07-06 成都智宝大数据科技有限公司 Refund Probabilistic Prediction Model construction method and device
CN108304884A (en) * 2018-02-23 2018-07-20 华东理工大学 A kind of cost-sensitive stacking integrated study frame of feature based inverse mapping
CN108898479A (en) * 2018-06-28 2018-11-27 中国农业银行股份有限公司 The construction method and device of Credit Evaluation Model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TIANQI CHEN等: "XGBoost:A scalable tree boosting system", 《HTTPS://ARXIV.ORG/PDF/1603.02754.PDF》 *
钟雪灵等: "网络个人信贷大数据风险控制", 《电子科技大学学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581326A (en) * 2019-09-30 2021-03-30 北京国双科技有限公司 Method, device, storage medium and equipment for discriminating false litigation
CN113157516A (en) * 2020-12-11 2021-07-23 四川新网银行股份有限公司 Model monitoring system and method for quasi-real-time calculation
CN113157516B (en) * 2020-12-11 2023-06-23 四川新网银行股份有限公司 Model monitoring system and method for quasi-real-time calculation

Also Published As

Publication number Publication date
CN109710890B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
KR102134792B1 (en) Method for providing used goods trade service using fraud detection and appraisal based on blockchain with safe transaction
CN109409896B (en) Bank fraud recognition model training method, bank fraud recognition method and device
CN108121824A (en) A kind of chat robots and system towards financial service
TW202025060A (en) Vehicle insurance automatic compensation method and system
CN106447434A (en) Personal credit ecological platform
CN109063931A (en) A kind of model method for predicting freight logistics driver Default Probability
CN110348850A (en) The arbitrage risk checking method and device, electronic equipment of polymerization payment trade company
CN111611487A (en) Stock information application analysis system
CN109710890A (en) Behavior portrait model based on building identifies the method and system of false material in real time
CN111899100A (en) Service control method, device and equipment and computer storage medium
CN111861716B (en) Method for generating monitoring early warning level in credit based on software system
CN113111250A (en) Service recommendation method and device, related equipment and storage medium
CN114881651B (en) Abnormal transaction identification system and method for big data service intelligent finance
CN115641125A (en) Trade method and system for identifying commodity without scanning bar code or code
CN113449753A (en) Service risk prediction method, device and system
CN110728570A (en) Anti-fraud fund analysis method
CN117172795A (en) Intelligent technical service fee online consultation system
CN111915329A (en) Personalized recommendation method and system based on after-sale scenes in automobile industry
CN110210868A (en) The processing method and electronic equipment of numerical value transfer data
CN115880077A (en) Recommendation method and device based on client label, electronic device and storage medium
CN112967062B (en) User identity identification method based on cautious degree
CN115564591A (en) Financing product determination method and related equipment
CN115907968A (en) Wind control rejection inference method and device based on pedestrian credit
CN114971017A (en) Bank transaction data processing method and device
CN115423601A (en) Method and device for designing online credit product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant