CN109710890A - Behavior portrait model based on building identifies the method and system of false material in real time - Google Patents
Behavior portrait model based on building identifies the method and system of false material in real time Download PDFInfo
- Publication number
- CN109710890A CN109710890A CN201811566426.8A CN201811566426A CN109710890A CN 109710890 A CN109710890 A CN 109710890A CN 201811566426 A CN201811566426 A CN 201811566426A CN 109710890 A CN109710890 A CN 109710890A
- Authority
- CN
- China
- Prior art keywords
- user
- model
- portrait
- false
- building
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of, and the behavior portrait model based on building identifies the method and system of false material in real time, and this method is mainly used in the real-time audit field of internet industry user's material true and false.The technical method is solved the prior art and is judged using single-point, is accidentally shot straight, the problem that precision is lower, input cost is very big.The present invention includes the building of identification model, is audited by identification model to user's material, and carries out real-time, interactive with user in mobile terminal or the end PC, and real-time monitoring user submits the true and false of material.The present invention submits the true and false of material for online identification user in real time.
Description
Technical field
A kind of behavior portrait model based on building identifies the method and system of false material in real time, for accurately identifying
The true and false for the material that user fills in, belongs to that internet is counter to cheat technical field, and can be extended to and need to verify user on any line
The field of the fill data true and false.
Background technique
User behavior portrait refers to the label of the sequence of user behavioural characteristic based on the building of user clickstream data.
False data: in Internet service scene, user is needed to fill in the personal material of some column, comprising: home address,
Contact person's cell-phone number, name of contact person, company's unit, unit address etc.;It is false for referring to the data that user fills in.
It needs to find that black production technological means constantly upgrades by the industry survey for verifying user information in internet, in order to allow
Underproof material meets the requirements, and the false data of various packagings emerges one after another, and how effectively to identify that the subscriber data true and false has become
For the prevention and control emphasis of internet industry.Traditional method mainly relies on artificial electric core, cross validation etc..The prior art is generally logical
Cross the true and false of the methods of artificial electric core, cross validation identification data.First method, artificial electricity core, i.e., used by customer service call
Family carries out verification review to the data that user fills in, and passes through the comprehensive true and false for verifying information of phone;The main drawback of such method
Have: 1, fund cost and human cost investment are high;2, audit time is long, to increase entire submission duration;3, user experience
It is very poor;4, it is not easy to find the true and false of data from telephone talk.Second method, cross validation, i.e., by inquiring external tripartite
Data are compared with the data that user fills in.The major drawbacks of such method have: 1, inquiring external data, need to pass through purchase
The mode bought obtains external data, certainly will increase fund cost;2, external data there are problems that rank recall and when point tolerance, lead
Rate increase is accidentally refused in cause;3, for text category information, it can not accomplish complete accurate fuzzy matching at present, such as: the company that user fills in
For " Chengdu Huawei ", the user company information that external data queries are returned is " Chengdu research institute, Huawei Tech Co., Ltd ", is
Uniting, it is inconsistent to will be considered that, causes to misidentify.
To sum up, it is existing about submit data verification method existing for main problem are as follows: 1, accuracy rate it is extremely low;2, investment at
This is high;3, can not online real-time judge data the true and false.
Summary of the invention
Aiming at the problem that the studies above, the purpose of the present invention is to provide the behavior portrait models based on building to identify in real time
The method and system of false material is solved the prior art and is judged using single-point, accidentally shot straight, precision is lower, input cost
Very big problem, the present invention include the building of identification model, are audited by identification model to user's material, and in mobile terminal
Or the end PC carries out real-time, interactive with user, real-time monitoring user submits the true and false of material.
In order to achieve the above object, the present invention adopts the following technical scheme:
A method of building identifies the behavior portrait model of false material, which is characterized in that following steps:
S1, the online user behavior portrait obtained in real time for training, as feature set V1;
S2, feature set V1 is subjected to feature importance ranking by random forest in real time, reject the feature that is not split off into
Row feature machining, the user behavior portrait after feature is rejected in building, as feature set V2;
S3, two modelings are carried out to feature set V2, the feature set V3 after obtaining two modelings;
S4, identification model is obtained with feature set V3 construction logic regression model, obtains final identification model.
Further, the specific steps of the step S1 are as follows:
S1.1, in such a way that the page buries SDK, obtain the relevant clickstream data of user, such as;
S1.2, it will click on flow data processing generation user behavior portrait label in feature factory, all user behaviors are drawn
As label constitutes user behavior portrait.
Further, in the step S1.2, flow data processing on real-time will click on using system features factory and generate user's row
For label of drawing a portrait.
Further, the specific steps of the step S3 are as follows:
S3.1, random stratified sampling is carried out to feature set V2, generates M1, M2 ..Mn, n different types of divisions, each
Type is divided into training set and verifying collection;
S3.2, two modelings, n different models of building are carried out to different type based on XGBOOST method;
S3.3, using the result that the model of building obtains is as new feature and binding characteristic collection V2 constructs new feature set
V3。
Further, the specific steps of the step S4 are as follows:
S4.1, select in feature set V3 most significant feature as explanatory variable by ridge regression mode;
S4.2, max log possibility predication is carried out to explanatory variable;
If there are also significant explanatory variables in S4.3, feature set V3, step S4.1- step S4.3 is repeated, otherwise by likelihood
Estimate all features retained as identification model.
A method of the behavior portrait model based on building identifies false material in real time, which is characterized in that specific steps
Are as follows:
When step 1, user's fill data, the online relevant clickstream data of user that obtains in real time is processed into use to be identified
Family behavior portrait label;
Step 2, label that user behavior to be identified is drawn a portrait are input to identification model and judge, and to the falseness of identification
Material is intercepted.
Further, the identification model in the step 2 is implanted into such a way that the page buries SDK.
A kind of system that the behavior portrait model based on building identifies false material in real time characterized by comprising rule
Automotive engine system, rule engine system include;
System features factory module: for obtaining the relevant click fluxion of user in real time online in user's fill data
According to being processed into user behavior to be identified portrait label;
Identification module: user behavior to be identified portrait label is input to identification model and is judged, and to identification
False material is intercepted.
Further, the identification model being implanted into such a way that the page buries SDK in the identification module.
The present invention compared with the existing technology, its advantages are shown in:
1, the AUC of identification model is 0.9-0.98 in the present invention, and Gini coefficient is 0.8-0.96, can be with from AUC and Gini
Find out that false data model has extremely strong separating capacity.Optimal model is selected, optimal cut-off is looked for by F1
Point, when be arranged false data index >=75 be false when, recognition methods precision with higher in the present invention and extremely low
Accidentally hit rate, wherein accidentally hit rate is 0.7%, precision 92% compares traditional identification method, recognition capability pole of the present invention
Height improves 80% compared with conventional cross verification method precision;
2, recognition methods of the invention can carry out real-time, interactive with user, and real-time prompting user, which fills in, correctly submits money
Expect material, improves the availability of material;
3, recognition methods of the invention greatly reduces capital investment, has saved time cost.Traditional cross validation
Method needs to inquire external data and carries out intersection comparison, and the online real-time judge of the technical method easily solves external data
The problem that rank recall is extremely low, inquiry is costly, the torsion time is long;By the true and false of system automatic identification material,
4, recognition methods of the invention is judged, no completely instead of the method manually verified by the online real-time perfoming of SDK
Intelligence is only realized, human cost is really saved;
5, recognition methods of the invention can cope with the continuous renewal of fraud crime means, and adjustment rate can keep up with new in real time
Fraudulent mean;
6. the present invention uses two modelings, i.e., model is again pulled up on the basis of initial model, in order to more preferable
The accuracy and robustness for reaching data.
Detailed description of the invention
Nothing
Specific embodiment
Embodiment
This paper presents a kind of false data recognition methods based on user behavior portrait, and this method is based on huge user
Behavioural characteristic label system, by way of two modelings after, logistic regression based on successive Regression generates the scoring of submission data
Block (i.e. identification model), exports submission data index, the true and false of rational judgment user's fill data.Not only instead of artificial data
The process of audit, and a possibility that user fills in false data can be identified in real time
Brought effect is as follows in specific application scenarios:
Submit data business credit link in internet, submit user to predict to online in real time, and can with user into
Row real-time, interactive reminds user to fill in correct information.
Specific embodiment is as follows:
Identification model is first constructed, a method of building identifies the behavior portrait model of false material, following steps:
S1, it obtains and draws a portrait for the user behavior of training, as feature set V1;
Specific steps are as follows:
S1.1, in such a way that the page buries SDK, obtain the relevant clickstream data of user;User behavior portrait relies on
It in huge user behavior data, needs bury a little in mobile terminal, in such a way that the page buries SDK, it is relevant to obtain user
Basic information, such as: IP, GPS, facility information;And record all operation notes of user;
S1.2, it will click on flow data processing generation user behavior portrait label, all user behaviors portrait labels, which are constituted, to be used
Family behavior portrait;The real-time calculating of the feature of user behavior portrait, is based on system features factory module, such as uses in credit industry
Anti- fraud feature factory module in anti-fraud rule engine system, powerful data-handling capacity and operational capability, (in the least in real time
Second grade) processing generation user behavior portrait label.
By system features factory module, processing on real-time generates a series of specific user behaviors portrait labels, such as: 1, using
Family use space-time preference, such as: commercial circle place preference (tourist area, shopping centre, Office Area), using time preference (working day/
Festivals or holidays, early, middle and late etc.);2, user's operation frequency/period preference, such as: interval time is submitted in monthly login times, registration
Deng;3, stream information is clicked, such as: user's typing speed, user input company's unit time;4, gyroscope information, such as: user makes
With the position of mobile phone, common click location;5, each dimension related information of user is (such as: the associated cell-phone number number of equipment, cell-phone number
Associated identity card number etc.) etc..
Obtained user behavior portrait label is input in the identification module in anti-fraud rule engine system, thus into
Row step S2-S4 constructs identification model.
S2, feature set V1 is subjected to feature importance ranking by random forest, rejects the feature not being split off, building is picked
Except the user behavior portrait after feature, as feature set V2;
S3, two modelings are carried out to feature set V2, the feature set V3 after obtaining two modelings;
Specific steps are as follows:
S3.1, random stratified sampling is carried out to feature set V2, generates M1, M2 ..Mn, n different types of divisions, each
Type is divided into training set and verifying collection;Different type is different false material types, such as contact person's falseness model, work unit's name
Claim 3 models such as false model, business address falseness model;
S3.2, two modelings, n different models of building are carried out to different type based on XGBOOST method;
S3.3, using the result that the model of building obtains is as new feature and binding characteristic collection V2 constructs new feature set
V3。
S4, identification model is obtained with feature set V3 construction logic regression model to get final identification model is arrived.Construct
The scorecard of false data and false data indexes Index (0-100).Specific steps are as follows:
S4.1, select in feature set V3 most significant feature as explanatory variable by ridge regression mode;
S4.2, max log possibility predication is carried out to explanatory variable;
If there are also significant explanatory variables in S4.3, feature set V3, step S4.1- step S4.3 is repeated, otherwise by likelihood
Estimate all features retained as identification model.
In embodiment, can define 75 points the above are false data high risks.
Before carrying out specific false material identification, first passes through the page and bury the mode of SDK and be implanted into the identification for identifying false material
Model.
A method of the behavior portrait model based on building identifies false material, specific steps in real time are as follows:
It is online to obtain the relevant clickstream data of user in real time when step 1, user's fill data, pass through feature factory mould
Block is processed into user behavior portrait label to be identified in real time;
Step 2, label that user behavior to be identified is drawn a portrait are input to identification model and judge, and to the falseness of identification
Material is intercepted.The user behavior portrait label that will be processed is passed to SDK, carries out the identification of data falseness model;It is anti-in real time
Recognition result is presented, if SDK is judged as data falseness, page prompts user " please input correct information ";Interaction is (i.e. defeated repeatedly
Again input judgement again after mistake), it is known that identification model assert that data is true.To prevent from judging by accident, interaction can be implanted into SDK
Number limitation, such as most 3 times.Interaction is completed to continue backward process and is walked.
In conclusion the present invention judges user's material by identification model, and in mobile terminal or the end PC with user
Real-time, interactive is carried out, real-time monitoring user submits the true and false of material.
The above is only the representative embodiment in the numerous concrete application ranges of the present invention, to protection scope of the present invention not structure
At any restrictions.It is all using transformation or equivalence replacement and the technical solution that is formed, all fall within rights protection scope of the present invention it
It is interior.
Claims (9)
1. a kind of method that building identifies the behavior portrait model of false material, which is characterized in that following steps:
S1, it obtains and draws a portrait for the user behavior of training, as feature set V1;
S2, feature set V1 is subjected to feature importance ranking by random forest, rejects the feature not being split off, building is rejected special
User behavior portrait after sign, as feature set V2;
S3, two modelings are carried out to feature set V2, the feature set V3 after obtaining two modelings;
S4, identification model is obtained with feature set V3 construction logic regression model, obtains final identification model.
2. the method that a kind of building according to claim 1 identifies the behavior portrait model of false material, it is characterised in that:
The specific steps of the step S1 are as follows:
S1.1, in such a way that the page buries SDK, obtain the relevant clickstream data of user;
S1.2, it will click on flow data processing generation user behavior portrait label, all user behaviors portrait labels constitute user's rows
For portrait.
3. the method that a kind of building according to claim 2 identifies the behavior portrait model of false material, it is characterised in that:
In the step S1.2, flow data processing on real-time will click on using system features factory and generate user behavior portrait label.
4. the method that a kind of building according to claim 1 or 3 identifies the behavior portrait model of false material, feature exist
In the specific steps of the step S3 are as follows:
S3.1, random stratified sampling is carried out to feature set V2, generates M1、M2、..Mn, n different types of divisions, each type
It is divided into training set and verifying collection;
S3.2, two modelings, n different models of building are carried out to different type based on XGBOOST method;
S3.3, using the result that the model of building obtains is as new feature and binding characteristic collection V2 constructs new feature set V3.
5. the method that a kind of building according to claim 4 identifies the behavior portrait model of false material, it is characterised in that:
The specific steps of the step S4 are as follows:
S4.1, select in feature set V3 most significant feature as explanatory variable by ridge regression mode;
S4.2, max log possibility predication is carried out to explanatory variable;
If there are also significant explanatory variables in S4.3, feature set V3, step S4.1- step S4.3 is repeated, otherwise by possibility predication
All features retained are as identification model.
6. a kind of method that the behavior portrait model based on building identifies false material in real time, which is characterized in that specific steps are as follows:
When step 1, user's fill data, the online relevant clickstream data of user that obtains in real time is processed into user's row to be identified
For label of drawing a portrait;
Step 2, label that user behavior to be identified is drawn a portrait are input to identification model and judge, and to the false material of identification
It is intercepted.
7. the method that a kind of behavior portrait model based on building according to claim 6 identifies false material in real time,
It is characterized in that, the identification model in the step 2 is implanted into such a way that the page buries SDK.
8. a kind of system that the behavior portrait model based on building identifies false material in real time characterized by comprising rule is drawn
System is held up, rule engine system includes;
System features factory module: for being obtained at the relevant clickstream data of user in real time online in user's fill data
Manage into user behavior portrait label to be identified;
Identification module: user behavior to be identified portrait label is input to identification model and is judged, and to the falseness of identification
Material is intercepted.
9. the system that a kind of behavior portrait model based on building according to claim 8 identifies false material in real time,
It is characterized in that: the identification model being implanted into such a way that the page buries SDK in the identification module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811566426.8A CN109710890B (en) | 2018-12-20 | 2018-12-20 | Method and system for identifying false material in real time based on constructed behavior portrait model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811566426.8A CN109710890B (en) | 2018-12-20 | 2018-12-20 | Method and system for identifying false material in real time based on constructed behavior portrait model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109710890A true CN109710890A (en) | 2019-05-03 |
CN109710890B CN109710890B (en) | 2023-06-09 |
Family
ID=66257041
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811566426.8A Active CN109710890B (en) | 2018-12-20 | 2018-12-20 | Method and system for identifying false material in real time based on constructed behavior portrait model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109710890B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112581326A (en) * | 2019-09-30 | 2021-03-30 | 北京国双科技有限公司 | Method, device, storage medium and equipment for discriminating false litigation |
CN113157516A (en) * | 2020-12-11 | 2021-07-23 | 四川新网银行股份有限公司 | Model monitoring system and method for quasi-real-time calculation |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150378A (en) * | 2013-03-13 | 2013-06-12 | 珠海市君天电子科技有限公司 | Method for identifying false favorable comments in microblog advertisements |
CN106529773A (en) * | 2016-10-31 | 2017-03-22 | 宜人恒业科技发展(北京)有限公司 | Online credit and fraud risk evaluation method based on identifying code type question answering |
CN106599022A (en) * | 2016-11-01 | 2017-04-26 | 中山大学 | User portrait forming method based on user access data |
CN107317688A (en) * | 2017-07-25 | 2017-11-03 | 薛江炜 | The device and method of communication group is created based on tag along sort |
CN107423442A (en) * | 2017-08-07 | 2017-12-01 | 火烈鸟网络(广州)股份有限公司 | Method and system, storage medium and computer equipment are recommended in application based on user's portrait behavioural analysis |
CN108256691A (en) * | 2018-02-08 | 2018-07-06 | 成都智宝大数据科技有限公司 | Refund Probabilistic Prediction Model construction method and device |
CN108304884A (en) * | 2018-02-23 | 2018-07-20 | 华东理工大学 | A kind of cost-sensitive stacking integrated study frame of feature based inverse mapping |
CN108364191A (en) * | 2018-01-11 | 2018-08-03 | 国网山东省电力公司 | Top-tier customer Optimum Identification Method and device based on random forest and logistic regression |
US20180285731A1 (en) * | 2017-03-30 | 2018-10-04 | Atomwise Inc. | Systems and methods for correcting error in a first classifier by evaluating classifier output in parallel |
CN108898479A (en) * | 2018-06-28 | 2018-11-27 | 中国农业银行股份有限公司 | The construction method and device of Credit Evaluation Model |
CN108932625A (en) * | 2017-05-23 | 2018-12-04 | 北京京东尚科信息技术有限公司 | Analysis method, device, medium and the electronic equipment of user behavior data |
-
2018
- 2018-12-20 CN CN201811566426.8A patent/CN109710890B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150378A (en) * | 2013-03-13 | 2013-06-12 | 珠海市君天电子科技有限公司 | Method for identifying false favorable comments in microblog advertisements |
CN106529773A (en) * | 2016-10-31 | 2017-03-22 | 宜人恒业科技发展(北京)有限公司 | Online credit and fraud risk evaluation method based on identifying code type question answering |
CN106599022A (en) * | 2016-11-01 | 2017-04-26 | 中山大学 | User portrait forming method based on user access data |
US20180285731A1 (en) * | 2017-03-30 | 2018-10-04 | Atomwise Inc. | Systems and methods for correcting error in a first classifier by evaluating classifier output in parallel |
CN108932625A (en) * | 2017-05-23 | 2018-12-04 | 北京京东尚科信息技术有限公司 | Analysis method, device, medium and the electronic equipment of user behavior data |
CN107317688A (en) * | 2017-07-25 | 2017-11-03 | 薛江炜 | The device and method of communication group is created based on tag along sort |
CN107423442A (en) * | 2017-08-07 | 2017-12-01 | 火烈鸟网络(广州)股份有限公司 | Method and system, storage medium and computer equipment are recommended in application based on user's portrait behavioural analysis |
CN108364191A (en) * | 2018-01-11 | 2018-08-03 | 国网山东省电力公司 | Top-tier customer Optimum Identification Method and device based on random forest and logistic regression |
CN108256691A (en) * | 2018-02-08 | 2018-07-06 | 成都智宝大数据科技有限公司 | Refund Probabilistic Prediction Model construction method and device |
CN108304884A (en) * | 2018-02-23 | 2018-07-20 | 华东理工大学 | A kind of cost-sensitive stacking integrated study frame of feature based inverse mapping |
CN108898479A (en) * | 2018-06-28 | 2018-11-27 | 中国农业银行股份有限公司 | The construction method and device of Credit Evaluation Model |
Non-Patent Citations (2)
Title |
---|
TIANQI CHEN等: "XGBoost:A scalable tree boosting system", 《HTTPS://ARXIV.ORG/PDF/1603.02754.PDF》 * |
钟雪灵等: "网络个人信贷大数据风险控制", 《电子科技大学学报》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112581326A (en) * | 2019-09-30 | 2021-03-30 | 北京国双科技有限公司 | Method, device, storage medium and equipment for discriminating false litigation |
CN113157516A (en) * | 2020-12-11 | 2021-07-23 | 四川新网银行股份有限公司 | Model monitoring system and method for quasi-real-time calculation |
CN113157516B (en) * | 2020-12-11 | 2023-06-23 | 四川新网银行股份有限公司 | Model monitoring system and method for quasi-real-time calculation |
Also Published As
Publication number | Publication date |
---|---|
CN109710890B (en) | 2023-06-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102134792B1 (en) | Method for providing used goods trade service using fraud detection and appraisal based on blockchain with safe transaction | |
CN109409896B (en) | Bank fraud recognition model training method, bank fraud recognition method and device | |
CN108121824A (en) | A kind of chat robots and system towards financial service | |
TW202025060A (en) | Vehicle insurance automatic compensation method and system | |
CN106447434A (en) | Personal credit ecological platform | |
CN109063931A (en) | A kind of model method for predicting freight logistics driver Default Probability | |
CN110348850A (en) | The arbitrage risk checking method and device, electronic equipment of polymerization payment trade company | |
CN111611487A (en) | Stock information application analysis system | |
CN109710890A (en) | Behavior portrait model based on building identifies the method and system of false material in real time | |
CN111899100A (en) | Service control method, device and equipment and computer storage medium | |
CN111861716B (en) | Method for generating monitoring early warning level in credit based on software system | |
CN113111250A (en) | Service recommendation method and device, related equipment and storage medium | |
CN114881651B (en) | Abnormal transaction identification system and method for big data service intelligent finance | |
CN115641125A (en) | Trade method and system for identifying commodity without scanning bar code or code | |
CN113449753A (en) | Service risk prediction method, device and system | |
CN110728570A (en) | Anti-fraud fund analysis method | |
CN117172795A (en) | Intelligent technical service fee online consultation system | |
CN111915329A (en) | Personalized recommendation method and system based on after-sale scenes in automobile industry | |
CN110210868A (en) | The processing method and electronic equipment of numerical value transfer data | |
CN115880077A (en) | Recommendation method and device based on client label, electronic device and storage medium | |
CN112967062B (en) | User identity identification method based on cautious degree | |
CN115564591A (en) | Financing product determination method and related equipment | |
CN115907968A (en) | Wind control rejection inference method and device based on pedestrian credit | |
CN114971017A (en) | Bank transaction data processing method and device | |
CN115423601A (en) | Method and device for designing online credit product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |