CN115713399A

CN115713399A - User credit assessment system combined with third-party data source

Info

Publication number: CN115713399A
Application number: CN202211188296.5A
Authority: CN
Inventors: 陈亚娟; 李翰璐; 金光丽
Original assignee: Smart Co Ltd Beijing Technology Co ltd
Current assignee: Smart Co Ltd Beijing Technology Co ltd
Priority date: 2022-09-28
Filing date: 2022-09-28
Publication date: 2023-02-24
Anticipated expiration: 2042-09-28
Also published as: CN115713399B

Abstract

The invention discloses a user credit evaluation system combined with a third-party data source, which comprises: the acquisition module is used for acquiring a third-party data source; the data processing module is used for carrying out data processing on the third-party data source to obtain target data; and the evaluation module is used for processing the combined label of the target data to form a code value label, and evaluating the credit of the user according to the code value label to obtain an evaluation result. Screening high-quality third party data sources and forming standardized output, can make things convenient for the quick butt joint third party data source of mechanism to guarantee the data security in the executive process when can guaranteeing the butt joint efficiency, also improved the accuracy of the evaluation result who obtains simultaneously.

Description

User credit assessment system combined with third-party data source

Technical Field

The invention relates to the technical field of credit assessment, in particular to a user credit assessment system combined with a third-party data source.

Background

At present, in the brisk development of the financial credit industry, the financial institutions in the China market have the problem of information asymmetry, data sharing cannot be rapidly and accurately realized, third-party data cannot be accurately screened, and therefore the credit assessment of a user is inaccurate.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the art described above. Therefore, the invention aims to provide a user credit evaluation system combined with a third-party data source, which screens a high-quality third-party data source and forms standardized output, can conveniently and quickly connect the third-party data source by an organization, ensures the data safety in the execution process while ensuring the connection efficiency, and simultaneously improves the accuracy of the obtained evaluation result.

In order to achieve the above object, an embodiment of the present invention provides a user credit evaluation system combined with a third-party data source, including:

the acquisition module is used for acquiring a third-party data source;

the data processing module is used for carrying out data processing on the third-party data source to obtain target data;

and the evaluation module is used for processing the combined label of the target data to form a code value label, and evaluating the credit of the user according to the code value label to obtain an evaluation result.

According to some embodiments of the invention, the data processing module comprises:

the screening module is used for screening data of the third-party data source based on a preset rule to obtain screening data;

and the processing module is used for carrying out derivative variable processing on the screened data to obtain target data.

According to some embodiments of the invention, the preset rules include a blacklist class, a multi-head class, a rating class, an early warning class and a verification class; wherein, the first and the second end of the pipe are connected with each other,

the blacklist type comprises a blacklist type, a high risk list and a grey list;

the multi-head class comprises a D90_ identification number _ total application organization number, a D180_ identification number _ total application organization number, a credit number of about 6 months and a credit number of about 24 months;

the rating categories include credit scores and fraud scores;

the early warning class comprises an early warning grade;

the verification class includes an online time and an online status.

According to some embodiments of the invention, the derived variables are processed in a manner comprising: calculating, logically judging, processing, counting, arranging and counting and other processing indexes; wherein the content of the first and second substances,

the calculation comprises whether the blacklist is serious overdue and whether the user number is a null number;

the logic judgment processing comprises merging of information under the same user identity card and the same mobile phone number, and outputting a variable after merging and logic processing; the logic processing includes determining at least one of a maximum value, a minimum value, or a sum;

the counting and the repetition-containing counting comprise the repeated login times of the same account, the customer data volume of the same living address of the user for applying credit, and the telephone number of the same working unit with different working unit names;

the other processing indexes comprise:

calculating time difference, including the time difference from the latest application to the present;

analyzing the longitude and latitude, including analyzing province according to the longitude and latitude and calculating a direct distance according to two groups of longitude and latitude data;

applying classification counting, including counting the number of various APPs installed by a user according to APP classification labels given by risks;

inquiring the identity of the user, including inquiring whether the user is a client according to the mobile phone number;

other custom logic, including whether to apply for nighttime or not, whether to apply for non-silver agencies.

According to some embodiments of the invention, the data processing module further comprises:

and the desensitization module is used for detecting the third-party data source before the screening module screens the data of the third-party data source based on a preset rule, judging whether sensitive data exist or not, and performing desensitization treatment when the sensitive data exist.

According to some embodiments of the invention, the obtaining module comprises: and each data source interface is used for receiving different types of third-party data sources.

According to some embodiments of the invention, the system further comprises a storage module for storing the evaluation result.

According to some embodiments of the invention, the desensitization module comprises:

the conversion module is used for converting the third-party data source into a character string;

and the matching module is used for matching the character strings with the sensitive character strings in the sensitive database and judging whether sensitive data exist according to a matching result.

According to some embodiments of the invention, the evaluation module comprises:

a fusion module to:

classifying the code value labels according to different scenes, determining code value labels respectively corresponding to a plurality of scenes of a user, and establishing a binding relationship between each scene and the corresponding code value label as an evaluation vector;

determining a feature space corresponding to the scene category according to the evaluation vector;

mapping the feature space corresponding to each scene category to obtain a plurality of kernel spaces, wherein the kernel spaces comprise the association relation among the evaluation vectors;

normalizing the plurality of nuclear spaces to obtain a plurality of target nuclear spaces;

acquiring a weight coefficient corresponding to each scene category in a plurality of scene categories;

fusing according to the multiple target nuclear spaces and the weight coefficients to obtain a fused nuclear space;

an establishment module to:

acquiring a sample code value label set and credit data corresponding to each sample code value label in the sample code value label set;

screening the sample code value label set to determine a target sample code value label set;

determining a corresponding sample fusion kernel space based on sample code value labels in the target sample code value label set;

analyzing the credit data to determine a credit score;

establishing a matching relation between the credit score and the sample fusion kernel space, and generating a database of the credit score and the sample fusion kernel space;

establishing sample fusion kernel space protocol dictionaries in different dimensions for sample fusion kernel spaces in the database;

establishing a regression model of the credit score matched between the sample fusion kernel space protocol dictionary and the sample fusion kernel space based on a regression algorithm;

and the determining module is used for carrying out classification identification and compensation processing on the fusion nuclear space according to the regression model and determining an evaluation result.

According to some embodiments of the invention, the establishing module comprises:

the numerical processing module is used for carrying out numerical processing on a plurality of sample code value labels included in the sample code value label set to obtain a data matrix; each sample code value label comprises numerical values corresponding to all parameters in the user credit evaluation parameters and corresponding user data of the corresponding user credit evaluation parameters;

a culling module for:

calculating the data ratio of each parameter in the user credit evaluation parameters according to the data matrix and a first preset algorithm;

calculating a user credit data score corresponding to each sample code value label according to the data proportion and a second preset algorithm, comparing the user credit data scores with a first preset threshold and a second preset threshold respectively, and removing the sample code value labels corresponding to the user credit data scores larger than the first preset threshold and the sample code value labels corresponding to the user credit data scores smaller than the second preset threshold to obtain a target sample code value label set; the first preset threshold is greater than the second preset threshold.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a block diagram of a user credit evaluation system incorporating third party data sources, according to one embodiment of the present invention;

FIG. 2 is a block diagram of a data processing module according to one embodiment of the invention;

FIG. 3 is a block diagram of an evaluation module according to one embodiment of the invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

As shown in fig. 1, an embodiment of the present invention provides a user credit evaluation system combined with a third-party data source, including:

the acquisition module is used for acquiring a third-party data source;

The working principle of the technical scheme is as follows: the code value tag indicates tag data for each scene of the user. The acquisition module is used for acquiring a third-party data source; the data processing module is used for carrying out data processing on the third-party data source to obtain target data; and the evaluation module is used for processing the combined label of the target data to form a code value label, and evaluating the credit of the user according to the code value label to obtain an evaluation result.

The beneficial effects of the above technical scheme are as follows: screening high-quality third party data sources and forming standardized output, can make things convenient for the quick butt joint third party data source of mechanism to guarantee the data security in the executive process when can guaranteeing the butt joint efficiency, also improved the accuracy of the evaluation result who obtains simultaneously.

As shown in fig. 2, according to some embodiments of the invention, the data processing module includes:

The working principle of the technical scheme is as follows: the screening module is used for screening data of the third-party data source based on a preset rule to obtain screening data; and the processing module is used for carrying out derivative variable processing on the screened data to obtain target data.

The beneficial effects of the above technical scheme are that: and the third-party data source is subjected to data screening based on a preset rule, so that the data is normalized and integrated, various types of data are extracted, derivative variable processing is performed on various types of data, target data are obtained, the data processing efficiency is improved, the target data are obtained quickly, and high-quality data are screened out.

According to some embodiments of the invention, the preset rules include a blacklist class, a multi-head class, a rating class, an early warning class and a verification class; wherein the content of the first and second substances,

the rating categories include credit scores and fraud scores;

the early warning class comprises early warning grades;

the verification class includes an online time and an online status.

The working principle of the technical scheme is as follows: d90_ id number _ total application authority number and D180_ id number _ total application authority number indicate relevant information of users in different areas.

The beneficial effects of the above technical scheme are as follows: and effective screening and classification of data are realized.

According to some embodiments of the invention, the derived variables are processed in a manner comprising: calculating, logically judging, processing, counting, arranging and counting, and other processing indexes; wherein, the first and the second end of the pipe are connected with each other,

the counting and the repetition-eliminating counting comprise the repeated login times of the same account, the customer data volume of the same living address of the user for applying credit and the telephone number of the same working unit and different working units with the same working unit name;

the other processing indexes comprise:

The working principle and the beneficial effects of the technical scheme are as follows: based on the collected basic data, the derivative variables are processed in real time, and the method specifically relates to the following processing modes:

(1) Comprises the following steps: (EXIST)

For example: whether the blacklist is serious overdue, whether the user number is empty, etc

(2) Logic judgment and processing:

for example: and merging the information under the same user identity card and the mobile phone number, and outputting variables after merging logic processing (maximum, minimum, addition and the like).

(3) Counting, including weight-discharge counting: (COUNT, COUNT DISTINCT)

For example: the number of repeated login times of the same account, the amount of customer data for applying for credit of the same residence address of the user, the number of telephones of the same working unit and different working units

(4) Other processing indexes are as follows:

time difference calculation, e.g. time difference from the latest application

-latitude and longitude resolution, e.g. resolving province based on latitude and longitude, calculating direct distance based on two sets of latitude and longitude data

Applying a classification count, e.g. counting the number of user-installed applications of a type given by risk APP classification tags

( Note: the APP classification list needs to support subsequent manual addition, deletion and change )

Subscriber identity enquiries, e.g. according to the mobile telephone number, whether it is a subscriber

-other custom logic, such as whether to apply for nighttime, whether to apply for non-silver agencies, etc

The historical data needs to be stored for nearly two years, and indexes of an undetermined query time window are counted based on the full effective historical data; and (3) related to derivative variables of a specific time window, wherein the time window is counted by natural days for more than 5 days, and is counted by minutes for less than 5 days. Different data processing is conveniently carried out on different data, and the target data can be accurately obtained conveniently.

The beneficial effects of the above technical scheme are that: the data security is convenient to improve.

According to some embodiments of the invention, the device further comprises a storage module for storing the evaluation result.

and the matching module is used for matching the character string with the sensitive character string in the sensitive database and judging whether sensitive data exist according to a matching result.

The beneficial effects of the above technical scheme are that: and accurately judging whether sensitive data exists or not based on the matching result of the character string and the sensitive character string. When the matching degree is greater than the preset matching degree, sensitive data are represented; otherwise, it means none.

As shown in fig. 3, according to some embodiments of the invention, the evaluation module comprises:

a fusion module to:

classifying the code value labels according to different scenes, determining code value labels respectively corresponding to a plurality of scenes of a user, and establishing a binding relationship between each scene and the corresponding code value label to serve as an evaluation vector;

mapping the feature space corresponding to each scene category to obtain a plurality of kernel spaces, wherein the kernel spaces comprise association relations among evaluation vectors;

normalizing the multiple kernel spaces to obtain multiple target kernel spaces;

an establishment module to:

analyzing the credit data to determine a credit score;

The working principle and the beneficial effects of the technical scheme are as follows: a fusion module to: classifying the code value labels according to different scenes, determining code value labels respectively corresponding to a plurality of scenes of a user, and establishing a binding relationship between each scene and the corresponding code value label to serve as an evaluation vector; determining a feature space corresponding to the scene category according to the evaluation vector; mapping the feature space corresponding to each scene category to obtain a plurality of kernel spaces, wherein the kernel spaces comprise association relations among evaluation vectors; normalizing the plurality of nuclear spaces to obtain a plurality of target nuclear spaces; acquiring a weight coefficient corresponding to each scene category in a plurality of scene categories; fusing according to the multiple target nuclear spaces and the weight coefficients to obtain a fused nuclear space; the code value labels of different scenes of the user can be displayed conveniently, and the overall evaluation space of the user, namely the fusion kernel space, is determined and represents the comprehensive data of the user. An establishment module to: acquiring a sample code value label set and credit data corresponding to each sample code value label in the sample code value label set; screening the sample code value label set to determine a target sample code value label set; determining a corresponding sample fusion kernel space based on sample code value labels in the target sample code value label set; analyzing the credit data to determine a credit score; establishing a matching relation between the credit score and the sample fusion kernel space, and generating a database of the credit score and the sample fusion kernel space; establishing sample fusion kernel space protocol dictionaries in different dimensions for sample fusion kernel spaces in the database; establishing a regression model of the credit score matched between the sample fusion kernel space protocol dictionary and the sample fusion kernel space based on a regression algorithm; and the determining module is used for carrying out classification identification and compensation processing on the fusion nuclear space according to the regression model and determining an evaluation result. And establishing a regression model based on the sample code value label set and the credit data corresponding to each sample code value label in the sample code value label set, performing classification identification and compensation processing on the fusion kernel space based on the regression model, and accurately determining an evaluation result.

the numerical processing module is used for carrying out numerical processing on a plurality of sample code value labels in the sample code value label set to obtain a data matrix; each sample code value label comprises numerical values corresponding to all parameters in the user credit evaluation parameters and corresponding user data of the corresponding user credit evaluation parameters;

a culling module for:

The working principle of the technical scheme is as follows: the numerical processing module is used for carrying out numerical processing on a plurality of sample code value labels included in the sample code value label set to obtain a data matrix; each sample code value label comprises numerical values corresponding to all parameters in the user credit evaluation parameters and corresponding user data of the corresponding user credit evaluation parameters; a culling module for: calculating the data ratio of each parameter in the user credit evaluation parameters according to the data matrix and a first preset algorithm; calculating a user credit data score corresponding to each sample code value label according to the data proportion and a second preset algorithm, comparing the user credit data scores with a first preset threshold and a second preset threshold respectively, and removing the sample code value labels corresponding to the user credit data scores larger than the first preset threshold and the sample code value labels corresponding to the user credit data scores smaller than the second preset threshold to obtain a target sample code value label set; the first preset threshold is greater than the second preset threshold.

The beneficial effects of the above technical scheme are as follows: the data is preprocessed, so that the value difference among the parameters is not too large, the extreme value of the credit data value of the user corresponding to the sample code value label is removed, the accuracy of data screening is improved, and the accuracy of the obtained target sample code value label set is ensured.

In an embodiment, calculating a data ratio of each parameter in the user credit evaluation parameters according to the data matrix and a first preset algorithm includes:

wherein w _j The data proportion of the jth parameter in the user credit evaluation parameters is determined; p is the number of sample code value labels included in the sample code value label set; x _i,j Performing data preprocessing on a jth parameter in the user credit evaluation parameters of the ith sample code value label in the P sample code value labels;

based on the formula, the data ratio of each parameter in the user credit evaluation parameters is accurately calculated.

Calculating a user credit data score corresponding to each sample code value label according to the data proportion and a second preset algorithm, wherein the calculation comprises the following steps:

F _i ＝w ₁ *X _i,1 +w ₂ *X _i,2 +w ₃ *X _i,3 +w ₄ *X _i,4

wherein, F _i And the user credit data score corresponding to the ith sample code value label in the P sample code value labels.

Based on the formula, the credit data score of the user corresponding to each sample code value label is accurately calculated, the accuracy of judging the size of the first preset threshold and the second preset threshold is improved, and the data to be eliminated is accurately determined.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A system for assessing user credit in conjunction with a third party data source, comprising:

the acquisition module is used for acquiring a third-party data source;

2. The user credit evaluation system in combination with a third party data source of claim 1, wherein the data processing module comprises:

3. The system of claim 2, wherein the predetermined rules include a blacklist class, a multi-head class, a rating class, an early warning class, and a verification class; wherein the content of the first and second substances,

the rating categories include credit scores and fraud scores;

the early warning class comprises an early warning grade;

the verification class comprises an on-network time and an on-network state.

4. The system of claim 2, wherein the derivative variables are processed in a manner comprising: calculating, logically judging, processing, counting, arranging and counting, and other processing indexes; wherein the content of the first and second substances,

the logic judgment processing comprises merging of information under the same user identity card and the mobile phone number, and outputting a variable after merging and logic processing; the logic processing includes determining at least one of a maximum value, a minimum value, or a sum;

the other processing indexes comprise:

the user identity query comprises the steps of querying whether the mobile phone is a client or not according to the mobile phone number;

other customized logics comprise whether the application is made at night or not and whether the application is made by a non-silver institution or not.

5. The user credit evaluation system in combination with a third-party data source of claim 1, wherein the data processing module further comprises:

6. The system of claim 1, wherein the acquisition module comprises: each data source interface is used for receiving different types of third-party data sources.

7. The system of claim 1, further comprising a storage module for storing the results of the evaluation.

8. The user credit assessment system in combination with a third party data source of claim 5, wherein the desensitization module comprises:

9. The system of claim 1, wherein the evaluation module comprises:

a fusion module to:

normalizing the multiple kernel spaces to obtain multiple target kernel spaces;

an establishment module to:

determining a corresponding sample fusion kernel space based on sample code value labels in a target sample code value label set;

analyzing the credit data to determine a credit score;

10. The system of claim 9, wherein the means for establishing comprises:

the numerical processing module is used for carrying out numerical processing on a plurality of sample code value labels in the sample code value label set to obtain a data matrix; each sample code value label comprises numerical values corresponding to all parameters in the user credit evaluation parameters and user data corresponding to the corresponding user credit evaluation parameters;

a culling module for: