CN113886372A - User portrait construction method based on improved analytic hierarchy process - Google Patents

User portrait construction method based on improved analytic hierarchy process Download PDF

Info

Publication number
CN113886372A
CN113886372A CN202111047789.2A CN202111047789A CN113886372A CN 113886372 A CN113886372 A CN 113886372A CN 202111047789 A CN202111047789 A CN 202111047789A CN 113886372 A CN113886372 A CN 113886372A
Authority
CN
China
Prior art keywords
data
index
indexes
dimension
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111047789.2A
Other languages
Chinese (zh)
Inventor
郭长营
崔乐乐
杨宝华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyuan Big Data Credit Management Co Ltd
Original Assignee
Tianyuan Big Data Credit Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyuan Big Data Credit Management Co Ltd filed Critical Tianyuan Big Data Credit Management Co Ltd
Priority to CN202111047789.2A priority Critical patent/CN113886372A/en
Publication of CN113886372A publication Critical patent/CN113886372A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a user portrait construction method based on an improved analytic hierarchy process, which belongs to the technical field of user portrait, and is characterized in that an improved analytic hierarchy process combining an analytic hierarchy process and an xgboost algorithm is adopted, an initial judgment matrix is constructed by index importance scores given by xgboost index importance analysis, adjustment is carried out by combining expert experience, and a weight fitting result of the same index after adjustment is used as a final weight value of the index, so that the weight value corresponding to the index mark of each portrait dimension has higher reasonability and reliability, and further, more objective and fair evaluation on each dimension of the user portrait is obtained.

Description

User portrait construction method based on improved analytic hierarchy process
Technical Field
The invention relates to the technical field of user portraits, in particular to a user portraits construction method based on an improved analytic hierarchy process.
Background
With the rapid development of the internet, the daily work and living habits of people are qualitatively changed. The development and application of internet technology can not penetrate all industries, especially the financial industry, in a manner of covering ears, and the traditional financial industry is receiving a serious examination that big data, internet and user experience are unbalanced and asymmetrical. Big data wind control technology has become one of the most focused big data applications for those skilled in the art. The financial wind control model is a necessary trend of internet finance in healthy growth in the sun. The current internet financial credit system is not perfect, a credit investigation system and related laws have certain defects, the internet extracts personal static data and behavior data for analysis and modeling, namely, the user portrait research of credit wind control is the main content of a wind control model.
User portrayal is often performed by rules, analytic hierarchy methods, etc.; analytic hierarchy process, AHP for short, refers to a decision-making method that decomposes elements always related to decision-making into levels such as targets, criteria, schemes, etc., and performs qualitative and quantitative analysis on the basis. The method is a hierarchical weight decision analysis method provided by applying a network system theory and a multi-target comprehensive evaluation method in the early 70 s of the 20 th century.
The analytic hierarchy process includes decomposing the decision problem into different hierarchical structures according to the sequence of the total target, sub targets of each layer, evaluation criteria and specific spare power switching scheme, solving and judging matrix characteristic vector to obtain the priority weight of each element of each layer to one element of the previous layer, and finally conducting hierarchical weighted sum to merge the final weight of each spare power switching scheme to the total target, wherein the maximum weight is the optimal scheme.
The analytic hierarchy process is widely applied to management decisions in the aspects of economy, science and technology, culture, military, environment and even social development and the like.
Are often used to solve problems such as comprehensive evaluation, decision making scheme selection, estimation and prediction, allocation of input quantities, etc.
The problem is solved by applying an analytic hierarchy process, and the method can be roughly divided into four steps:
1. establishing a hierarchical structure of the problem; (first, a complex problem is broken down into components called elements, which are grouped into sets by attribute differences to form different levels.an element at the same level governs some elements at a next level as a criterion, while it is governed by an element at a previous level.this top-down governing relationship forms a hierarchical level.an element at the top level typically has only one element, and is typically a predetermined goal or desired result of an analytical problem
2. Constructing a pairwise comparison judgment matrix;
3. calculating the relative weight of the compared elements by the judgment matrix;
4. and calculating the combined weight of each hierarchy element.
In the existing common analytic hierarchy process, the importance among indexes of each dimension is given through human experience, so that the subjectivity of a weight coefficient of each index is too strong, and objective evaluation cannot be performed on each dimension of a user.
Disclosure of Invention
In order to solve the technical problems, the invention provides a user portrait construction method based on an improved analytic hierarchy process, so that the weighted value corresponding to the index mark of each portrait dimension has higher reasonability and reliability, and further, the evaluation on each dimension of the user portrait is more objective and fair.
The technical scheme of the invention is as follows:
a user portrait construction method based on an improved analytic hierarchy process mainly comprises the following implementation steps: on the basis of multi-source data, different original fields are aligned through comparison among the data, and a standard database is established by methods such as analysis and fusion of the multi-source heterogeneous data; acquiring information which is beneficial to user portrayal from high-dimensional data based on the established standard database; screening and establishing a user portrait index system based on a standard database, and screening and forming user portrait in-mould characteristics through data cleaning, invalid value processing, same value statistics, missing value statistics, collinearity analysis, abnormal value detection, missing value filling and other processes; labeling the user with a label based on the mold-entering characteristics so as to be used for judging the importance of the index subsequently; performing feature importance analysis based on xgboost; dividing the residual indexes into image dimensions such as basic ability, performance ability, bond paying ability and the like according to the characteristic properties; constructing an AHP judgment matrix of each dimension according to the xgboost index score; carrying out consistency check on the decision matrix; scores for the various dimensions are constructed.
Further, in the above-mentioned case,
establishing a standard database: the multi-source data comprises three data sources including department data, internet data and third-party data, and the three data sources establish a standard database through data convergence and fusion ratio peer-to-peer;
the method comprises the following steps of: constructing an index system for user portrait based on the established multi-source data standard library and the business meaning;
user portrait model establishing: the modeling index is used for establishing a user portrait model after characteristic engineering and characteristic screening such as data cleaning, invalid value processing, homometric statistics, missing value statistics, collinearity analysis and the like are carried out; the feature screening comprises two parts of content, namely, missing in the feature engineering process, feature screening based on a threshold value in the homography process and feature screening based on disturbance feature importance in the user portrait model building process; after the final mode entering characteristics are determined, importance ranking and scoring of each index are determined through xgboost characteristic importance ranking; classifying the indexes into all dimensions according to the characteristics of the indexes, constructing an initial judgment matrix based on the xgboost feature importance scores for the indexes of all dimensions, carrying out consistency check on the matrix and calculating index weight coefficients; dividing scoring intervals for each index, constructing a user portrait model, and outputting scores of each dimension;
and step of converting user portrait rating: classifying the indexes into five dimensions of basic ability, user stability, performance ability, debt paying ability, development ability and the like through the meaning of the indexes, and calculating and processing each dimension through the steps in the step 3 to finally obtain the score of each user in each portrait dimension. In the part, through the grading statistics of each dimension portrait, the portrait dimension grades are assigned to each sample according to the quartile, and the grades are divided into five grades of A, B, C, D and E.
Further, in the above-mentioned case,
step of the Standard database construction
Treatment of multi-source heterogeneous data: the multi-source data is from multiple sources such as enterprises, departments, the Internet and the like, and comprises structured data and semi-structured data, wherein the data comprises stock data and data provided by an API (application program interface). The semi-structured data is required to be subjected to text data processing, data extraction, data structuring processing and the like to form structured warehousing data, standardized management is carried out on the warehousing multi-source data by establishing a unified data standard specification, storable data such as internet data and the like are regularly pulled, real-time interface data is processed through a memory, and data processing, data standardization, light feature mining and the like are carried out on the data in combination with a batch flow processing mode;
data fusion: the three-party multi-source data are fused in the transverse and longitudinal directions, multi-source complementary data, redundant data and overlapped data are finally fused and converged into a unified database through different fusion strategies, and the database stores information such as standard database data, an index database obtained by processing, a feature database and the like after the multi-source data are fused.
Further, in the above-mentioned case,
two-part composition of the index system and steps for constructing the same
And constructing an index system for the user portrait from the big data based on the established multi-source data standard library.
Still further, the method comprises the steps of:
based on the constructed tags, xgboost importance analysis was performed: according to the invention, a part of more definite indexes are selected to label a label for a user, such as whether the label is a deceased person or not, whether an enterprise is cancelled or not, tax payment grade, tax owing information and the like; seeking the minimum loss function degradation value required by node splitting, the sample proportion used for constructing each tree, the maximum depth of the tree, the characteristic proportion used for constructing each tree, the small and minimum sample weight of leaf nodes and the optimal value of the parameters through cross validation; and (4) training the xgboost model for multiple times according to parameter optimization, and outputting the importance ranking of the indexes and the index importance score.
Constructing an initial decision matrix: and carrying out xgboost importance analysis based on the constructed label to obtain index importance score. Firstly, dividing the indexes into five dimensions of basic ability, user stability, performance ability, bond ability and development ability according to the properties of the indexes; and establishing judgment matrixes with different sizes according to the number of the indexes in each dimension, wherein the indexes in the same dimension take the importance score ratio among the indexes as elements of the position of the judgment matrix to form a final initial judgment matrix.
And (3) matrix consistency checking: carrying out consistency check on the judgment matrixes of all dimensions; when CR is larger than 0.1, the judgment matrix does not pass consistency check, and the importance of part of indexes can be adjusted; when CR is less than or equal to 0.1, determining a current decision matrix, and solving the weight of each index through an arithmetic average method, a geometric average method and a characteristic value method; firstly, normalizing the decision matrix according to columns, then calculating weight vectors, and summing according to the columns to obtain weight coefficients.
Training sample interval conversion: dividing the converted sample into corresponding intervals according to quantiles of the indexes, and dividing all the modulus-entering indexes into corresponding intervals according to 20,40,60 and 80 of the quantiles; the indexes with positive meaning are assigned with 20,40,60,80 and 100 points according to interval ranges respectively for initial assignment, and the indexes with negative meaning are assigned with reverse assignment.
And (3) forming user portrait scores: after the above steps, the index of each image dimension is initially assigned, the index weight coefficient is calculated by the arithmetic mean method, and the image SCORE of the dimension (initial _ SCORE weight) is obtained by the initial SCORE initial _ SCORE weight coefficient weight of each index.
The construction of the user portrait rating conversion comprises the following steps:
classifying the indexes into five dimensions of basic ability, user stability, performance ability, debt paying ability, development ability and the like through the meaning of the indexes, and calculating and processing each dimension through the steps to finally obtain the score of each user in each portrait dimension. In the part, through the grading statistics of all dimension pictures, the picture dimension grades are assigned to all samples according to the quartile, and the grades are divided into five grades of A, B, C, D and E, namely the final user picture evaluation result.
The invention has the advantages that
1. Compared with the traditional method for constructing the initial matrix by the AHP, the method disclosed by the invention has the advantages that when the initial matrix is constructed, the importance scores of all indexes are calculated through the xgboost, and then the importance score ratio among the indexes is used as an element of the judgment matrix.
2. When the characteristics are extracted, indexes are subjected to correlation analysis and collinearity analysis, and xgboost index importance analysis is carried out before a judgment matrix is constructed, so that the indexes which are key to the user portrait can be extracted through the characteristic processing method, the user portrait model result is more accurate and reliable, the quality of the user is effectively identified, correct judgment is given, and help and guidance are provided for financial credit;
3. when the improved analytic hierarchy process is used for constructing the initial judgment matrix, the initial judgment matrix is obtained through xgboost scoring, so that the process efficiency is improved; when the indexes used by the user portrait are more, dozens or even hundreds, the difficulty in judging the importance of the indexes through manual experience is great, and the time consumption is long; the method for constructing the initial decision matrix can be quickly and accurately completed;
4. with the convergence of mass data and the continuous progress of science and technology, the analytic hierarchy process is applied to many fields, including submarine cable state evaluation, air quality evaluation, performance evaluation and the like.
Drawings
FIG. 1 is a schematic diagram of a radar map of various dimensions representing a user in accordance with the present invention;
FIG. 2 is a flow chart of user portrait model building according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
The invention relates to a user portrait construction method based on an improved analytic hierarchy process, which mainly comprises the following steps: on the basis of multi-source data, different original fields are aligned through comparison among the data, and a standard database is established by methods such as analysis and fusion of the multi-source heterogeneous data; acquiring information which is beneficial to user portrayal from high-dimensional data based on the established standard database; screening and establishing a user portrait index system based on a standard database, and screening and forming user portrait in-mould characteristics through data cleaning, invalid value processing, same value statistics, missing value statistics, collinearity analysis, abnormal value detection, missing value filling and other processes; labeling the user with a label based on the mold-entering characteristics so as to be used for judging the importance of the index subsequently; performing feature importance analysis based on xgboost; dividing the residual indexes into image dimensions such as basic ability, performance ability, bond paying ability and the like according to the characteristic properties; constructing an AHP judgment matrix of each dimension according to the xgboost index score; carrying out consistency check on the decision matrix; scores for the various dimensions are constructed.
1. Establishing user standard database by analyzing and fusing multi-source heterogeneous data
The multi-source heterogeneous data of the user covers information such as government data of the user, internet data of the user, and third-party data of the user comprise information such as external guarantee, water and electricity consumption, share right pledge, land mortgage and assignment, important shareholder pledge and the like of the user. The multi-source data comprises warehousing stock data, API interface data, structured basic information, change, blacklist, identification information and other data, and semi-structured data such as bulletin.
The semi-structured data needs to form structured data through text data processing, data extraction, a data structuring method and the like to be stored in a warehouse, the stored structured data needs to establish a unified standard data table through a data alignment and fusion comparison method, a standard data table covering multiple ranges is established for information of a certain dimension, data among different data sources are fused and compared, a unified standard data set is established, the fusion among the data sources is mainly included, the data with a complementary relation are aligned and fused, the data with a redundant relation are subjected to deduplication processing, and the data with better data quality is selected. And the multi-dimensional data are fused and compared to form a standard data set and stored in a standard database.
2. Building an index system for a user representation
Constructing an index system for the user portrait based on the established multi-source data standard library, wherein the data for constructing the index mainly come from department data; in addition, the relational data is also used to construct a partial index.
3. Building user portrait model
The modeling index is used for establishing a user portrait model after characteristic engineering and characteristic screening such as data cleaning, invalid value processing, homometric statistics, missing value statistics, collinearity analysis and the like are carried out; the feature screening comprises two parts of content, namely, missing in the feature engineering process, feature screening based on a threshold value in the homography process and feature screening based on disturbance feature importance in the user portrait model building process; after the final mode entering characteristics are determined, importance ranking and scoring of each index are determined through xgboost characteristic importance ranking; classifying the indexes into all dimensions according to the characteristics of the indexes, constructing an initial judgment matrix based on the xgboost feature importance scores for the indexes of all dimensions, carrying out consistency check on the matrix and calculating the weight coefficient of each index; and dividing scoring intervals for each index, constructing a model for the user portrait, and outputting the score of each dimension.
3.1 feature engineering
Firstly, processing invalid values in the mold-entering indexes, and carrying out numerical quantization on partial quantifiable indexes; then carrying out missing value statistics on the mold-entering indexes, and removing training indexes with the missing values larger than 75%; counting the same-value rate of the remaining indexes, removing the characteristic that the attribute has only one value, and removing the indexes with the same-value rate of the attribute being more than 80%; and performing VIF collinearity analysis on the evaluation indexes after the missing value and the same value are counted, and removing the indexes with collinearity.
3.2 Xgboost importance analysis based on constructed tags
Analyzing the importance of the sample index containing the label through the xgboost, and then constructing an initial judgment matrix according to the given index importance score. The xgboost training parameters are as follows:
(1) the minimum loss function reduction value gamma required by node splitting is used for searching an optimal value in an interval [0,1/3] in a step length of 3/1000;
(2) constructing a used sample proportion subsample of each tree, and searching an optimal value in an interval [1/2,7/10] according to the step length of 1/500;
(3) seeking the optimal value on [3,7] for the maximum depth max _ depth of the tree;
(4) finding the optimal characteristic proportion colsample _ byte used for constructing each tree in the interval [1/2,7/10] by step length 1/250;
(5) searching an optimal value by step 3/1000 in an interval [1/4,1/2] according to the minimum sample weight of the leaf node and min _ child _ weight;
(6) the remaining parameters are modulo by default.
The model outputs the importance ranking and importance score of the selected index.
3.3 user Profile modeling
The invention constructs the initial decision matrix in the following way: establishing an initial judgment matrix based on index importance scores given by xgboost, and adding expert experience for adjustment; and carrying out consistency check on the constructed initial decision matrix, and calculating index weight coefficients.
3.3.1 constructing an initial decision matrix
(1) Carrying out xgboost importance analysis by 3.2 based on the constructed tags to obtain index importance scores, and giving importance score ordering to the selected 18 indexes, wherein the importance scores are s _01> s _02> … > s _18 respectively;
(2) dividing the indexes into five dimensions of basic ability, user stability, performance ability, bond ability and development ability according to the meaning property of the indexes, wherein the number of the indexes in each dimension is 4,4,3,4 and 3;
(3) and the indexes in the same dimension take the importance score ratio among the indexes as elements of the position of the judgment matrix to form a final initial judgment matrix.
3.3.2 matrix consistency check
(1) Carrying out consistency check on the judgment matrix of the five dimensions of 3.3.1; when CR is larger than 0.1, the judgment matrix does not pass consistency check, and the adjustment can be carried out by entering the step (3) of the step 3.3.1; when CR is less than or equal to 0.1, entering the subsequent step;
note: the consistency ratio is calculated as:
Figure BDA0003251657430000101
wherein, CR is consistency ratio, CI is consistency index, RI is average random consistency index, obtained by table look-up according to the order of matrix, rmaxAnd n is the maximum characteristic root of the judgment matrix, and the order of the judgment matrix.
(2) After the judgment matrix passes consistency check, the weight of each index is calculated by an arithmetic mean method; firstly, normalizing the decision matrix according to columns, then calculating weight vectors, and summing according to the columns to obtain weight coefficients; in addition, the weight may be obtained by a geometric mean method or a characteristic value method.
3.3.3 data verification and transformation
(1) Exploring each in-mode index of the training sample, calculating skewness and kurtosis of each index, and counting the asymmetric degree of data distribution;
(2) and carrying out log conversion on the indexes with skewness larger than 3 and kurtosis larger than 3.
3.3.4 training sample Interval transitions
(1) Dividing the converted sample into corresponding intervals according to quantiles of the indexes, and dividing all the modulus-entering indexes into corresponding intervals according to 20,40,60 and 80 of the quantiles;
(2) the index of positive meaning is assigned with 20,40,60,80 and 100 according to the interval range, and if the index of negative meaning is assigned with negative meaning, the index is assigned with negative meaning;
3.3.5 user representation Scoring formation
Through the steps, the indexes of each image dimension are subjected to initial assigning, an index weight coefficient is calculated and obtained through an arithmetic mean method, and the image score of the dimension is obtained through the initial score initial _ score weight coefficient weight of each index.
4. User portrait rating transformation
In the above steps, the indexes are classified into five dimensions of basic ability, user stability, performance ability, repayment ability, development ability and the like according to the meaning of the indexes, and the calculation processing is performed on each dimension through the steps in the above 3, so as to finally obtain the score of each user in each portrait dimension. In the part, through the grading statistics of each dimension portrait, the portrait dimension grades are assigned to each sample according to the quartile, and the grades are divided into five grades of A, B, C, D and E.
In addition, radar maps may be plotted based on the scores of the various dimensions for a clearer understanding.
The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (9)

1. A user portrait construction method based on improved analytic hierarchy process is characterized in that,
Included
1) on the basis of user multi-source data, different original fields are aligned through fusion comparison among the data, and a standard database is established for a multi-source heterogeneous data analysis fusion method;
2) acquiring information which is beneficial to user portrayal from high-dimensional data based on the established standard database;
3) screening and establishing a user portrait index system based on a standard database, and forming user portrait modelled features through data cleaning, invalid value processing, same value statistics, missing value statistics, collinearity analysis, abnormal value detection and missing value filling screening;
4) labeling the user with a label based on the mold-entering characteristics so as to be used for judging the importance of the index subsequently;
5) performing feature importance analysis based on xgboost; dividing the above remaining indicators into image dimensions according to the characteristic properties; constructing an AHP judgment matrix of each dimension according to the xgboost index score; carrying out consistency check on the decision matrix; and constructing the scores of all dimensions, and giving the rating condition of the scores of the users.
2. The method of claim 1,
and constructing an index system for user portrait based on the established multi-source data standard library and the business meaning.
3. The method of claim 2,
user portrait model establishing: the modeling index is used for establishing a user portrait model after data cleaning, invalid value processing, homometric statistics, missing value statistics, collinearity analysis and feature screening;
the feature screening comprises two parts of content, namely, missing in the feature engineering process, feature screening based on a threshold value in the homography process and feature screening based on disturbance feature importance in the user portrait model building process; after the final mode entering characteristics are determined, importance ranking and scoring of each index are determined through xgboost characteristic importance ranking; classifying the indexes into all dimensions according to the characteristics of the indexes, constructing an initial judgment matrix based on the xgboost feature importance scores for the indexes of all dimensions, carrying out consistency check on the matrix and calculating index weight coefficients; and dividing scoring intervals for each index, constructing a user portrait model, and outputting scores of all dimensions.
4. The method of claim 3,
and step of converting user portrait rating: classifying the indexes into five dimensions according to the meaning of the indexes, and performing calculation processing on each dimension to finally obtain the score of each user in each portrait dimension.
5. The method of claim 4,
in the part, through the grading statistics of each dimension image, the dimension grades of the image are assigned to each sample according to the quartile, and the grades are set as five grades of A, B, C, D and E.
6. The method of claim 5,
the standard database construction
Treatment of multi-source heterogeneous data: the multi-source data source comprises structured data and semi-structured data, and comprises stock data and data provided by an API (application program interface); the semi-structured data is required to be subjected to text data processing, data extraction and data structuring processing to form structured warehousing data, standardized management is carried out on the warehousing multi-source data by establishing a unified data standard specification, the internet data can be stored and pulled regularly, real-time interface data is processed through a memory, and data processing, data standardization and light feature mining are carried out on the data in combination with a batch flow processing mode;
data fusion: the three-party multi-source data are fused in the transverse and longitudinal directions, multi-source complementary data, redundant data and overlapped data are finally fused and converged into a unified database through different fusion strategies, and the database stores standard database data, an index database obtained by processing and characteristic database information after the multi-source data are fused.
7. The method of claim 6,
labeling a label for a user by selecting an index; seeking the minimum loss function degradation value required by node splitting, the sample proportion used for constructing each tree, the maximum depth of the tree, the characteristic proportion used for constructing each tree, the small and minimum sample weight of leaf nodes and the optimal value of the parameters through cross validation; and training the xgboost model for a plurality of times according to parameter optimization, and outputting the importance ranking of the indexes and the index importance score.
8. The method of claim 8,
constructing an initial decision matrix:
firstly, dividing the index into five dimensionalities according to the property of the index; and establishing judgment matrixes with different sizes according to the number of the indexes in each dimension, wherein the indexes in the same dimension take the importance score ratio among the indexes as elements of the position of the judgment matrix to form a final initial judgment matrix.
And (3) matrix consistency checking: carrying out consistency check on the judgment matrixes of all dimensions; when CR is larger than 0.1, the judgment matrix does not pass consistency check; when CR is less than or equal to 0.1, determining a current decision matrix, and solving the weight of each index through an arithmetic average method, a geometric average method and a characteristic value method; firstly, normalizing the decision matrix according to columns, then calculating weight vectors, and summing according to the columns to obtain weight coefficients.
Training sample interval conversion: dividing the converted sample into corresponding intervals according to quantiles of the indexes, and dividing all the modulus-entering indexes into corresponding intervals according to 20,40,60 and 80 of the quantiles; the indexes with positive meaning are assigned with 20,40,60,80 and 100 points according to interval ranges respectively for initial assignment, and the indexes with negative meaning are assigned with reverse assignment.
And (3) forming user portrait scores: after the above steps, the index of each image dimension is initially assigned, the index weight coefficient is calculated by the arithmetic mean method, and the image SCORE of the dimension (initial _ SCORE weight) is obtained by the initial SCORE initial _ SCORE weight coefficient weight of each index.
9. The method of claim 8,
constructing a user portrait rating conversion:
finally, the score of each user in each portrait dimension is obtained through processing; in the part, through the grading statistics of all dimension pictures, the picture dimension grades are assigned to all samples according to the quartile, and the grades are divided into five grades of A, B, C, D and E, namely the final user picture evaluation result.
CN202111047789.2A 2021-09-08 2021-09-08 User portrait construction method based on improved analytic hierarchy process Pending CN113886372A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111047789.2A CN113886372A (en) 2021-09-08 2021-09-08 User portrait construction method based on improved analytic hierarchy process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111047789.2A CN113886372A (en) 2021-09-08 2021-09-08 User portrait construction method based on improved analytic hierarchy process

Publications (1)

Publication Number Publication Date
CN113886372A true CN113886372A (en) 2022-01-04

Family

ID=79008580

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111047789.2A Pending CN113886372A (en) 2021-09-08 2021-09-08 User portrait construction method based on improved analytic hierarchy process

Country Status (1)

Country Link
CN (1) CN113886372A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114372835A (en) * 2022-03-22 2022-04-19 佰聆数据股份有限公司 Comprehensive energy service potential customer identification method, system and computer equipment
CN114783007A (en) * 2022-06-22 2022-07-22 成都新希望金融信息有限公司 Equipment fingerprint identification method and device and electronic equipment
CN115907308A (en) * 2023-01-09 2023-04-04 佰聆数据股份有限公司 User portrait-based electric power material supplier evaluation method and device
CN116304974A (en) * 2023-02-17 2023-06-23 国网浙江省电力有限公司营销服务中心 Multi-channel data fusion method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2018102040A4 (en) * 2018-12-10 2019-01-17 Chen, Shixuan Mr The method of an efficient and accurate credit rating system through the gradient boost decision tree
WO2020041955A1 (en) * 2018-08-28 2020-03-05 大连理工大学 Method for evaluating comprehensive performance of numerical control machine tool based on improved pull-apart grade method
CN111126775A (en) * 2019-11-26 2020-05-08 国网浙江省电力有限公司电力科学研究院 Hierarchical analysis method based resident customer value grading model construction method
CN111429970A (en) * 2019-12-24 2020-07-17 大连海事大学 Method and system for obtaining multi-gene risk scores by performing feature selection based on extreme gradient lifting method
CN111832966A (en) * 2020-07-24 2020-10-27 山东中医药大学 Traditional Chinese medicine hospital regional portrait construction method and system
AU2020103500A4 (en) * 2020-11-18 2021-01-28 Sichuan Agricultural University Integrated Quality Evaluation Method for Huangguogan
CN112884590A (en) * 2021-01-26 2021-06-01 浙江工业大学 Power grid enterprise financing decision method based on machine learning algorithm
CN113065789A (en) * 2021-04-15 2021-07-02 南京航空航天大学 Manufacturing maturity grade rapid self-evaluation method based on three-scale analytic hierarchy process
CN113159364A (en) * 2020-12-30 2021-07-23 中国移动通信集团广东有限公司珠海分公司 Passenger flow prediction method and system for large-scale traffic station

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020041955A1 (en) * 2018-08-28 2020-03-05 大连理工大学 Method for evaluating comprehensive performance of numerical control machine tool based on improved pull-apart grade method
AU2018102040A4 (en) * 2018-12-10 2019-01-17 Chen, Shixuan Mr The method of an efficient and accurate credit rating system through the gradient boost decision tree
CN111126775A (en) * 2019-11-26 2020-05-08 国网浙江省电力有限公司电力科学研究院 Hierarchical analysis method based resident customer value grading model construction method
CN111429970A (en) * 2019-12-24 2020-07-17 大连海事大学 Method and system for obtaining multi-gene risk scores by performing feature selection based on extreme gradient lifting method
CN111832966A (en) * 2020-07-24 2020-10-27 山东中医药大学 Traditional Chinese medicine hospital regional portrait construction method and system
AU2020103500A4 (en) * 2020-11-18 2021-01-28 Sichuan Agricultural University Integrated Quality Evaluation Method for Huangguogan
CN113159364A (en) * 2020-12-30 2021-07-23 中国移动通信集团广东有限公司珠海分公司 Passenger flow prediction method and system for large-scale traffic station
CN112884590A (en) * 2021-01-26 2021-06-01 浙江工业大学 Power grid enterprise financing decision method based on machine learning algorithm
CN113065789A (en) * 2021-04-15 2021-07-02 南京航空航天大学 Manufacturing maturity grade rapid self-evaluation method based on three-scale analytic hierarchy process

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MEIHONG MA等: "XGBoost-based method for flash flood risk assessment", JOURNAL OF HYDROLOGY, vol. 598, 28 April 2021 (2021-04-28), XP086636611, DOI: 10.1016/j.jhydrol.2021.126382 *
宋传洲等: "面向任务携行航材品种确定和消耗预测的特征选择分析", 兵工自动化, vol. 40, no. 06, 15 June 2021 (2021-06-15), pages 0 - 2 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114372835A (en) * 2022-03-22 2022-04-19 佰聆数据股份有限公司 Comprehensive energy service potential customer identification method, system and computer equipment
CN114783007A (en) * 2022-06-22 2022-07-22 成都新希望金融信息有限公司 Equipment fingerprint identification method and device and electronic equipment
CN114783007B (en) * 2022-06-22 2022-09-27 成都新希望金融信息有限公司 Equipment fingerprint identification method and device and electronic equipment
CN115907308A (en) * 2023-01-09 2023-04-04 佰聆数据股份有限公司 User portrait-based electric power material supplier evaluation method and device
CN116304974A (en) * 2023-02-17 2023-06-23 国网浙江省电力有限公司营销服务中心 Multi-channel data fusion method and system
CN116304974B (en) * 2023-02-17 2023-09-29 国网浙江省电力有限公司营销服务中心 Multi-channel data fusion method and system

Similar Documents

Publication Publication Date Title
CN113886372A (en) User portrait construction method based on improved analytic hierarchy process
CN113011973B (en) Method and equipment for financial transaction supervision model based on intelligent contract data lake
CN112417176B (en) Method, equipment and medium for mining implicit association relation between enterprises based on graph characteristics
KR102379472B1 (en) Multimodal data integration method considering spatiotemporal characteristics of disaster damage
CN115129879A (en) Method for constructing enterprise relational knowledge base based on knowledge graph
CN112734154A (en) Multi-factor public opinion risk assessment method based on fuzzy number similarity
AU2019101160A4 (en) Application of decision tree and random forest in cash loan
Ali et al. Modelling for causal interrelationships by DEMATEL
CN113129188A (en) Provincial education teaching evaluation system based on artificial intelligence big data
Huang [Retracted] Construction and Analysis of Green Investment Risk Evaluation Index System Based on Information Entropy Fuzzy Hierarchical Analysis Model
Chen et al. Complexity of the analysis of financial cloud based on fuzzy theory in the wisdom of sustainable urban development
CN111353728A (en) Risk analysis method and system
Lee Technology-based practical blockchain system audit maturity model
CN115829209A (en) Environment-friendly intelligent warehouse environment-friendly quality analysis method and device based on carbon path
CN115099504A (en) Cultural relic security risk element identification method based on knowledge graph complement model
CN114282988A (en) Abnormal application identification method and equipment
CN114240318A (en) Target object oriented information processing method and device and computer equipment
CN114493224A (en) Method and device for evaluating sustainable development degree of enterprise
CN111915428A (en) Scoring card model optimization method based on decision tree feature fusion
CN114282875A (en) Flow approval certainty rule and semantic self-learning combined judgment method and device
CN110956471A (en) Method for analyzing credit investigation data of decoration industry
Wang A Study on Early Warning of Financial Indicators of Listed Companies Based on Random Forest
Wang et al. Optimized third-generation prospect theory-based three-way decision approach for conflict analysis in multi-scale Z-number information systems
Andrijauskiene et al. Towards AI-based R&I investment efficiency evaluation: Case of EU
Gao et al. Research on the Analysis and Reasoning Model of Enterprise Bidding Environment of Energy Project Digital Procurement Based on HMM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination