CN112418987B - Method and system for rating credit of transportation unit, electronic device and storage medium - Google Patents

Method and system for rating credit of transportation unit, electronic device and storage medium Download PDF

Info

Publication number
CN112418987B
CN112418987B CN202011307960.4A CN202011307960A CN112418987B CN 112418987 B CN112418987 B CN 112418987B CN 202011307960 A CN202011307960 A CN 202011307960A CN 112418987 B CN112418987 B CN 112418987B
Authority
CN
China
Prior art keywords
rating
feature data
data set
clustering
clusters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011307960.4A
Other languages
Chinese (zh)
Other versions
CN112418987A (en
Inventor
文琰杰
许旺土
李传明
黄永燊
丁昌星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202011307960.4A priority Critical patent/CN112418987B/en
Publication of CN112418987A publication Critical patent/CN112418987A/en
Application granted granted Critical
Publication of CN112418987B publication Critical patent/CN112418987B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method, a system, electronic equipment and a computer readable storage medium for credit rating of a transportation unit, wherein the method comprises the following steps: extracting a characteristic data set of the traffic credit rating of a traffic transportation unit; clustering the characteristic data set, selecting the optimal cluster number, and establishing a rating model; inputting the characteristic data set into the rating model, and rating the label-free data of the transportation units. According to the scheme of the invention, the credit of the transportation unit is graded in an unsupervised learning mode (namely, unlabeled data is adopted), the method for grading the unlabeled data does not need a large amount of data support, does not need to refer to historical experience data, has low cost for acquiring the data, and does not need to consider the accuracy of the labeled data in the traditional supervised learning. Compared with the similar traditional clustering scheme, the method has the advantages of higher precision, stronger robustness and quicker convergence, and can provide decision support for credit rating of transportation enterprises.

Description

Method and system for rating credit of transportation unit, electronic device and storage medium
Technical Field
The invention relates to the technical field of transportation, in particular to a method and a system for rating credit of a transportation unit, electronic equipment and a computer-readable storage medium.
Background
The construction of a traffic credit system is one of key work of traffic management departments in recent years, and the traffic credit assessment of a traffic and transportation enterprise has extremely important significance for standardizing credit information of the enterprise and building city credit market promotion self-discipline.
The development of enterprise credit rating models has led to extensive research interest in academia and business industries. The evaluation method can be divided into a traditional method and an artificial intelligence-based evaluation method. The traditional method mainly comprises the following steps: factor analysis, experience discrimination, multivariate discrimination analysis and prediction; the artificial intelligence rating mainly comprises: support vector machines, artificial neural networks, inductive learning, and the like. The above methods are all based on supervised learning on the premise that data has tags, a large amount of data support is needed, the cost for acquiring data is high, and the accuracy of the tags cannot be guaranteed. Moreover, the traditional rating method and model have serious clustering loss, slow convergence and poor robustness, and can not accurately and stably provide decision support for the evaluation of traffic credit of transportation units because the characteristic characteristics of characteristic data can not be accurately and effectively maintained.
Disclosure of Invention
The invention aims to solve at least one technical problem in the background art and provides a method, a system, an electronic device and a computer-readable storage medium for rating the credit of a transportation unit.
In order to achieve the above object, the present invention provides a method for rating credit of transportation units, comprising:
extracting a characteristic data set of the traffic credit rating of a traffic transportation unit;
clustering the characteristic data set, selecting the optimal cluster number, and establishing a rating model;
inputting the characteristic data set into the rating model, and rating the label-free data of the transportation units.
According to an aspect of the invention, further comprising: after establishing a rating model, optimizing the rating model by taking the Mahalanobis distance of the feature data set as an optimization target, wherein the Mahalanobis distanceThe formula is as follows:
Figure BDA0002788868390000021
wherein xiFor the ith data, mjBelongs to the centroid, Σ, of the corresponding cluster j for data i-1Is the inverse of the covariance.
According to one aspect of the invention, the feature data set comprises: administrative penalty Z1(ii) a Subject to a general administrative penalty x4(ii) a Is subjected to severe administrative penalty x5(ii) a Reputation evaluation Z2(ii) a Reputation evaluation obtaining AAA level x1(ii) a Reputation evaluation to obtain AA x2(ii) a Reputation evaluation to obtain B-level x3
According to one aspect of the invention, the feature data set is obtained based on a factor analysis method, wherein the factor analysis method comprises:
acquiring a feature data set of traffic credit rating;
constructing a matrix formula: the matrix A is an n × m load matrix, the vector Z is an m-dimensional common factor, the epsilon is an n-dimensional error vector, X is an n-dimensional observation vector, and Z is a feature data set after the data set X is screened; (ii) a
According to the formula cov (x) ═ AAT+ Cov (epsilon) may solve for the load matrix a, where Cov (x) represents the covariance of the variables;
determining a common factor Z, Z ═ ATCov(X)-1X;
Requiring KMO value to be more than 0.5 and Bartlett testing sig value to be less than 0.05, the data set can be subjected to factor analysis, otherwise, the data set needs to be modified again.
According to one aspect of the invention, clustering is carried out on the feature data set based on a global K-means algorithm, the optimal cluster number is selected according to the class square sum and the elbow principle, and a global K-means traffic credit rating model is established;
the method for selecting the optimal cluster number comprises the following steps:
initializing a clustering center as an average value of the full data;
taking the Mahalanobis distance as an optimization target, observing a corresponding clustering center with the least class square sum loss after N iterations, and taking the clustering center as an optimal clustering center;
sequentially increasing the cluster number by k +1, wherein the initialization cluster center of k' is composed of the optimal cluster center and an initial point which is randomly selected, and k is the cluster number;
and observing the change of class squares and loss function curves under different k values, and determining the optimal cluster number according to an elbow rule.
According to one aspect of the invention, after the global K-means traffic credit rating model is established, the cluster centers are normalized to obtain the rating results of the corresponding clusters, the feature data sets are input into the global K-means traffic credit rating model, the feature data sets are divided into the corresponding clusters according to the shortest distance, and the rating is carried out according to the rating results of the clusters.
According to an aspect of the present invention, the normalizing the cluster centers to obtain the rating results of the corresponding clusters, inputting the feature data sets into the global K-means traffic credit rating model, dividing the feature data into the corresponding clusters according to the shortest distance, and rating according to the rating results of the clusters includes:
and inputting the feature data set into a global K-means traffic credit rating model of the K cluster centers to obtain the mass center of each cluster center.
According to an aspect of the present invention, the normalizing the cluster centers to obtain the rating results of the corresponding clusters, inputting the feature data sets into the global K-means traffic credit rating model, dividing the feature data into the corresponding clusters according to the shortest distance, and rating according to the rating results of the clusters, further includes:
normalizing the clustering centers, and dividing each centroid into corresponding credit grades according to the nominal meaning of the indexes;
calculating the distance between each centroid and the feature data of the transportation unit based on the Mahalanobis distance, and dividing the feature data into corresponding clusters according to the shortest distance;
and converting the corresponding clusters of the feature data into corresponding traffic credit rating results to finish the traffic credit rating of the transportation units.
In order to achieve the above object, the present invention further provides a credit rating system for transportation units, comprising:
the data extraction module is used for extracting a characteristic data set of the traffic credit rating of the traffic transportation unit;
the data processing module is used for clustering the characteristic data set, selecting the optimal cluster number and establishing a rating model;
and the rating module is used for inputting the characteristic data set into the rating model and rating the label-free data of the transportation units.
According to an aspect of the invention, further comprising:
and the model optimization module is used for optimizing the rating model by taking the Mahalanobis distance as an optimization target after the data processing module establishes the rating model.
To achieve the above object, the present invention also provides an electronic device, including:
at least one processor; and
a memory coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to implement the above-described method.
To achieve the above object, the present invention also provides a computer-readable storage medium for storing a computer program, which when executed, is capable of implementing the above method.
According to one scheme of the invention, the traffic credit rating method based on data driving for the transportation unit adopts an unsupervised learning mode (namely, unlabeled data is adopted) to rate the credit of the transportation unit, the method adopting the unlabeled data rating does not need a large amount of data support, does not need to refer to historical experience data, has low cost for acquiring data, and does not need to consider the accuracy of the labeled data in the traditional supervised learning. Therefore, under the condition of rating the non-tag data, the rating method provided by the invention has the advantages of higher data processing speed, higher stability and higher accuracy. Moreover, the invention takes the spatial distribution relation of data into consideration, and provides a global K-means algorithm based on Mahalanobis distance optimization to cluster and grade the data set, so that the method has the advantages of low clustering loss, fast convergence and stronger robustness compared with the traditional model, and can provide accurate and stable decision support for enterprise traffic credit evaluation.
Drawings
FIG. 1 is a flow chart of a method for credit rating of a transportation unit according to the present invention;
FIG. 2 shows a flow chart of a factor analysis method;
FIG. 3 shows a global K-means flow diagram;
FIG. 4 is a schematic illustration of the elbow rule;
FIG. 5 is a graph of a comparison of square-like errors for conventional K-means and global K-means at Euclidean distance and Mahalanobis distance;
FIG. 6 is a flow chart of an enterprise traffic credit rating process;
FIG. 7 is a block diagram of a transportation unit credit rating system according to the present invention;
FIG. 8 is a graph of the classification results of a data set at an optimal cluster number.
Detailed Description
The content of the invention will now be discussed with reference to exemplary embodiments. It is to be understood that the embodiments discussed are merely intended to enable one of ordinary skill in the art to better understand and thus implement the teachings of the present invention, and do not imply any limitations on the scope of the invention.
As used herein, the term "include" and its variants are to be read as open-ended terms meaning "including, but not limited to. The term "based on" is to be read as "based, at least in part, on". The terms "one embodiment" and "an embodiment" are to be read as "at least one embodiment".
Fig. 1 schematically shows a flow chart of a method for credit rating of a transportation unit according to the invention. As shown in fig. 1, the method for rating credit of transportation unit according to the present invention comprises the following steps:
a. extracting a characteristic data set of the traffic credit rating of a traffic transportation unit;
b. clustering the characteristic data set, selecting the optimal cluster number, and establishing a rating model;
c. inputting the characteristic data set into the rating model, and rating the label-free data of the transportation units.
In the invention, the non-tag data rating is the rating of the characteristic data in an unsupervised learning mode, compared with the situation that the characteristic data is obtained by referring to the historical experience data in the tag data rating process, the characteristic data required by the rating can be obtained by inputting the non-tag data rating process into the rating model without referring to the historical experience data.
According to an embodiment of the present invention, before the step b or between the step b and the step c, the method further comprises the step e: and optimizing the rating model by taking the Mahalanobis distance of the feature data set as an optimization target so as to solve the problem of dimension difference.
According to the Euclidean distance adopted by the traditional K-means clustering, if the difference between variables is very different, the effect of a small variable value in the clustering process is ignored, the Mahalanobis distance is introduced, namely, covariance is introduced to the Euclidean distance, a distance formula is modified, and a target is optimized. The mahalanobis distance formula is:
Figure BDA0002788868390000061
wherein xiFor the ith data, mjBelongs to the centroid, Σ, of the corresponding cluster j for data i-1Is the inverse of the covariance.
According to an embodiment of the present invention, the feature data set includes: administrative penalty Z1(ii) a Subject to a general administrative penalty x4(ii) a Is subjected to severe administrative penalty x5(ii) a Reputation evaluation Z2(ii) a Reputation evaluation obtaining AAA level x1(ii) a Reputation evaluation to obtain AA x2(ii) a Reputation evaluation to obtain B-level x3
Further, in the step a, the feature data set is obtained by a factor analysis method, where the factor analysis method is shown in fig. 2 and includes:
a1. acquiring a feature data set of traffic credit rating;
a2. constructing a matrix formula: x is AZ + epsilon, wherein the matrix A is an n X m load matrix, the vector Z is an m-dimensional common factor, epsilon is an n-dimensional error vector, and X is an n-dimensional observation vector;
a3. according to the formula cov (x) ═ AAT+ Cov (epsilon) may solve for the load matrix a, where Cov (x) represents the covariance of the variables;
a4. determining a common factor Z, Z ═ ATCov(X)-1X;
a5. Requiring KMO value to be more than 0.5 and Bartlett testing sig value to be less than 0.05, the data set can be subjected to factor analysis, otherwise, the data set needs to be modified again.
According to an embodiment of the invention, in the step b, clustering is carried out on the characteristic data set based on a global K-means algorithm, the optimal cluster number is selected according to the class square sum and the elbow principle, and a global K-means traffic credit rating model is established;
the method for selecting the optimal cluster number is shown in fig. 3, and includes:
b1. initializing a clustering center as an average value of the full data;
b2. taking the Mahalanobis distance as an optimization target, observing a corresponding clustering center with the least class square sum loss after N iterations, and taking the clustering center as an optimal clustering center;
b3. sequentially increasing the cluster number by k +1, wherein the initialization cluster center of k' is composed of an optimal cluster center and an initial point which is randomly selected, and k is the cluster number;
b4. and observing the change of class squares and loss function curves under different k values, and determining the optimal cluster number according to an elbow rule.
For example, the method for selecting the optimal cluster number uses the average value of the feature data set as the centroid when the cluster number is 1, gradually increases the k value with the step length of 1, and the initial centroid is (o)1,o2,...,ok-1,ok) Wherein o is1,o2,...,ok-1For the optimal cluster centroid obtained, OkAre randomly selected centroid points. And then, carrying out inner loop iteration, wherein N times of iteration is needed when k optimal centroids are selected each time until the quasi-square error tends to be stable. And then, performing outer loop iteration, and observing a loss-cluster number curve after k optimal centroids are selected, wherein if the elbow method is satisfied, the elbow position is the position of an inflection point, and the optimal k is 3 in the example, as shown in fig. 4. And finally outputting the optimal k value and the corresponding k centroids.
According to an embodiment of the present invention, in the step c, after the global K-means traffic credit rating model is established, the cluster center is normalized to obtain the rating result of the corresponding cluster, the feature data set is input into the global K-means traffic credit rating model, the feature data is divided into the corresponding clusters according to the shortest distance, and the rating is performed according to the rating result of the clusters. In this embodiment, inputting the feature data set into a rating model for rating includes:
and inputting the feature data set into a global K-means traffic credit rating model of the K cluster centers to obtain the mass center of each cluster center.
The four types of comparison models are applied to the traffic credit related data sets of the traffic transportation enterprises and are respectively K-means + Euclidean distance (KE), K-means + Mahalanobis distance (KM), global K-means + Euclidean distance (GKE) and global K-means + Mahalanobis distance (GKM). The comparison results are represented by the square-like loss curves, as shown in fig. 5.
According to an embodiment of the present invention, in the step c, the cluster center is normalized to obtain a rating result of the corresponding cluster, the feature data set is input into the optimized global K-means traffic credit rating model, and the feature data is divided into the corresponding clusters according to the shortest distance and rated, as shown in fig. 6, the method further includes the following steps:
d1. normalizing the clustering centers, and dividing each centroid into corresponding credit grades according to the nominal meaning of the indexes;
d2. calculating the distance between each centroid and the feature data of the transportation unit based on the Mahalanobis distance, and dividing the feature data into corresponding clusters according to the shortest distance;
d3. and converting the corresponding clusters of the feature data into corresponding traffic credit rating results to finish the traffic credit rating of the transportation units.
Specifically, the feature data set is input into the optimized global K-means, the centroid coordinates are normalized, namely mapped to the positions between [0 and 1], relevant management departments determine the importance degree between attributes, the data are weighted and determine the final evaluation data, the final evaluation data are sorted according to the scale significance, the grading level represented by each cluster is determined, the distance between the enterprise data and the centroid coordinates is calculated based on the Mahalanobis distance, and the enterprise traffic credit is graded according to the shortest distance.
Compared with the similar traditional clustering method, the method provided by the invention has the advantages of higher precision, stronger robustness and quicker convergence, and can provide decision support for credit rating of transportation enterprises.
In order to achieve the object of the present invention, the present invention further provides a credit rating system for transportation units, the block diagram of the system is shown in fig. 7, and the system comprises:
the data extraction module is used for extracting a characteristic data set of the traffic credit rating of the traffic transportation unit;
the data processing module is used for clustering the characteristic data set, selecting the optimal cluster number and establishing a rating model;
and the rating module is used for inputting the characteristic data set into a rating model and rating the label-free data of the transportation units.
According to an embodiment of the present invention, the data processing module further comprises a model optimization module, configured to optimize the rating model by using mahalanobis distance as an optimization target after the data processing module establishes the rating model, so as to solve the dimension difference problem.
According to the Euclidean distance adopted by the traditional K-means clustering, if the difference between variables is very different, the effect of a small variable value in the clustering process is ignored, the Mahalanobis distance is introduced, namely, covariance is introduced to the Euclidean distance, a distance formula is modified, and a target is optimized. The mahalanobis distance formula is:
Figure BDA0002788868390000091
wherein xiFor the ith data, mjBelongs to the centroid, Σ, of the corresponding cluster j for data i-1Is the inverse of the covariance.
According to an embodiment of the present invention, the feature data set includes: administrative penalty Z1(ii) a Subject to a general administrative penalty x4(ii) a Is subjected to severe administrative penalty x5(ii) a Reputation evaluation Z2(ii) a Reputation evaluation obtaining AAA level x1(ii) a Reputation evaluation to obtain AA x2(ii) a Reputation evaluation to obtain B-level x3
Further, the data extraction module obtains the feature data set by using a factor analysis method, where the factor analysis method is shown in fig. 2 and includes:
a1. acquiring a feature data set of traffic credit rating;
a2. constructing a matrix formula: x is AZ + epsilon, wherein the matrix A is an n X m load matrix, the vector Z is an m-dimensional common factor, epsilon is an n-dimensional error vector, and X is an n-dimensional observation vector;
a3. according to the formula cov (x) ═ AAT+ Cov (epsilon) may solve for the load matrix a, where Cov (x) represents the covariance of the variables;
a4. determining a common factor Z, Z ═ ATCov(X)-1X;
a5. Requiring KMO value to be more than 0.5 and Bartlett testing sig value to be less than 0.05, the data set can be subjected to factor analysis, otherwise, the data set needs to be modified again.
According to one embodiment of the invention, the data processing module clusters the feature data set based on a global K-means algorithm, selects the optimal cluster number according to the class square sum and the elbow principle, and establishes a global K-means traffic credit rating model;
the method for selecting the optimal cluster number is shown in fig. 3, and includes:
b1. initializing a clustering center as an average value of the full data;
b2. taking the Mahalanobis distance as an optimization target, observing a corresponding clustering center with the least class square sum loss after N iterations, and taking the clustering center as an optimal clustering center;
b3. sequentially increasing the cluster number by k +1, wherein the initialization cluster center of k' is composed of an optimal cluster center and an initial point which is randomly selected, and k is the cluster number;
b4. and observing the change of class squares and loss function curves under different k values, and determining the optimal cluster number according to an elbow rule.
For example, the method for selecting the optimal cluster number uses the average value of the feature data set as the centroid when the cluster number is 1, gradually increases the k value with the step length of 1, and the initial centroid is (o)1,o2,...,ok-1,ok) Wherein o is1,o2,....,ok-1For the obtained optimal clustering centroid, okAre randomly selected centroid points. And then, carrying out inner loop iteration, wherein N times of iteration is needed when k optimal centroids are selected each time until the quasi-square error tends to be stable. And then, performing outer loop iteration, and observing a loss-cluster number curve after k optimal centroids are selected, wherein if the elbow method is satisfied, the elbow position is the position of an inflection point, and the optimal k is 3 in the example, as shown in fig. 4. And finally outputting the optimal k value and the corresponding k centroids.
According to one embodiment of the invention, after the data processing module establishes the global K-means traffic credit rating model, the rating model normalizes the clustering center to obtain the rating result of the corresponding cluster, inputs the feature data set into the global K-means traffic credit rating model, divides the feature data into the corresponding cluster according to the shortest distance, and performs rating according to the rating result of the cluster. In this embodiment, inputting the feature data set into a rating model for rating includes:
and inputting the feature data set into a global K-means traffic credit rating model of the K cluster centers to obtain the mass center of each cluster center.
The four types of comparison models are applied to the traffic credit related data sets of the traffic transportation enterprises and are respectively K-means + Euclidean distance (KE), K-means + Mahalanobis distance (KM), global K-means + Euclidean distance (GKE) and global K-means + Mahalanobis distance (GKM). The comparison results are represented by the square-like loss curves, as shown in fig. 5.
According to an embodiment of the present invention, the rating module normalizes the cluster center to obtain a rating result of the corresponding cluster, inputs the feature data set into the optimized global K-means traffic credit rating model, and divides the feature data into the corresponding clusters according to the shortest distance and performs rating, as shown in fig. 6, the method further includes the following steps:
d1. normalizing the clustering centers, and dividing each centroid into corresponding credit grades according to the nominal meaning of the indexes;
d2. calculating the distance between each centroid and the feature data of the transportation unit based on the Mahalanobis distance, and dividing the feature data into corresponding clusters according to the shortest distance;
d3. and converting the corresponding clusters of the feature data into corresponding traffic credit rating results to finish the traffic credit rating of the transportation units.
Specifically, the feature data set is input into the optimized global K-means, the centroid coordinates are normalized, namely mapped to [0,1], relevant management departments determine the importance degree between attributes, the data are weighted and determine the final evaluation data, the final evaluation data are sorted according to the scale significance, the grading level represented by each cluster is determined, the distance between the enterprise data and the centroid coordinates is calculated based on the Mahalanobis distance, and the enterprise traffic credit is graded according to the shortest distance.
Compared with the similar traditional clustering system, the system provided by the invention has the advantages of higher precision, stronger robustness and quicker convergence, and can provide decision support for credit rating of transportation enterprises.
In order to achieve the object of the present invention, the present invention also provides an electronic device comprising: at least one processor; and a memory coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor to implement the above method.
The present invention also provides a computer-readable storage medium for storing a computer program which, when executed, is capable of carrying out the above-mentioned method.
In addition, the present invention is specifically described in the above aspects of the present invention by way of an embodiment with reference to the accompanying drawings.
Example 1:
with reference to fig. 1, the rating method steps are as follows:
a: feature set for extracting traffic credit rating
a 1: the traffic credit rating data shown in the table 1 is obtained, the data bars are reward and punishment times obtained by the enterprise in half a year, and one line of data bars represent a traffic transportation enterprise;
based on a2, a3, a4 and a5, the common factor Z can be found to load the matrix a. As shown in table 2. z is a radical ofiIs a common factor, xjAre observed variables.
Figure BDA0002788868390000121
TABLE 1
Figure BDA0002788868390000122
Figure BDA0002788868390000131
TABLE 2
e: the Mahalanobis distance is used for replacing the Euclidean distance to serve as an optimization target, and the problem of dimension difference is solved;
b: clustering the constructed data set based on a global K-means algorithm, and selecting the optimal cluster number according to the class square sum and the elbow principle;
by observing the change curve of the loss function along with the cluster number and combining the elbow principle, global K-means and K-means based on Euclidean distance and Mahalanobis distance are compared, and the fact that the loss of the global K-means algorithm based on Mahalanobis distance has the advantages of fast convergence, small loss and stability is shown in figure 5. The mahalanobis distance based global K-means cluster number is determined to be 5, corresponding to credit ratings a, B, C, D, E, scaled in the sense that the credit is a best and the credit is E worst.
c: inputting the data set into the optimized global K-means traffic credit rating model to evaluate the transportation enterprises under the condition of no label;
the respective centroid data coordinates are as in table 3 for cluster number 5. When the number of clusters is 5, the data set clustering result is shown in FIG. 8.
Figure BDA0002788868390000132
TABLE 3
d: normalizing the clustering centers and then sorting to obtain the rating results of the corresponding clusters; inputting the data set into global K-means, dividing the data into corresponding clusters according to the shortest distance and grading;
d 1: selecting 10 pieces of example data as K-means input for determining the optimal cluster number;
d 2: normalizing the centroid;
d 3: the management department is assumed to determine the weight to be as important as credit evaluation and administrative penalty, and the weight is 0.5;
d 4: determining a composite rank of merit for centroids based on d2 and d3, as shown in Table 3
Figure BDA0002788868390000141
TABLE 4
d 5: calculating the mahalanobis distance between the credit-related data of the enterprise and the centroid;
d 6: based on the calculation result of d5, the enterprises are ranked according to the shortest distance, and the results are shown in Table 4
Figure BDA0002788868390000142
Figure BDA0002788868390000151
TABLE 5
As described above, the above implementation method of the present invention first analyzes and identifies 5 factors affecting traffic credits; on the basis, 5 factors are summarized by a factor analysis method, and the variables are mapped to a two-dimensional space, so that the method can better explain the relationship among the variables; and in consideration of the spatial distribution relation of data, a global K-means algorithm based on Mahalanobis distance optimization is provided for clustering and grading the data set.
Compared with the traditional model, the method provided by the invention has the advantages of low clustering loss, fast convergence and stronger robustness, and can provide decision support for enterprise traffic credit evaluation.
Those of ordinary skill in the art will appreciate that the modules and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and devices may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, each functional module in the embodiments of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method for transmitting/receiving the power saving signal according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.
It should be understood that the order of execution of the steps in the summary of the invention and the embodiments of the present invention does not absolutely imply any order of execution, and the order of execution of the steps should be determined by their functions and inherent logic, and should not be construed as limiting the process of the embodiments of the present invention.

Claims (4)

1. A method for rating credit of a transportation unit, comprising:
extracting a characteristic data set of the traffic credit rating of a traffic transportation unit;
clustering the characteristic data set, selecting the optimal cluster number, and establishing a rating model;
inputting the feature data set into the rating model, and rating the label-free data of the transportation units;
further comprising: after a rating model is established, optimizing the rating model by taking the Mahalanobis distance of the feature data set as an optimization target, wherein the Mahalanobis distance formula is as follows:
Figure DEST_PATH_IMAGE002
wherein xiIs as follows
Figure DEST_PATH_IMAGE004
Bar data, mjFor a centroid where data i belongs to the corresponding cluster j,
Figure DEST_PATH_IMAGE006
is the inverse of the covariance;
the feature data set includes: administrative penalty Z1(ii) a Subject to a general administrative penalty x4(ii) a Is subjected to severe administrative penalty x5(ii) a Reputation evaluation Z2(ii) a Reputation evaluation obtaining AAA level x1(ii) a Reputation evaluation to obtain AA x2(ii) a Reputation evaluation to obtain B-level x3
Obtaining the feature data set based on a factor analysis method, wherein the factor analysis method comprises:
acquiring a feature data set of traffic credit rating;
constructing a matrix formula:
Figure DEST_PATH_IMAGE008
wherein the load matrix
Figure DEST_PATH_IMAGE010
Is composed of
Figure DEST_PATH_IMAGE012
Load matrix, vector
Figure DEST_PATH_IMAGE014
Is composed of
Figure DEST_PATH_IMAGE016
The dimension(s) of the common factor,
Figure DEST_PATH_IMAGE018
is composed of
Figure DEST_PATH_IMAGE020
A vector of the dimensional error is calculated,
Figure DEST_PATH_IMAGE022
is composed of
Figure 892422DEST_PATH_IMAGE020
A dimensional observation vector;
according to the formula
Figure DEST_PATH_IMAGE024
Determining a load matrix
Figure 183464DEST_PATH_IMAGE010
Wherein
Figure DEST_PATH_IMAGE026
Represents the covariance of the variables;
the common factor Z is found out,
Figure DEST_PATH_IMAGE028
requiring the KMO value to be more than 0.5 and the Bartlett test sig value to be less than 0.05, indicating that the data set can be subjected to factor analysis, otherwise, revising the data set;
clustering the feature data set based on a global K-means algorithm, selecting the optimal cluster number according to the class square sum and the elbow principle, and establishing a global K-means traffic credit rating model;
the method for selecting the optimal cluster number comprises the following steps:
initializing a clustering center as an average value of the full data;
taking the Mahalanobis distance as an optimization target, observing a corresponding clustering center with the least class square sum loss after N iterations, and taking the clustering center as an optimal clustering center;
order to
Figure DEST_PATH_IMAGE030
The number of the clustering clusters is increased in turn,
Figure DEST_PATH_IMAGE032
the initialization clustering center consists of the optimal clustering center and an initial point which is randomly selected, wherein k is the number of clusters;
observing the change of class squares and loss function curves under different k values, and determining the optimal cluster number according to an elbow rule;
after the global K-means traffic credit rating model is established, normalizing the clustering center to obtain the rating result of the corresponding cluster, inputting the feature data set into the global K-means traffic credit rating model, dividing the feature data into the corresponding clusters according to the shortest distance, and rating according to the rating result of the clusters;
the normalizing the clustering centers to obtain the rating results of the corresponding clusters, inputting the feature data sets into the global K-means traffic credit rating model, dividing the feature data into the corresponding clusters according to the shortest distance, and rating according to the rating results of the clusters, and the method comprises the following steps:
inputting the feature data set into a global K-means traffic credit rating model of K cluster centers to obtain the mass center of each cluster center;
the normalizing the clustering centers to obtain the rating results of the corresponding clusters, inputting the feature data sets into the global K-means traffic credit rating model, dividing the feature data into the corresponding clusters according to the shortest distance, and rating according to the rating results of the clusters, and the method further comprises the following steps:
normalizing the clustering centers, and dividing each centroid into corresponding credit grades according to the nominal meaning of the indexes;
calculating the distance between each centroid and the feature data of the transportation unit based on the Mahalanobis distance, and dividing the feature data into corresponding clusters according to the shortest distance;
and converting the corresponding clusters of the feature data into corresponding traffic credit rating results to finish the traffic credit rating of the transportation units.
2. A transportation unit credit rating system, comprising:
the data extraction module is used for extracting a characteristic data set of the traffic credit rating of the traffic transportation unit;
the data processing module is used for clustering the characteristic data set, selecting the optimal cluster number and establishing a rating model;
the rating module is used for inputting the characteristic data set into the rating model and rating the label-free data of the transportation units;
further comprising:
the model optimization module is used for optimizing the rating model by taking the Mahalanobis distance as an optimization target after the data processing module establishes the rating model, wherein the Mahalanobis distance formula is as follows:
Figure DEST_PATH_IMAGE033
wherein xiIs as follows
Figure 280645DEST_PATH_IMAGE004
Bar data, mjFor a centroid where data i belongs to the corresponding cluster j,
Figure 488903DEST_PATH_IMAGE006
is the inverse of the covariance;
the feature data set includes: administrative penalty Z1(ii) a Subject to a general administrative penalty x4(ii) a Is subjected to severe administrative penalty x5(ii) a Reputation evaluation Z2(ii) a Reputation evaluation obtaining AAA level x1(ii) a Reputation evaluation to obtain AA x2(ii) a Reputation evaluation to obtain B-level x3
Obtaining the feature data set based on a factor analysis method, wherein the factor analysis method comprises:
acquiring a feature data set of traffic credit rating;
constructing a matrix formula:
Figure DEST_PATH_IMAGE034
wherein the load matrix
Figure 677177DEST_PATH_IMAGE010
Is composed of
Figure 561956DEST_PATH_IMAGE012
Load matrix, vector
Figure 828990DEST_PATH_IMAGE014
Is composed of
Figure 840939DEST_PATH_IMAGE016
The dimension(s) of the common factor,
Figure 572135DEST_PATH_IMAGE018
is composed of
Figure 877083DEST_PATH_IMAGE020
A vector of the dimensional error is calculated,
Figure 631412DEST_PATH_IMAGE022
is composed of
Figure 430741DEST_PATH_IMAGE020
A dimensional observation vector;
according to the formula
Figure 767176DEST_PATH_IMAGE024
Determining a load matrix
Figure 259337DEST_PATH_IMAGE010
Wherein
Figure 812547DEST_PATH_IMAGE026
Represents the covariance of the variables;
the common factor Z is found out,
Figure 87670DEST_PATH_IMAGE028
requiring the KMO value to be more than 0.5 and the Bartlett test sig value to be less than 0.05, indicating that the data set can be subjected to factor analysis, otherwise, revising the data set;
clustering the feature data set based on a global K-means algorithm, selecting the optimal cluster number according to the class square sum and the elbow principle, and establishing a global K-means traffic credit rating model;
the method for selecting the optimal cluster number comprises the following steps:
initializing a clustering center as an average value of the full data;
taking the Mahalanobis distance as an optimization target, observing a corresponding clustering center with the least class square sum loss after N iterations, and taking the clustering center as an optimal clustering center;
order to
Figure 793458DEST_PATH_IMAGE030
The number of the clustering clusters is increased in turn,
Figure 941674DEST_PATH_IMAGE032
the initialization clustering center consists of the optimal clustering center and an initial point which is randomly selected, wherein k is the number of clusters;
observing the change of class squares and loss function curves under different k values, and determining the optimal cluster number according to an elbow rule;
after the global K-means traffic credit rating model is established, normalizing the clustering center to obtain the rating result of the corresponding cluster, inputting the feature data set into the global K-means traffic credit rating model, dividing the feature data into the corresponding clusters according to the shortest distance, and rating according to the rating result of the clusters;
the normalizing the clustering centers to obtain the rating results of the corresponding clusters, inputting the feature data sets into the global K-means traffic credit rating model, dividing the feature data into the corresponding clusters according to the shortest distance, and rating according to the rating results of the clusters, and the method comprises the following steps:
inputting the feature data set into a global K-means traffic credit rating model of K cluster centers to obtain the mass center of each cluster center;
the normalizing the clustering centers to obtain the rating results of the corresponding clusters, inputting the feature data sets into the global K-means traffic credit rating model, dividing the feature data into the corresponding clusters according to the shortest distance, and rating according to the rating results of the clusters, and the method further comprises the following steps:
normalizing the clustering centers, and dividing each centroid into corresponding credit grades according to the nominal meaning of the indexes;
calculating the distance between each centroid and the feature data of the transportation unit based on the Mahalanobis distance, and dividing the feature data into corresponding clusters according to the shortest distance;
and converting the corresponding clusters of the feature data into corresponding traffic credit rating results to finish the traffic credit rating of the transportation units.
3. An electronic device, comprising:
at least one processor; and
a memory coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to implement the method of claim 1.
4. A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program, which when executed is capable of implementing the method of claim 1.
CN202011307960.4A 2020-11-20 2020-11-20 Method and system for rating credit of transportation unit, electronic device and storage medium Active CN112418987B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011307960.4A CN112418987B (en) 2020-11-20 2020-11-20 Method and system for rating credit of transportation unit, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011307960.4A CN112418987B (en) 2020-11-20 2020-11-20 Method and system for rating credit of transportation unit, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN112418987A CN112418987A (en) 2021-02-26
CN112418987B true CN112418987B (en) 2022-04-29

Family

ID=74773836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011307960.4A Active CN112418987B (en) 2020-11-20 2020-11-20 Method and system for rating credit of transportation unit, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN112418987B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837836A (en) * 2021-09-18 2021-12-24 珠海格力电器股份有限公司 Model recommendation method, device, equipment and storage medium
CN115511506A (en) * 2022-09-30 2022-12-23 中国电子科技集团公司第十五研究所 Enterprise credit rating method, device, terminal equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027836A (en) * 2019-12-03 2020-04-17 泰州企业征信服务有限公司 Enterprise public credit rating system and method
CN111047193A (en) * 2019-12-13 2020-04-21 上海海豚企业征信服务有限公司 Enterprise credit scoring model generation algorithm based on credit big data label

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017181346A1 (en) * 2016-04-19 2017-10-26 大连理工大学 Optimal dividing method for credit grade based on credit similarity maximization

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027836A (en) * 2019-12-03 2020-04-17 泰州企业征信服务有限公司 Enterprise public credit rating system and method
CN111047193A (en) * 2019-12-13 2020-04-21 上海海豚企业征信服务有限公司 Enterprise credit scoring model generation algorithm based on credit big data label

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Mahalanobis Distances on Factor Model Based Estimation;Deliang Dai;《MDPI》;20200305;第1-11页 *
基于二阶聚类算法的常规公交乘客出行特征分析;张懿木 等;《综合运输》;20200320;第42卷(第3期);第120-126页 *
基于因子分析和聚类分析的中小制造企业信用评级研究;赵冬梅 等;《电子设计工程》;20150405;第23卷(第7期);第82-85页 *
大数据平台商业银行小企业贷款信用风险评价与控制;王少君;《中国优秀硕士学位论文全文数据库》;20190715(第7期);第15-38页 *
我国农村信用社客户信用评级系统研究;蒋华 等;《科技导报》;20070425;第25卷(第8期);第54-60页 *

Also Published As

Publication number Publication date
CN112418987A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
CN108023876B (en) Intrusion detection method and intrusion detection system based on sustainability ensemble learning
CN112070125A (en) Prediction method of unbalanced data set based on isolated forest learning
CN110443281B (en) Text classification self-adaptive oversampling method based on HDBSCAN (high-density binary-coded decimal) clustering
CN111324642A (en) Model algorithm type selection and evaluation method for power grid big data analysis
CN110956273A (en) Credit scoring method and system integrating multiple machine learning models
CN110866782B (en) Customer classification method and system and electronic equipment
CN112418987B (en) Method and system for rating credit of transportation unit, electronic device and storage medium
CN112395487B (en) Information recommendation method and device, computer readable storage medium and electronic equipment
CN115661550B (en) Graph data category unbalanced classification method and device based on generation of countermeasure network
Liu et al. A feature gene selection method based on ReliefF and PSO
Wang et al. Partition cost-sensitive CART based on customer value for Telecom customer churn prediction
Wang et al. Big data analytics for price forecasting in smart grids
CN113837266B (en) Software defect prediction method based on feature extraction and Stacking ensemble learning
Yotsawat et al. Improved credit scoring model using XGBoost with Bayesian hyper-parameter optimization
Li et al. An efficient noise-filtered ensemble model for customer churn analysis in aviation industry
Dutta et al. Clustering by multi objective genetic algorithm
CN108804588A (en) A kind of mixed data flow data label method
Yazdani et al. Fuzzy classification method in credit risk
CN111708865A (en) Technology forecasting and patent early warning analysis method based on improved XGboost algorithm
Kanwal et al. An attribute weight estimation using particle swarm optimization and machine learning approaches for customer churn prediction
CN115730152A (en) Big data processing method and big data processing system based on user portrait analysis
Benchaji et al. Novel learning strategy based on genetic programming for credit card fraud detection in Big Data
Zheng Application of silence customer segmentation in securities industry based on fuzzy cluster algorithm
CN114091961A (en) Power enterprise supplier evaluation method based on semi-supervised SVM
Sridevi et al. Comparative study on various clustering algorithms review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant