CN112418987B

CN112418987B - Method and system for rating credit of transportation unit, electronic device and storage medium

Info

Publication number: CN112418987B
Application number: CN202011307960.4A
Authority: CN
Inventors: 文琰杰; 许旺土; 李传明; 黄永燊; 丁昌星
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2022-04-29
Anticipated expiration: 2040-11-20
Also published as: CN112418987A

Abstract

The invention provides a method, a system, electronic equipment and a computer readable storage medium for credit rating of a transportation unit, wherein the method comprises the following steps: extracting a characteristic data set of the traffic credit rating of a traffic transportation unit; clustering the characteristic data set, selecting the optimal cluster number, and establishing a rating model; inputting the characteristic data set into the rating model, and rating the label-free data of the transportation units. According to the scheme of the invention, the credit of the transportation unit is graded in an unsupervised learning mode (namely, unlabeled data is adopted), the method for grading the unlabeled data does not need a large amount of data support, does not need to refer to historical experience data, has low cost for acquiring the data, and does not need to consider the accuracy of the labeled data in the traditional supervised learning. Compared with the similar traditional clustering scheme, the method has the advantages of higher precision, stronger robustness and quicker convergence, and can provide decision support for credit rating of transportation enterprises.

Description

Method and system for rating credit of transportation unit, electronic device and storage medium

Technical Field

The invention relates to the technical field of transportation, in particular to a method and a system for rating credit of a transportation unit, electronic equipment and a computer-readable storage medium.

Background

The construction of a traffic credit system is one of key work of traffic management departments in recent years, and the traffic credit assessment of a traffic and transportation enterprise has extremely important significance for standardizing credit information of the enterprise and building city credit market promotion self-discipline.

The development of enterprise credit rating models has led to extensive research interest in academia and business industries. The evaluation method can be divided into a traditional method and an artificial intelligence-based evaluation method. The traditional method mainly comprises the following steps: factor analysis, experience discrimination, multivariate discrimination analysis and prediction; the artificial intelligence rating mainly comprises: support vector machines, artificial neural networks, inductive learning, and the like. The above methods are all based on supervised learning on the premise that data has tags, a large amount of data support is needed, the cost for acquiring data is high, and the accuracy of the tags cannot be guaranteed. Moreover, the traditional rating method and model have serious clustering loss, slow convergence and poor robustness, and can not accurately and stably provide decision support for the evaluation of traffic credit of transportation units because the characteristic characteristics of characteristic data can not be accurately and effectively maintained.

Disclosure of Invention

The invention aims to solve at least one technical problem in the background art and provides a method, a system, an electronic device and a computer-readable storage medium for rating the credit of a transportation unit.

In order to achieve the above object, the present invention provides a method for rating credit of transportation units, comprising:

extracting a characteristic data set of the traffic credit rating of a traffic transportation unit;

clustering the characteristic data set, selecting the optimal cluster number, and establishing a rating model;

inputting the characteristic data set into the rating model, and rating the label-free data of the transportation units.

According to an aspect of the invention, further comprising: after establishing a rating model, optimizing the rating model by taking the Mahalanobis distance of the feature data set as an optimization target, wherein the Mahalanobis distanceThe formula is as follows:

wherein x_iFor the ith data, m_jBelongs to the centroid, Σ, of the corresponding cluster j for data i^-1Is the inverse of the covariance.

According to one aspect of the invention, the feature data set comprises: administrative penalty Z₁(ii) a Subject to a general administrative penalty x₄(ii) a Is subjected to severe administrative penalty x₅(ii) a Reputation evaluation Z₂(ii) a Reputation evaluation obtaining AAA level x₁(ii) a Reputation evaluation to obtain AA x₂(ii) a Reputation evaluation to obtain B-level x₃。

According to one aspect of the invention, the feature data set is obtained based on a factor analysis method, wherein the factor analysis method comprises:

acquiring a feature data set of traffic credit rating;

constructing a matrix formula: the matrix A is an n × m load matrix, the vector Z is an m-dimensional common factor, the epsilon is an n-dimensional error vector, X is an n-dimensional observation vector, and Z is a feature data set after the data set X is screened; (ii) a

According to the formula cov (x) ═ AA^T+ Cov (epsilon) may solve for the load matrix a, where Cov (x) represents the covariance of the variables;

determining a common factor Z, Z ═ A^TCov(X)^-1X；

Requiring KMO value to be more than 0.5 and Bartlett testing sig value to be less than 0.05, the data set can be subjected to factor analysis, otherwise, the data set needs to be modified again.

According to one aspect of the invention, clustering is carried out on the feature data set based on a global K-means algorithm, the optimal cluster number is selected according to the class square sum and the elbow principle, and a global K-means traffic credit rating model is established;

the method for selecting the optimal cluster number comprises the following steps:

initializing a clustering center as an average value of the full data;

taking the Mahalanobis distance as an optimization target, observing a corresponding clustering center with the least class square sum loss after N iterations, and taking the clustering center as an optimal clustering center;

sequentially increasing the cluster number by k +1, wherein the initialization cluster center of k' is composed of the optimal cluster center and an initial point which is randomly selected, and k is the cluster number;

and observing the change of class squares and loss function curves under different k values, and determining the optimal cluster number according to an elbow rule.

According to one aspect of the invention, after the global K-means traffic credit rating model is established, the cluster centers are normalized to obtain the rating results of the corresponding clusters, the feature data sets are input into the global K-means traffic credit rating model, the feature data sets are divided into the corresponding clusters according to the shortest distance, and the rating is carried out according to the rating results of the clusters.

According to an aspect of the present invention, the normalizing the cluster centers to obtain the rating results of the corresponding clusters, inputting the feature data sets into the global K-means traffic credit rating model, dividing the feature data into the corresponding clusters according to the shortest distance, and rating according to the rating results of the clusters includes:

and inputting the feature data set into a global K-means traffic credit rating model of the K cluster centers to obtain the mass center of each cluster center.

According to an aspect of the present invention, the normalizing the cluster centers to obtain the rating results of the corresponding clusters, inputting the feature data sets into the global K-means traffic credit rating model, dividing the feature data into the corresponding clusters according to the shortest distance, and rating according to the rating results of the clusters, further includes:

normalizing the clustering centers, and dividing each centroid into corresponding credit grades according to the nominal meaning of the indexes;

calculating the distance between each centroid and the feature data of the transportation unit based on the Mahalanobis distance, and dividing the feature data into corresponding clusters according to the shortest distance;

and converting the corresponding clusters of the feature data into corresponding traffic credit rating results to finish the traffic credit rating of the transportation units.

In order to achieve the above object, the present invention further provides a credit rating system for transportation units, comprising:

the data extraction module is used for extracting a characteristic data set of the traffic credit rating of the traffic transportation unit;

the data processing module is used for clustering the characteristic data set, selecting the optimal cluster number and establishing a rating model;

and the rating module is used for inputting the characteristic data set into the rating model and rating the label-free data of the transportation units.

According to an aspect of the invention, further comprising:

and the model optimization module is used for optimizing the rating model by taking the Mahalanobis distance as an optimization target after the data processing module establishes the rating model.

To achieve the above object, the present invention also provides an electronic device, including:

at least one processor; and

a memory coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores a computer program executable by the at least one processor to implement the above-described method.

To achieve the above object, the present invention also provides a computer-readable storage medium for storing a computer program, which when executed, is capable of implementing the above method.

According to one scheme of the invention, the traffic credit rating method based on data driving for the transportation unit adopts an unsupervised learning mode (namely, unlabeled data is adopted) to rate the credit of the transportation unit, the method adopting the unlabeled data rating does not need a large amount of data support, does not need to refer to historical experience data, has low cost for acquiring data, and does not need to consider the accuracy of the labeled data in the traditional supervised learning. Therefore, under the condition of rating the non-tag data, the rating method provided by the invention has the advantages of higher data processing speed, higher stability and higher accuracy. Moreover, the invention takes the spatial distribution relation of data into consideration, and provides a global K-means algorithm based on Mahalanobis distance optimization to cluster and grade the data set, so that the method has the advantages of low clustering loss, fast convergence and stronger robustness compared with the traditional model, and can provide accurate and stable decision support for enterprise traffic credit evaluation.

Drawings

FIG. 1 is a flow chart of a method for credit rating of a transportation unit according to the present invention;

FIG. 2 shows a flow chart of a factor analysis method;

FIG. 3 shows a global K-means flow diagram;

FIG. 4 is a schematic illustration of the elbow rule;

FIG. 5 is a graph of a comparison of square-like errors for conventional K-means and global K-means at Euclidean distance and Mahalanobis distance;

FIG. 6 is a flow chart of an enterprise traffic credit rating process;

FIG. 7 is a block diagram of a transportation unit credit rating system according to the present invention;

FIG. 8 is a graph of the classification results of a data set at an optimal cluster number.

Detailed Description

The content of the invention will now be discussed with reference to exemplary embodiments. It is to be understood that the embodiments discussed are merely intended to enable one of ordinary skill in the art to better understand and thus implement the teachings of the present invention, and do not imply any limitations on the scope of the invention.

As used herein, the term "include" and its variants are to be read as open-ended terms meaning "including, but not limited to. The term "based on" is to be read as "based, at least in part, on". The terms "one embodiment" and "an embodiment" are to be read as "at least one embodiment".

Fig. 1 schematically shows a flow chart of a method for credit rating of a transportation unit according to the invention. As shown in fig. 1, the method for rating credit of transportation unit according to the present invention comprises the following steps:

a. extracting a characteristic data set of the traffic credit rating of a traffic transportation unit;

b. clustering the characteristic data set, selecting the optimal cluster number, and establishing a rating model;

c. inputting the characteristic data set into the rating model, and rating the label-free data of the transportation units.

In the invention, the non-tag data rating is the rating of the characteristic data in an unsupervised learning mode, compared with the situation that the characteristic data is obtained by referring to the historical experience data in the tag data rating process, the characteristic data required by the rating can be obtained by inputting the non-tag data rating process into the rating model without referring to the historical experience data.

According to an embodiment of the present invention, before the step b or between the step b and the step c, the method further comprises the step e: and optimizing the rating model by taking the Mahalanobis distance of the feature data set as an optimization target so as to solve the problem of dimension difference.

According to the Euclidean distance adopted by the traditional K-means clustering, if the difference between variables is very different, the effect of a small variable value in the clustering process is ignored, the Mahalanobis distance is introduced, namely, covariance is introduced to the Euclidean distance, a distance formula is modified, and a target is optimized. The mahalanobis distance formula is:

According to an embodiment of the present invention, the feature data set includes: administrative penalty Z₁(ii) a Subject to a general administrative penalty x₄(ii) a Is subjected to severe administrative penalty x₅(ii) a Reputation evaluation Z₂(ii) a Reputation evaluation obtaining AAA level x₁(ii) a Reputation evaluation to obtain AA x₂(ii) a Reputation evaluation to obtain B-level x₃。

Further, in the step a, the feature data set is obtained by a factor analysis method, where the factor analysis method is shown in fig. 2 and includes:

a1. acquiring a feature data set of traffic credit rating;

a2. constructing a matrix formula: x is AZ + epsilon, wherein the matrix A is an n X m load matrix, the vector Z is an m-dimensional common factor, epsilon is an n-dimensional error vector, and X is an n-dimensional observation vector;

a3. according to the formula cov (x) ═ AA^T+ Cov (epsilon) may solve for the load matrix a, where Cov (x) represents the covariance of the variables;

a4. determining a common factor Z, Z ═ A^TCov(X)^-1X；

a5. Requiring KMO value to be more than 0.5 and Bartlett testing sig value to be less than 0.05, the data set can be subjected to factor analysis, otherwise, the data set needs to be modified again.

According to an embodiment of the invention, in the step b, clustering is carried out on the characteristic data set based on a global K-means algorithm, the optimal cluster number is selected according to the class square sum and the elbow principle, and a global K-means traffic credit rating model is established;

the method for selecting the optimal cluster number is shown in fig. 3, and includes:

b1. initializing a clustering center as an average value of the full data;

b2. taking the Mahalanobis distance as an optimization target, observing a corresponding clustering center with the least class square sum loss after N iterations, and taking the clustering center as an optimal clustering center;

b3. sequentially increasing the cluster number by k +1, wherein the initialization cluster center of k' is composed of an optimal cluster center and an initial point which is randomly selected, and k is the cluster number;

b4. and observing the change of class squares and loss function curves under different k values, and determining the optimal cluster number according to an elbow rule.

For example, the method for selecting the optimal cluster number uses the average value of the feature data set as the centroid when the cluster number is 1, gradually increases the k value with the step length of 1, and the initial centroid is (o)₁，o₂，...，o_k-1，o_k) Wherein o is₁，o₂，...，o_k-1For the optimal cluster centroid obtained, O_kAre randomly selected centroid points. And then, carrying out inner loop iteration, wherein N times of iteration is needed when k optimal centroids are selected each time until the quasi-square error tends to be stable. And then, performing outer loop iteration, and observing a loss-cluster number curve after k optimal centroids are selected, wherein if the elbow method is satisfied, the elbow position is the position of an inflection point, and the optimal k is 3 in the example, as shown in fig. 4. And finally outputting the optimal k value and the corresponding k centroids.

According to an embodiment of the present invention, in the step c, after the global K-means traffic credit rating model is established, the cluster center is normalized to obtain the rating result of the corresponding cluster, the feature data set is input into the global K-means traffic credit rating model, the feature data is divided into the corresponding clusters according to the shortest distance, and the rating is performed according to the rating result of the clusters. In this embodiment, inputting the feature data set into a rating model for rating includes:

The four types of comparison models are applied to the traffic credit related data sets of the traffic transportation enterprises and are respectively K-means + Euclidean distance (KE), K-means + Mahalanobis distance (KM), global K-means + Euclidean distance (GKE) and global K-means + Mahalanobis distance (GKM). The comparison results are represented by the square-like loss curves, as shown in fig. 5.

According to an embodiment of the present invention, in the step c, the cluster center is normalized to obtain a rating result of the corresponding cluster, the feature data set is input into the optimized global K-means traffic credit rating model, and the feature data is divided into the corresponding clusters according to the shortest distance and rated, as shown in fig. 6, the method further includes the following steps:

d1. normalizing the clustering centers, and dividing each centroid into corresponding credit grades according to the nominal meaning of the indexes;

d2. calculating the distance between each centroid and the feature data of the transportation unit based on the Mahalanobis distance, and dividing the feature data into corresponding clusters according to the shortest distance;

d3. and converting the corresponding clusters of the feature data into corresponding traffic credit rating results to finish the traffic credit rating of the transportation units.

Specifically, the feature data set is input into the optimized global K-means, the centroid coordinates are normalized, namely mapped to the positions between [0 and 1], relevant management departments determine the importance degree between attributes, the data are weighted and determine the final evaluation data, the final evaluation data are sorted according to the scale significance, the grading level represented by each cluster is determined, the distance between the enterprise data and the centroid coordinates is calculated based on the Mahalanobis distance, and the enterprise traffic credit is graded according to the shortest distance.

Compared with the similar traditional clustering method, the method provided by the invention has the advantages of higher precision, stronger robustness and quicker convergence, and can provide decision support for credit rating of transportation enterprises.

In order to achieve the object of the present invention, the present invention further provides a credit rating system for transportation units, the block diagram of the system is shown in fig. 7, and the system comprises:

and the rating module is used for inputting the characteristic data set into a rating model and rating the label-free data of the transportation units.

According to an embodiment of the present invention, the data processing module further comprises a model optimization module, configured to optimize the rating model by using mahalanobis distance as an optimization target after the data processing module establishes the rating model, so as to solve the dimension difference problem.

Further, the data extraction module obtains the feature data set by using a factor analysis method, where the factor analysis method is shown in fig. 2 and includes:

a1. acquiring a feature data set of traffic credit rating;

a4. determining a common factor Z, Z ═ A^TCov(X)^-1X；

According to one embodiment of the invention, the data processing module clusters the feature data set based on a global K-means algorithm, selects the optimal cluster number according to the class square sum and the elbow principle, and establishes a global K-means traffic credit rating model;

b1. initializing a clustering center as an average value of the full data;

For example, the method for selecting the optimal cluster number uses the average value of the feature data set as the centroid when the cluster number is 1, gradually increases the k value with the step length of 1, and the initial centroid is (o)₁，o₂，...，o_k-1，o_k) Wherein o is₁，o₂，....，o_k-1For the obtained optimal clustering centroid, o_kAre randomly selected centroid points. And then, carrying out inner loop iteration, wherein N times of iteration is needed when k optimal centroids are selected each time until the quasi-square error tends to be stable. And then, performing outer loop iteration, and observing a loss-cluster number curve after k optimal centroids are selected, wherein if the elbow method is satisfied, the elbow position is the position of an inflection point, and the optimal k is 3 in the example, as shown in fig. 4. And finally outputting the optimal k value and the corresponding k centroids.

According to one embodiment of the invention, after the data processing module establishes the global K-means traffic credit rating model, the rating model normalizes the clustering center to obtain the rating result of the corresponding cluster, inputs the feature data set into the global K-means traffic credit rating model, divides the feature data into the corresponding cluster according to the shortest distance, and performs rating according to the rating result of the cluster. In this embodiment, inputting the feature data set into a rating model for rating includes:

According to an embodiment of the present invention, the rating module normalizes the cluster center to obtain a rating result of the corresponding cluster, inputs the feature data set into the optimized global K-means traffic credit rating model, and divides the feature data into the corresponding clusters according to the shortest distance and performs rating, as shown in fig. 6, the method further includes the following steps:

Specifically, the feature data set is input into the optimized global K-means, the centroid coordinates are normalized, namely mapped to [0,1], relevant management departments determine the importance degree between attributes, the data are weighted and determine the final evaluation data, the final evaluation data are sorted according to the scale significance, the grading level represented by each cluster is determined, the distance between the enterprise data and the centroid coordinates is calculated based on the Mahalanobis distance, and the enterprise traffic credit is graded according to the shortest distance.

Compared with the similar traditional clustering system, the system provided by the invention has the advantages of higher precision, stronger robustness and quicker convergence, and can provide decision support for credit rating of transportation enterprises.

In order to achieve the object of the present invention, the present invention also provides an electronic device comprising: at least one processor; and a memory coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor to implement the above method.

The present invention also provides a computer-readable storage medium for storing a computer program which, when executed, is capable of carrying out the above-mentioned method.

In addition, the present invention is specifically described in the above aspects of the present invention by way of an embodiment with reference to the accompanying drawings.

Example 1:

with reference to fig. 1, the rating method steps are as follows:

a: feature set for extracting traffic credit rating

a 1: the traffic credit rating data shown in the table 1 is obtained, the data bars are reward and punishment times obtained by the enterprise in half a year, and one line of data bars represent a traffic transportation enterprise;

based on a2, a3, a4 and a5, the common factor Z can be found to load the matrix a. As shown in table 2. z is a radical of_iIs a common factor, x_jAre observed variables.

TABLE 1

TABLE 2

e: the Mahalanobis distance is used for replacing the Euclidean distance to serve as an optimization target, and the problem of dimension difference is solved;

b: clustering the constructed data set based on a global K-means algorithm, and selecting the optimal cluster number according to the class square sum and the elbow principle;

by observing the change curve of the loss function along with the cluster number and combining the elbow principle, global K-means and K-means based on Euclidean distance and Mahalanobis distance are compared, and the fact that the loss of the global K-means algorithm based on Mahalanobis distance has the advantages of fast convergence, small loss and stability is shown in figure 5. The mahalanobis distance based global K-means cluster number is determined to be 5, corresponding to credit ratings a, B, C, D, E, scaled in the sense that the credit is a best and the credit is E worst.

c: inputting the data set into the optimized global K-means traffic credit rating model to evaluate the transportation enterprises under the condition of no label;

the respective centroid data coordinates are as in table 3 for cluster number 5. When the number of clusters is 5, the data set clustering result is shown in FIG. 8.

TABLE 3

d: normalizing the clustering centers and then sorting to obtain the rating results of the corresponding clusters; inputting the data set into global K-means, dividing the data into corresponding clusters according to the shortest distance and grading;

d 1: selecting 10 pieces of example data as K-means input for determining the optimal cluster number;

d 2: normalizing the centroid;

d 3: the management department is assumed to determine the weight to be as important as credit evaluation and administrative penalty, and the weight is 0.5;

d 4: determining a composite rank of merit for centroids based on d2 and d3, as shown in Table 3

TABLE 4

d 5: calculating the mahalanobis distance between the credit-related data of the enterprise and the centroid;

d 6: based on the calculation result of d5, the enterprises are ranked according to the shortest distance, and the results are shown in Table 4

TABLE 5

As described above, the above implementation method of the present invention first analyzes and identifies 5 factors affecting traffic credits; on the basis, 5 factors are summarized by a factor analysis method, and the variables are mapped to a two-dimensional space, so that the method can better explain the relationship among the variables; and in consideration of the spatial distribution relation of data, a global K-means algorithm based on Mahalanobis distance optimization is provided for clustering and grading the data set.

Compared with the traditional model, the method provided by the invention has the advantages of low clustering loss, fast convergence and stronger robustness, and can provide decision support for enterprise traffic credit evaluation.

Those of ordinary skill in the art will appreciate that the modules and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and devices may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, each functional module in the embodiments of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method for transmitting/receiving the power saving signal according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

It should be understood that the order of execution of the steps in the summary of the invention and the embodiments of the present invention does not absolutely imply any order of execution, and the order of execution of the steps should be determined by their functions and inherent logic, and should not be construed as limiting the process of the embodiments of the present invention.

Claims

1. A method for rating credit of a transportation unit, comprising:

inputting the feature data set into the rating model, and rating the label-free data of the transportation units;

further comprising: after a rating model is established, optimizing the rating model by taking the Mahalanobis distance of the feature data set as an optimization target, wherein the Mahalanobis distance formula is as follows:

；

wherein x_iIs as follows

Bar data, m_jFor a centroid where data i belongs to the corresponding cluster j,

is the inverse of the covariance;

the feature data set includes: administrative penalty Z₁(ii) a Subject to a general administrative penalty x₄(ii) a Is subjected to severe administrative penalty x₅(ii) a Reputation evaluation Z₂(ii) a Reputation evaluation obtaining AAA level x₁(ii) a Reputation evaluation to obtain AA x₂(ii) a Reputation evaluation to obtain B-level x₃；

Obtaining the feature data set based on a factor analysis method, wherein the factor analysis method comprises:

acquiring a feature data set of traffic credit rating;

constructing a matrix formula:

wherein the load matrix

Is composed of

Load matrix, vector

Is composed of

The dimension(s) of the common factor,

is composed of

A vector of the dimensional error is calculated,

is composed of

A dimensional observation vector;

according to the formula

Determining a load matrix

Wherein

Represents the covariance of the variables;

the common factor Z is found out,

；

requiring the KMO value to be more than 0.5 and the Bartlett test sig value to be less than 0.05, indicating that the data set can be subjected to factor analysis, otherwise, revising the data set;

clustering the feature data set based on a global K-means algorithm, selecting the optimal cluster number according to the class square sum and the elbow principle, and establishing a global K-means traffic credit rating model;

initializing a clustering center as an average value of the full data;

order to

The number of the clustering clusters is increased in turn,

the initialization clustering center consists of the optimal clustering center and an initial point which is randomly selected, wherein k is the number of clusters;

observing the change of class squares and loss function curves under different k values, and determining the optimal cluster number according to an elbow rule;

after the global K-means traffic credit rating model is established, normalizing the clustering center to obtain the rating result of the corresponding cluster, inputting the feature data set into the global K-means traffic credit rating model, dividing the feature data into the corresponding clusters according to the shortest distance, and rating according to the rating result of the clusters;

the normalizing the clustering centers to obtain the rating results of the corresponding clusters, inputting the feature data sets into the global K-means traffic credit rating model, dividing the feature data into the corresponding clusters according to the shortest distance, and rating according to the rating results of the clusters, and the method comprises the following steps:

inputting the feature data set into a global K-means traffic credit rating model of K cluster centers to obtain the mass center of each cluster center;

the normalizing the clustering centers to obtain the rating results of the corresponding clusters, inputting the feature data sets into the global K-means traffic credit rating model, dividing the feature data into the corresponding clusters according to the shortest distance, and rating according to the rating results of the clusters, and the method further comprises the following steps:

2. A transportation unit credit rating system, comprising:

the rating module is used for inputting the characteristic data set into the rating model and rating the label-free data of the transportation units;

further comprising:

the model optimization module is used for optimizing the rating model by taking the Mahalanobis distance as an optimization target after the data processing module establishes the rating model, wherein the Mahalanobis distance formula is as follows:

；

wherein x_iIs as follows

is the inverse of the covariance;

acquiring a feature data set of traffic credit rating;

constructing a matrix formula:

wherein the load matrix

Is composed of

Load matrix, vector

Is composed of

The dimension(s) of the common factor,

is composed of

A vector of the dimensional error is calculated,

is composed of

A dimensional observation vector;

according to the formula

Determining a load matrix

Wherein

Represents the covariance of the variables;

the common factor Z is found out,

；

initializing a clustering center as an average value of the full data;

order to

The number of the clustering clusters is increased in turn,

3. An electronic device, comprising:

at least one processor; and

the memory stores a computer program executable by the at least one processor to implement the method of claim 1.

4. A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program, which when executed is capable of implementing the method of claim 1.