CN113837859B

CN113837859B - Image construction method for small and micro enterprises

Info

Publication number: CN113837859B
Application number: CN202110979314.0A
Authority: CN
Inventors: 尹盼盼; 边松华; 崔乐乐
Original assignee: Tianyuan Big Data Credit Management Co Ltd
Current assignee: Tianyuan Big Data Credit Management Co Ltd
Priority date: 2021-08-25
Filing date: 2021-08-25
Publication date: 2024-05-14
Anticipated expiration: 2041-08-25
Also published as: CN113837859A

Abstract

The invention relates to the field of financial credit, in particular to a small micro enterprise portrait construction method, which comprises the following steps: s1, establishing a standard database by data convergence and fusion; s2, establishing an enterprise portrait tag system; s3, establishing an enterprise comprehensive evaluation and dimension evaluation index system; s4, feature engineering forms a clustering model entering index; s5, establishing a fusion cluster analysis model. Compared with the prior art, the method and the system have the advantages that based on enterprise multi-source data fusion, the operations such as data merging, data alignment and data fusion are performed on multi-source data, an enterprise portrait tag system, an enterprise comprehensive evaluation and dimension evaluation index system are established on the basis of multi-source data fusion, enterprise portrait dimensions are richer, evaluation indexes are more comprehensive, and the defect that a single data source covers portrait evaluation dimensions more on one side is overcome.

Description

Image construction method for small and micro enterprises

Technical Field

The invention relates to the field of financial credit, and particularly provides a small micro enterprise portrait construction method.

Background

With the application of technologies such as big data, machine learning and artificial intelligence, the service mode, service form, management operation mode and the like of the traditional financial institutions are revolutionarily changed, and the financial technology is rapidly developed, wherein the big data and artificial intelligence technology is one of important application technologies of the financial technology. Aiming at the 'short, small, frequent and urgent' demand of financing of small micro-enterprise objects, based on multi-source data covered by the small micro-enterprise, the establishment of an intelligent wind control system which runs through the whole credit flow before, during and after the credit is one of the mainstream business modes.

The comprehensive interpretation of the enterprise is provided before the loan, so that the bank establishes preliminary knowledge of the enterprise, timely reflects related risks of the enterprise in the loan, establishes timely control of the enterprise management and development status risk points by the bank, and facilitates timely operation of drop interests and adjustment of loan products implemented by the enterprise by the bank to timely control the risks.

However, in the prior art, the intelligent wind control system cannot accurately read the behavior characteristics of malicious loan application user enterprises, and the indexes of evaluation characteristics are less and incomplete.

Disclosure of Invention

The invention aims at the defects of the prior art and provides a small micro enterprise portrait construction method with strong practicability.

The technical scheme adopted for solving the technical problems is as follows:

a small micro enterprise portrait construction method comprises the following steps:

s1, establishing a standard database by data convergence and fusion;

S2, establishing an enterprise portrait tag system;

s3, establishing an enterprise comprehensive evaluation and dimension evaluation index system;

s4, feature engineering forms a clustering model entering index;

s5, establishing a fusion cluster analysis model.

Further, in step S1, multi-source data covering multiple departments of the government and third parties are fused and aggregated by the big data ETL technology, and the data is stored in a standard database after noise removal, data alignment and data redundancy removal.

Further, in step S2, the enterprise portrait tag includes an enterprise owned data tag and an enterprise model tag, where the enterprise owned tag is derived from owned data in a standard database, and the enterprise model tag is mainly generated by a cluster analysis method, an enterprise comprehensive evaluation index is generated by enterprise multi-source data, a comprehensive cluster analysis model is called to generate a comprehensive model prediction tag, five dimension indexes of enterprise background, enterprise stability, enterprise operation capability, enterprise development capability and technological innovation capability are generated by enterprise multi-source data, and a dimension cluster analysis model is called to generate a dimension model prediction tag.

Further, in step S3, the enterprise standard data table in the enterprise standard database extracts enterprise indexes, and the enterprise comprehensive evaluation indexes include five primary dimensions in total, including enterprise context, enterprise stability, enterprise operation capability, enterprise development capability, and enterprise technological innovation capability.

Further, in step S4, the enterprise multisource data is subjected to exploratory data analysis and data cleaning based on the indexes formed in step S3, and finally the model entering features required by the fusion cluster model training are formed.

Furthermore, the exploratory data analysis is used for carrying out simple descriptive statistics on the generated index, carrying out simple statistical analysis on the data, then carrying out data segmentation on specific index data, carrying out deep analysis on the dynamic change condition of the data and the value condition under a specific condition, and carrying out visual analysis on the model entering index by drawing a histogram curve of a single variable and a relation curve of the single variable and a target variable.

The data cleaning comprises the steps of firstly processing invalid values in indexes, carrying out numerical quantization on part of quantifiable indexes, then carrying out missing value statistics on the modeling indexes, removing training indexes with missing values larger than 80%, carrying out statistics on the same value rate aiming at the rest indexes, removing the characteristic that the attribute has only one value, and removing the indexes with the same value rate of the attribute larger than 85%; performing VIF collinearity analysis on the evaluation indexes subjected to missing same-value filtering, and removing a plurality of residual modeling indexes after relevant features; the missing values in the multiple modulus indexes are filled with 0 value by default, and Z-Score standardization processing is carried out on the training samples filled with the missing values through data cleaning, so that standardized training vectors are formed.

Further, in step S5, a kmeans cluster analysis method is adopted to perform cluster modeling on enterprise comprehensive evaluation indexes, a Calinski-Harabasz measurement method is adopted to determine a K value, a stability analysis method and a cluster effect analysis method are adopted to evaluate the cluster effect, and a comprehensive cluster analysis model is established;

And carrying out enterprise cluster analysis of each dimension by adopting a kmeans clustering method based on the dimension indexes of the five dimensions of the enterprise to form a cluster analysis model of each dimension.

Further, when the K value is determined by Calinski-Harabasz measurement method, the larger the CHI score value is, the better the clustering effect is. And (3) carrying out kmeans cluster analysis on the values in the K value interval of 1-10, drawing a cluster analysis result graph, sequentially calculating CH metric index values under different K values, and selecting a K value result with the optimal clustering effect by combining the visual result graph of the cluster analysis and the different values of CH.

Further, after a kmeans clustering algorithm is selected as an optimal clustering effect, a clustering parameter random_state in a modeling process is not set, an optimal K value is determined according to a CH value, kmeans clustering is continuously carried out for 3-10 times, and whether the distribution condition of samples in each cluster is greatly fluctuated or not after each clustering is observed. The clustering effect of 3-10 cycles is observed, the distribution value sequence of each cluster is random, but the duty ratio of samples in each cluster is relatively fixed, which indicates that the cluster analysis is suitable for the current data set by selecting kmeans clustering algorithm based on the existing characteristics, training samples and determining K value.

Further, the enterprise background, the enterprise stability, the enterprise operation capability, the enterprise development capability and the enterprise technological innovation capability are summed up to all indexes of the five primary dimension enterprises, after the deletion same value is removed by feature pretreatment and the deletion same value exceeds a threshold value, useless indexes are removed, a plurality of modeling indexes are summed up, the modeling indexes are used as comprehensive evaluation indexes of the enterprises, a kmeans cluster analysis model is established based on the comprehensive evaluation indexes of the enterprises, an optimal K value is determined through CH measurement, and the clustering effect is evaluated through cluster effect stability analysis and cluster result cluster visualization analysis;

screening indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capability, enterprise development capability and enterprise technological innovation capability are subjected to feature pretreatment and feature quantification to generate dimension training vectors, a kmeans cluster analysis model is respectively established based on enterprise evaluation indexes of each dimension, an optimal K value is determined through CH measurement, clustering effects are evaluated through cluster effect stability analysis and cluster result cluster visualization analysis, a dimension cluster analysis model of five dimensions in total is generated, and the comprehensive evaluation model of an enterprise and the dimension cluster analysis model of the five dimensions of the enterprise are fused to form an enterprise portrait fusion cluster analysis model;

Acquiring enterprise dimension indexes of five dimensions, generating dimension training vectors after carrying out data preprocessing and feature quantization on the indexes, calling each dimension clustering analysis model, dividing enterprise dimension classification clusters, forming enterprise dimension clustering analysis model labels according to the enterprise classification clusters, establishing an enterprise portrait label automatic generation module based on own labels, comprehensive clustering model labels and dimension clustering model labels, inputting enterprise information to acquire enterprise owned data, comprehensive evaluation indexes and dimension evaluation indexes, calling a fusion clustering model to generate each model label, and automatically generating enterprise portraits.

Compared with the prior art, the image construction method for the small and micro enterprises has the following outstanding beneficial effects:

1. Compared with an enterprise portrait assessment method based on a single data source, the enterprise portrait assessment method based on enterprise multisource data fusion performs operations such as data merging, data alignment and data fusion on multisource data, establishes an enterprise portrait label system, an enterprise comprehensive assessment and a dimension assessment index system on the basis of multisource data fusion, is richer in enterprise portrait dimension and more comprehensive in assessment index, and overcomes the defect that the single data source covers a portrait assessment dimension.

2. Compared with the enterprise portrait modeling method based on the supervised classification method, the cluster analysis method can be used for deeply analyzing the distribution condition of enterprises under the conditions of lack of enterprise identification and inaccurate enterprise identification, so that the group division of small and micro enterprises is realized, the realization category and realization scene for establishing enterprise portraits based on high-dimensional characteristics and massive training samples are expanded, and the application range of the method is wider.

3. The simple kmeans cluster analysis method is improved, the fusion clustering method is applied to enterprise portrait label construction, and the power-assisted enterprise credit service expands the application scene of the financial science and technology in the credit field and enriches the content of the financial science and technology.

4. Along with the convergence of mass data of enterprises, the introduction of an artificial intelligent wind control modeling method, the continuous enrichment of enterprise portrait construction indexes, the lack of the increase of training sample identification scenes and the fusion of various algorithms, the method provided by the invention can be more suitable for wind control modeling of mass data of enterprises with large data, is particularly suitable for wind control model construction under the condition of no label, and has extremely wide application prospect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow diagram of a method for constructing images of small micro enterprises;

FIG. 2 is a schematic diagram of a small micro enterprise portrayal construction method for establishing an enterprise portrayal label system and a comprehensive evaluation and dimension evaluation index system;

FIG. 3 is a diagram of an example application scenario in a method for constructing images of small and micro enterprises.

Detailed Description

In order to provide a better understanding of the aspects of the present invention, the present invention will be described in further detail with reference to specific embodiments. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

A preferred embodiment is given below:

As shown in fig. 1-3, in the method for constructing a small micro enterprise portrait in this embodiment, a cluster analysis method in unsupervised learning is adopted to construct the enterprise portrait. The multi-source data covering multiple departments and third parties of the government are fused and converged through a big data ETL technology, and the data are stored in a standard database after being processed by noise removal, data alignment, data redundancy removal and the like; screening and sorting small micro enterprise data in a standard database, and establishing a label system of enterprise portraits, wherein the enterprise portraits labels mainly comprise enterprise self-data labels and enterprise model labels; the enterprise data in the standard database is subjected to operations such as data cleaning, feature preprocessing and the like, one part of the enterprise data is directly used as an own tag of an enterprise, and the other part of the enterprise data is used as an unsupervised training sample for carrying out the next clustering modeling after the feature preprocessing and the standardization processing; respectively establishing a comprehensive feature model and a grouping clustering model based on a kmeans clustering method, determining a K value, a clustering effect analysis and the like through a stability analysis and Calinski-Harabasz measurement method to form a final clustering model; predicting enterprise classification clusters according to the established fusion cluster model of the fusion comprehensive clusters and the grouping clusters to form enterprise cluster model labels; and establishing a small micro enterprise portrait tag based on the model tag and the own data tag, externally inputting enterprise information to read original data processing pretreatment of enterprises, and calling a fusion clustering model to predict an enterprise classification cluster to automatically generate the enterprise portrait tag.

The specific steps are as follows:

s1, establishing a standard database by data convergence and fusion

The enterprise multi-source data cover enterprise government data comprise information such as business, public accumulation, social security, issuing and modifying commission, banking and protecting supervision, administrative punishment and the like, the enterprise internet data comprise information such as electronic commerce data, marketing information, identification information, online store information, legal litigation, trust loss execution, bidding and the like, and the enterprise third party data comprise information such as enterprise business information, personnel relationship data and the like; firstly, establishing a unified data standard specification to perform standardization management on multi-source data in storage; secondly, processing multi-source data through ETL and other data processing tools, regularly pulling storable data such as Internet data, processing real-time interface data through a memory, and carrying out data processing, data standardization, index calculation, light feature mining and the like on the data by combining a batch processing mode; and finally, fusing and converging the three-party multi-source data into a unified data warehouse through transverse and longitudinal data fusion, wherein the data warehouse stores information such as standard library data, index libraries obtained through processing, feature libraries and the like after the multi-source data fusion.

S2, establishing an enterprise portrait label system

And (3) combing all data sources covered by the enterprise in the standard database, and establishing an enterprise portrait label system, wherein the enterprise portrait labels comprise enterprise owned data labels and enterprise model labels. The enterprise self-owned labels are derived from the self data in the standard database and mainly comprise basic information of the enterprise such as the establishment date, registered capital, enterprise type and the number of incumbent persons of the enterprise; the rewards and punishments of enterprises such as market-long quality rewards enterprises, name plate product title enterprises, contract-keeping and reckoning enterprises, special fine new and medium-sized enterprises, gazelle enterprises, scientific and innovative enterprises and the like; the tax identification information of the enterprise, such as class A tax payers of the enterprise, class A tax credit level of the latest tax of the enterprise, and the like; negative information of the enterprise, such as the latest tax credit rating of the enterprise is grade C or grade D, whether the enterprise has been revoked, abnormal enterprise operation, serious illegal enterprise listed by the enterprise, serious tax illegal enterprise of the enterprise, and the like. The enterprise model labels are mainly generated by a cluster analysis method; generating enterprise comprehensive evaluation indexes through enterprise multi-source data, calling a comprehensive cluster analysis model to generate a comprehensive model prediction label, generating dimension indexes of 5 dimensions in total through enterprise multi-source data, namely enterprise background, enterprise stability, enterprise business capability, enterprise development capability and technological innovation capability, and calling a dimension cluster analysis model to generate a dimension model prediction label.

S3, establishing an enterprise comprehensive evaluation and dimension evaluation index system

Based on an enterprise standard data table in an enterprise standard database, extracting enterprise indexes, wherein the enterprise comprehensive evaluation indexes comprise five primary dimensions of enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity, and the enterprise background mainly comprises 20 basic indexes in total such as establishment time, registered capital, number of practitioners and the like of an enterprise; the enterprise stability covers the enterprise industry and commerce changes, tax changes, legal changes and the like and counts 20 basic indexes altogether; the enterprise operation capability covers a total of 200 indexes of six secondary dimensions such as management capability, debt repayment capability, repayment willingness, operation capability, profit capability, enterprise qualification and the like of the enterprise; the development potential of the enterprise comprises 50 indexes in total of two dimensions of the development capability and the innovation capability of the enterprise, and the technological innovation capability of the enterprise comprises 10 secondary dimensions in total of the patent number, the soft book, the intellectual property and the like of the enterprise; a total of 300 indexes of five primary dimensions of the enterprise jointly form a comprehensive evaluation index of the enterprise.

S4, forming a clustering model in-model index by characteristic engineering

A total of 300 indexes formed based on enterprise multisource data extraction are required to be subjected to multiple processes such as exploratory data analysis and data cleaning to finally form model entering features required by fusion cluster model training.

The exploratory data analysis mainly comprises the steps of carrying out simple descriptive statistics on more than 300 indexes, analyzing variances, mean values, median values, data distribution and the like of the indexes, carrying out simple statistical analysis on the data, carrying out data segmentation on specific index data, and carrying out deep analysis on dynamic change conditions of the data and value taking conditions under a specific condition; and carrying out visual analysis on the model entering sample indexes by drawing a histogram curve of the univariate, a relation curve of the univariate and the target variable and the like.

Firstly, processing invalid values in the indexes, and carrying out numerical quantization on part of quantifiable indexes; carrying out missing value statistics on the modeling indexes, and removing training indexes with missing values greater than 80%; counting the same value rate of the residual indexes, removing the characteristic that the attribute has only one value, and removing the indexes with the same value rate of the attribute being more than 85 percent; performing VIF collinearity analysis on the evaluation indexes subjected to the missing same-value filtering, and removing the residual 20 modeling indexes after relevant features; the missing values in the 20 modulus indexes are filled with 0 value by default, and Z-Score standardization processing is carried out on the training samples filled with the missing values through data cleaning, so that standardized training vectors are formed.

S5, establishing a fusion cluster analysis model

Carrying out cluster modeling on the enterprise comprehensive evaluation index by adopting a kmeans cluster analysis method, determining a K value by adopting a Calinski-Harabasz measurement method, evaluating a cluster effect by adopting a stability analysis and cluster effect analysis method, and establishing a comprehensive cluster analysis model; and carrying out enterprise cluster analysis of each dimension by adopting a kmeans clustering method based on the dimension indexes of the five dimensions of the enterprise to form a cluster analysis model of each dimension.

When the K value is determined by Calinski-Harabasz measuring method, a plurality of K value determining methods are adopted in the kmeans clustering method, the K value is determined by adopting Calinski-Harabasz measuring method, and the clustering effect is better as the CHI score value is larger. And (3) carrying out kmeans cluster analysis on the values in the K value interval of 1-10, drawing a cluster analysis result graph, sequentially calculating CH metric index values under different K values, and selecting a K value result with the optimal clustering effect by combining the visual result graph of the cluster analysis and the different values of CH.

After a kmeans clustering algorithm is selected as an optimal clustering effect, a clustering parameter random_state in a modeling process is not set, an optimal K value is determined according to a CH value, kmeans clustering is continuously carried out for 5 times, and whether the distribution condition of samples in each cluster is greatly fluctuated or not after each clustering is observed. Through observing the clustering effect of 5 times of circulation, the distribution value sequence of each cluster is random, but the duty ratio of samples in each cluster is relatively fixed, which indicates that the cluster analysis is suitable for the current data set by selecting kmeans clustering algorithm based on the existing characteristics, training samples and determining K value.

Establishing a fusion clustering model:

all indexes of the enterprise with five primary dimensions, which are covered by the enterprise background, the enterprise stability, the enterprise operation capability, the enterprise development capability and the enterprise technological innovation capability, are subjected to feature pretreatment, feature screening, deletion and same value removal exceeding a threshold value, and useless indexes are removed, and then the total number of the indexes is 25, wherein the 25 indexes are used as comprehensive evaluation indexes of the enterprise, a kmeans cluster analysis model is established based on the comprehensive evaluation indexes of the enterprise, an optimal K value is determined through CH measurement, and the clustering effect is evaluated through cluster effect stability analysis and cluster result cluster visualization analysis.

Screening indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capability, enterprise development capability and enterprise science and technology innovation capability are subjected to feature pretreatment and feature quantification to generate dimension training vectors, a kmeans cluster analysis model is respectively established based on enterprise evaluation indexes of each dimension, an optimal K value is determined through CH measurement, clustering effects are evaluated through clustering effect stability analysis and clustering result cluster visualization analysis, and a dimension cluster analysis model of five dimensions in total is generated. And the comprehensive evaluation model of the enterprise is fused with the dimension cluster analysis model of the five dimensions of the enterprise to form an enterprise portrait fusion cluster analysis model.

Generating an enterprise portrait tag:

The enterprise portrait tag comprises an enterprise own data tag and a clustering model analysis tag, wherein enterprise original data fields stored in a standard database are subjected to data preprocessing and data quantization to form a standardized tag format which is used as the own tag of the enterprise portrait to be generated; acquiring corresponding indexes of an enterprise comprehensive clustering model, performing data preprocessing and feature quantization on the indexes, generating training vectors, calling the comprehensive clustering analysis model, dividing enterprise classification clusters, and forming enterprise comprehensive clustering analysis model labels according to the enterprise classification clusters, wherein the label information comprises label information such as good, general and high-quality enterprise credit conditions, enterprise comprehensive conditions, comprehensive condition occupation ratio and the like; the method comprises the steps of obtaining enterprise dimension indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capability, enterprise development capability and enterprise technological innovation capability, generating dimension training vectors after data preprocessing and feature quantification aiming at the indexes, calling each dimension clustering analysis model, dividing enterprise dimension classification clusters, forming enterprise dimension clustering analysis model labels according to the enterprise classification clusters, and mainly comprising a certain scale of enterprises, an enterprise development start period, relatively stable enterprises, strong enterprise technological innovation capability and the like. And establishing an enterprise portrait label automatic generation module based on the owned labels, the comprehensive clustering model labels and the dimension clustering model labels, inputting enterprise information to acquire enterprise owned data, comprehensive evaluation indexes and dimension evaluation indexes, calling a fusion clustering model to generate each model label, and automatically generating enterprise portraits.

The above specific embodiments are merely illustrative of specific cases of the present invention, and the scope of the present invention includes, but is not limited to, the above specific embodiments, and any suitable modification or replacement made by one of ordinary skill in the art, which is in accordance with the claims of the method for constructing a micro enterprise portrait of the present invention, shall fall within the scope of the present invention.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The method for constructing the image of the small micro-enterprise is characterized by comprising the following steps of:

s1, establishing a standard database by data convergence and fusion;

The multi-source data covering multiple departments and third parties of the government are fused and converged through a big data ETL technology, and the data are stored in a standard database after noise removal, data alignment and data redundancy removal;

S2, establishing an enterprise portrait tag system;

The enterprise portrait label comprises an enterprise owned data label and an enterprise model label, wherein the enterprise owned label is derived from owned data in a standard database, the enterprise model label is mainly generated through a cluster analysis method, enterprise comprehensive evaluation indexes are generated through enterprise multi-source data, a comprehensive cluster analysis model is called to generate a comprehensive model prediction label, five-dimension indexes of enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and technological innovation capacity are generated through the enterprise multi-source data, and a dimension cluster analysis model is called to generate a dimension model prediction label;

extracting enterprise indexes according to an enterprise standard data table in an enterprise standard database, wherein the enterprise comprehensive evaluation indexes comprise five primary dimensions of enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity;

s4, feature engineering forms a clustering model entering index;

The enterprise multisource data is subjected to exploratory data analysis and data cleaning based on the indexes formed in the step S3, and finally the model entering features required by fusion cluster model training are formed;

the exploratory data analysis is used for carrying out simple descriptive statistics on the generated index, carrying out simple statistical analysis on the data, then carrying out data segmentation on specific index data, carrying out deep analysis on the dynamic change condition of the data and the value condition under a specific condition, and carrying out visual analysis on the model entering sample index by drawing a histogram curve of a single variable and a relation curve of the single variable and a target variable;

The data cleaning comprises the steps of firstly processing invalid values in indexes, carrying out numerical quantization on part of quantifiable indexes, then carrying out missing value statistics on the modeling indexes, removing training indexes with missing values larger than 80%, carrying out statistics on the same value rate aiming at the rest indexes, removing the characteristic that the attribute has only one value, and removing the indexes with the same value rate of the attribute larger than 85%; performing VIF collinearity analysis on the evaluation indexes subjected to missing same-value filtering, and removing a plurality of residual modeling indexes after relevant features; filling missing values in the multiple modulus indexes with 0 value by default, and performing Z-Score standardization processing on training samples filled with the missing values through data cleaning to form standardized training vectors;

s5, establishing a fusion cluster analysis model;

Carrying out cluster modeling on enterprise comprehensive evaluation indexes by adopting a kmeans cluster analysis method, determining a K value by adopting a Calinski-Harabasz measurement method, evaluating a cluster effect by adopting a stability analysis and cluster effect analysis method, and establishing a comprehensive cluster analysis model;

Carrying out enterprise cluster analysis of each dimension by adopting a kmeans clustering method based on dimension indexes of the five dimensions of the enterprise to form a cluster analysis model of each dimension;

When the Calinski-Harabasz measurement method determines the K value, the larger the CHI score value is, the better the clustering effect is; k values are taken to be 1-10 interval values for kmeans cluster analysis, a cluster analysis result graph is drawn, CH measurement index values under different K values are sequentially calculated, and K value results with optimal clustering effect are selected by combining the visual result graph of the cluster analysis and different values of CH;

After a kmeans clustering algorithm is selected as an optimal clustering effect, a clustering parameter random_state in a modeling process is not set, an optimal K value is determined according to a CH value, kmeans clustering is continuously carried out for 3-10 times, and whether the distribution condition of samples in each cluster is greatly fluctuated or not after each clustering is observed; the clustering effect of 3-10 times of circulation is observed, the distribution value sequence of each cluster is random, but the duty ratio of samples in each cluster is relatively fixed, so that the method is suitable for the current data set by selecting kmeans clustering algorithm for clustering analysis based on the existing characteristics, training samples and determining K values;

All indexes of the enterprise with five primary dimensions are summed up through characteristic pretreatment, characteristic screening, deletion and same value removal exceeding a threshold value and useless index removal, and then a plurality of modular indexes are summed up, the modular indexes are used as comprehensive evaluation indexes of the enterprise, a kmeans cluster analysis model is established based on the comprehensive evaluation indexes of the enterprise, an optimal K value is determined through CH measurement, and clustering effects are evaluated through clustering effect stability analysis and clustering result cluster visualization analysis;

Screening indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capability, enterprise development capability and enterprise technological innovation capability are subjected to feature pretreatment and feature quantification to generate dimension training vectors, a kmeans cluster analysis model is respectively established based on enterprise evaluation indexes of each dimension, an optimal K value is determined through CH measurement, clustering effects are evaluated through clustering effect stability analysis and clustering result cluster visualization analysis, a dimension cluster analysis model of five dimensions in total is generated, and the comprehensive evaluation model of an enterprise and the dimension cluster analysis model of the five dimensions of the enterprise are fused to form an enterprise portrait fusion cluster analysis model;

acquiring enterprise dimension indexes of five dimensions, generating dimension training vectors after data preprocessing and feature quantization are performed on the indexes, calling each dimension clustering analysis model, dividing enterprise dimension classification clusters, forming enterprise dimension clustering analysis model labels according to the enterprise classification clusters, establishing an enterprise portrait label automatic generation module based on own labels, comprehensive clustering model labels and dimension clustering model labels, inputting enterprise information to acquire enterprise owned data, comprehensive evaluation indexes and dimension evaluation indexes, calling a fusion clustering model to generate each model label, and automatically generating enterprise portraits.