CN113837859B - Image construction method for small and micro enterprises - Google Patents
Image construction method for small and micro enterprises Download PDFInfo
- Publication number
- CN113837859B CN113837859B CN202110979314.0A CN202110979314A CN113837859B CN 113837859 B CN113837859 B CN 113837859B CN 202110979314 A CN202110979314 A CN 202110979314A CN 113837859 B CN113837859 B CN 113837859B
- Authority
- CN
- China
- Prior art keywords
- enterprise
- data
- indexes
- clustering
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010276 construction Methods 0.000 title abstract description 11
- 238000007621 cluster analysis Methods 0.000 claims abstract description 54
- 238000011156 evaluation Methods 0.000 claims abstract description 43
- 238000000034 method Methods 0.000 claims abstract description 37
- 230000004927 fusion Effects 0.000 claims abstract description 31
- 238000004458 analytical method Methods 0.000 claims description 39
- 230000000694 effects Effects 0.000 claims description 31
- 238000012549 training Methods 0.000 claims description 25
- 238000011161 development Methods 0.000 claims description 16
- 230000018109 developmental process Effects 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 14
- 238000005516 engineering process Methods 0.000 claims description 11
- 239000013598 vector Substances 0.000 claims description 10
- 238000004140 cleaning Methods 0.000 claims description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 238000005259 measurement Methods 0.000 claims description 7
- 238000007781 pre-processing Methods 0.000 claims description 7
- 238000013139 quantization Methods 0.000 claims description 7
- 238000011985 exploratory data analysis Methods 0.000 claims description 6
- 238000000691 measurement method Methods 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 6
- 230000000007 visual effect Effects 0.000 claims description 6
- 238000012800 visualization Methods 0.000 claims description 6
- 238000004138 cluster model Methods 0.000 claims description 5
- 238000012217 deletion Methods 0.000 claims description 4
- 230000037430 deletion Effects 0.000 claims description 4
- 238000011002 quantification Methods 0.000 claims description 4
- 238000013210 evaluation model Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000007619 statistical method Methods 0.000 claims description 3
- 230000007547 defect Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 241000283899 Gazella Species 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Educational Administration (AREA)
- Entrepreneurship & Innovation (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Finance (AREA)
- Artificial Intelligence (AREA)
- Accounting & Taxation (AREA)
- Technology Law (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the field of financial credit, in particular to a small micro enterprise portrait construction method, which comprises the following steps: s1, establishing a standard database by data convergence and fusion; s2, establishing an enterprise portrait tag system; s3, establishing an enterprise comprehensive evaluation and dimension evaluation index system; s4, feature engineering forms a clustering model entering index; s5, establishing a fusion cluster analysis model. Compared with the prior art, the method and the system have the advantages that based on enterprise multi-source data fusion, the operations such as data merging, data alignment and data fusion are performed on multi-source data, an enterprise portrait tag system, an enterprise comprehensive evaluation and dimension evaluation index system are established on the basis of multi-source data fusion, enterprise portrait dimensions are richer, evaluation indexes are more comprehensive, and the defect that a single data source covers portrait evaluation dimensions more on one side is overcome.
Description
Technical Field
The invention relates to the field of financial credit, and particularly provides a small micro enterprise portrait construction method.
Background
With the application of technologies such as big data, machine learning and artificial intelligence, the service mode, service form, management operation mode and the like of the traditional financial institutions are revolutionarily changed, and the financial technology is rapidly developed, wherein the big data and artificial intelligence technology is one of important application technologies of the financial technology. Aiming at the 'short, small, frequent and urgent' demand of financing of small micro-enterprise objects, based on multi-source data covered by the small micro-enterprise, the establishment of an intelligent wind control system which runs through the whole credit flow before, during and after the credit is one of the mainstream business modes.
The comprehensive interpretation of the enterprise is provided before the loan, so that the bank establishes preliminary knowledge of the enterprise, timely reflects related risks of the enterprise in the loan, establishes timely control of the enterprise management and development status risk points by the bank, and facilitates timely operation of drop interests and adjustment of loan products implemented by the enterprise by the bank to timely control the risks.
However, in the prior art, the intelligent wind control system cannot accurately read the behavior characteristics of malicious loan application user enterprises, and the indexes of evaluation characteristics are less and incomplete.
Disclosure of Invention
The invention aims at the defects of the prior art and provides a small micro enterprise portrait construction method with strong practicability.
The technical scheme adopted for solving the technical problems is as follows:
a small micro enterprise portrait construction method comprises the following steps:
s1, establishing a standard database by data convergence and fusion;
S2, establishing an enterprise portrait tag system;
s3, establishing an enterprise comprehensive evaluation and dimension evaluation index system;
s4, feature engineering forms a clustering model entering index;
s5, establishing a fusion cluster analysis model.
Further, in step S1, multi-source data covering multiple departments of the government and third parties are fused and aggregated by the big data ETL technology, and the data is stored in a standard database after noise removal, data alignment and data redundancy removal.
Further, in step S2, the enterprise portrait tag includes an enterprise owned data tag and an enterprise model tag, where the enterprise owned tag is derived from owned data in a standard database, and the enterprise model tag is mainly generated by a cluster analysis method, an enterprise comprehensive evaluation index is generated by enterprise multi-source data, a comprehensive cluster analysis model is called to generate a comprehensive model prediction tag, five dimension indexes of enterprise background, enterprise stability, enterprise operation capability, enterprise development capability and technological innovation capability are generated by enterprise multi-source data, and a dimension cluster analysis model is called to generate a dimension model prediction tag.
Further, in step S3, the enterprise standard data table in the enterprise standard database extracts enterprise indexes, and the enterprise comprehensive evaluation indexes include five primary dimensions in total, including enterprise context, enterprise stability, enterprise operation capability, enterprise development capability, and enterprise technological innovation capability.
Further, in step S4, the enterprise multisource data is subjected to exploratory data analysis and data cleaning based on the indexes formed in step S3, and finally the model entering features required by the fusion cluster model training are formed.
Furthermore, the exploratory data analysis is used for carrying out simple descriptive statistics on the generated index, carrying out simple statistical analysis on the data, then carrying out data segmentation on specific index data, carrying out deep analysis on the dynamic change condition of the data and the value condition under a specific condition, and carrying out visual analysis on the model entering index by drawing a histogram curve of a single variable and a relation curve of the single variable and a target variable.
The data cleaning comprises the steps of firstly processing invalid values in indexes, carrying out numerical quantization on part of quantifiable indexes, then carrying out missing value statistics on the modeling indexes, removing training indexes with missing values larger than 80%, carrying out statistics on the same value rate aiming at the rest indexes, removing the characteristic that the attribute has only one value, and removing the indexes with the same value rate of the attribute larger than 85%; performing VIF collinearity analysis on the evaluation indexes subjected to missing same-value filtering, and removing a plurality of residual modeling indexes after relevant features; the missing values in the multiple modulus indexes are filled with 0 value by default, and Z-Score standardization processing is carried out on the training samples filled with the missing values through data cleaning, so that standardized training vectors are formed.
Further, in step S5, a kmeans cluster analysis method is adopted to perform cluster modeling on enterprise comprehensive evaluation indexes, a Calinski-Harabasz measurement method is adopted to determine a K value, a stability analysis method and a cluster effect analysis method are adopted to evaluate the cluster effect, and a comprehensive cluster analysis model is established;
And carrying out enterprise cluster analysis of each dimension by adopting a kmeans clustering method based on the dimension indexes of the five dimensions of the enterprise to form a cluster analysis model of each dimension.
Further, when the K value is determined by Calinski-Harabasz measurement method, the larger the CHI score value is, the better the clustering effect is. And (3) carrying out kmeans cluster analysis on the values in the K value interval of 1-10, drawing a cluster analysis result graph, sequentially calculating CH metric index values under different K values, and selecting a K value result with the optimal clustering effect by combining the visual result graph of the cluster analysis and the different values of CH.
Further, after a kmeans clustering algorithm is selected as an optimal clustering effect, a clustering parameter random_state in a modeling process is not set, an optimal K value is determined according to a CH value, kmeans clustering is continuously carried out for 3-10 times, and whether the distribution condition of samples in each cluster is greatly fluctuated or not after each clustering is observed. The clustering effect of 3-10 cycles is observed, the distribution value sequence of each cluster is random, but the duty ratio of samples in each cluster is relatively fixed, which indicates that the cluster analysis is suitable for the current data set by selecting kmeans clustering algorithm based on the existing characteristics, training samples and determining K value.
Further, the enterprise background, the enterprise stability, the enterprise operation capability, the enterprise development capability and the enterprise technological innovation capability are summed up to all indexes of the five primary dimension enterprises, after the deletion same value is removed by feature pretreatment and the deletion same value exceeds a threshold value, useless indexes are removed, a plurality of modeling indexes are summed up, the modeling indexes are used as comprehensive evaluation indexes of the enterprises, a kmeans cluster analysis model is established based on the comprehensive evaluation indexes of the enterprises, an optimal K value is determined through CH measurement, and the clustering effect is evaluated through cluster effect stability analysis and cluster result cluster visualization analysis;
screening indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capability, enterprise development capability and enterprise technological innovation capability are subjected to feature pretreatment and feature quantification to generate dimension training vectors, a kmeans cluster analysis model is respectively established based on enterprise evaluation indexes of each dimension, an optimal K value is determined through CH measurement, clustering effects are evaluated through cluster effect stability analysis and cluster result cluster visualization analysis, a dimension cluster analysis model of five dimensions in total is generated, and the comprehensive evaluation model of an enterprise and the dimension cluster analysis model of the five dimensions of the enterprise are fused to form an enterprise portrait fusion cluster analysis model;
Acquiring enterprise dimension indexes of five dimensions, generating dimension training vectors after carrying out data preprocessing and feature quantization on the indexes, calling each dimension clustering analysis model, dividing enterprise dimension classification clusters, forming enterprise dimension clustering analysis model labels according to the enterprise classification clusters, establishing an enterprise portrait label automatic generation module based on own labels, comprehensive clustering model labels and dimension clustering model labels, inputting enterprise information to acquire enterprise owned data, comprehensive evaluation indexes and dimension evaluation indexes, calling a fusion clustering model to generate each model label, and automatically generating enterprise portraits.
Compared with the prior art, the image construction method for the small and micro enterprises has the following outstanding beneficial effects:
1. Compared with an enterprise portrait assessment method based on a single data source, the enterprise portrait assessment method based on enterprise multisource data fusion performs operations such as data merging, data alignment and data fusion on multisource data, establishes an enterprise portrait label system, an enterprise comprehensive assessment and a dimension assessment index system on the basis of multisource data fusion, is richer in enterprise portrait dimension and more comprehensive in assessment index, and overcomes the defect that the single data source covers a portrait assessment dimension.
2. Compared with the enterprise portrait modeling method based on the supervised classification method, the cluster analysis method can be used for deeply analyzing the distribution condition of enterprises under the conditions of lack of enterprise identification and inaccurate enterprise identification, so that the group division of small and micro enterprises is realized, the realization category and realization scene for establishing enterprise portraits based on high-dimensional characteristics and massive training samples are expanded, and the application range of the method is wider.
3. The simple kmeans cluster analysis method is improved, the fusion clustering method is applied to enterprise portrait label construction, and the power-assisted enterprise credit service expands the application scene of the financial science and technology in the credit field and enriches the content of the financial science and technology.
4. Along with the convergence of mass data of enterprises, the introduction of an artificial intelligent wind control modeling method, the continuous enrichment of enterprise portrait construction indexes, the lack of the increase of training sample identification scenes and the fusion of various algorithms, the method provided by the invention can be more suitable for wind control modeling of mass data of enterprises with large data, is particularly suitable for wind control model construction under the condition of no label, and has extremely wide application prospect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow diagram of a method for constructing images of small micro enterprises;
FIG. 2 is a schematic diagram of a small micro enterprise portrayal construction method for establishing an enterprise portrayal label system and a comprehensive evaluation and dimension evaluation index system;
FIG. 3 is a diagram of an example application scenario in a method for constructing images of small and micro enterprises.
Detailed Description
In order to provide a better understanding of the aspects of the present invention, the present invention will be described in further detail with reference to specific embodiments. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
A preferred embodiment is given below:
As shown in fig. 1-3, in the method for constructing a small micro enterprise portrait in this embodiment, a cluster analysis method in unsupervised learning is adopted to construct the enterprise portrait. The multi-source data covering multiple departments and third parties of the government are fused and converged through a big data ETL technology, and the data are stored in a standard database after being processed by noise removal, data alignment, data redundancy removal and the like; screening and sorting small micro enterprise data in a standard database, and establishing a label system of enterprise portraits, wherein the enterprise portraits labels mainly comprise enterprise self-data labels and enterprise model labels; the enterprise data in the standard database is subjected to operations such as data cleaning, feature preprocessing and the like, one part of the enterprise data is directly used as an own tag of an enterprise, and the other part of the enterprise data is used as an unsupervised training sample for carrying out the next clustering modeling after the feature preprocessing and the standardization processing; respectively establishing a comprehensive feature model and a grouping clustering model based on a kmeans clustering method, determining a K value, a clustering effect analysis and the like through a stability analysis and Calinski-Harabasz measurement method to form a final clustering model; predicting enterprise classification clusters according to the established fusion cluster model of the fusion comprehensive clusters and the grouping clusters to form enterprise cluster model labels; and establishing a small micro enterprise portrait tag based on the model tag and the own data tag, externally inputting enterprise information to read original data processing pretreatment of enterprises, and calling a fusion clustering model to predict an enterprise classification cluster to automatically generate the enterprise portrait tag.
The specific steps are as follows:
s1, establishing a standard database by data convergence and fusion
The enterprise multi-source data cover enterprise government data comprise information such as business, public accumulation, social security, issuing and modifying commission, banking and protecting supervision, administrative punishment and the like, the enterprise internet data comprise information such as electronic commerce data, marketing information, identification information, online store information, legal litigation, trust loss execution, bidding and the like, and the enterprise third party data comprise information such as enterprise business information, personnel relationship data and the like; firstly, establishing a unified data standard specification to perform standardization management on multi-source data in storage; secondly, processing multi-source data through ETL and other data processing tools, regularly pulling storable data such as Internet data, processing real-time interface data through a memory, and carrying out data processing, data standardization, index calculation, light feature mining and the like on the data by combining a batch processing mode; and finally, fusing and converging the three-party multi-source data into a unified data warehouse through transverse and longitudinal data fusion, wherein the data warehouse stores information such as standard library data, index libraries obtained through processing, feature libraries and the like after the multi-source data fusion.
S2, establishing an enterprise portrait label system
And (3) combing all data sources covered by the enterprise in the standard database, and establishing an enterprise portrait label system, wherein the enterprise portrait labels comprise enterprise owned data labels and enterprise model labels. The enterprise self-owned labels are derived from the self data in the standard database and mainly comprise basic information of the enterprise such as the establishment date, registered capital, enterprise type and the number of incumbent persons of the enterprise; the rewards and punishments of enterprises such as market-long quality rewards enterprises, name plate product title enterprises, contract-keeping and reckoning enterprises, special fine new and medium-sized enterprises, gazelle enterprises, scientific and innovative enterprises and the like; the tax identification information of the enterprise, such as class A tax payers of the enterprise, class A tax credit level of the latest tax of the enterprise, and the like; negative information of the enterprise, such as the latest tax credit rating of the enterprise is grade C or grade D, whether the enterprise has been revoked, abnormal enterprise operation, serious illegal enterprise listed by the enterprise, serious tax illegal enterprise of the enterprise, and the like. The enterprise model labels are mainly generated by a cluster analysis method; generating enterprise comprehensive evaluation indexes through enterprise multi-source data, calling a comprehensive cluster analysis model to generate a comprehensive model prediction label, generating dimension indexes of 5 dimensions in total through enterprise multi-source data, namely enterprise background, enterprise stability, enterprise business capability, enterprise development capability and technological innovation capability, and calling a dimension cluster analysis model to generate a dimension model prediction label.
S3, establishing an enterprise comprehensive evaluation and dimension evaluation index system
Based on an enterprise standard data table in an enterprise standard database, extracting enterprise indexes, wherein the enterprise comprehensive evaluation indexes comprise five primary dimensions of enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity, and the enterprise background mainly comprises 20 basic indexes in total such as establishment time, registered capital, number of practitioners and the like of an enterprise; the enterprise stability covers the enterprise industry and commerce changes, tax changes, legal changes and the like and counts 20 basic indexes altogether; the enterprise operation capability covers a total of 200 indexes of six secondary dimensions such as management capability, debt repayment capability, repayment willingness, operation capability, profit capability, enterprise qualification and the like of the enterprise; the development potential of the enterprise comprises 50 indexes in total of two dimensions of the development capability and the innovation capability of the enterprise, and the technological innovation capability of the enterprise comprises 10 secondary dimensions in total of the patent number, the soft book, the intellectual property and the like of the enterprise; a total of 300 indexes of five primary dimensions of the enterprise jointly form a comprehensive evaluation index of the enterprise.
S4, forming a clustering model in-model index by characteristic engineering
A total of 300 indexes formed based on enterprise multisource data extraction are required to be subjected to multiple processes such as exploratory data analysis and data cleaning to finally form model entering features required by fusion cluster model training.
The exploratory data analysis mainly comprises the steps of carrying out simple descriptive statistics on more than 300 indexes, analyzing variances, mean values, median values, data distribution and the like of the indexes, carrying out simple statistical analysis on the data, carrying out data segmentation on specific index data, and carrying out deep analysis on dynamic change conditions of the data and value taking conditions under a specific condition; and carrying out visual analysis on the model entering sample indexes by drawing a histogram curve of the univariate, a relation curve of the univariate and the target variable and the like.
Firstly, processing invalid values in the indexes, and carrying out numerical quantization on part of quantifiable indexes; carrying out missing value statistics on the modeling indexes, and removing training indexes with missing values greater than 80%; counting the same value rate of the residual indexes, removing the characteristic that the attribute has only one value, and removing the indexes with the same value rate of the attribute being more than 85 percent; performing VIF collinearity analysis on the evaluation indexes subjected to the missing same-value filtering, and removing the residual 20 modeling indexes after relevant features; the missing values in the 20 modulus indexes are filled with 0 value by default, and Z-Score standardization processing is carried out on the training samples filled with the missing values through data cleaning, so that standardized training vectors are formed.
S5, establishing a fusion cluster analysis model
Carrying out cluster modeling on the enterprise comprehensive evaluation index by adopting a kmeans cluster analysis method, determining a K value by adopting a Calinski-Harabasz measurement method, evaluating a cluster effect by adopting a stability analysis and cluster effect analysis method, and establishing a comprehensive cluster analysis model; and carrying out enterprise cluster analysis of each dimension by adopting a kmeans clustering method based on the dimension indexes of the five dimensions of the enterprise to form a cluster analysis model of each dimension.
When the K value is determined by Calinski-Harabasz measuring method, a plurality of K value determining methods are adopted in the kmeans clustering method, the K value is determined by adopting Calinski-Harabasz measuring method, and the clustering effect is better as the CHI score value is larger. And (3) carrying out kmeans cluster analysis on the values in the K value interval of 1-10, drawing a cluster analysis result graph, sequentially calculating CH metric index values under different K values, and selecting a K value result with the optimal clustering effect by combining the visual result graph of the cluster analysis and the different values of CH.
After a kmeans clustering algorithm is selected as an optimal clustering effect, a clustering parameter random_state in a modeling process is not set, an optimal K value is determined according to a CH value, kmeans clustering is continuously carried out for 5 times, and whether the distribution condition of samples in each cluster is greatly fluctuated or not after each clustering is observed. Through observing the clustering effect of 5 times of circulation, the distribution value sequence of each cluster is random, but the duty ratio of samples in each cluster is relatively fixed, which indicates that the cluster analysis is suitable for the current data set by selecting kmeans clustering algorithm based on the existing characteristics, training samples and determining K value.
Establishing a fusion clustering model:
all indexes of the enterprise with five primary dimensions, which are covered by the enterprise background, the enterprise stability, the enterprise operation capability, the enterprise development capability and the enterprise technological innovation capability, are subjected to feature pretreatment, feature screening, deletion and same value removal exceeding a threshold value, and useless indexes are removed, and then the total number of the indexes is 25, wherein the 25 indexes are used as comprehensive evaluation indexes of the enterprise, a kmeans cluster analysis model is established based on the comprehensive evaluation indexes of the enterprise, an optimal K value is determined through CH measurement, and the clustering effect is evaluated through cluster effect stability analysis and cluster result cluster visualization analysis.
Screening indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capability, enterprise development capability and enterprise science and technology innovation capability are subjected to feature pretreatment and feature quantification to generate dimension training vectors, a kmeans cluster analysis model is respectively established based on enterprise evaluation indexes of each dimension, an optimal K value is determined through CH measurement, clustering effects are evaluated through clustering effect stability analysis and clustering result cluster visualization analysis, and a dimension cluster analysis model of five dimensions in total is generated. And the comprehensive evaluation model of the enterprise is fused with the dimension cluster analysis model of the five dimensions of the enterprise to form an enterprise portrait fusion cluster analysis model.
Generating an enterprise portrait tag:
The enterprise portrait tag comprises an enterprise own data tag and a clustering model analysis tag, wherein enterprise original data fields stored in a standard database are subjected to data preprocessing and data quantization to form a standardized tag format which is used as the own tag of the enterprise portrait to be generated; acquiring corresponding indexes of an enterprise comprehensive clustering model, performing data preprocessing and feature quantization on the indexes, generating training vectors, calling the comprehensive clustering analysis model, dividing enterprise classification clusters, and forming enterprise comprehensive clustering analysis model labels according to the enterprise classification clusters, wherein the label information comprises label information such as good, general and high-quality enterprise credit conditions, enterprise comprehensive conditions, comprehensive condition occupation ratio and the like; the method comprises the steps of obtaining enterprise dimension indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capability, enterprise development capability and enterprise technological innovation capability, generating dimension training vectors after data preprocessing and feature quantification aiming at the indexes, calling each dimension clustering analysis model, dividing enterprise dimension classification clusters, forming enterprise dimension clustering analysis model labels according to the enterprise classification clusters, and mainly comprising a certain scale of enterprises, an enterprise development start period, relatively stable enterprises, strong enterprise technological innovation capability and the like. And establishing an enterprise portrait label automatic generation module based on the owned labels, the comprehensive clustering model labels and the dimension clustering model labels, inputting enterprise information to acquire enterprise owned data, comprehensive evaluation indexes and dimension evaluation indexes, calling a fusion clustering model to generate each model label, and automatically generating enterprise portraits.
The above specific embodiments are merely illustrative of specific cases of the present invention, and the scope of the present invention includes, but is not limited to, the above specific embodiments, and any suitable modification or replacement made by one of ordinary skill in the art, which is in accordance with the claims of the method for constructing a micro enterprise portrait of the present invention, shall fall within the scope of the present invention.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (1)
1. The method for constructing the image of the small micro-enterprise is characterized by comprising the following steps of:
s1, establishing a standard database by data convergence and fusion;
The multi-source data covering multiple departments and third parties of the government are fused and converged through a big data ETL technology, and the data are stored in a standard database after noise removal, data alignment and data redundancy removal;
S2, establishing an enterprise portrait tag system;
The enterprise portrait label comprises an enterprise owned data label and an enterprise model label, wherein the enterprise owned label is derived from owned data in a standard database, the enterprise model label is mainly generated through a cluster analysis method, enterprise comprehensive evaluation indexes are generated through enterprise multi-source data, a comprehensive cluster analysis model is called to generate a comprehensive model prediction label, five-dimension indexes of enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and technological innovation capacity are generated through the enterprise multi-source data, and a dimension cluster analysis model is called to generate a dimension model prediction label;
s3, establishing an enterprise comprehensive evaluation and dimension evaluation index system;
extracting enterprise indexes according to an enterprise standard data table in an enterprise standard database, wherein the enterprise comprehensive evaluation indexes comprise five primary dimensions of enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity;
s4, feature engineering forms a clustering model entering index;
The enterprise multisource data is subjected to exploratory data analysis and data cleaning based on the indexes formed in the step S3, and finally the model entering features required by fusion cluster model training are formed;
the exploratory data analysis is used for carrying out simple descriptive statistics on the generated index, carrying out simple statistical analysis on the data, then carrying out data segmentation on specific index data, carrying out deep analysis on the dynamic change condition of the data and the value condition under a specific condition, and carrying out visual analysis on the model entering sample index by drawing a histogram curve of a single variable and a relation curve of the single variable and a target variable;
The data cleaning comprises the steps of firstly processing invalid values in indexes, carrying out numerical quantization on part of quantifiable indexes, then carrying out missing value statistics on the modeling indexes, removing training indexes with missing values larger than 80%, carrying out statistics on the same value rate aiming at the rest indexes, removing the characteristic that the attribute has only one value, and removing the indexes with the same value rate of the attribute larger than 85%; performing VIF collinearity analysis on the evaluation indexes subjected to missing same-value filtering, and removing a plurality of residual modeling indexes after relevant features; filling missing values in the multiple modulus indexes with 0 value by default, and performing Z-Score standardization processing on training samples filled with the missing values through data cleaning to form standardized training vectors;
s5, establishing a fusion cluster analysis model;
Carrying out cluster modeling on enterprise comprehensive evaluation indexes by adopting a kmeans cluster analysis method, determining a K value by adopting a Calinski-Harabasz measurement method, evaluating a cluster effect by adopting a stability analysis and cluster effect analysis method, and establishing a comprehensive cluster analysis model;
Carrying out enterprise cluster analysis of each dimension by adopting a kmeans clustering method based on dimension indexes of the five dimensions of the enterprise to form a cluster analysis model of each dimension;
When the Calinski-Harabasz measurement method determines the K value, the larger the CHI score value is, the better the clustering effect is; k values are taken to be 1-10 interval values for kmeans cluster analysis, a cluster analysis result graph is drawn, CH measurement index values under different K values are sequentially calculated, and K value results with optimal clustering effect are selected by combining the visual result graph of the cluster analysis and different values of CH;
After a kmeans clustering algorithm is selected as an optimal clustering effect, a clustering parameter random_state in a modeling process is not set, an optimal K value is determined according to a CH value, kmeans clustering is continuously carried out for 3-10 times, and whether the distribution condition of samples in each cluster is greatly fluctuated or not after each clustering is observed; the clustering effect of 3-10 times of circulation is observed, the distribution value sequence of each cluster is random, but the duty ratio of samples in each cluster is relatively fixed, so that the method is suitable for the current data set by selecting kmeans clustering algorithm for clustering analysis based on the existing characteristics, training samples and determining K values;
All indexes of the enterprise with five primary dimensions are summed up through characteristic pretreatment, characteristic screening, deletion and same value removal exceeding a threshold value and useless index removal, and then a plurality of modular indexes are summed up, the modular indexes are used as comprehensive evaluation indexes of the enterprise, a kmeans cluster analysis model is established based on the comprehensive evaluation indexes of the enterprise, an optimal K value is determined through CH measurement, and clustering effects are evaluated through clustering effect stability analysis and clustering result cluster visualization analysis;
Screening indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capability, enterprise development capability and enterprise technological innovation capability are subjected to feature pretreatment and feature quantification to generate dimension training vectors, a kmeans cluster analysis model is respectively established based on enterprise evaluation indexes of each dimension, an optimal K value is determined through CH measurement, clustering effects are evaluated through clustering effect stability analysis and clustering result cluster visualization analysis, a dimension cluster analysis model of five dimensions in total is generated, and the comprehensive evaluation model of an enterprise and the dimension cluster analysis model of the five dimensions of the enterprise are fused to form an enterprise portrait fusion cluster analysis model;
acquiring enterprise dimension indexes of five dimensions, generating dimension training vectors after data preprocessing and feature quantization are performed on the indexes, calling each dimension clustering analysis model, dividing enterprise dimension classification clusters, forming enterprise dimension clustering analysis model labels according to the enterprise classification clusters, establishing an enterprise portrait label automatic generation module based on own labels, comprehensive clustering model labels and dimension clustering model labels, inputting enterprise information to acquire enterprise owned data, comprehensive evaluation indexes and dimension evaluation indexes, calling a fusion clustering model to generate each model label, and automatically generating enterprise portraits.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110979314.0A CN113837859B (en) | 2021-08-25 | 2021-08-25 | Image construction method for small and micro enterprises |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110979314.0A CN113837859B (en) | 2021-08-25 | 2021-08-25 | Image construction method for small and micro enterprises |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113837859A CN113837859A (en) | 2021-12-24 |
CN113837859B true CN113837859B (en) | 2024-05-14 |
Family
ID=78961216
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110979314.0A Active CN113837859B (en) | 2021-08-25 | 2021-08-25 | Image construction method for small and micro enterprises |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113837859B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113988726A (en) * | 2021-12-28 | 2022-01-28 | 江苏荣泽信息科技股份有限公司 | Enterprise industry credit evaluation management system based on block chain |
CN114462516B (en) * | 2022-01-21 | 2024-04-16 | 天元大数据信用管理有限公司 | Enterprise credit scoring sample labeling method and device |
CN116304974B (en) * | 2023-02-17 | 2023-09-29 | 国网浙江省电力有限公司营销服务中心 | Multi-channel data fusion method and system |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103294828A (en) * | 2013-06-25 | 2013-09-11 | 厦门市美亚柏科信息股份有限公司 | Verification method and verification device of data mining model dimension |
CN107563929A (en) * | 2017-07-27 | 2018-01-09 | 杭州中奥科技有限公司 | A kind of various dimensions siren based on personage's specificity analysis |
CN107993143A (en) * | 2017-11-23 | 2018-05-04 | 深圳大管加软件与技术服务有限公司 | A kind of Credit Risk Assessment method and system |
CN110322089A (en) * | 2018-03-30 | 2019-10-11 | 宗略投资(上海)有限公司 | Enterprise Credit Risk Evaluation method and its system |
CN110990474A (en) * | 2019-11-28 | 2020-04-10 | 泰华智慧产业集团股份有限公司 | Regional industry image analysis method and device |
CN111047122A (en) * | 2018-10-11 | 2020-04-21 | 北京国双科技有限公司 | Enterprise data maturity evaluation method and device and computer equipment |
CN111680073A (en) * | 2020-06-11 | 2020-09-18 | 天元大数据信用管理有限公司 | Financial service platform policy information recommendation method based on user data |
CN111737477A (en) * | 2020-08-07 | 2020-10-02 | 杭州六棱镜知识产权科技有限公司 | Intellectual property big data-based intelligence investigation method, system and storage medium |
CN111754116A (en) * | 2020-06-24 | 2020-10-09 | 国家电网有限公司大数据中心 | Credit assessment method and device based on label portrait technology |
CN111861262A (en) * | 2020-07-30 | 2020-10-30 | 国网山东省电力公司寿光市供电公司 | Enterprise perspective portrait method and terminal based on energy big data |
CN112396430A (en) * | 2020-11-09 | 2021-02-23 | 中国南方电网有限责任公司 | Processing method and system for enterprise evaluation |
CN112395500A (en) * | 2020-11-17 | 2021-02-23 | 平安科技(深圳)有限公司 | Content data recommendation method and device, computer equipment and storage medium |
CN112435152A (en) * | 2020-12-04 | 2021-03-02 | 北京师范大学 | Online learning investment dynamic evaluation method and system |
CN112668945A (en) * | 2021-01-27 | 2021-04-16 | 天元大数据信用管理有限公司 | Enterprise credit risk assessment method and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100205108A1 (en) * | 2009-02-11 | 2010-08-12 | Mun Johnathan C | Credit and market risk evaluation method |
-
2021
- 2021-08-25 CN CN202110979314.0A patent/CN113837859B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103294828A (en) * | 2013-06-25 | 2013-09-11 | 厦门市美亚柏科信息股份有限公司 | Verification method and verification device of data mining model dimension |
CN107563929A (en) * | 2017-07-27 | 2018-01-09 | 杭州中奥科技有限公司 | A kind of various dimensions siren based on personage's specificity analysis |
CN107993143A (en) * | 2017-11-23 | 2018-05-04 | 深圳大管加软件与技术服务有限公司 | A kind of Credit Risk Assessment method and system |
CN110322089A (en) * | 2018-03-30 | 2019-10-11 | 宗略投资(上海)有限公司 | Enterprise Credit Risk Evaluation method and its system |
CN111047122A (en) * | 2018-10-11 | 2020-04-21 | 北京国双科技有限公司 | Enterprise data maturity evaluation method and device and computer equipment |
CN110990474A (en) * | 2019-11-28 | 2020-04-10 | 泰华智慧产业集团股份有限公司 | Regional industry image analysis method and device |
CN111680073A (en) * | 2020-06-11 | 2020-09-18 | 天元大数据信用管理有限公司 | Financial service platform policy information recommendation method based on user data |
CN111754116A (en) * | 2020-06-24 | 2020-10-09 | 国家电网有限公司大数据中心 | Credit assessment method and device based on label portrait technology |
CN111861262A (en) * | 2020-07-30 | 2020-10-30 | 国网山东省电力公司寿光市供电公司 | Enterprise perspective portrait method and terminal based on energy big data |
CN111737477A (en) * | 2020-08-07 | 2020-10-02 | 杭州六棱镜知识产权科技有限公司 | Intellectual property big data-based intelligence investigation method, system and storage medium |
CN112396430A (en) * | 2020-11-09 | 2021-02-23 | 中国南方电网有限责任公司 | Processing method and system for enterprise evaluation |
CN112395500A (en) * | 2020-11-17 | 2021-02-23 | 平安科技(深圳)有限公司 | Content data recommendation method and device, computer equipment and storage medium |
CN112435152A (en) * | 2020-12-04 | 2021-03-02 | 北京师范大学 | Online learning investment dynamic evaluation method and system |
CN112668945A (en) * | 2021-01-27 | 2021-04-16 | 天元大数据信用管理有限公司 | Enterprise credit risk assessment method and device |
Non-Patent Citations (3)
Title |
---|
Enterprise modelling techniques to help manufacturing firms develop product service activities;T. Alix等;IFAC Proceedings Volumes;第42卷(第4期);1637-1642 * |
企业信息化的灰聚类评价模型及应用;林伟等;科技进步与对策;第20卷(第06期);129-130 * |
融合多源数据的企业竞争对手画像构建;黄晓斌等;现代情报;第40卷(第11期);13-21,33 * |
Also Published As
Publication number | Publication date |
---|---|
CN113837859A (en) | 2021-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113837859B (en) | Image construction method for small and micro enterprises | |
CN112017025B (en) | Enterprise credit assessment method based on fusion of deep learning and logistic regression | |
CN111882446B (en) | Abnormal account detection method based on graph convolution network | |
CN114444986B (en) | Product analysis method, system, device and medium | |
Utari et al. | Implementation of data mining for drop-out prediction using random forest method | |
Cheng et al. | Contagious chain risk rating for networked-guarantee loans | |
CN112700324A (en) | User loan default prediction method based on combination of Catboost and restricted Boltzmann machine | |
CN111583012B (en) | Method for evaluating default risk of credit, debt and debt main body by fusing text information | |
CN111461216A (en) | Case risk identification method based on machine learning | |
CN113537807B (en) | Intelligent wind control method and equipment for enterprises | |
CN113886372A (en) | User portrait construction method based on improved analytic hierarchy process | |
CN116468536A (en) | Automatic risk control rule generation method | |
CN112419029A (en) | Similar financial institution risk monitoring method, risk simulation system and storage medium | |
CN115794803A (en) | Engineering audit problem monitoring method and system based on big data AI technology | |
CN109543038B (en) | Emotion analysis method applied to text data | |
CN116883153A (en) | Pedestrian credit investigation-based automobile finance pre-credit rating card development method and terminal | |
CN115618926A (en) | Important factor extraction method and device for taxpayer enterprise classification | |
CN115330526A (en) | Enterprise credit scoring method and device | |
CN113869423A (en) | Marketing response model construction method, equipment and medium | |
CN115905655A (en) | User portrait construction method, device and equipment and readable storage medium | |
Zaffar et al. | A review on feature selection methods for improving the performance of classification in educational data mining | |
CN114817557A (en) | Enterprise risk detection method and device based on enterprise credit investigation big data knowledge graph | |
Yao | Application of data mining technology in financial fraud identification | |
CN113822751A (en) | Online loan risk prediction method | |
Nassreddine et al. | Detecting Data Outliers with Machine Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |