CN113837859A - Small and micro enterprise portrait construction method - Google Patents
Small and micro enterprise portrait construction method Download PDFInfo
- Publication number
- CN113837859A CN113837859A CN202110979314.0A CN202110979314A CN113837859A CN 113837859 A CN113837859 A CN 113837859A CN 202110979314 A CN202110979314 A CN 202110979314A CN 113837859 A CN113837859 A CN 113837859A
- Authority
- CN
- China
- Prior art keywords
- enterprise
- clustering
- data
- indexes
- analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010276 construction Methods 0.000 title description 9
- 238000004458 analytical method Methods 0.000 claims abstract description 67
- 238000000034 method Methods 0.000 claims abstract description 54
- 238000011156 evaluation Methods 0.000 claims abstract description 48
- 230000004927 fusion Effects 0.000 claims abstract description 30
- 230000002776 aggregation Effects 0.000 claims abstract description 4
- 238000004220 aggregation Methods 0.000 claims abstract description 4
- 230000000694 effects Effects 0.000 claims description 32
- 238000007621 cluster analysis Methods 0.000 claims description 25
- 238000012549 training Methods 0.000 claims description 25
- 238000011161 development Methods 0.000 claims description 17
- 230000018109 developmental process Effects 0.000 claims description 17
- 238000007781 pre-processing Methods 0.000 claims description 14
- 230000000007 visual effect Effects 0.000 claims description 14
- 238000005516 engineering process Methods 0.000 claims description 11
- 238000004140 cleaning Methods 0.000 claims description 10
- 239000013598 vector Substances 0.000 claims description 10
- 238000005259 measurement Methods 0.000 claims description 9
- 238000009826 distribution Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- 238000013139 quantization Methods 0.000 claims description 8
- 230000008859 change Effects 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 7
- 238000012216 screening Methods 0.000 claims description 7
- 238000011985 exploratory data analysis Methods 0.000 claims description 6
- 238000000691 measurement method Methods 0.000 claims description 6
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 238000013210 evaluation model Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000007619 statistical method Methods 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 2
- 230000007547 defect Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 241000283899 Gazella Species 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000005111 flow chemistry technique Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Educational Administration (AREA)
- Entrepreneurship & Innovation (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Finance (AREA)
- Artificial Intelligence (AREA)
- Accounting & Taxation (AREA)
- Technology Law (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the field of financial credit, and particularly provides a method for constructing a small and micro enterprise portrait, which comprises the following steps: s1, establishing a standard database through data aggregation and fusion; s2, establishing an enterprise portrait label system; s3, establishing an enterprise comprehensive evaluation and dimension evaluation index system; s4, forming a clustering model mold entering index by the characteristic engineering; and S5, establishing a fusion clustering analysis model. Compared with the prior art, the method is based on enterprise multi-source data fusion, data merging, data alignment, data fusion and other operations are carried out on multi-source data, an enterprise portrait label system, an enterprise comprehensive evaluation and dimension evaluation index system are established on the basis of the multi-source data fusion, enterprise portrait dimensions are richer, evaluation indexes are more comprehensive, and the defect that a single data source covers a portrait with more comprehensive evaluation dimensions is overcome.
Description
Technical Field
The invention relates to the field of financial credit, and particularly provides a method for constructing a small and micro enterprise portrait.
Background
With the application of technologies such as big data, machine learning and artificial intelligence, the service mode, service form, management operation mode and the like of traditional financial institutions have revolutionary changes, and the financial technology is rapidly developed, wherein the big data and artificial intelligence technology is one of the important application technologies of the financial technology. Aiming at the requirements of short, small, frequent and urgent financing of small and micro enterprise objects, an intelligent wind control system which runs through the whole credit process before, during and after loan is established based on multi-source data covered by the small and micro enterprises is one of mainstream business modes.
The comprehensive interpretation of the enterprise is provided for the bank to establish preliminary cognition on the enterprise before the loan, and the enterprise-related risks are provided in the loan to timely embody and establish the timely control of the bank on the enterprise operation and development condition risk points, so that the bank can conveniently perform timely operations of interest reduction and interest adjustment on the loan products implemented by the enterprise, and timely control on the risks.
However, in the intelligent wind control system in the prior art, the behavior characteristics of the malicious loan application user enterprises cannot be accurately read, and the evaluation characteristics have fewer indexes and are not complete.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a small micro enterprise portrait construction method with strong practicability.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a small and micro enterprise portrait construction method comprises the following steps:
s1, establishing a standard database through data aggregation and fusion;
s2, establishing an enterprise portrait label system;
s3, establishing an enterprise comprehensive evaluation and dimension evaluation index system;
s4, forming a clustering model mold entering index by the characteristic engineering;
and S5, establishing a fusion clustering analysis model.
Further, in step S1, the multi-source data covering multiple departments of the government and the third party are fused and aggregated by the big data ETL technology, and the data are stored in the standard database after being subjected to noise removal, data alignment and data redundancy removal.
Further, in step S2, the enterprise portrait label includes an enterprise self-owned data label and an enterprise model label, wherein the enterprise self-owned label is derived from self-owned data in the standard database, and the enterprise model label is mainly generated by a cluster analysis method, an enterprise comprehensive evaluation index is generated by enterprise multi-source data, an integrated cluster analysis model is invoked to generate an integrated model prediction label, an enterprise background, enterprise stability, enterprise business capability, enterprise development capability, and technological innovation capability are generated by enterprise multi-source data, and a dimension model prediction label is generated by invoking a dimension cluster analysis model.
Further, in step S3, the enterprise standard data table in the enterprise standard database extracts enterprise indexes, and the enterprise comprehensive evaluation index includes five primary dimensions in total, namely, enterprise background, enterprise stability, enterprise business capability, enterprise development capability, and enterprise science and technology innovation capability.
Further, in step S4, the enterprise multi-source data is subjected to exploratory data analysis and data cleaning based on the indicators formed in step S3, and finally form the model entering features required by the training of the fused clustering model.
Furthermore, the exploratory data analysis is used for carrying out simple description statistics on the generated indexes, carrying out data segmentation on specific index data after carrying out simple statistical analysis on the data, carrying out deep analysis on the dynamic change condition of the data and the value taking condition under a certain specific condition, and carrying out visual analysis on the indexes of the template entry sample by drawing a histogram curve of a univariate and a relation curve of the univariate and a target variable.
The data cleaning firstly processes invalid values in the indexes, numerically quantizes partial quantifiable indexes, then carries out missing value statistics on the in-mold indexes, removes training indexes with missing values larger than 80%, carries out statistics on the same value rate of the rest indexes, removes the characteristic that the attribute only has one value, and removes the indexes with the same value rate of the attribute larger than 85%; performing VIF (visual aid factor) collinearity analysis on the evaluation indexes subjected to the missing homonymy filtering, and removing relevant characteristics to obtain a plurality of residual mould-entering indexes; default missing values in the multiple mold-entering indexes are filled with 0 values, and the training sample subjected to data cleaning and missing value filling is subjected to Z-Score standardization processing to form a standardized training vector.
Further, in step S5, performing cluster modeling on the enterprise comprehensive evaluation index by using a kmeans cluster analysis method, determining a K value by using a Calinski-Harabasz measurement method, evaluating a clustering effect by using stability analysis and clustering effect analysis methods, and establishing a comprehensive cluster analysis model;
based on the dimension indexes of five dimensions of the enterprise, enterprise clustering analysis of each dimension is carried out by adopting a kmeans clustering method respectively, and a clustering analysis model of each dimension is formed.
Further, when the Calinski-Harabasz measurement method determines the K value, the greater the CHI score value, the better the clustering effect. And the K value is subjected to kmeans clustering analysis by taking a value in an interval of 1-10, a clustering analysis result graph is drawn, CH measurement index values under different K values are sequentially calculated, and a K value result with the optimal clustering effect is selected by combining a visual result graph of the clustering analysis and the different values of the CH.
Further, after a kmeans clustering algorithm is selected as the optimal clustering effect, a clustering parameter random _ state is not set in the modeling process, an optimal K value is determined according to a CH value, kmeans clustering is continuously executed for 3-10 times, and whether the distribution situation of samples in each cluster greatly fluctuates after each clustering is observed. By observing the clustering effect of 3-10 cycles, the distribution value sequence of each cluster is random, but the ratio of the samples in each cluster is relatively fixed, which indicates that the method is suitable for the current data set by selecting a kmeans clustering algorithm for clustering analysis based on the existing characteristics, training samples and determining K value.
Further, all indexes of an enterprise with five first-level dimensions, namely, enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity, are subjected to feature preprocessing, feature screening, deletion homonymy removal and exceeding of a threshold value, useless indexes are removed, then a plurality of residual module-entering indexes are summed, the plurality of module-entering indexes serve as comprehensive evaluation indexes of the enterprise, a kmeans cluster analysis model is established based on the comprehensive evaluation indexes of the enterprise, an optimal K value is determined through CH measurement, and the cluster effect is visually evaluated through cluster effect stability analysis and cluster result clusters;
screening indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity generate dimension training vectors after feature preprocessing and feature quantization, based on enterprise evaluation indexes of all dimensions, kmeans cluster analysis models are respectively established, an optimal K value is determined through CH measurement, cluster effect stability analysis and cluster result cluster visual analysis are carried out to evaluate the cluster effect, dimension cluster analysis models of five dimensions in total are generated, and an enterprise portrait fusion cluster analysis model is formed by fusing an enterprise comprehensive evaluation model and the five-dimension cluster analysis models of the enterprise;
the method comprises the steps of obtaining enterprise dimension indexes of five dimensions, generating dimension training vectors after data preprocessing and characteristic quantization are carried out on the indexes, calling each dimension clustering analysis model, dividing enterprise dimension classification clusters, forming enterprise dimension clustering analysis model labels according to the enterprise classification clusters, establishing an enterprise portrait label automatic generation module based on self labels, comprehensive clustering model labels and dimension clustering model labels, inputting enterprise information to obtain self data, comprehensive evaluation indexes and dimension evaluation indexes of the enterprises, calling a fusion clustering model to generate each model label, and automatically generating enterprise portraits.
Compared with the prior art, the small and micro enterprise portrait construction method has the following outstanding beneficial effects:
1. compared with an enterprise portrait evaluation method based on a single data source, the method is based on enterprise multi-source data fusion, data merging, data alignment, data fusion and other operations are carried out on multi-source data, an enterprise portrait label system, an enterprise comprehensive evaluation and dimension evaluation index system are established on the basis of the multi-source data fusion, enterprise portrait dimensions are richer, evaluation indexes are more comprehensive, and the defect that a single data source covers a relatively large area of portrait evaluation dimensions is overcome.
2. Compared with the enterprise portrait modeling method based on the supervised classification method, the cluster analysis method can still be used for deeply analyzing the distribution condition of the enterprise under the conditions of enterprise identification lack and inaccurate enterprise identification, group division of small and micro enterprises is realized, the realization scope and the realization scene of enterprise portrait establishment based on high-dimensional characteristics and massive training samples are expanded, and the method is wider in application range.
3. The simple kmeans clustering analysis method is improved, the enterprise portrait label construction is carried out by applying the fusion clustering method, the application scene of the financial science and technology in the credit field is expanded by the aid of the power-assisted enterprise credit service, and the content of the financial science and technology is enriched.
4. Along with the convergence of enterprise mass data, the introduction of an artificial intelligent wind control modeling method, the continuous enrichment of enterprise portrait construction indexes, the increase of training sample identification scenes and the fusion of various algorithms, the method provided by the invention is more suitable for the wind control modeling of the large-data mass enterprise data, is particularly more suitable for the wind control model construction under the condition of no label, and has a very wide application prospect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart diagram of a method for constructing a portrait of a small micro enterprise;
FIG. 2 is a schematic diagram of an enterprise portrait label system and a comprehensive evaluation and dimension evaluation index system established in a small and micro enterprise portrait construction method;
FIG. 3 is an example diagram of an application scenario in a method for constructing an image of a small and micro enterprise.
Detailed Description
The present invention will be described in further detail with reference to specific embodiments in order to better understand the technical solutions of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A preferred embodiment is given below:
as shown in fig. 1 to 3, in the method for constructing a small micro enterprise sketch in this embodiment, an enterprise sketch is constructed by using a cluster analysis method in unsupervised learning. Multi-source data covering multiple departments of the government and a third party are fused and converged by a big data ETL technology, and the data are stored in a standard database after being processed by noise removal, data alignment, data redundancy removal and the like; screening and sorting small and micro enterprise data in a standard database, and establishing an enterprise portrait label system, wherein the enterprise portrait label mainly comprises an enterprise own data label and an enterprise model label; enterprise data in the standard database is subjected to data cleaning, characteristic preprocessing and the like, one part of the enterprise data is directly used as an own label of an enterprise, and the other part of the enterprise data is subjected to characteristic preprocessing and standardization processing and then used as an unsupervised training sample to perform next clustering modeling; respectively establishing a comprehensive characteristic model and a grouping clustering model based on a kmeans clustering method, and determining a K value, clustering effect analysis and the like through stability analysis and a Calinski-Harabasz measurement method to form a final clustering model; predicting enterprise classification clusters according to the established fusion comprehensive clustering and grouping clustering fusion clustering model to form enterprise clustering model labels; establishing a small micro enterprise portrait label based on the model label and the self-owned data label, reading the original data of the enterprise by externally inputting enterprise information, processing and preprocessing, and calling the fusion clustering model to predict the enterprise classification cluster to automatically generate the enterprise portrait label.
The method comprises the following specific steps:
s1, establishing standard database by data aggregation and fusion
The multi-source data of the enterprise covers government data of the enterprise and comprises information such as an enterprise, a public accumulation fund, a social security, a change committee, a bank security prison, an administrative penalty and the like, internet data of the enterprise comprises information such as e-commerce data, marketing information, affirmation information, online store information, legal action, information loss execution, tendering and the like, and third-party data of the enterprise comprises information such as enterprise business information, personnel information, people-enterprise relationship data and the like; firstly, establishing a uniform data standard specification to carry out standardized management on multi-source data which is put in a warehouse; secondly, the treatment processing of multi-source data is carried out through ETL and other data treatment tools, the storable data such as internet data is regularly pulled, the real-time interface data is processed through a memory, and the data is processed, standardized, calculated according to indexes, mined according to light characteristics and the like in combination with a batch flow processing mode; and finally, fusing and converging the three-party multi-source data into a unified data warehouse through transverse and longitudinal data fusion, wherein the data warehouse stores information such as standard library data, an index library, a feature library and the like obtained by processing after the multi-source data fusion.
S2, establishing an enterprise portrait label system
And (3) combing all data sources covered by the enterprise in the standard database, and establishing an enterprise portrait label system, wherein the enterprise portrait label comprises an enterprise owned data label and an enterprise model label. The enterprise self-owned tag is derived from self-owned data in a standard database and mainly comprises basic information of the enterprise, such as the establishment period of the enterprise, the registered capital, the type of the enterprise and the number of persons in the enterprise; the reward and punishment information of enterprises such as a city leader quality reward enterprise, a famous brand product title enterprise, a contract keeping re-credit enterprise, a special and new small and medium-sized enterprise, a gazelle enterprise, a scientific and technological innovation type enterprise and the like; tax identification information of the enterprise, such as an enterprise A-level taxpayer, a latest tax credit level of the enterprise as A level and the like; the negative information of the enterprise, such as the latest tax payment credit level of the enterprise is C level or D level, whether the enterprise has been cancelled or cancelled, abnormal operation of the enterprise, the enterprise is listed as a serious illegal enterprise, the enterprise has a serious tax illegal enterprise, and the like. The enterprise model label is mainly generated by a clustering analysis method; the method comprises the steps of generating enterprise comprehensive evaluation indexes through enterprise multi-source data, calling a comprehensive clustering analysis model to generate a comprehensive model prediction label, generating enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and technological innovation capacity to obtain 5 dimensionality indexes, and calling the dimensionality clustering analysis model to generate a dimensionality model prediction label.
S3, establishing an enterprise comprehensive evaluation and dimension evaluation index system
Extracting enterprise indexes based on an enterprise standard data table in an enterprise standard database, wherein the enterprise comprehensive evaluation indexes comprise five primary dimensions in total, namely an enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity, and the enterprise background mainly comprises more than 20 basic indexes in total, such as establishment time, registration capital, number of workers and the like of the enterprise; the enterprise stability covers more than 20 basic indexes of enterprise such as business change, tax change, legal person change and the like; the enterprise operation capacity covers more than 200 indexes of the total six secondary dimensions of management capacity, repayment willingness, operation capacity, profit capacity, enterprise qualification and the like of the enterprise; the development potential of the enterprise comprises more than 50 indexes of two dimensions of development ability and innovation ability of the enterprise, and the scientific and technological innovation ability of the enterprise comprises more than 10 secondary dimensions of patent number, soft copy, intellectual property and the like of the enterprise; a total of 300 indexes of five primary dimensions of the enterprise form a comprehensive evaluation index of the enterprise.
S4, forming clustering model mold-entering index by feature engineering
A total of 300 indexes formed based on enterprise multi-source data extraction need to finally form model entering characteristics required by fusion clustering model training through a plurality of processes such as exploratory data analysis, data cleaning and the like.
The exploratory data analysis mainly comprises the steps of carrying out simple description statistics on more than 300 generated indexes, analyzing the variance, mean value, median, data distribution and the like of each index, carrying out simple statistical analysis on the data, carrying out data segmentation on specific index data, and carrying out deep analysis on the dynamic change condition of the data and the value taking condition under a certain specific condition; and performing visual analysis on the model-entering sample indexes by drawing a histogram curve of the univariate, a relation curve of the univariate and the target variable and the like.
Data cleaning firstly processes invalid values in the indexes and carries out numerical quantification on partial quantifiable indexes; then carrying out missing value statistics on the mold-entering indexes, and removing training indexes with the missing values larger than 80%; counting the same-value rate of the remaining indexes, removing the characteristic that the attribute has only one value, and removing the indexes with the same-value rate of the attribute being more than 85%; performing VIF (visual aid factor) collinearity analysis on the evaluation indexes subjected to the missing homonymy filtering, and removing the relevant characteristics to obtain the remaining 20 mould-entering indexes; default missing values in the 20 model-entering indexes are filled with 0 values, and the training sample subjected to data cleaning and missing value filling is subjected to Z-Score standardization processing to form a standardized training vector.
S5 fusion clustering analysis model establishment
Performing cluster modeling on the enterprise comprehensive evaluation index by adopting a kmeans cluster analysis method, determining a K value by adopting a Calinski-Harabasz measurement method, evaluating a clustering effect by adopting stability analysis and clustering effect analysis methods, and establishing a comprehensive cluster analysis model; based on the dimension indexes of five dimensions of the enterprise, enterprise clustering analysis of each dimension is carried out by adopting a kmeans clustering method respectively, and a clustering analysis model of each dimension is formed.
The method comprises the steps of determining the K value in a kmeans clustering method when the K value is determined by a Calinski-Harabasz measuring method, determining the K value by the Calinski-Harabasz measuring method, and determining the clustering effect when the CHI score value is larger. And the K value is subjected to kmeans clustering analysis by taking a value in an interval of 1-10, a clustering analysis result graph is drawn, CH measurement index values under different K values are sequentially calculated, and a K value result with the optimal clustering effect is selected by combining a visual result graph of the clustering analysis and the different values of the CH.
After a kmeans clustering algorithm is selected as the optimal clustering effect, a clustering parameter random _ state is not set in the modeling process, an optimal K value is determined according to a CH value, kmeans clustering is continuously executed for 5 times, and whether the distribution condition of samples in each cluster greatly fluctuates after each clustering is observed. By observing the clustering effect of 5 cycles, the distribution value sequence of each cluster is random, but the ratio of the samples in each cluster is relatively fixed, which indicates that the method is suitable for the current data set by selecting a kmeans clustering algorithm for clustering analysis based on the existing characteristics, training samples and determining the K value.
Establishing a fusion clustering model:
all indexes of five first-dimension enterprises which cover the enterprise background, the enterprise stability, the enterprise operation capacity, the enterprise development capacity and the enterprise technological innovation capacity are subjected to feature preprocessing, feature screening, deletion identical value exceeding a threshold value is removed, 25 remaining mould-entering indexes are removed, the 25 mould-entering indexes are used as comprehensive evaluation indexes of the enterprises, a kmeans cluster analysis model is established based on the comprehensive evaluation indexes of the enterprises, the optimal K value is determined through CH measurement, and the cluster effect is evaluated through cluster effect stability analysis and cluster visual analysis of cluster results.
Screening indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity generate dimension training vectors after feature preprocessing and feature quantization, a kmeans cluster analysis model is respectively established based on enterprise evaluation indexes of each dimension, an optimal K value is determined through CH measurement, and a cluster effect cluster analysis model of the total five dimensions is generated through cluster effect stability analysis and cluster result cluster visual analysis evaluation cluster effect. And the comprehensive evaluation model of the enterprise is fused with the five-dimensional clustering analysis model of the enterprise to form an enterprise portrait fusion clustering analysis model.
Generating an enterprise portrait label:
the enterprise portrait label comprises an enterprise self-owned data label and a clustering model analysis label, and an enterprise original data field stored in a standard database is subjected to data preprocessing and data quantization to form a standardized label format which is used as the self-owned label of the enterprise portrait for generation; acquiring indexes corresponding to the enterprise comprehensive clustering model, performing data preprocessing and feature quantization on the indexes, generating a training vector to call the comprehensive clustering analysis model, dividing enterprise classification clusters, and forming enterprise comprehensive clustering analysis model labels according to the enterprise classification clusters, wherein the labels comprise label information such as good, general and high-quality enterprise credit conditions, enterprise comprehensive conditions, comprehensive condition ratio and the like; the method comprises the steps of obtaining enterprise dimension indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity, generating dimension training vectors after data preprocessing and feature quantization are conducted on the indexes, calling each dimension clustering analysis model, dividing enterprise dimension classification clusters, forming enterprise dimension clustering analysis model labels according to the enterprise classification clusters, and mainly comprising the fact that an enterprise has a certain scale, an enterprise development starting period, the enterprise is stable, the enterprise technological innovation capacity is strong, and the like. An enterprise portrait label automatic generation module is established based on the self label, the comprehensive clustering model label and the dimension clustering model label, enterprise information is input to obtain self data, comprehensive evaluation indexes and dimension evaluation indexes of the enterprise, the fusion clustering model is called to generate each model label, and the enterprise portrait is automatically generated.
The above embodiments are only specific cases of the present invention, and the scope of the present invention includes but is not limited to the above embodiments, and any suitable changes or substitutions that are required by a person of ordinary skill in the art and in accordance with the claims of the method for constructing a representation of a small micro enterprise of the present invention shall fall within the scope of the present invention.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (10)
1. A method for constructing a small and micro enterprise image is characterized by comprising the following steps:
s1, establishing a standard database through data aggregation and fusion;
s2, establishing an enterprise portrait label system;
s3, establishing an enterprise comprehensive evaluation and dimension evaluation index system;
s4, forming a clustering model mold entering index by the characteristic engineering;
and S5, establishing a fusion clustering analysis model.
2. The method for constructing the image of the small and micro enterprise according to claim 1, wherein in step S1, the multi-source data covering multiple departments of the government and the third party are fused and converged by big data ETL technology, and the data are stored in the standard database after being subjected to noise removal, data alignment and data redundancy removal.
3. The method of claim 2, wherein in step S2, the enterprise portrait tags include enterprise-owned data tags and enterprise model tags, wherein the enterprise-owned tags are derived from owned data in a standard database, and the enterprise model tags are generated mainly by a cluster analysis method, the enterprise comprehensive assessment indicators are generated by enterprise multi-source data, the comprehensive cluster analysis model is invoked to generate comprehensive model prediction tags, and the enterprise background, enterprise stability, enterprise business capability, enterprise development capability, and technological innovation capability are generated by enterprise multi-source data to generate five-dimensional indicators in total, and the dimensional cluster analysis model is invoked to generate dimensional model prediction tags.
4. The method for constructing an image of a small business as claimed in claim 3, wherein in step S3, the enterprise standard data table in the enterprise standard database extracts the enterprise index, and the enterprise comprehensive evaluation index includes five primary dimensions in total, namely, the enterprise background, the enterprise stability, the enterprise business capability, the enterprise development capability, and the enterprise technological innovation capability.
5. The method for constructing the small and micro enterprise image of claim 4, wherein in step S4, enterprise multi-source data is subjected to exploratory data analysis and data cleaning based on the indexes formed in step S3, and finally, model entering features required by the training of the fusion clustering model are formed.
6. The method for constructing the image of the small and micro enterprise according to claim 5, wherein the exploratory data analysis is used for carrying out simple description statistics on generated indexes, carrying out simple statistical analysis on the data, carrying out data segmentation on specific index data, carrying out deep analysis on dynamic change conditions of the data and value taking conditions under a certain specific condition, and carrying out visual analysis on the indexes of the model entering sample cases by drawing a histogram curve of a univariate and a relation curve of the univariate and a target variable.
The data cleaning firstly processes invalid values in the indexes, numerically quantizes partial quantifiable indexes, then carries out missing value statistics on the in-mold indexes, removes training indexes with missing values larger than 80%, carries out statistics on the same value rate of the rest indexes, removes the characteristic that the attribute only has one value, and removes the indexes with the same value rate of the attribute larger than 85%; performing VIF (visual aid factor) collinearity analysis on the evaluation indexes subjected to the missing homonymy filtering, and removing relevant characteristics to obtain a plurality of residual mould-entering indexes; default missing values in the multiple mold-entering indexes are filled with 0 values, and the training sample subjected to data cleaning and missing value filling is subjected to Z-Score standardization processing to form a standardized training vector.
7. The method for constructing the small and micro enterprise image according to claim 6, wherein in step S5, a kmeans clustering analysis method is adopted to perform clustering modeling on the enterprise comprehensive evaluation index, a Calinski-Harabasz measurement method is adopted to determine a K value, a stability analysis method and a clustering effect analysis method are adopted to evaluate a clustering effect, and a comprehensive clustering analysis model is established;
based on the dimension indexes of five dimensions of the enterprise, enterprise clustering analysis of each dimension is carried out by adopting a kmeans clustering method respectively, and a clustering analysis model of each dimension is formed.
8. The method for constructing the small and micro enterprise image according to claim 7, wherein when the K value is determined by the Calinski-Harabasz measurement method, the clustering effect is better when the CHI score value is larger. And the K value is subjected to kmeans clustering analysis by taking a value in an interval of 1-10, a clustering analysis result graph is drawn, CH measurement index values under different K values are sequentially calculated, and a K value result with the optimal clustering effect is selected by combining a visual result graph of the clustering analysis and the different values of the CH.
9. The method for constructing the small and micro enterprise image according to claim 8, characterized in that after a kmeans clustering algorithm is selected as the optimal clustering effect, a clustering parameter random _ state is not set in the modeling process, an optimal K value is determined according to a CH value, kmeans clustering is continuously executed for 3-10 times, and whether the distribution situation of samples in each cluster greatly fluctuates after each clustering is observed. By observing the clustering effect of 3-10 cycles, the distribution value sequence of each cluster is random, but the ratio of the samples in each cluster is relatively fixed, which indicates that the method is suitable for the current data set by selecting a kmeans clustering algorithm for clustering analysis based on the existing characteristics, training samples and determining K value.
10. The method for constructing the image of the small micro-enterprise according to claim 8, wherein all indexes of five first-dimension enterprises in total, such as enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity, are subjected to feature preprocessing, feature screening, deletion equivalence exceeding a threshold value and useless index removal, and then a plurality of remaining in-mold indexes are summed, the plurality of in-mold indexes are used as comprehensive evaluation indexes of the enterprises, a kmeans cluster analysis model is established based on the comprehensive evaluation indexes of the enterprises, an optimal K value is determined through CH measurement, and the clustering effect is evaluated through stability analysis of clustering effect and visual analysis of clustering result clusters;
screening indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity generate dimension training vectors after feature preprocessing and feature quantization, based on enterprise evaluation indexes of all dimensions, kmeans cluster analysis models are respectively established, an optimal K value is determined through CH measurement, cluster effect stability analysis and cluster result cluster visual analysis are carried out to evaluate the cluster effect, dimension cluster analysis models of five dimensions in total are generated, and an enterprise portrait fusion cluster analysis model is formed by fusing an enterprise comprehensive evaluation model and the five-dimension cluster analysis models of the enterprise;
the method comprises the steps of obtaining enterprise dimension indexes of five dimensions, generating dimension training vectors after data preprocessing and characteristic quantization are carried out on the indexes, calling each dimension clustering analysis model, dividing enterprise dimension classification clusters, forming enterprise dimension clustering analysis model labels according to the enterprise classification clusters, establishing an enterprise portrait label automatic generation module based on self labels, comprehensive clustering model labels and dimension clustering model labels, inputting enterprise information to obtain self data, comprehensive evaluation indexes and dimension evaluation indexes of the enterprises, calling a fusion clustering model to generate each model label, and automatically generating enterprise portraits.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110979314.0A CN113837859B (en) | 2021-08-25 | 2021-08-25 | Image construction method for small and micro enterprises |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110979314.0A CN113837859B (en) | 2021-08-25 | 2021-08-25 | Image construction method for small and micro enterprises |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113837859A true CN113837859A (en) | 2021-12-24 |
CN113837859B CN113837859B (en) | 2024-05-14 |
Family
ID=78961216
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110979314.0A Active CN113837859B (en) | 2021-08-25 | 2021-08-25 | Image construction method for small and micro enterprises |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113837859B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113988726A (en) * | 2021-12-28 | 2022-01-28 | 江苏荣泽信息科技股份有限公司 | Enterprise industry credit evaluation management system based on block chain |
CN114462516A (en) * | 2022-01-21 | 2022-05-10 | 天元大数据信用管理有限公司 | Enterprise credit score sample labeling method and device |
CN116304974A (en) * | 2023-02-17 | 2023-06-23 | 国网浙江省电力有限公司营销服务中心 | Multi-channel data fusion method and system |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100205108A1 (en) * | 2009-02-11 | 2010-08-12 | Mun Johnathan C | Credit and market risk evaluation method |
CN103294828A (en) * | 2013-06-25 | 2013-09-11 | 厦门市美亚柏科信息股份有限公司 | Verification method and verification device of data mining model dimension |
CN107563929A (en) * | 2017-07-27 | 2018-01-09 | 杭州中奥科技有限公司 | A kind of various dimensions siren based on personage's specificity analysis |
CN107993143A (en) * | 2017-11-23 | 2018-05-04 | 深圳大管加软件与技术服务有限公司 | A kind of Credit Risk Assessment method and system |
CN110322089A (en) * | 2018-03-30 | 2019-10-11 | 宗略投资(上海)有限公司 | Enterprise Credit Risk Evaluation method and its system |
CN110990474A (en) * | 2019-11-28 | 2020-04-10 | 泰华智慧产业集团股份有限公司 | Regional industry image analysis method and device |
CN111047122A (en) * | 2018-10-11 | 2020-04-21 | 北京国双科技有限公司 | Enterprise data maturity evaluation method and device and computer equipment |
CN111680073A (en) * | 2020-06-11 | 2020-09-18 | 天元大数据信用管理有限公司 | Financial service platform policy information recommendation method based on user data |
CN111737477A (en) * | 2020-08-07 | 2020-10-02 | 杭州六棱镜知识产权科技有限公司 | Intellectual property big data-based intelligence investigation method, system and storage medium |
CN111754116A (en) * | 2020-06-24 | 2020-10-09 | 国家电网有限公司大数据中心 | Credit assessment method and device based on label portrait technology |
CN111861262A (en) * | 2020-07-30 | 2020-10-30 | 国网山东省电力公司寿光市供电公司 | Enterprise perspective portrait method and terminal based on energy big data |
CN112396430A (en) * | 2020-11-09 | 2021-02-23 | 中国南方电网有限责任公司 | Processing method and system for enterprise evaluation |
CN112395500A (en) * | 2020-11-17 | 2021-02-23 | 平安科技(深圳)有限公司 | Content data recommendation method and device, computer equipment and storage medium |
CN112435152A (en) * | 2020-12-04 | 2021-03-02 | 北京师范大学 | Online learning investment dynamic evaluation method and system |
CN112668945A (en) * | 2021-01-27 | 2021-04-16 | 天元大数据信用管理有限公司 | Enterprise credit risk assessment method and device |
-
2021
- 2021-08-25 CN CN202110979314.0A patent/CN113837859B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100205108A1 (en) * | 2009-02-11 | 2010-08-12 | Mun Johnathan C | Credit and market risk evaluation method |
CN103294828A (en) * | 2013-06-25 | 2013-09-11 | 厦门市美亚柏科信息股份有限公司 | Verification method and verification device of data mining model dimension |
CN107563929A (en) * | 2017-07-27 | 2018-01-09 | 杭州中奥科技有限公司 | A kind of various dimensions siren based on personage's specificity analysis |
CN107993143A (en) * | 2017-11-23 | 2018-05-04 | 深圳大管加软件与技术服务有限公司 | A kind of Credit Risk Assessment method and system |
CN110322089A (en) * | 2018-03-30 | 2019-10-11 | 宗略投资(上海)有限公司 | Enterprise Credit Risk Evaluation method and its system |
CN111047122A (en) * | 2018-10-11 | 2020-04-21 | 北京国双科技有限公司 | Enterprise data maturity evaluation method and device and computer equipment |
CN110990474A (en) * | 2019-11-28 | 2020-04-10 | 泰华智慧产业集团股份有限公司 | Regional industry image analysis method and device |
CN111680073A (en) * | 2020-06-11 | 2020-09-18 | 天元大数据信用管理有限公司 | Financial service platform policy information recommendation method based on user data |
CN111754116A (en) * | 2020-06-24 | 2020-10-09 | 国家电网有限公司大数据中心 | Credit assessment method and device based on label portrait technology |
CN111861262A (en) * | 2020-07-30 | 2020-10-30 | 国网山东省电力公司寿光市供电公司 | Enterprise perspective portrait method and terminal based on energy big data |
CN111737477A (en) * | 2020-08-07 | 2020-10-02 | 杭州六棱镜知识产权科技有限公司 | Intellectual property big data-based intelligence investigation method, system and storage medium |
CN112396430A (en) * | 2020-11-09 | 2021-02-23 | 中国南方电网有限责任公司 | Processing method and system for enterprise evaluation |
CN112395500A (en) * | 2020-11-17 | 2021-02-23 | 平安科技(深圳)有限公司 | Content data recommendation method and device, computer equipment and storage medium |
CN112435152A (en) * | 2020-12-04 | 2021-03-02 | 北京师范大学 | Online learning investment dynamic evaluation method and system |
CN112668945A (en) * | 2021-01-27 | 2021-04-16 | 天元大数据信用管理有限公司 | Enterprise credit risk assessment method and device |
Non-Patent Citations (4)
Title |
---|
JIE DU等: "Research on Accurate Marketing Modeling of User Portrait Based on Bid Data", 2018 INTERNATIONAL COMPUTERS, SIGNALS AND SYSTEMS CONFERENCE (ICOMSSC) * |
T. ALIX等: "Enterprise modelling techniques to help manufacturing firms develop product service activities", IFAC PROCEEDINGS VOLUMES, vol. 42, no. 4, pages 1637 - 1642 * |
林伟等: "企业信息化的灰聚类评价模型及应用", 科技进步与对策, vol. 20, no. 06, pages 129 - 130 * |
黄晓斌等: "融合多源数据的企业竞争对手画像构建", 现代情报, vol. 40, no. 11, pages 13 - 21 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113988726A (en) * | 2021-12-28 | 2022-01-28 | 江苏荣泽信息科技股份有限公司 | Enterprise industry credit evaluation management system based on block chain |
CN114462516A (en) * | 2022-01-21 | 2022-05-10 | 天元大数据信用管理有限公司 | Enterprise credit score sample labeling method and device |
CN114462516B (en) * | 2022-01-21 | 2024-04-16 | 天元大数据信用管理有限公司 | Enterprise credit scoring sample labeling method and device |
CN116304974A (en) * | 2023-02-17 | 2023-06-23 | 国网浙江省电力有限公司营销服务中心 | Multi-channel data fusion method and system |
CN116304974B (en) * | 2023-02-17 | 2023-09-29 | 国网浙江省电力有限公司营销服务中心 | Multi-channel data fusion method and system |
Also Published As
Publication number | Publication date |
---|---|
CN113837859B (en) | 2024-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113837859B (en) | Image construction method for small and micro enterprises | |
CN112017025A (en) | Enterprise credit assessment method based on fusion of deep learning and logistic regression | |
Utari et al. | Implementation of data mining for drop-out prediction using random forest method | |
CN114444986A (en) | Product analysis method, system, device and medium | |
CN112182246A (en) | Method, system, medium, and application for creating an enterprise representation through big data analysis | |
CN109345133B (en) | Review method based on big data and deep learning and robot system | |
CN114757432A (en) | Future execution activity and time prediction method and system based on flow log and multi-task learning | |
CN113886372A (en) | User portrait construction method based on improved analytic hierarchy process | |
CN113326862A (en) | Audit big data fusion clustering and risk data detection method, medium and equipment | |
CN113537807A (en) | Enterprise intelligent wind control method and device | |
CN112330441A (en) | Method for evaluating business value credit loan of medium and small enterprises | |
CN115238197A (en) | Expert thinking model-based field business auxiliary analysis method | |
CN116468536A (en) | Automatic risk control rule generation method | |
CN111709225A (en) | Event cause and effect relationship judging method and device and computer readable storage medium | |
CN109543038B (en) | Emotion analysis method applied to text data | |
CN112200684B (en) | Method, system and storage medium for detecting medical insurance fraud | |
Scherger et al. | A systematic overview of the prediction of business failure | |
CN115860924A (en) | Supply chain financial credit risk early warning method and related equipment | |
CN113869423A (en) | Marketing response model construction method, equipment and medium | |
CN115375456A (en) | Data processing method, device, equipment and medium for credit risk assessment | |
Yao | Application of data mining technology in financial fraud identification | |
CN114626940A (en) | Data analysis method and device and electronic equipment | |
CN114817557A (en) | Enterprise risk detection method and device based on enterprise credit investigation big data knowledge graph | |
CN114417011A (en) | Inspection service portrait knowledge fusion method and device, electronic equipment and storage medium | |
Kulothungan | Loan Forecast by Using Machine Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |