CN113837859A - Small and micro enterprise portrait construction method - Google Patents

Small and micro enterprise portrait construction method Download PDF

Info

Publication number
CN113837859A
CN113837859A CN202110979314.0A CN202110979314A CN113837859A CN 113837859 A CN113837859 A CN 113837859A CN 202110979314 A CN202110979314 A CN 202110979314A CN 113837859 A CN113837859 A CN 113837859A
Authority
CN
China
Prior art keywords
enterprise
clustering
data
indexes
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110979314.0A
Other languages
Chinese (zh)
Other versions
CN113837859B (en
Inventor
尹盼盼
边松华
崔乐乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyuan Big Data Credit Management Co Ltd
Original Assignee
Tianyuan Big Data Credit Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyuan Big Data Credit Management Co Ltd filed Critical Tianyuan Big Data Credit Management Co Ltd
Priority to CN202110979314.0A priority Critical patent/CN113837859B/en
Publication of CN113837859A publication Critical patent/CN113837859A/en
Application granted granted Critical
Publication of CN113837859B publication Critical patent/CN113837859B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Educational Administration (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Finance (AREA)
  • Artificial Intelligence (AREA)
  • Accounting & Taxation (AREA)
  • Technology Law (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of financial credit, and particularly provides a method for constructing a small and micro enterprise portrait, which comprises the following steps: s1, establishing a standard database through data aggregation and fusion; s2, establishing an enterprise portrait label system; s3, establishing an enterprise comprehensive evaluation and dimension evaluation index system; s4, forming a clustering model mold entering index by the characteristic engineering; and S5, establishing a fusion clustering analysis model. Compared with the prior art, the method is based on enterprise multi-source data fusion, data merging, data alignment, data fusion and other operations are carried out on multi-source data, an enterprise portrait label system, an enterprise comprehensive evaluation and dimension evaluation index system are established on the basis of the multi-source data fusion, enterprise portrait dimensions are richer, evaluation indexes are more comprehensive, and the defect that a single data source covers a portrait with more comprehensive evaluation dimensions is overcome.

Description

Small and micro enterprise portrait construction method
Technical Field
The invention relates to the field of financial credit, and particularly provides a method for constructing a small and micro enterprise portrait.
Background
With the application of technologies such as big data, machine learning and artificial intelligence, the service mode, service form, management operation mode and the like of traditional financial institutions have revolutionary changes, and the financial technology is rapidly developed, wherein the big data and artificial intelligence technology is one of the important application technologies of the financial technology. Aiming at the requirements of short, small, frequent and urgent financing of small and micro enterprise objects, an intelligent wind control system which runs through the whole credit process before, during and after loan is established based on multi-source data covered by the small and micro enterprises is one of mainstream business modes.
The comprehensive interpretation of the enterprise is provided for the bank to establish preliminary cognition on the enterprise before the loan, and the enterprise-related risks are provided in the loan to timely embody and establish the timely control of the bank on the enterprise operation and development condition risk points, so that the bank can conveniently perform timely operations of interest reduction and interest adjustment on the loan products implemented by the enterprise, and timely control on the risks.
However, in the intelligent wind control system in the prior art, the behavior characteristics of the malicious loan application user enterprises cannot be accurately read, and the evaluation characteristics have fewer indexes and are not complete.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a small micro enterprise portrait construction method with strong practicability.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a small and micro enterprise portrait construction method comprises the following steps:
s1, establishing a standard database through data aggregation and fusion;
s2, establishing an enterprise portrait label system;
s3, establishing an enterprise comprehensive evaluation and dimension evaluation index system;
s4, forming a clustering model mold entering index by the characteristic engineering;
and S5, establishing a fusion clustering analysis model.
Further, in step S1, the multi-source data covering multiple departments of the government and the third party are fused and aggregated by the big data ETL technology, and the data are stored in the standard database after being subjected to noise removal, data alignment and data redundancy removal.
Further, in step S2, the enterprise portrait label includes an enterprise self-owned data label and an enterprise model label, wherein the enterprise self-owned label is derived from self-owned data in the standard database, and the enterprise model label is mainly generated by a cluster analysis method, an enterprise comprehensive evaluation index is generated by enterprise multi-source data, an integrated cluster analysis model is invoked to generate an integrated model prediction label, an enterprise background, enterprise stability, enterprise business capability, enterprise development capability, and technological innovation capability are generated by enterprise multi-source data, and a dimension model prediction label is generated by invoking a dimension cluster analysis model.
Further, in step S3, the enterprise standard data table in the enterprise standard database extracts enterprise indexes, and the enterprise comprehensive evaluation index includes five primary dimensions in total, namely, enterprise background, enterprise stability, enterprise business capability, enterprise development capability, and enterprise science and technology innovation capability.
Further, in step S4, the enterprise multi-source data is subjected to exploratory data analysis and data cleaning based on the indicators formed in step S3, and finally form the model entering features required by the training of the fused clustering model.
Furthermore, the exploratory data analysis is used for carrying out simple description statistics on the generated indexes, carrying out data segmentation on specific index data after carrying out simple statistical analysis on the data, carrying out deep analysis on the dynamic change condition of the data and the value taking condition under a certain specific condition, and carrying out visual analysis on the indexes of the template entry sample by drawing a histogram curve of a univariate and a relation curve of the univariate and a target variable.
The data cleaning firstly processes invalid values in the indexes, numerically quantizes partial quantifiable indexes, then carries out missing value statistics on the in-mold indexes, removes training indexes with missing values larger than 80%, carries out statistics on the same value rate of the rest indexes, removes the characteristic that the attribute only has one value, and removes the indexes with the same value rate of the attribute larger than 85%; performing VIF (visual aid factor) collinearity analysis on the evaluation indexes subjected to the missing homonymy filtering, and removing relevant characteristics to obtain a plurality of residual mould-entering indexes; default missing values in the multiple mold-entering indexes are filled with 0 values, and the training sample subjected to data cleaning and missing value filling is subjected to Z-Score standardization processing to form a standardized training vector.
Further, in step S5, performing cluster modeling on the enterprise comprehensive evaluation index by using a kmeans cluster analysis method, determining a K value by using a Calinski-Harabasz measurement method, evaluating a clustering effect by using stability analysis and clustering effect analysis methods, and establishing a comprehensive cluster analysis model;
based on the dimension indexes of five dimensions of the enterprise, enterprise clustering analysis of each dimension is carried out by adopting a kmeans clustering method respectively, and a clustering analysis model of each dimension is formed.
Further, when the Calinski-Harabasz measurement method determines the K value, the greater the CHI score value, the better the clustering effect. And the K value is subjected to kmeans clustering analysis by taking a value in an interval of 1-10, a clustering analysis result graph is drawn, CH measurement index values under different K values are sequentially calculated, and a K value result with the optimal clustering effect is selected by combining a visual result graph of the clustering analysis and the different values of the CH.
Further, after a kmeans clustering algorithm is selected as the optimal clustering effect, a clustering parameter random _ state is not set in the modeling process, an optimal K value is determined according to a CH value, kmeans clustering is continuously executed for 3-10 times, and whether the distribution situation of samples in each cluster greatly fluctuates after each clustering is observed. By observing the clustering effect of 3-10 cycles, the distribution value sequence of each cluster is random, but the ratio of the samples in each cluster is relatively fixed, which indicates that the method is suitable for the current data set by selecting a kmeans clustering algorithm for clustering analysis based on the existing characteristics, training samples and determining K value.
Further, all indexes of an enterprise with five first-level dimensions, namely, enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity, are subjected to feature preprocessing, feature screening, deletion homonymy removal and exceeding of a threshold value, useless indexes are removed, then a plurality of residual module-entering indexes are summed, the plurality of module-entering indexes serve as comprehensive evaluation indexes of the enterprise, a kmeans cluster analysis model is established based on the comprehensive evaluation indexes of the enterprise, an optimal K value is determined through CH measurement, and the cluster effect is visually evaluated through cluster effect stability analysis and cluster result clusters;
screening indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity generate dimension training vectors after feature preprocessing and feature quantization, based on enterprise evaluation indexes of all dimensions, kmeans cluster analysis models are respectively established, an optimal K value is determined through CH measurement, cluster effect stability analysis and cluster result cluster visual analysis are carried out to evaluate the cluster effect, dimension cluster analysis models of five dimensions in total are generated, and an enterprise portrait fusion cluster analysis model is formed by fusing an enterprise comprehensive evaluation model and the five-dimension cluster analysis models of the enterprise;
the method comprises the steps of obtaining enterprise dimension indexes of five dimensions, generating dimension training vectors after data preprocessing and characteristic quantization are carried out on the indexes, calling each dimension clustering analysis model, dividing enterprise dimension classification clusters, forming enterprise dimension clustering analysis model labels according to the enterprise classification clusters, establishing an enterprise portrait label automatic generation module based on self labels, comprehensive clustering model labels and dimension clustering model labels, inputting enterprise information to obtain self data, comprehensive evaluation indexes and dimension evaluation indexes of the enterprises, calling a fusion clustering model to generate each model label, and automatically generating enterprise portraits.
Compared with the prior art, the small and micro enterprise portrait construction method has the following outstanding beneficial effects:
1. compared with an enterprise portrait evaluation method based on a single data source, the method is based on enterprise multi-source data fusion, data merging, data alignment, data fusion and other operations are carried out on multi-source data, an enterprise portrait label system, an enterprise comprehensive evaluation and dimension evaluation index system are established on the basis of the multi-source data fusion, enterprise portrait dimensions are richer, evaluation indexes are more comprehensive, and the defect that a single data source covers a relatively large area of portrait evaluation dimensions is overcome.
2. Compared with the enterprise portrait modeling method based on the supervised classification method, the cluster analysis method can still be used for deeply analyzing the distribution condition of the enterprise under the conditions of enterprise identification lack and inaccurate enterprise identification, group division of small and micro enterprises is realized, the realization scope and the realization scene of enterprise portrait establishment based on high-dimensional characteristics and massive training samples are expanded, and the method is wider in application range.
3. The simple kmeans clustering analysis method is improved, the enterprise portrait label construction is carried out by applying the fusion clustering method, the application scene of the financial science and technology in the credit field is expanded by the aid of the power-assisted enterprise credit service, and the content of the financial science and technology is enriched.
4. Along with the convergence of enterprise mass data, the introduction of an artificial intelligent wind control modeling method, the continuous enrichment of enterprise portrait construction indexes, the increase of training sample identification scenes and the fusion of various algorithms, the method provided by the invention is more suitable for the wind control modeling of the large-data mass enterprise data, is particularly more suitable for the wind control model construction under the condition of no label, and has a very wide application prospect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart diagram of a method for constructing a portrait of a small micro enterprise;
FIG. 2 is a schematic diagram of an enterprise portrait label system and a comprehensive evaluation and dimension evaluation index system established in a small and micro enterprise portrait construction method;
FIG. 3 is an example diagram of an application scenario in a method for constructing an image of a small and micro enterprise.
Detailed Description
The present invention will be described in further detail with reference to specific embodiments in order to better understand the technical solutions of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A preferred embodiment is given below:
as shown in fig. 1 to 3, in the method for constructing a small micro enterprise sketch in this embodiment, an enterprise sketch is constructed by using a cluster analysis method in unsupervised learning. Multi-source data covering multiple departments of the government and a third party are fused and converged by a big data ETL technology, and the data are stored in a standard database after being processed by noise removal, data alignment, data redundancy removal and the like; screening and sorting small and micro enterprise data in a standard database, and establishing an enterprise portrait label system, wherein the enterprise portrait label mainly comprises an enterprise own data label and an enterprise model label; enterprise data in the standard database is subjected to data cleaning, characteristic preprocessing and the like, one part of the enterprise data is directly used as an own label of an enterprise, and the other part of the enterprise data is subjected to characteristic preprocessing and standardization processing and then used as an unsupervised training sample to perform next clustering modeling; respectively establishing a comprehensive characteristic model and a grouping clustering model based on a kmeans clustering method, and determining a K value, clustering effect analysis and the like through stability analysis and a Calinski-Harabasz measurement method to form a final clustering model; predicting enterprise classification clusters according to the established fusion comprehensive clustering and grouping clustering fusion clustering model to form enterprise clustering model labels; establishing a small micro enterprise portrait label based on the model label and the self-owned data label, reading the original data of the enterprise by externally inputting enterprise information, processing and preprocessing, and calling the fusion clustering model to predict the enterprise classification cluster to automatically generate the enterprise portrait label.
The method comprises the following specific steps:
s1, establishing standard database by data aggregation and fusion
The multi-source data of the enterprise covers government data of the enterprise and comprises information such as an enterprise, a public accumulation fund, a social security, a change committee, a bank security prison, an administrative penalty and the like, internet data of the enterprise comprises information such as e-commerce data, marketing information, affirmation information, online store information, legal action, information loss execution, tendering and the like, and third-party data of the enterprise comprises information such as enterprise business information, personnel information, people-enterprise relationship data and the like; firstly, establishing a uniform data standard specification to carry out standardized management on multi-source data which is put in a warehouse; secondly, the treatment processing of multi-source data is carried out through ETL and other data treatment tools, the storable data such as internet data is regularly pulled, the real-time interface data is processed through a memory, and the data is processed, standardized, calculated according to indexes, mined according to light characteristics and the like in combination with a batch flow processing mode; and finally, fusing and converging the three-party multi-source data into a unified data warehouse through transverse and longitudinal data fusion, wherein the data warehouse stores information such as standard library data, an index library, a feature library and the like obtained by processing after the multi-source data fusion.
S2, establishing an enterprise portrait label system
And (3) combing all data sources covered by the enterprise in the standard database, and establishing an enterprise portrait label system, wherein the enterprise portrait label comprises an enterprise owned data label and an enterprise model label. The enterprise self-owned tag is derived from self-owned data in a standard database and mainly comprises basic information of the enterprise, such as the establishment period of the enterprise, the registered capital, the type of the enterprise and the number of persons in the enterprise; the reward and punishment information of enterprises such as a city leader quality reward enterprise, a famous brand product title enterprise, a contract keeping re-credit enterprise, a special and new small and medium-sized enterprise, a gazelle enterprise, a scientific and technological innovation type enterprise and the like; tax identification information of the enterprise, such as an enterprise A-level taxpayer, a latest tax credit level of the enterprise as A level and the like; the negative information of the enterprise, such as the latest tax payment credit level of the enterprise is C level or D level, whether the enterprise has been cancelled or cancelled, abnormal operation of the enterprise, the enterprise is listed as a serious illegal enterprise, the enterprise has a serious tax illegal enterprise, and the like. The enterprise model label is mainly generated by a clustering analysis method; the method comprises the steps of generating enterprise comprehensive evaluation indexes through enterprise multi-source data, calling a comprehensive clustering analysis model to generate a comprehensive model prediction label, generating enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and technological innovation capacity to obtain 5 dimensionality indexes, and calling the dimensionality clustering analysis model to generate a dimensionality model prediction label.
S3, establishing an enterprise comprehensive evaluation and dimension evaluation index system
Extracting enterprise indexes based on an enterprise standard data table in an enterprise standard database, wherein the enterprise comprehensive evaluation indexes comprise five primary dimensions in total, namely an enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity, and the enterprise background mainly comprises more than 20 basic indexes in total, such as establishment time, registration capital, number of workers and the like of the enterprise; the enterprise stability covers more than 20 basic indexes of enterprise such as business change, tax change, legal person change and the like; the enterprise operation capacity covers more than 200 indexes of the total six secondary dimensions of management capacity, repayment willingness, operation capacity, profit capacity, enterprise qualification and the like of the enterprise; the development potential of the enterprise comprises more than 50 indexes of two dimensions of development ability and innovation ability of the enterprise, and the scientific and technological innovation ability of the enterprise comprises more than 10 secondary dimensions of patent number, soft copy, intellectual property and the like of the enterprise; a total of 300 indexes of five primary dimensions of the enterprise form a comprehensive evaluation index of the enterprise.
S4, forming clustering model mold-entering index by feature engineering
A total of 300 indexes formed based on enterprise multi-source data extraction need to finally form model entering characteristics required by fusion clustering model training through a plurality of processes such as exploratory data analysis, data cleaning and the like.
The exploratory data analysis mainly comprises the steps of carrying out simple description statistics on more than 300 generated indexes, analyzing the variance, mean value, median, data distribution and the like of each index, carrying out simple statistical analysis on the data, carrying out data segmentation on specific index data, and carrying out deep analysis on the dynamic change condition of the data and the value taking condition under a certain specific condition; and performing visual analysis on the model-entering sample indexes by drawing a histogram curve of the univariate, a relation curve of the univariate and the target variable and the like.
Data cleaning firstly processes invalid values in the indexes and carries out numerical quantification on partial quantifiable indexes; then carrying out missing value statistics on the mold-entering indexes, and removing training indexes with the missing values larger than 80%; counting the same-value rate of the remaining indexes, removing the characteristic that the attribute has only one value, and removing the indexes with the same-value rate of the attribute being more than 85%; performing VIF (visual aid factor) collinearity analysis on the evaluation indexes subjected to the missing homonymy filtering, and removing the relevant characteristics to obtain the remaining 20 mould-entering indexes; default missing values in the 20 model-entering indexes are filled with 0 values, and the training sample subjected to data cleaning and missing value filling is subjected to Z-Score standardization processing to form a standardized training vector.
S5 fusion clustering analysis model establishment
Performing cluster modeling on the enterprise comprehensive evaluation index by adopting a kmeans cluster analysis method, determining a K value by adopting a Calinski-Harabasz measurement method, evaluating a clustering effect by adopting stability analysis and clustering effect analysis methods, and establishing a comprehensive cluster analysis model; based on the dimension indexes of five dimensions of the enterprise, enterprise clustering analysis of each dimension is carried out by adopting a kmeans clustering method respectively, and a clustering analysis model of each dimension is formed.
The method comprises the steps of determining the K value in a kmeans clustering method when the K value is determined by a Calinski-Harabasz measuring method, determining the K value by the Calinski-Harabasz measuring method, and determining the clustering effect when the CHI score value is larger. And the K value is subjected to kmeans clustering analysis by taking a value in an interval of 1-10, a clustering analysis result graph is drawn, CH measurement index values under different K values are sequentially calculated, and a K value result with the optimal clustering effect is selected by combining a visual result graph of the clustering analysis and the different values of the CH.
After a kmeans clustering algorithm is selected as the optimal clustering effect, a clustering parameter random _ state is not set in the modeling process, an optimal K value is determined according to a CH value, kmeans clustering is continuously executed for 5 times, and whether the distribution condition of samples in each cluster greatly fluctuates after each clustering is observed. By observing the clustering effect of 5 cycles, the distribution value sequence of each cluster is random, but the ratio of the samples in each cluster is relatively fixed, which indicates that the method is suitable for the current data set by selecting a kmeans clustering algorithm for clustering analysis based on the existing characteristics, training samples and determining the K value.
Establishing a fusion clustering model:
all indexes of five first-dimension enterprises which cover the enterprise background, the enterprise stability, the enterprise operation capacity, the enterprise development capacity and the enterprise technological innovation capacity are subjected to feature preprocessing, feature screening, deletion identical value exceeding a threshold value is removed, 25 remaining mould-entering indexes are removed, the 25 mould-entering indexes are used as comprehensive evaluation indexes of the enterprises, a kmeans cluster analysis model is established based on the comprehensive evaluation indexes of the enterprises, the optimal K value is determined through CH measurement, and the cluster effect is evaluated through cluster effect stability analysis and cluster visual analysis of cluster results.
Screening indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity generate dimension training vectors after feature preprocessing and feature quantization, a kmeans cluster analysis model is respectively established based on enterprise evaluation indexes of each dimension, an optimal K value is determined through CH measurement, and a cluster effect cluster analysis model of the total five dimensions is generated through cluster effect stability analysis and cluster result cluster visual analysis evaluation cluster effect. And the comprehensive evaluation model of the enterprise is fused with the five-dimensional clustering analysis model of the enterprise to form an enterprise portrait fusion clustering analysis model.
Generating an enterprise portrait label:
the enterprise portrait label comprises an enterprise self-owned data label and a clustering model analysis label, and an enterprise original data field stored in a standard database is subjected to data preprocessing and data quantization to form a standardized label format which is used as the self-owned label of the enterprise portrait for generation; acquiring indexes corresponding to the enterprise comprehensive clustering model, performing data preprocessing and feature quantization on the indexes, generating a training vector to call the comprehensive clustering analysis model, dividing enterprise classification clusters, and forming enterprise comprehensive clustering analysis model labels according to the enterprise classification clusters, wherein the labels comprise label information such as good, general and high-quality enterprise credit conditions, enterprise comprehensive conditions, comprehensive condition ratio and the like; the method comprises the steps of obtaining enterprise dimension indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity, generating dimension training vectors after data preprocessing and feature quantization are conducted on the indexes, calling each dimension clustering analysis model, dividing enterprise dimension classification clusters, forming enterprise dimension clustering analysis model labels according to the enterprise classification clusters, and mainly comprising the fact that an enterprise has a certain scale, an enterprise development starting period, the enterprise is stable, the enterprise technological innovation capacity is strong, and the like. An enterprise portrait label automatic generation module is established based on the self label, the comprehensive clustering model label and the dimension clustering model label, enterprise information is input to obtain self data, comprehensive evaluation indexes and dimension evaluation indexes of the enterprise, the fusion clustering model is called to generate each model label, and the enterprise portrait is automatically generated.
The above embodiments are only specific cases of the present invention, and the scope of the present invention includes but is not limited to the above embodiments, and any suitable changes or substitutions that are required by a person of ordinary skill in the art and in accordance with the claims of the method for constructing a representation of a small micro enterprise of the present invention shall fall within the scope of the present invention.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. A method for constructing a small and micro enterprise image is characterized by comprising the following steps:
s1, establishing a standard database through data aggregation and fusion;
s2, establishing an enterprise portrait label system;
s3, establishing an enterprise comprehensive evaluation and dimension evaluation index system;
s4, forming a clustering model mold entering index by the characteristic engineering;
and S5, establishing a fusion clustering analysis model.
2. The method for constructing the image of the small and micro enterprise according to claim 1, wherein in step S1, the multi-source data covering multiple departments of the government and the third party are fused and converged by big data ETL technology, and the data are stored in the standard database after being subjected to noise removal, data alignment and data redundancy removal.
3. The method of claim 2, wherein in step S2, the enterprise portrait tags include enterprise-owned data tags and enterprise model tags, wherein the enterprise-owned tags are derived from owned data in a standard database, and the enterprise model tags are generated mainly by a cluster analysis method, the enterprise comprehensive assessment indicators are generated by enterprise multi-source data, the comprehensive cluster analysis model is invoked to generate comprehensive model prediction tags, and the enterprise background, enterprise stability, enterprise business capability, enterprise development capability, and technological innovation capability are generated by enterprise multi-source data to generate five-dimensional indicators in total, and the dimensional cluster analysis model is invoked to generate dimensional model prediction tags.
4. The method for constructing an image of a small business as claimed in claim 3, wherein in step S3, the enterprise standard data table in the enterprise standard database extracts the enterprise index, and the enterprise comprehensive evaluation index includes five primary dimensions in total, namely, the enterprise background, the enterprise stability, the enterprise business capability, the enterprise development capability, and the enterprise technological innovation capability.
5. The method for constructing the small and micro enterprise image of claim 4, wherein in step S4, enterprise multi-source data is subjected to exploratory data analysis and data cleaning based on the indexes formed in step S3, and finally, model entering features required by the training of the fusion clustering model are formed.
6. The method for constructing the image of the small and micro enterprise according to claim 5, wherein the exploratory data analysis is used for carrying out simple description statistics on generated indexes, carrying out simple statistical analysis on the data, carrying out data segmentation on specific index data, carrying out deep analysis on dynamic change conditions of the data and value taking conditions under a certain specific condition, and carrying out visual analysis on the indexes of the model entering sample cases by drawing a histogram curve of a univariate and a relation curve of the univariate and a target variable.
The data cleaning firstly processes invalid values in the indexes, numerically quantizes partial quantifiable indexes, then carries out missing value statistics on the in-mold indexes, removes training indexes with missing values larger than 80%, carries out statistics on the same value rate of the rest indexes, removes the characteristic that the attribute only has one value, and removes the indexes with the same value rate of the attribute larger than 85%; performing VIF (visual aid factor) collinearity analysis on the evaluation indexes subjected to the missing homonymy filtering, and removing relevant characteristics to obtain a plurality of residual mould-entering indexes; default missing values in the multiple mold-entering indexes are filled with 0 values, and the training sample subjected to data cleaning and missing value filling is subjected to Z-Score standardization processing to form a standardized training vector.
7. The method for constructing the small and micro enterprise image according to claim 6, wherein in step S5, a kmeans clustering analysis method is adopted to perform clustering modeling on the enterprise comprehensive evaluation index, a Calinski-Harabasz measurement method is adopted to determine a K value, a stability analysis method and a clustering effect analysis method are adopted to evaluate a clustering effect, and a comprehensive clustering analysis model is established;
based on the dimension indexes of five dimensions of the enterprise, enterprise clustering analysis of each dimension is carried out by adopting a kmeans clustering method respectively, and a clustering analysis model of each dimension is formed.
8. The method for constructing the small and micro enterprise image according to claim 7, wherein when the K value is determined by the Calinski-Harabasz measurement method, the clustering effect is better when the CHI score value is larger. And the K value is subjected to kmeans clustering analysis by taking a value in an interval of 1-10, a clustering analysis result graph is drawn, CH measurement index values under different K values are sequentially calculated, and a K value result with the optimal clustering effect is selected by combining a visual result graph of the clustering analysis and the different values of the CH.
9. The method for constructing the small and micro enterprise image according to claim 8, characterized in that after a kmeans clustering algorithm is selected as the optimal clustering effect, a clustering parameter random _ state is not set in the modeling process, an optimal K value is determined according to a CH value, kmeans clustering is continuously executed for 3-10 times, and whether the distribution situation of samples in each cluster greatly fluctuates after each clustering is observed. By observing the clustering effect of 3-10 cycles, the distribution value sequence of each cluster is random, but the ratio of the samples in each cluster is relatively fixed, which indicates that the method is suitable for the current data set by selecting a kmeans clustering algorithm for clustering analysis based on the existing characteristics, training samples and determining K value.
10. The method for constructing the image of the small micro-enterprise according to claim 8, wherein all indexes of five first-dimension enterprises in total, such as enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity, are subjected to feature preprocessing, feature screening, deletion equivalence exceeding a threshold value and useless index removal, and then a plurality of remaining in-mold indexes are summed, the plurality of in-mold indexes are used as comprehensive evaluation indexes of the enterprises, a kmeans cluster analysis model is established based on the comprehensive evaluation indexes of the enterprises, an optimal K value is determined through CH measurement, and the clustering effect is evaluated through stability analysis of clustering effect and visual analysis of clustering result clusters;
screening indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity generate dimension training vectors after feature preprocessing and feature quantization, based on enterprise evaluation indexes of all dimensions, kmeans cluster analysis models are respectively established, an optimal K value is determined through CH measurement, cluster effect stability analysis and cluster result cluster visual analysis are carried out to evaluate the cluster effect, dimension cluster analysis models of five dimensions in total are generated, and an enterprise portrait fusion cluster analysis model is formed by fusing an enterprise comprehensive evaluation model and the five-dimension cluster analysis models of the enterprise;
the method comprises the steps of obtaining enterprise dimension indexes of five dimensions, generating dimension training vectors after data preprocessing and characteristic quantization are carried out on the indexes, calling each dimension clustering analysis model, dividing enterprise dimension classification clusters, forming enterprise dimension clustering analysis model labels according to the enterprise classification clusters, establishing an enterprise portrait label automatic generation module based on self labels, comprehensive clustering model labels and dimension clustering model labels, inputting enterprise information to obtain self data, comprehensive evaluation indexes and dimension evaluation indexes of the enterprises, calling a fusion clustering model to generate each model label, and automatically generating enterprise portraits.
CN202110979314.0A 2021-08-25 2021-08-25 Image construction method for small and micro enterprises Active CN113837859B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110979314.0A CN113837859B (en) 2021-08-25 2021-08-25 Image construction method for small and micro enterprises

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110979314.0A CN113837859B (en) 2021-08-25 2021-08-25 Image construction method for small and micro enterprises

Publications (2)

Publication Number Publication Date
CN113837859A true CN113837859A (en) 2021-12-24
CN113837859B CN113837859B (en) 2024-05-14

Family

ID=78961216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110979314.0A Active CN113837859B (en) 2021-08-25 2021-08-25 Image construction method for small and micro enterprises

Country Status (1)

Country Link
CN (1) CN113837859B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113988726A (en) * 2021-12-28 2022-01-28 江苏荣泽信息科技股份有限公司 Enterprise industry credit evaluation management system based on block chain
CN114462516A (en) * 2022-01-21 2022-05-10 天元大数据信用管理有限公司 Enterprise credit score sample labeling method and device
CN116304974A (en) * 2023-02-17 2023-06-23 国网浙江省电力有限公司营销服务中心 Multi-channel data fusion method and system

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100205108A1 (en) * 2009-02-11 2010-08-12 Mun Johnathan C Credit and market risk evaluation method
CN103294828A (en) * 2013-06-25 2013-09-11 厦门市美亚柏科信息股份有限公司 Verification method and verification device of data mining model dimension
CN107563929A (en) * 2017-07-27 2018-01-09 杭州中奥科技有限公司 A kind of various dimensions siren based on personage's specificity analysis
CN107993143A (en) * 2017-11-23 2018-05-04 深圳大管加软件与技术服务有限公司 A kind of Credit Risk Assessment method and system
CN110322089A (en) * 2018-03-30 2019-10-11 宗略投资(上海)有限公司 Enterprise Credit Risk Evaluation method and its system
CN110990474A (en) * 2019-11-28 2020-04-10 泰华智慧产业集团股份有限公司 Regional industry image analysis method and device
CN111047122A (en) * 2018-10-11 2020-04-21 北京国双科技有限公司 Enterprise data maturity evaluation method and device and computer equipment
CN111680073A (en) * 2020-06-11 2020-09-18 天元大数据信用管理有限公司 Financial service platform policy information recommendation method based on user data
CN111737477A (en) * 2020-08-07 2020-10-02 杭州六棱镜知识产权科技有限公司 Intellectual property big data-based intelligence investigation method, system and storage medium
CN111754116A (en) * 2020-06-24 2020-10-09 国家电网有限公司大数据中心 Credit assessment method and device based on label portrait technology
CN111861262A (en) * 2020-07-30 2020-10-30 国网山东省电力公司寿光市供电公司 Enterprise perspective portrait method and terminal based on energy big data
CN112396430A (en) * 2020-11-09 2021-02-23 中国南方电网有限责任公司 Processing method and system for enterprise evaluation
CN112395500A (en) * 2020-11-17 2021-02-23 平安科技(深圳)有限公司 Content data recommendation method and device, computer equipment and storage medium
CN112435152A (en) * 2020-12-04 2021-03-02 北京师范大学 Online learning investment dynamic evaluation method and system
CN112668945A (en) * 2021-01-27 2021-04-16 天元大数据信用管理有限公司 Enterprise credit risk assessment method and device

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100205108A1 (en) * 2009-02-11 2010-08-12 Mun Johnathan C Credit and market risk evaluation method
CN103294828A (en) * 2013-06-25 2013-09-11 厦门市美亚柏科信息股份有限公司 Verification method and verification device of data mining model dimension
CN107563929A (en) * 2017-07-27 2018-01-09 杭州中奥科技有限公司 A kind of various dimensions siren based on personage's specificity analysis
CN107993143A (en) * 2017-11-23 2018-05-04 深圳大管加软件与技术服务有限公司 A kind of Credit Risk Assessment method and system
CN110322089A (en) * 2018-03-30 2019-10-11 宗略投资(上海)有限公司 Enterprise Credit Risk Evaluation method and its system
CN111047122A (en) * 2018-10-11 2020-04-21 北京国双科技有限公司 Enterprise data maturity evaluation method and device and computer equipment
CN110990474A (en) * 2019-11-28 2020-04-10 泰华智慧产业集团股份有限公司 Regional industry image analysis method and device
CN111680073A (en) * 2020-06-11 2020-09-18 天元大数据信用管理有限公司 Financial service platform policy information recommendation method based on user data
CN111754116A (en) * 2020-06-24 2020-10-09 国家电网有限公司大数据中心 Credit assessment method and device based on label portrait technology
CN111861262A (en) * 2020-07-30 2020-10-30 国网山东省电力公司寿光市供电公司 Enterprise perspective portrait method and terminal based on energy big data
CN111737477A (en) * 2020-08-07 2020-10-02 杭州六棱镜知识产权科技有限公司 Intellectual property big data-based intelligence investigation method, system and storage medium
CN112396430A (en) * 2020-11-09 2021-02-23 中国南方电网有限责任公司 Processing method and system for enterprise evaluation
CN112395500A (en) * 2020-11-17 2021-02-23 平安科技(深圳)有限公司 Content data recommendation method and device, computer equipment and storage medium
CN112435152A (en) * 2020-12-04 2021-03-02 北京师范大学 Online learning investment dynamic evaluation method and system
CN112668945A (en) * 2021-01-27 2021-04-16 天元大数据信用管理有限公司 Enterprise credit risk assessment method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JIE DU等: "Research on Accurate Marketing Modeling of User Portrait Based on Bid Data", 2018 INTERNATIONAL COMPUTERS, SIGNALS AND SYSTEMS CONFERENCE (ICOMSSC) *
T. ALIX等: "Enterprise modelling techniques to help manufacturing firms develop product service activities", IFAC PROCEEDINGS VOLUMES, vol. 42, no. 4, pages 1637 - 1642 *
林伟等: "企业信息化的灰聚类评价模型及应用", 科技进步与对策, vol. 20, no. 06, pages 129 - 130 *
黄晓斌等: "融合多源数据的企业竞争对手画像构建", 现代情报, vol. 40, no. 11, pages 13 - 21 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113988726A (en) * 2021-12-28 2022-01-28 江苏荣泽信息科技股份有限公司 Enterprise industry credit evaluation management system based on block chain
CN114462516A (en) * 2022-01-21 2022-05-10 天元大数据信用管理有限公司 Enterprise credit score sample labeling method and device
CN114462516B (en) * 2022-01-21 2024-04-16 天元大数据信用管理有限公司 Enterprise credit scoring sample labeling method and device
CN116304974A (en) * 2023-02-17 2023-06-23 国网浙江省电力有限公司营销服务中心 Multi-channel data fusion method and system
CN116304974B (en) * 2023-02-17 2023-09-29 国网浙江省电力有限公司营销服务中心 Multi-channel data fusion method and system

Also Published As

Publication number Publication date
CN113837859B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
CN113837859B (en) Image construction method for small and micro enterprises
CN112017025A (en) Enterprise credit assessment method based on fusion of deep learning and logistic regression
Utari et al. Implementation of data mining for drop-out prediction using random forest method
CN114444986A (en) Product analysis method, system, device and medium
CN112182246A (en) Method, system, medium, and application for creating an enterprise representation through big data analysis
CN109345133B (en) Review method based on big data and deep learning and robot system
CN114757432A (en) Future execution activity and time prediction method and system based on flow log and multi-task learning
CN113886372A (en) User portrait construction method based on improved analytic hierarchy process
CN113326862A (en) Audit big data fusion clustering and risk data detection method, medium and equipment
CN113537807A (en) Enterprise intelligent wind control method and device
CN112330441A (en) Method for evaluating business value credit loan of medium and small enterprises
CN115238197A (en) Expert thinking model-based field business auxiliary analysis method
CN116468536A (en) Automatic risk control rule generation method
CN111709225A (en) Event cause and effect relationship judging method and device and computer readable storage medium
CN109543038B (en) Emotion analysis method applied to text data
CN112200684B (en) Method, system and storage medium for detecting medical insurance fraud
Scherger et al. A systematic overview of the prediction of business failure
CN115860924A (en) Supply chain financial credit risk early warning method and related equipment
CN113869423A (en) Marketing response model construction method, equipment and medium
CN115375456A (en) Data processing method, device, equipment and medium for credit risk assessment
Yao Application of data mining technology in financial fraud identification
CN114626940A (en) Data analysis method and device and electronic equipment
CN114817557A (en) Enterprise risk detection method and device based on enterprise credit investigation big data knowledge graph
CN114417011A (en) Inspection service portrait knowledge fusion method and device, electronic equipment and storage medium
Kulothungan Loan Forecast by Using Machine Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant