CN113837859B - Image construction method for small and micro enterprises - Google Patents

Image construction method for small and micro enterprises Download PDF

Info

Publication number
CN113837859B
CN113837859B CN202110979314.0A CN202110979314A CN113837859B CN 113837859 B CN113837859 B CN 113837859B CN 202110979314 A CN202110979314 A CN 202110979314A CN 113837859 B CN113837859 B CN 113837859B
Authority
CN
China
Prior art keywords
enterprise
data
indexes
clustering
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110979314.0A
Other languages
Chinese (zh)
Other versions
CN113837859A (en
Inventor
尹盼盼
边松华
崔乐乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyuan Big Data Credit Management Co Ltd
Original Assignee
Tianyuan Big Data Credit Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyuan Big Data Credit Management Co Ltd filed Critical Tianyuan Big Data Credit Management Co Ltd
Priority to CN202110979314.0A priority Critical patent/CN113837859B/en
Publication of CN113837859A publication Critical patent/CN113837859A/en
Application granted granted Critical
Publication of CN113837859B publication Critical patent/CN113837859B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Educational Administration (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Finance (AREA)
  • Artificial Intelligence (AREA)
  • Accounting & Taxation (AREA)
  • Technology Law (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of financial credit, in particular to a small micro enterprise portrait construction method, which comprises the following steps: s1, establishing a standard database by data convergence and fusion; s2, establishing an enterprise portrait tag system; s3, establishing an enterprise comprehensive evaluation and dimension evaluation index system; s4, feature engineering forms a clustering model entering index; s5, establishing a fusion cluster analysis model. Compared with the prior art, the method and the system have the advantages that based on enterprise multi-source data fusion, the operations such as data merging, data alignment and data fusion are performed on multi-source data, an enterprise portrait tag system, an enterprise comprehensive evaluation and dimension evaluation index system are established on the basis of multi-source data fusion, enterprise portrait dimensions are richer, evaluation indexes are more comprehensive, and the defect that a single data source covers portrait evaluation dimensions more on one side is overcome.

Description

Image construction method for small and micro enterprises
Technical Field
The invention relates to the field of financial credit, and particularly provides a small micro enterprise portrait construction method.
Background
With the application of technologies such as big data, machine learning and artificial intelligence, the service mode, service form, management operation mode and the like of the traditional financial institutions are revolutionarily changed, and the financial technology is rapidly developed, wherein the big data and artificial intelligence technology is one of important application technologies of the financial technology. Aiming at the 'short, small, frequent and urgent' demand of financing of small micro-enterprise objects, based on multi-source data covered by the small micro-enterprise, the establishment of an intelligent wind control system which runs through the whole credit flow before, during and after the credit is one of the mainstream business modes.
The comprehensive interpretation of the enterprise is provided before the loan, so that the bank establishes preliminary knowledge of the enterprise, timely reflects related risks of the enterprise in the loan, establishes timely control of the enterprise management and development status risk points by the bank, and facilitates timely operation of drop interests and adjustment of loan products implemented by the enterprise by the bank to timely control the risks.
However, in the prior art, the intelligent wind control system cannot accurately read the behavior characteristics of malicious loan application user enterprises, and the indexes of evaluation characteristics are less and incomplete.
Disclosure of Invention
The invention aims at the defects of the prior art and provides a small micro enterprise portrait construction method with strong practicability.
The technical scheme adopted for solving the technical problems is as follows:
a small micro enterprise portrait construction method comprises the following steps:
s1, establishing a standard database by data convergence and fusion;
S2, establishing an enterprise portrait tag system;
s3, establishing an enterprise comprehensive evaluation and dimension evaluation index system;
s4, feature engineering forms a clustering model entering index;
s5, establishing a fusion cluster analysis model.
Further, in step S1, multi-source data covering multiple departments of the government and third parties are fused and aggregated by the big data ETL technology, and the data is stored in a standard database after noise removal, data alignment and data redundancy removal.
Further, in step S2, the enterprise portrait tag includes an enterprise owned data tag and an enterprise model tag, where the enterprise owned tag is derived from owned data in a standard database, and the enterprise model tag is mainly generated by a cluster analysis method, an enterprise comprehensive evaluation index is generated by enterprise multi-source data, a comprehensive cluster analysis model is called to generate a comprehensive model prediction tag, five dimension indexes of enterprise background, enterprise stability, enterprise operation capability, enterprise development capability and technological innovation capability are generated by enterprise multi-source data, and a dimension cluster analysis model is called to generate a dimension model prediction tag.
Further, in step S3, the enterprise standard data table in the enterprise standard database extracts enterprise indexes, and the enterprise comprehensive evaluation indexes include five primary dimensions in total, including enterprise context, enterprise stability, enterprise operation capability, enterprise development capability, and enterprise technological innovation capability.
Further, in step S4, the enterprise multisource data is subjected to exploratory data analysis and data cleaning based on the indexes formed in step S3, and finally the model entering features required by the fusion cluster model training are formed.
Furthermore, the exploratory data analysis is used for carrying out simple descriptive statistics on the generated index, carrying out simple statistical analysis on the data, then carrying out data segmentation on specific index data, carrying out deep analysis on the dynamic change condition of the data and the value condition under a specific condition, and carrying out visual analysis on the model entering index by drawing a histogram curve of a single variable and a relation curve of the single variable and a target variable.
The data cleaning comprises the steps of firstly processing invalid values in indexes, carrying out numerical quantization on part of quantifiable indexes, then carrying out missing value statistics on the modeling indexes, removing training indexes with missing values larger than 80%, carrying out statistics on the same value rate aiming at the rest indexes, removing the characteristic that the attribute has only one value, and removing the indexes with the same value rate of the attribute larger than 85%; performing VIF collinearity analysis on the evaluation indexes subjected to missing same-value filtering, and removing a plurality of residual modeling indexes after relevant features; the missing values in the multiple modulus indexes are filled with 0 value by default, and Z-Score standardization processing is carried out on the training samples filled with the missing values through data cleaning, so that standardized training vectors are formed.
Further, in step S5, a kmeans cluster analysis method is adopted to perform cluster modeling on enterprise comprehensive evaluation indexes, a Calinski-Harabasz measurement method is adopted to determine a K value, a stability analysis method and a cluster effect analysis method are adopted to evaluate the cluster effect, and a comprehensive cluster analysis model is established;
And carrying out enterprise cluster analysis of each dimension by adopting a kmeans clustering method based on the dimension indexes of the five dimensions of the enterprise to form a cluster analysis model of each dimension.
Further, when the K value is determined by Calinski-Harabasz measurement method, the larger the CHI score value is, the better the clustering effect is. And (3) carrying out kmeans cluster analysis on the values in the K value interval of 1-10, drawing a cluster analysis result graph, sequentially calculating CH metric index values under different K values, and selecting a K value result with the optimal clustering effect by combining the visual result graph of the cluster analysis and the different values of CH.
Further, after a kmeans clustering algorithm is selected as an optimal clustering effect, a clustering parameter random_state in a modeling process is not set, an optimal K value is determined according to a CH value, kmeans clustering is continuously carried out for 3-10 times, and whether the distribution condition of samples in each cluster is greatly fluctuated or not after each clustering is observed. The clustering effect of 3-10 cycles is observed, the distribution value sequence of each cluster is random, but the duty ratio of samples in each cluster is relatively fixed, which indicates that the cluster analysis is suitable for the current data set by selecting kmeans clustering algorithm based on the existing characteristics, training samples and determining K value.
Further, the enterprise background, the enterprise stability, the enterprise operation capability, the enterprise development capability and the enterprise technological innovation capability are summed up to all indexes of the five primary dimension enterprises, after the deletion same value is removed by feature pretreatment and the deletion same value exceeds a threshold value, useless indexes are removed, a plurality of modeling indexes are summed up, the modeling indexes are used as comprehensive evaluation indexes of the enterprises, a kmeans cluster analysis model is established based on the comprehensive evaluation indexes of the enterprises, an optimal K value is determined through CH measurement, and the clustering effect is evaluated through cluster effect stability analysis and cluster result cluster visualization analysis;
screening indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capability, enterprise development capability and enterprise technological innovation capability are subjected to feature pretreatment and feature quantification to generate dimension training vectors, a kmeans cluster analysis model is respectively established based on enterprise evaluation indexes of each dimension, an optimal K value is determined through CH measurement, clustering effects are evaluated through cluster effect stability analysis and cluster result cluster visualization analysis, a dimension cluster analysis model of five dimensions in total is generated, and the comprehensive evaluation model of an enterprise and the dimension cluster analysis model of the five dimensions of the enterprise are fused to form an enterprise portrait fusion cluster analysis model;
Acquiring enterprise dimension indexes of five dimensions, generating dimension training vectors after carrying out data preprocessing and feature quantization on the indexes, calling each dimension clustering analysis model, dividing enterprise dimension classification clusters, forming enterprise dimension clustering analysis model labels according to the enterprise classification clusters, establishing an enterprise portrait label automatic generation module based on own labels, comprehensive clustering model labels and dimension clustering model labels, inputting enterprise information to acquire enterprise owned data, comprehensive evaluation indexes and dimension evaluation indexes, calling a fusion clustering model to generate each model label, and automatically generating enterprise portraits.
Compared with the prior art, the image construction method for the small and micro enterprises has the following outstanding beneficial effects:
1. Compared with an enterprise portrait assessment method based on a single data source, the enterprise portrait assessment method based on enterprise multisource data fusion performs operations such as data merging, data alignment and data fusion on multisource data, establishes an enterprise portrait label system, an enterprise comprehensive assessment and a dimension assessment index system on the basis of multisource data fusion, is richer in enterprise portrait dimension and more comprehensive in assessment index, and overcomes the defect that the single data source covers a portrait assessment dimension.
2. Compared with the enterprise portrait modeling method based on the supervised classification method, the cluster analysis method can be used for deeply analyzing the distribution condition of enterprises under the conditions of lack of enterprise identification and inaccurate enterprise identification, so that the group division of small and micro enterprises is realized, the realization category and realization scene for establishing enterprise portraits based on high-dimensional characteristics and massive training samples are expanded, and the application range of the method is wider.
3. The simple kmeans cluster analysis method is improved, the fusion clustering method is applied to enterprise portrait label construction, and the power-assisted enterprise credit service expands the application scene of the financial science and technology in the credit field and enriches the content of the financial science and technology.
4. Along with the convergence of mass data of enterprises, the introduction of an artificial intelligent wind control modeling method, the continuous enrichment of enterprise portrait construction indexes, the lack of the increase of training sample identification scenes and the fusion of various algorithms, the method provided by the invention can be more suitable for wind control modeling of mass data of enterprises with large data, is particularly suitable for wind control model construction under the condition of no label, and has extremely wide application prospect.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow diagram of a method for constructing images of small micro enterprises;
FIG. 2 is a schematic diagram of a small micro enterprise portrayal construction method for establishing an enterprise portrayal label system and a comprehensive evaluation and dimension evaluation index system;
FIG. 3 is a diagram of an example application scenario in a method for constructing images of small and micro enterprises.
Detailed Description
In order to provide a better understanding of the aspects of the present invention, the present invention will be described in further detail with reference to specific embodiments. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
A preferred embodiment is given below:
As shown in fig. 1-3, in the method for constructing a small micro enterprise portrait in this embodiment, a cluster analysis method in unsupervised learning is adopted to construct the enterprise portrait. The multi-source data covering multiple departments and third parties of the government are fused and converged through a big data ETL technology, and the data are stored in a standard database after being processed by noise removal, data alignment, data redundancy removal and the like; screening and sorting small micro enterprise data in a standard database, and establishing a label system of enterprise portraits, wherein the enterprise portraits labels mainly comprise enterprise self-data labels and enterprise model labels; the enterprise data in the standard database is subjected to operations such as data cleaning, feature preprocessing and the like, one part of the enterprise data is directly used as an own tag of an enterprise, and the other part of the enterprise data is used as an unsupervised training sample for carrying out the next clustering modeling after the feature preprocessing and the standardization processing; respectively establishing a comprehensive feature model and a grouping clustering model based on a kmeans clustering method, determining a K value, a clustering effect analysis and the like through a stability analysis and Calinski-Harabasz measurement method to form a final clustering model; predicting enterprise classification clusters according to the established fusion cluster model of the fusion comprehensive clusters and the grouping clusters to form enterprise cluster model labels; and establishing a small micro enterprise portrait tag based on the model tag and the own data tag, externally inputting enterprise information to read original data processing pretreatment of enterprises, and calling a fusion clustering model to predict an enterprise classification cluster to automatically generate the enterprise portrait tag.
The specific steps are as follows:
s1, establishing a standard database by data convergence and fusion
The enterprise multi-source data cover enterprise government data comprise information such as business, public accumulation, social security, issuing and modifying commission, banking and protecting supervision, administrative punishment and the like, the enterprise internet data comprise information such as electronic commerce data, marketing information, identification information, online store information, legal litigation, trust loss execution, bidding and the like, and the enterprise third party data comprise information such as enterprise business information, personnel relationship data and the like; firstly, establishing a unified data standard specification to perform standardization management on multi-source data in storage; secondly, processing multi-source data through ETL and other data processing tools, regularly pulling storable data such as Internet data, processing real-time interface data through a memory, and carrying out data processing, data standardization, index calculation, light feature mining and the like on the data by combining a batch processing mode; and finally, fusing and converging the three-party multi-source data into a unified data warehouse through transverse and longitudinal data fusion, wherein the data warehouse stores information such as standard library data, index libraries obtained through processing, feature libraries and the like after the multi-source data fusion.
S2, establishing an enterprise portrait label system
And (3) combing all data sources covered by the enterprise in the standard database, and establishing an enterprise portrait label system, wherein the enterprise portrait labels comprise enterprise owned data labels and enterprise model labels. The enterprise self-owned labels are derived from the self data in the standard database and mainly comprise basic information of the enterprise such as the establishment date, registered capital, enterprise type and the number of incumbent persons of the enterprise; the rewards and punishments of enterprises such as market-long quality rewards enterprises, name plate product title enterprises, contract-keeping and reckoning enterprises, special fine new and medium-sized enterprises, gazelle enterprises, scientific and innovative enterprises and the like; the tax identification information of the enterprise, such as class A tax payers of the enterprise, class A tax credit level of the latest tax of the enterprise, and the like; negative information of the enterprise, such as the latest tax credit rating of the enterprise is grade C or grade D, whether the enterprise has been revoked, abnormal enterprise operation, serious illegal enterprise listed by the enterprise, serious tax illegal enterprise of the enterprise, and the like. The enterprise model labels are mainly generated by a cluster analysis method; generating enterprise comprehensive evaluation indexes through enterprise multi-source data, calling a comprehensive cluster analysis model to generate a comprehensive model prediction label, generating dimension indexes of 5 dimensions in total through enterprise multi-source data, namely enterprise background, enterprise stability, enterprise business capability, enterprise development capability and technological innovation capability, and calling a dimension cluster analysis model to generate a dimension model prediction label.
S3, establishing an enterprise comprehensive evaluation and dimension evaluation index system
Based on an enterprise standard data table in an enterprise standard database, extracting enterprise indexes, wherein the enterprise comprehensive evaluation indexes comprise five primary dimensions of enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity, and the enterprise background mainly comprises 20 basic indexes in total such as establishment time, registered capital, number of practitioners and the like of an enterprise; the enterprise stability covers the enterprise industry and commerce changes, tax changes, legal changes and the like and counts 20 basic indexes altogether; the enterprise operation capability covers a total of 200 indexes of six secondary dimensions such as management capability, debt repayment capability, repayment willingness, operation capability, profit capability, enterprise qualification and the like of the enterprise; the development potential of the enterprise comprises 50 indexes in total of two dimensions of the development capability and the innovation capability of the enterprise, and the technological innovation capability of the enterprise comprises 10 secondary dimensions in total of the patent number, the soft book, the intellectual property and the like of the enterprise; a total of 300 indexes of five primary dimensions of the enterprise jointly form a comprehensive evaluation index of the enterprise.
S4, forming a clustering model in-model index by characteristic engineering
A total of 300 indexes formed based on enterprise multisource data extraction are required to be subjected to multiple processes such as exploratory data analysis and data cleaning to finally form model entering features required by fusion cluster model training.
The exploratory data analysis mainly comprises the steps of carrying out simple descriptive statistics on more than 300 indexes, analyzing variances, mean values, median values, data distribution and the like of the indexes, carrying out simple statistical analysis on the data, carrying out data segmentation on specific index data, and carrying out deep analysis on dynamic change conditions of the data and value taking conditions under a specific condition; and carrying out visual analysis on the model entering sample indexes by drawing a histogram curve of the univariate, a relation curve of the univariate and the target variable and the like.
Firstly, processing invalid values in the indexes, and carrying out numerical quantization on part of quantifiable indexes; carrying out missing value statistics on the modeling indexes, and removing training indexes with missing values greater than 80%; counting the same value rate of the residual indexes, removing the characteristic that the attribute has only one value, and removing the indexes with the same value rate of the attribute being more than 85 percent; performing VIF collinearity analysis on the evaluation indexes subjected to the missing same-value filtering, and removing the residual 20 modeling indexes after relevant features; the missing values in the 20 modulus indexes are filled with 0 value by default, and Z-Score standardization processing is carried out on the training samples filled with the missing values through data cleaning, so that standardized training vectors are formed.
S5, establishing a fusion cluster analysis model
Carrying out cluster modeling on the enterprise comprehensive evaluation index by adopting a kmeans cluster analysis method, determining a K value by adopting a Calinski-Harabasz measurement method, evaluating a cluster effect by adopting a stability analysis and cluster effect analysis method, and establishing a comprehensive cluster analysis model; and carrying out enterprise cluster analysis of each dimension by adopting a kmeans clustering method based on the dimension indexes of the five dimensions of the enterprise to form a cluster analysis model of each dimension.
When the K value is determined by Calinski-Harabasz measuring method, a plurality of K value determining methods are adopted in the kmeans clustering method, the K value is determined by adopting Calinski-Harabasz measuring method, and the clustering effect is better as the CHI score value is larger. And (3) carrying out kmeans cluster analysis on the values in the K value interval of 1-10, drawing a cluster analysis result graph, sequentially calculating CH metric index values under different K values, and selecting a K value result with the optimal clustering effect by combining the visual result graph of the cluster analysis and the different values of CH.
After a kmeans clustering algorithm is selected as an optimal clustering effect, a clustering parameter random_state in a modeling process is not set, an optimal K value is determined according to a CH value, kmeans clustering is continuously carried out for 5 times, and whether the distribution condition of samples in each cluster is greatly fluctuated or not after each clustering is observed. Through observing the clustering effect of 5 times of circulation, the distribution value sequence of each cluster is random, but the duty ratio of samples in each cluster is relatively fixed, which indicates that the cluster analysis is suitable for the current data set by selecting kmeans clustering algorithm based on the existing characteristics, training samples and determining K value.
Establishing a fusion clustering model:
all indexes of the enterprise with five primary dimensions, which are covered by the enterprise background, the enterprise stability, the enterprise operation capability, the enterprise development capability and the enterprise technological innovation capability, are subjected to feature pretreatment, feature screening, deletion and same value removal exceeding a threshold value, and useless indexes are removed, and then the total number of the indexes is 25, wherein the 25 indexes are used as comprehensive evaluation indexes of the enterprise, a kmeans cluster analysis model is established based on the comprehensive evaluation indexes of the enterprise, an optimal K value is determined through CH measurement, and the clustering effect is evaluated through cluster effect stability analysis and cluster result cluster visualization analysis.
Screening indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capability, enterprise development capability and enterprise science and technology innovation capability are subjected to feature pretreatment and feature quantification to generate dimension training vectors, a kmeans cluster analysis model is respectively established based on enterprise evaluation indexes of each dimension, an optimal K value is determined through CH measurement, clustering effects are evaluated through clustering effect stability analysis and clustering result cluster visualization analysis, and a dimension cluster analysis model of five dimensions in total is generated. And the comprehensive evaluation model of the enterprise is fused with the dimension cluster analysis model of the five dimensions of the enterprise to form an enterprise portrait fusion cluster analysis model.
Generating an enterprise portrait tag:
The enterprise portrait tag comprises an enterprise own data tag and a clustering model analysis tag, wherein enterprise original data fields stored in a standard database are subjected to data preprocessing and data quantization to form a standardized tag format which is used as the own tag of the enterprise portrait to be generated; acquiring corresponding indexes of an enterprise comprehensive clustering model, performing data preprocessing and feature quantization on the indexes, generating training vectors, calling the comprehensive clustering analysis model, dividing enterprise classification clusters, and forming enterprise comprehensive clustering analysis model labels according to the enterprise classification clusters, wherein the label information comprises label information such as good, general and high-quality enterprise credit conditions, enterprise comprehensive conditions, comprehensive condition occupation ratio and the like; the method comprises the steps of obtaining enterprise dimension indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capability, enterprise development capability and enterprise technological innovation capability, generating dimension training vectors after data preprocessing and feature quantification aiming at the indexes, calling each dimension clustering analysis model, dividing enterprise dimension classification clusters, forming enterprise dimension clustering analysis model labels according to the enterprise classification clusters, and mainly comprising a certain scale of enterprises, an enterprise development start period, relatively stable enterprises, strong enterprise technological innovation capability and the like. And establishing an enterprise portrait label automatic generation module based on the owned labels, the comprehensive clustering model labels and the dimension clustering model labels, inputting enterprise information to acquire enterprise owned data, comprehensive evaluation indexes and dimension evaluation indexes, calling a fusion clustering model to generate each model label, and automatically generating enterprise portraits.
The above specific embodiments are merely illustrative of specific cases of the present invention, and the scope of the present invention includes, but is not limited to, the above specific embodiments, and any suitable modification or replacement made by one of ordinary skill in the art, which is in accordance with the claims of the method for constructing a micro enterprise portrait of the present invention, shall fall within the scope of the present invention.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (1)

1. The method for constructing the image of the small micro-enterprise is characterized by comprising the following steps of:
s1, establishing a standard database by data convergence and fusion;
The multi-source data covering multiple departments and third parties of the government are fused and converged through a big data ETL technology, and the data are stored in a standard database after noise removal, data alignment and data redundancy removal;
S2, establishing an enterprise portrait tag system;
The enterprise portrait label comprises an enterprise owned data label and an enterprise model label, wherein the enterprise owned label is derived from owned data in a standard database, the enterprise model label is mainly generated through a cluster analysis method, enterprise comprehensive evaluation indexes are generated through enterprise multi-source data, a comprehensive cluster analysis model is called to generate a comprehensive model prediction label, five-dimension indexes of enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and technological innovation capacity are generated through the enterprise multi-source data, and a dimension cluster analysis model is called to generate a dimension model prediction label;
s3, establishing an enterprise comprehensive evaluation and dimension evaluation index system;
extracting enterprise indexes according to an enterprise standard data table in an enterprise standard database, wherein the enterprise comprehensive evaluation indexes comprise five primary dimensions of enterprise background, enterprise stability, enterprise operation capacity, enterprise development capacity and enterprise technological innovation capacity;
s4, feature engineering forms a clustering model entering index;
The enterprise multisource data is subjected to exploratory data analysis and data cleaning based on the indexes formed in the step S3, and finally the model entering features required by fusion cluster model training are formed;
the exploratory data analysis is used for carrying out simple descriptive statistics on the generated index, carrying out simple statistical analysis on the data, then carrying out data segmentation on specific index data, carrying out deep analysis on the dynamic change condition of the data and the value condition under a specific condition, and carrying out visual analysis on the model entering sample index by drawing a histogram curve of a single variable and a relation curve of the single variable and a target variable;
The data cleaning comprises the steps of firstly processing invalid values in indexes, carrying out numerical quantization on part of quantifiable indexes, then carrying out missing value statistics on the modeling indexes, removing training indexes with missing values larger than 80%, carrying out statistics on the same value rate aiming at the rest indexes, removing the characteristic that the attribute has only one value, and removing the indexes with the same value rate of the attribute larger than 85%; performing VIF collinearity analysis on the evaluation indexes subjected to missing same-value filtering, and removing a plurality of residual modeling indexes after relevant features; filling missing values in the multiple modulus indexes with 0 value by default, and performing Z-Score standardization processing on training samples filled with the missing values through data cleaning to form standardized training vectors;
s5, establishing a fusion cluster analysis model;
Carrying out cluster modeling on enterprise comprehensive evaluation indexes by adopting a kmeans cluster analysis method, determining a K value by adopting a Calinski-Harabasz measurement method, evaluating a cluster effect by adopting a stability analysis and cluster effect analysis method, and establishing a comprehensive cluster analysis model;
Carrying out enterprise cluster analysis of each dimension by adopting a kmeans clustering method based on dimension indexes of the five dimensions of the enterprise to form a cluster analysis model of each dimension;
When the Calinski-Harabasz measurement method determines the K value, the larger the CHI score value is, the better the clustering effect is; k values are taken to be 1-10 interval values for kmeans cluster analysis, a cluster analysis result graph is drawn, CH measurement index values under different K values are sequentially calculated, and K value results with optimal clustering effect are selected by combining the visual result graph of the cluster analysis and different values of CH;
After a kmeans clustering algorithm is selected as an optimal clustering effect, a clustering parameter random_state in a modeling process is not set, an optimal K value is determined according to a CH value, kmeans clustering is continuously carried out for 3-10 times, and whether the distribution condition of samples in each cluster is greatly fluctuated or not after each clustering is observed; the clustering effect of 3-10 times of circulation is observed, the distribution value sequence of each cluster is random, but the duty ratio of samples in each cluster is relatively fixed, so that the method is suitable for the current data set by selecting kmeans clustering algorithm for clustering analysis based on the existing characteristics, training samples and determining K values;
All indexes of the enterprise with five primary dimensions are summed up through characteristic pretreatment, characteristic screening, deletion and same value removal exceeding a threshold value and useless index removal, and then a plurality of modular indexes are summed up, the modular indexes are used as comprehensive evaluation indexes of the enterprise, a kmeans cluster analysis model is established based on the comprehensive evaluation indexes of the enterprise, an optimal K value is determined through CH measurement, and clustering effects are evaluated through clustering effect stability analysis and clustering result cluster visualization analysis;
Screening indexes of five dimensions of enterprise background, enterprise stability, enterprise operation capability, enterprise development capability and enterprise technological innovation capability are subjected to feature pretreatment and feature quantification to generate dimension training vectors, a kmeans cluster analysis model is respectively established based on enterprise evaluation indexes of each dimension, an optimal K value is determined through CH measurement, clustering effects are evaluated through clustering effect stability analysis and clustering result cluster visualization analysis, a dimension cluster analysis model of five dimensions in total is generated, and the comprehensive evaluation model of an enterprise and the dimension cluster analysis model of the five dimensions of the enterprise are fused to form an enterprise portrait fusion cluster analysis model;
acquiring enterprise dimension indexes of five dimensions, generating dimension training vectors after data preprocessing and feature quantization are performed on the indexes, calling each dimension clustering analysis model, dividing enterprise dimension classification clusters, forming enterprise dimension clustering analysis model labels according to the enterprise classification clusters, establishing an enterprise portrait label automatic generation module based on own labels, comprehensive clustering model labels and dimension clustering model labels, inputting enterprise information to acquire enterprise owned data, comprehensive evaluation indexes and dimension evaluation indexes, calling a fusion clustering model to generate each model label, and automatically generating enterprise portraits.
CN202110979314.0A 2021-08-25 2021-08-25 Image construction method for small and micro enterprises Active CN113837859B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110979314.0A CN113837859B (en) 2021-08-25 2021-08-25 Image construction method for small and micro enterprises

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110979314.0A CN113837859B (en) 2021-08-25 2021-08-25 Image construction method for small and micro enterprises

Publications (2)

Publication Number Publication Date
CN113837859A CN113837859A (en) 2021-12-24
CN113837859B true CN113837859B (en) 2024-05-14

Family

ID=78961216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110979314.0A Active CN113837859B (en) 2021-08-25 2021-08-25 Image construction method for small and micro enterprises

Country Status (1)

Country Link
CN (1) CN113837859B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113988726A (en) * 2021-12-28 2022-01-28 江苏荣泽信息科技股份有限公司 Enterprise industry credit evaluation management system based on block chain
CN114462516B (en) * 2022-01-21 2024-04-16 天元大数据信用管理有限公司 Enterprise credit scoring sample labeling method and device
CN116304974B (en) * 2023-02-17 2023-09-29 国网浙江省电力有限公司营销服务中心 Multi-channel data fusion method and system

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294828A (en) * 2013-06-25 2013-09-11 厦门市美亚柏科信息股份有限公司 Verification method and verification device of data mining model dimension
CN107563929A (en) * 2017-07-27 2018-01-09 杭州中奥科技有限公司 A kind of various dimensions siren based on personage's specificity analysis
CN107993143A (en) * 2017-11-23 2018-05-04 深圳大管加软件与技术服务有限公司 A kind of Credit Risk Assessment method and system
CN110322089A (en) * 2018-03-30 2019-10-11 宗略投资(上海)有限公司 Enterprise Credit Risk Evaluation method and its system
CN110990474A (en) * 2019-11-28 2020-04-10 泰华智慧产业集团股份有限公司 Regional industry image analysis method and device
CN111047122A (en) * 2018-10-11 2020-04-21 北京国双科技有限公司 Enterprise data maturity evaluation method and device and computer equipment
CN111680073A (en) * 2020-06-11 2020-09-18 天元大数据信用管理有限公司 Financial service platform policy information recommendation method based on user data
CN111737477A (en) * 2020-08-07 2020-10-02 杭州六棱镜知识产权科技有限公司 Intellectual property big data-based intelligence investigation method, system and storage medium
CN111754116A (en) * 2020-06-24 2020-10-09 国家电网有限公司大数据中心 Credit assessment method and device based on label portrait technology
CN111861262A (en) * 2020-07-30 2020-10-30 国网山东省电力公司寿光市供电公司 Enterprise perspective portrait method and terminal based on energy big data
CN112396430A (en) * 2020-11-09 2021-02-23 中国南方电网有限责任公司 Processing method and system for enterprise evaluation
CN112395500A (en) * 2020-11-17 2021-02-23 平安科技(深圳)有限公司 Content data recommendation method and device, computer equipment and storage medium
CN112435152A (en) * 2020-12-04 2021-03-02 北京师范大学 Online learning investment dynamic evaluation method and system
CN112668945A (en) * 2021-01-27 2021-04-16 天元大数据信用管理有限公司 Enterprise credit risk assessment method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100205108A1 (en) * 2009-02-11 2010-08-12 Mun Johnathan C Credit and market risk evaluation method

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294828A (en) * 2013-06-25 2013-09-11 厦门市美亚柏科信息股份有限公司 Verification method and verification device of data mining model dimension
CN107563929A (en) * 2017-07-27 2018-01-09 杭州中奥科技有限公司 A kind of various dimensions siren based on personage's specificity analysis
CN107993143A (en) * 2017-11-23 2018-05-04 深圳大管加软件与技术服务有限公司 A kind of Credit Risk Assessment method and system
CN110322089A (en) * 2018-03-30 2019-10-11 宗略投资(上海)有限公司 Enterprise Credit Risk Evaluation method and its system
CN111047122A (en) * 2018-10-11 2020-04-21 北京国双科技有限公司 Enterprise data maturity evaluation method and device and computer equipment
CN110990474A (en) * 2019-11-28 2020-04-10 泰华智慧产业集团股份有限公司 Regional industry image analysis method and device
CN111680073A (en) * 2020-06-11 2020-09-18 天元大数据信用管理有限公司 Financial service platform policy information recommendation method based on user data
CN111754116A (en) * 2020-06-24 2020-10-09 国家电网有限公司大数据中心 Credit assessment method and device based on label portrait technology
CN111861262A (en) * 2020-07-30 2020-10-30 国网山东省电力公司寿光市供电公司 Enterprise perspective portrait method and terminal based on energy big data
CN111737477A (en) * 2020-08-07 2020-10-02 杭州六棱镜知识产权科技有限公司 Intellectual property big data-based intelligence investigation method, system and storage medium
CN112396430A (en) * 2020-11-09 2021-02-23 中国南方电网有限责任公司 Processing method and system for enterprise evaluation
CN112395500A (en) * 2020-11-17 2021-02-23 平安科技(深圳)有限公司 Content data recommendation method and device, computer equipment and storage medium
CN112435152A (en) * 2020-12-04 2021-03-02 北京师范大学 Online learning investment dynamic evaluation method and system
CN112668945A (en) * 2021-01-27 2021-04-16 天元大数据信用管理有限公司 Enterprise credit risk assessment method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Enterprise modelling techniques to help manufacturing firms develop product service activities;T. Alix等;IFAC Proceedings Volumes;第42卷(第4期);1637-1642 *
企业信息化的灰聚类评价模型及应用;林伟等;科技进步与对策;第20卷(第06期);129-130 *
融合多源数据的企业竞争对手画像构建;黄晓斌等;现代情报;第40卷(第11期);13-21,33 *

Also Published As

Publication number Publication date
CN113837859A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
CN113837859B (en) Image construction method for small and micro enterprises
CN112017025B (en) Enterprise credit assessment method based on fusion of deep learning and logistic regression
CN111882446B (en) Abnormal account detection method based on graph convolution network
CN114444986B (en) Product analysis method, system, device and medium
Utari et al. Implementation of data mining for drop-out prediction using random forest method
Cheng et al. Contagious chain risk rating for networked-guarantee loans
CN112700324A (en) User loan default prediction method based on combination of Catboost and restricted Boltzmann machine
CN111583012B (en) Method for evaluating default risk of credit, debt and debt main body by fusing text information
CN111461216A (en) Case risk identification method based on machine learning
CN113537807B (en) Intelligent wind control method and equipment for enterprises
CN113886372A (en) User portrait construction method based on improved analytic hierarchy process
CN116468536A (en) Automatic risk control rule generation method
CN112419029A (en) Similar financial institution risk monitoring method, risk simulation system and storage medium
CN115794803A (en) Engineering audit problem monitoring method and system based on big data AI technology
CN109543038B (en) Emotion analysis method applied to text data
CN116883153A (en) Pedestrian credit investigation-based automobile finance pre-credit rating card development method and terminal
CN115618926A (en) Important factor extraction method and device for taxpayer enterprise classification
CN115330526A (en) Enterprise credit scoring method and device
CN113869423A (en) Marketing response model construction method, equipment and medium
CN115905655A (en) User portrait construction method, device and equipment and readable storage medium
Zaffar et al. A review on feature selection methods for improving the performance of classification in educational data mining
CN114817557A (en) Enterprise risk detection method and device based on enterprise credit investigation big data knowledge graph
Yao Application of data mining technology in financial fraud identification
CN113822751A (en) Online loan risk prediction method
Nassreddine et al. Detecting Data Outliers with Machine Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant