CN114511250A - Enterprise external migration risk early warning method and system based on machine learning - Google Patents

Enterprise external migration risk early warning method and system based on machine learning Download PDF

Info

Publication number
CN114511250A
CN114511250A CN202210258025.6A CN202210258025A CN114511250A CN 114511250 A CN114511250 A CN 114511250A CN 202210258025 A CN202210258025 A CN 202210258025A CN 114511250 A CN114511250 A CN 114511250A
Authority
CN
China
Prior art keywords
enterprise
migration
model
data
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210258025.6A
Other languages
Chinese (zh)
Inventor
王慧
韩丽俊
何正兴
杨梦茜
宋娟娟
曹金虎
宋红见
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Industrial Park Surveying Mapping And Geoinformation Co ltd
Original Assignee
Suzhou Industrial Park Surveying Mapping And Geoinformation Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Industrial Park Surveying Mapping And Geoinformation Co ltd filed Critical Suzhou Industrial Park Surveying Mapping And Geoinformation Co ltd
Priority to CN202210258025.6A priority Critical patent/CN114511250A/en
Publication of CN114511250A publication Critical patent/CN114511250A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • G06Q10/06375Prediction of business process outcome or impact based on a proposed change
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P80/00Climate change mitigation technologies for sector-wide applications
    • Y02P80/10Efficient use of energy, e.g. using compressed air or pressurized fluid as energy carrier

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Operations Research (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Educational Administration (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Game Theory and Decision Science (AREA)
  • Mathematical Optimization (AREA)
  • Marketing (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Mathematical Analysis (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Algebra (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an enterprise external migration risk early warning method based on machine learning, which comprises the following steps: constructing characteristic variables to form an enterprise ex-situ prediction characteristic database; constructing a feature selection model and establishing a model database under the condition that the data in the enterprise external migration prediction feature database is in a balanced data set; establishing and verifying an enterprise ex-transit prediction model according to the data in the model database; inputting a new sample to the constructed enterprise migratory prediction model to obtain the enterprise migratory probability corresponding to the new sample. The machine learning-based enterprise external migration risk early warning system for implementing the method is also disclosed. The risk early warning method ensures the feasibility and convenience of data acquisition, has strong applicability and strong universality, can directly obtain the ex-enterprise probability, and has high accuracy.

Description

Early warning method and system for enterprise ex-business risks based on machine learning
Technical Field
The invention belongs to the field of economic situation prediction and early warning, and particularly relates to an enterprise external migration risk early warning method and system based on machine learning.
Background
The enterprise, as a market economic subject, is influenced by factors such as element cost rise, enterprise capacity adjustment, market demand change, external policy diversion, and the like, and is easy to completely move itself out of the original registration place by carrying out partial enterprise movement on external investment and even changing the enterprise registration authority. The essence of enterprise migration is a process of enterprise location reselection, and the migration of important enterprises such as industrial enterprises with scales above, unicorn enterprises, gazelle enterprises, specialized speciality, special business, small and medium-sized enterprises and the like can directly reduce the economic tax sources of the locations of the enterprises, remarkably influence the warehousing tax income and is not beneficial to the stable development of local social economy. Therefore, it is necessary to predict enterprises with migratory tendency by technical means and to give an early warning to the possible migratory phenomenon in time.
The Chinese patent of invention CN109377058A discloses an enterprise external migration risk assessment method based on a logistic regression model. The method collects the information of the enterprise external migration instance and desensitization telecommunication data provided by an operator; removing low prediction capability indexes and high correlation indexes through data binning and Pearson correlation coefficients; on the basis, an enterprise migratory prediction model is constructed by adopting logistic regression, and finally, the migratory probability of the enterprise is output. The method fills the blank of a modeling method for quantitatively predicting the enterprise migration behavior, and can output the enterprise migration probability. However, the method has few data sources, and the adopted telecommunication data cannot represent all enterprises. Pearson correlation as a linear analysis method cannot effectively screen out nonlinear influence variables. The prediction based on the regression algorithm is complicated to operate and has low accuracy (68%). In general, the method has the defects of poor scene applicability, model universality and application accuracy.
Disclosure of Invention
In order to overcome the defects in the prior art, the embodiment of the invention provides an enterprise ex-business risk early warning method and system based on machine learning.
In order to achieve the purpose, the invention adopts the technical scheme that: a risk early warning method for enterprise migration outside based on machine learning comprises the following steps:
constructing characteristic variables to form an enterprise external migration prediction characteristic database;
constructing a feature selection model and establishing a model database under the condition that the data in the enterprise external migration prediction feature database is in a balanced data set;
establishing and verifying an enterprise external migration prediction model according to the data in the model database;
inputting a new sample to the constructed enterprise migratory prediction model to obtain the enterprise migratory probability corresponding to the new sample.
In the above technical solution, the method further includes:
and classifying the enterprises by adopting a comprehensive scene analysis method based on the obtained enterprise migration probability and in combination with preset evaluation items, and carrying out grading early warning.
In the above technical solution, "constructing feature variables and forming an enterprise external migration prediction feature database" includes:
sorting lists of the external migration enterprises and the non-external migration enterprises to obtain data samples of multiple dimensions of the enterprises in the lists;
constructing characteristic variables according to the data samples;
and according to a preset time period, calculating the numerical values of different characteristic variables of all enterprises in the list to form an enterprise ex-migration prediction characteristic database.
Further, "calculating values of different characteristic variables of all enterprises in the list to form an enterprise ex-migration prediction characteristic database" includes:
calculating the change value of the power consumption ring ratio, the change value of the water consumption ring ratio, the change value of the patent quantity ring ratio, the quantity of enjoyed local policies, the quantity of enterprises in the registered places of the industries, the remote investment frequency, the remote investment amount, the quantity of remote recruiters, the quantity of remote investment engagement reports and the quantity of officers visiting the remote officers in admission in the list;
and converting the results of the characteristic variables, namely whether the enterprise is moved outside, whether the enterprise is in a different place, whether the enterprise enjoys the place policy and whether the enterprise has the officer report, from a character type to a digital type, wherein the result is yes is converted into 1, and the result is no is converted into 0.
In the above technical solution, the "state that the sample set in the enterprise external migration prediction feature database is a balanced data set" includes:
judging whether the ex-business and the non-ex-business in the enterprise ex-business forecast feature database are balanced or not;
and under-sampling the data in the enterprise ex-migration prediction feature database when the ex-migration enterprise and the non-ex-migration enterprise are in an unbalanced state, so that the ratio of the ex-migration enterprise to the non-ex-migration enterprise is 1:1, and forming the balanced data set.
In the above technical solution, "constructing a feature selection model and building a model database" includes:
establishing a feature selection model by utilizing a random forest algorithm according to the balanced data set;
calculating a relative importance value of each of the feature variables through the feature selection model;
and when the relative importance value is smaller than a preset importance value, removing data corresponding to the characteristic variable to form the model database.
In the above technical solution, "establishing and verifying the enterprise migration prediction model according to the data in the model database" includes:
dividing data in the model database into a training set and a test set according to a preset proportion;
establishing the enterprise external migration prediction model based on a machine learning classification algorithm, and setting model parameters;
optimizing the model parameters by taking the training set as training data of the enterprise migrant prediction model;
and verifying the model precision of the trained enterprise ex-transit prediction model by using the test set until the preset precision is reached.
Further, the data in the model database are divided into a training set and a test set according to a ratio of 7: 3.
Further, verifying the model accuracy of the trained enterprise external migration prediction model by using the test set comprises verifying the model accuracy by using the model accuracy and the model recall ratio.
The early warning system comprises at least one data processor and a memory, wherein the memory stores instructions, and when the instructions are executed by at least one processor, the early warning method is implemented.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
1. the enterprise external migration prediction characteristic database uses the energy data and the internet big data as data sources, constructs characteristic variables by acquiring data samples with multiple dimensions, expands the data sources, does not depend on data in a specific field, ensures the feasibility and convenience of data acquisition, is suitable for various industrial and commercial enterprises, and has strong scene applicability.
2. The invention carries out data analysis by depending on a statistical method in the whole process, realizes the measurement of the importance of the characteristic variable by constructing a characteristic selection model based on random forests, can effectively screen out nonlinear influence factors, overcomes the defect that the traditional method can only carry out linear relation measurement, and has smaller application limitation and strong universality compared with the traditional method.
3. The enterprise migration prediction model based on the classification algorithm is constructed by adopting the machine learning technology through data in the model database, the construction steps of the model are simple, the enterprise migration probability can be directly output, and the accuracy is high.
4. Based on the obtained probability of the enterprise migration, the severity of the influence of the enterprise migration on the local economic development is evaluated by combining with preset evaluation items and adopting a scene comprehensive analysis method, the method is simple and convenient, data is visual, automatic grading early warning of the enterprise migration is realized, and monitoring is facilitated.
In order to make the aforementioned and other objects, features and advantages of the invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a method for early warning of enterprise migration risk based on machine learning according to an embodiment of the present invention;
FIG. 2 is a flow chart of constructing a feature selection model in an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The first embodiment is as follows: referring to fig. 1-2, a method for early warning of enterprise migration risk based on machine learning includes the steps:
step S01: constructing characteristic variables to form an enterprise external migration prediction characteristic database;
specifically, the method comprises the following steps:
s11: sorting lists of the external migration enterprises and the non-external migration enterprises; lists of the foreign enterprises and the non-foreign enterprises can be obtained by collecting a list of the foreign enterprises and a local enterprise directory issued by a market supervision and management bureau;
s12: acquiring data samples of multiple dimensions of enterprises in a list; the data samples at least comprise energy utilization classes, employment classes, land utilization classes, external investment classes, policy classes, intellectual property rights classes and industrial chain classes, the data dimensionality is wide, the data source does not depend on a specific field, and the feasibility and the convenience of data acquisition are ensured;
s13: constructing characteristic variables according to the data samples;
s14: calculating the change values of the power consumption ring ratios, the water consumption ring ratios, the patent quantity ring ratios, the enjoyed local policy quantity, the quantity of enterprises in the registered places of the industries, the remote investment frequency, the remote investment amount, the quantity of remote recruiters, the quantity of remote investment engagement reports and the quantity of the officers visiting the remote places in the list according to the preset time period;
s15: and converting the results of the characteristic variables, namely whether the enterprise is moved outside, whether the enterprise is in a different place, whether the enterprise enjoys the place policy and whether the enterprise has the officer report, from a character type to a digital type, wherein the result is yes is converted into 1, and the result is no is converted into 0.
And forming an enterprise external migration prediction feature database by the data for calculating a feature selection model.
Step S02: constructing a feature selection model and establishing a model database under the condition that the sample set in the enterprise ex-transit prediction feature database is a balanced data set;
specifically, the method comprises the following steps:
s21: judging whether the ex-business and the non-ex-business in the enterprise ex-business forecast feature database are balanced or not;
s22: under-sampling data in the enterprise ex-migration prediction feature database when the ex-migration enterprise and the non-ex-migration enterprise are in an unbalanced state, so that the ratio of the ex-migration enterprise to the non-ex-migration enterprise is 1:1, and forming a balanced data set; the undersampling means that part of non-ex-situ enterprises are removed, so that the number of samples of ex-situ enterprises is close to that of non-ex-situ enterprises;
s23: establishing a feature selection model by using a random forest algorithm by taking data of whether enterprises in the balanced data set migrate outside as dependent variables and the rest data as independent variables;
s24: calculating a relative importance value of each of the feature variables through the feature selection model;
the following calculation formula needs to be utilized:
formula (1)
Figure BDA0003549144530000061
Equation (1) is used to output the importance measure parameter for each argument. In formula (1), VI is the minimum variance of the characteristic variable A; d1D2 are data sets divided by characteristic variable A at two sides of arbitrary data dividing point s; c. C1And c2Respectively a data set D1And D2The sample mean of (1).
Formula (2)
Figure BDA0003549144530000062
Equation (2) is used to convert the minimum variance of the characteristic variables into a relative importance value. In the formula (2), VIMiAnd VIiThe relative importance value and the minimum variance of the index i, and c the feature number, respectively.
S25: and when the relative importance value is smaller than a preset importance value, removing data corresponding to the characteristic variable to form the model database.
Step S03: establishing and verifying an enterprise ex-transit prediction model according to the data in the model database;
specifically, the method comprises the following steps:
s31: dividing data in the model database into a training set and a test set according to a preset proportion; the preset ratio can be set to 7: 3;
s32: taking the training set as training data of the enterprise migratory prediction model, taking data of whether enterprises migrate outside the enterprise as dependent variables, taking the rest data in the training set as independent variables, preferentially selecting a machine learning classification algorithm, establishing the enterprise migratory prediction model, and setting model parameters;
s33: taking the test set as input data of the enterprise migrant prediction model, and outputting a prediction result by using the trained enterprise migrant prediction model;
s34: comparing the prediction results of the models established by the various machine learning classification algorithms in the test set with the difference of the actual migrated enterprises, and calculating the model accuracy and the model recall ratio of each model;
formula (3)
Figure BDA0003549144530000071
Equation (3) is used to calculate the model accuracy, i.e. the percentage of the number of samples predicted to be correct to the total number of samples. In the formula (3), Accuracy is the model Accuracy; n is the number of enterprises; TP is the number of the external enterprise with correct prediction, and TN is the number of the non-external enterprise with correct prediction.
Formula (4)
Figure BDA0003549144530000072
Equation (4) is used to calculate the model recall ratio, i.e. the percentage of the number of correct live-in samples to the total number of actual live-in samples is predicted. In the formula (4), Recall is the model Recall ratio; TP is the number of the correct external migration enterprises to be predicted, and FN is the number of the wrong external migration enterprises to be predicted.
S35: judging whether the model accuracy and the model recall ratio both reach preset accuracy, wherein the preset accuracy can be set to be 75%; if both of the two reach the preset precision, obtaining a built enterprise ex-transit prediction model; otherwise, the step S32 is returned to optimize the model parameters.
Step S04: inputting a new sample to the constructed enterprise migratory prediction model to obtain the enterprise migratory probability corresponding to the new sample.
In a preferred embodiment, the method further comprises:
step S05: and based on the enterprise ex-migration probability, combining with preset evaluation items, and carrying out grading early warning on the enterprise by adopting a comprehensive scene analysis method.
The comprehensive analysis method of the situation refers to the method of evaluating the influence of future migration of the enterprise by determining the evaluation item of the early warning of the migration, taking the probability of the migration of the enterprise as the weight and adopting a weighted accumulation method.
Formula (5) Total score ═ p × Σiscore
Equation (5) is used to calculate the evaluation total score for the business migrations. In formula (5), Total score is the Total score of the enterprise assessment, p is the probability of enterprise migration, and score is the score of the enterprise on the ith assessment item.
The preset evaluation items comprise enterprise types, enterprise tax payment scales and the like. The method comprises the steps of obtaining an assessment total score of the enterprise migration through weighted calculation of the enterprise migration probability and preset assessment items, classifying the enterprises according to the assessment total score, and classifying the enterprises into major attention enterprises, major attention enterprises and general attention enterprises to perform grading early warning. The external migration of the enterprise can directly reduce the economic tax sources of the places of the enterprise, obviously influence the warehousing tax and is not beneficial to the stable development of the local socioeconomic. Therefore, the probability of the enterprise migration is combined with the evaluation items to calculate the evaluation total score, and the severity of the influence of the enterprise migration on the local economic development can be directly judged by using the evaluation total score, so that the method is simple, convenient and intuitive, and is beneficial to the enterprise to carry out grading early warning and focus attention.
Taking the data from january to august in a certain area as an example, the enterprise in the data is subjected to outrun prediction. The method comprises the following steps:
step S01: and constructing the characteristic variables to form an enterprise ex-transit prediction characteristic database.
And collecting an external enterprise list issued by a market supervision and management office, and integrating local enterprise lists. The method comprises the steps of obtaining power consumption and water consumption data of corresponding enterprises, capturing data of external investment, network remote recruitment, location policy, land 'shooting hanging', intellectual property rights and industry types of the enterprises, crawling news reports related to remote investment engagement of the enterprises and remote visit officers of the enterprises, and counting 12 thousands of data.
The above data is used to construct feature variables, i.e., various indices constructed to describe the characteristics of the population sample. The characteristic variables comprise the change of the power consumption ring ratio of an enterprise, the change of the water consumption ring ratio, the change of the patent quantity ring ratio, the quantity of local policies enjoyed, the quantity of enterprises of which the industry is registered, the frequency of remote investment (including enterprises and enterprise legal persons), the remote investment amount (including enterprises and enterprise legal persons), the quantity of remote recruiters (function departments), the quantity of remote investment and contact reports and the quantity of officers in the remote reception. And converting the results of the characteristic variables, namely whether the enterprise is moved outside, whether the enterprise is in a different place, whether the enterprise enjoys the place policy and whether the enterprise has the officer report, from a character type to a digital type, wherein the result is yes is converted into 1, and the result is no is converted into 0.
Specific characteristic variables are shown in table 1:
TABLE 1 characteristic variable index system
Figure BDA0003549144530000091
Step S02: and constructing a feature selection model and establishing a model database under the condition that the sample set in the enterprise external migration prediction feature database is a balanced data set.
And judging whether the ex-business and the non-ex-business in the enterprise ex-business forecast feature database are balanced. Under-sampling data in the enterprise ex-migration prediction feature database when the ex-migration enterprise and the non-ex-migration enterprise are in an unbalanced state, so that the ratio of the ex-migration enterprise to the non-ex-migration enterprise is 1:1, and forming a balanced data set; establishing a feature selection model by using a random forest algorithm by taking data of whether enterprises in the balanced data set migrate outside as dependent variables and the rest data as independent variables; and selecting the characteristic variables in the balanced data set by utilizing the characteristic selection model. That is, M feature variables are selected from N existing feature variables of the balanced data set, and the M feature variables can represent the overall features of the sample. Specifically, in the present embodiment, the number of the business units 2151 is substantially smaller than the number of the non-business units. And randomly undersampling the non-migratory enterprises in the enterprise migratory forecasting feature database to obtain 2151 families of the non-migratory enterprises, and forming a balanced data set.
Formula (1)
Figure BDA0003549144530000092
The importance measure parameter of each independent variable is output by using formula (1).
Formula (2)
Figure BDA0003549144530000093
The minimum variance of the characteristic variable is converted to a relative importance value using equation (2).
And eliminating the characteristic variables with relative importance values less than 0.05 after calculation. Wherein, the reserved characteristic variables comprise: the change of the power consumption ring ratio, the quantity of the recruiters in different places (function departments), whether to purchase the use places in different places, the different-place investment frequency of enterprise legal persons and the quantity of the reports of the different-place investment engagement, and the data consisting of the characteristic variables form a model database.
Step S03: and establishing and verifying an enterprise ex-transit prediction model according to the data in the model database.
And dividing the data in the model database into a training set and a test set according to the ratio of 7:3 to obtain 3011 training samples and 1281 test samples. And (3) taking 3011 training samples as model training data, taking whether the enterprise migrates outside as a dependent variable and taking the data in the rest training sets as independent variables, establishing an enterprise migrates outside prediction model based on a machine learning classification algorithm, and setting model parameters. And (3) taking 1281 test samples as input data, and outputting a prediction result by using the trained model. Comparing the predicted results of the test centralized model with actual migration enterprisesThe difference of (1) outputs the Accuracy of the model Accuracy Accuracy and the Recall of the model Recall. Formula (3)
Figure BDA0003549144530000101
Formula (4)
Figure BDA0003549144530000102
Through inspection, the accuracy rate of the random forest prediction model is 82.1%, the model recall ratio is 78.2%, the precision requirement is met, and the enterprise external migration prediction model is output.
Step S04: inputting a new sample to the constructed enterprise migratory prediction model to obtain the enterprise migratory probability corresponding to the new sample.
And inputting the remaining 11 thousands of sample data and outputting the enterprise migratory probability by using the constructed enterprise migratory prediction model.
Step S05: analyzing 2151 case characteristics of the enterprises moving outside, and performing grading early warning on the enterprises by adopting a scenario comprehensive analysis method based on the obtained probabilities of the enterprises moving outside and combining with preset evaluation items.
Specifically, the statistics 2151 of cases of the enterprises moving outside the house shows that the tax of 41 enterprises moving outside the house is more than 1000 ten thousand yuan, the proportion of the total tax of all enterprises moving outside the house is 69.87%, and the tax payment scale of the enterprises can be used as an important early warning index. In addition, the external migration of key enterprises can directly reduce the economic tax sources of the locations of the enterprises, obviously influence the warehousing tax revenue and is not beneficial to the stable development of local socioeconomic performance. Therefore, based on the enterprise migration probability, the Total score of the formula (5) is used in combination with the tax payment scale and the enterprise typeiscore assesses severity of outcome of enterprise migration. The enterprise valuation term specific scores are shown in table 2:
TABLE 2 Enterprise assessment term scores
Serial number Index (I) Score value
1 Enterprises of four* 35
2 The annual tax rate of an enterprise is over 1000 million 25
3 The enterprise engaged in the industry belongs to the regional key development industry 15
4 Marketing enterprise 5
5 Headquarters enterprise 5
6 Gazelle enterprise 5
7 Unicorn enterprises 5
8 Dedicated and special new 'small giant' enterprise 5
Note: the general term refers to the general term of more than four types of enterprises of more than scale industrial enterprises, capital grade construction enterprises, quota-based zero-meal-holding enterprises, national key service enterprises and the like
Based on the total score of enterprise evaluation, the enterprise with the tendency of migration is divided into three categories of major concern (score larger than or equal to 40), major concern (score larger than or equal to 20) and general concern (score larger than or equal to 5), and grading early warning is carried out. The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A risk early warning method for enterprise migration outside based on machine learning is characterized by comprising the following steps:
constructing characteristic variables to form an enterprise external migration prediction characteristic database;
constructing a feature selection model and establishing a model database under the condition that a sample set in the enterprise external migration prediction feature database is a balanced data set;
establishing and verifying an enterprise ex-transit prediction model according to the data in the model database;
inputting a new sample to the constructed enterprise migratory prediction model to obtain the enterprise migratory probability corresponding to the new sample.
2. The machine learning-based enterprise relocation risk early warning method according to claim 1, wherein the method further comprises the following steps:
and based on the obtained enterprise migration probability, combining with preset evaluation items, and carrying out grading early warning on the enterprise by adopting a comprehensive scene analysis method.
3. The early warning method for risks of enterprise migration based on machine learning of claim 1, wherein constructing feature variables to form a database of predicted features of enterprise migration comprises:
sorting lists of the external migration enterprises and the non-external migration enterprises to obtain data samples of multiple dimensions of the enterprises in the lists;
constructing characteristic variables according to the data samples;
and according to a preset time period, calculating the numerical values of different characteristic variables of all enterprises in the list to form an enterprise external migration prediction characteristic database.
4. The early warning method for risks of enterprise migration based on machine learning of claim 3, wherein calculating the values of different characteristic variables of all enterprises in the list to form the database of predicted characteristics of enterprise migration comprises:
calculating the change value of the power consumption ring ratio, the change value of the water consumption ring ratio, the change value of the patent quantity ring ratio, the quantity of enjoyed local policies, the quantity of enterprises in the registered places of the industries, the remote investment frequency, the remote investment amount, the quantity of remote recruiters, the quantity of remote investment engagement reports and the quantity of officers visiting the remote officers in admission in the list;
and converting the results of the characteristic variables, namely whether the enterprise is moved outside, whether the enterprise is in a different place, whether the enterprise enjoys the place policy and whether the enterprise has the officer report, from a character type to a digital type, wherein the result is yes is converted into 1, and the result is no is converted into 0.
5. The machine learning-based enterprise migration risk early warning method according to claim 1, wherein the "state that the sample set in the enterprise migration prediction feature database is a balanced data set" comprises:
judging whether the ex-business and the non-ex-business in the enterprise ex-business forecast feature database are balanced or not;
and under-sampling the data in the enterprise ex-migration prediction feature database when the ex-migration enterprise and the non-ex-migration enterprise are in an unbalanced state, so that the ratio of the ex-migration enterprise to the non-ex-migration enterprise is 1:1, and forming the balanced data set.
6. The early warning method for risks of enterprise migration based on machine learning of claim 1, wherein the step of constructing a feature selection model and establishing a model database comprises the following steps:
establishing a feature selection model by utilizing a random forest algorithm according to the balanced data set;
calculating a relative importance value of each of the feature variables through the feature selection model;
and when the relative importance value is smaller than a preset importance value, removing data corresponding to the characteristic variable to form the model database.
7. The machine learning-based enterprise migratory risk early warning method according to claim 1, wherein the step of establishing and verifying an enterprise migratory prediction model according to the data in the model database comprises the steps of:
dividing data in the model database into a training set and a test set according to a preset proportion;
establishing the enterprise external migration prediction model based on a machine learning classification algorithm, and setting model parameters;
optimizing the model parameters by taking the training set as training data of the enterprise migrant prediction model;
and verifying the model precision of the trained enterprise outside migration prediction model by using the test set until the preset precision is reached.
8. The machine learning-based enterprise migratory risk early warning method according to claim 7, wherein: and dividing the data in the model database into a training set and a test set according to a 7:3 ratio.
9. The machine learning-based enterprise migration risk early warning method according to claim 7, wherein verifying the model accuracy of the trained enterprise migration prediction model using the test set comprises verifying the model accuracy using a model accuracy rate and a model recall rate.
10. A machine learning based out-of-business migration risk early warning system, comprising at least one data processor and a memory, the memory having stored therein instructions which, when executed by at least one of the processors, carry out the method according to any one of claims 1 to 9.
CN202210258025.6A 2022-03-16 2022-03-16 Enterprise external migration risk early warning method and system based on machine learning Pending CN114511250A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210258025.6A CN114511250A (en) 2022-03-16 2022-03-16 Enterprise external migration risk early warning method and system based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210258025.6A CN114511250A (en) 2022-03-16 2022-03-16 Enterprise external migration risk early warning method and system based on machine learning

Publications (1)

Publication Number Publication Date
CN114511250A true CN114511250A (en) 2022-05-17

Family

ID=81552968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210258025.6A Pending CN114511250A (en) 2022-03-16 2022-03-16 Enterprise external migration risk early warning method and system based on machine learning

Country Status (1)

Country Link
CN (1) CN114511250A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115660796A (en) * 2022-12-09 2023-01-31 北京中科闻歌科技股份有限公司 Tax fund management method, device, equipment and storage medium for migration risk enterprise

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115660796A (en) * 2022-12-09 2023-01-31 北京中科闻歌科技股份有限公司 Tax fund management method, device, equipment and storage medium for migration risk enterprise

Similar Documents

Publication Publication Date Title
CN111104981A (en) Hydrological prediction precision evaluation method and system based on machine learning
CN111178611B (en) Method for predicting daily electric quantity
Moghaddam et al. An appropriate multiple criteria decision making method for solving electricity planning problems, addressing sustainability issue
CN111080356A (en) Method for calculating residence price influence factors by using machine learning regression model
CN110930250A (en) Enterprise credit risk prediction method and system, storage medium and electronic equipment
CN110942171A (en) Enterprise labor and resource dispute risk prediction method based on machine learning
CN113537807A (en) Enterprise intelligent wind control method and device
CN114429245A (en) Analysis display method of engineering cost data
CN107256461B (en) Charging facility construction address evaluation method and system
CN113642922A (en) Small and medium-sized micro enterprise credit evaluation method and device
CN108805471A (en) Evaluation method for water resources carrying capacity based on the analysis of hybrid system interactively
CN115471000A (en) Method for evaluating uncertainty of deterministic graded rainfall forecast
CN114511250A (en) Enterprise external migration risk early warning method and system based on machine learning
CN114118793A (en) Local exchange risk early warning method, device and equipment
CN117495094A (en) Comprehensive evaluation and early warning method and system for safety risk of industrial chain
CN116911994A (en) External trade risk early warning system
CN116739742A (en) Monitoring method, device, equipment and storage medium of credit wind control model
CN110866696A (en) Method and device for training shop falling risk assessment model
CN113688506B (en) Potential atmospheric pollution source identification method based on multi-dimensional data such as micro-station and the like
CN113222255B (en) Method and device for quantifying contract performance and predicting short-term violations
CN114510405A (en) Index data evaluation method, index data evaluation device, index data evaluation apparatus, storage medium, and program product
CN115393148A (en) Data monitoring system, monitoring method, device, medium and terminal for natural resources
CN115204501A (en) Enterprise evaluation method and device, computer equipment and storage medium
CN114418450A (en) Data processing method and device
CN114092216A (en) Enterprise credit rating method, apparatus, computer device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 215000 No. 101, Suhong Middle Road, Suzhou Industrial Park, Jiangsu Province

Applicant after: Yuance Information Technology Co.,Ltd.

Address before: 215000 No. 101, Suhong Middle Road, Suzhou Industrial Park, Jiangsu Province

Applicant before: SUZHOU INDUSTRIAL PARK SURVEYING MAPPING AND GEOINFORMATION Co.,Ltd.

CB02 Change of applicant information