CN114511250A - Enterprise external migration risk early warning method and system based on machine learning - Google Patents
Enterprise external migration risk early warning method and system based on machine learning Download PDFInfo
- Publication number
- CN114511250A CN114511250A CN202210258025.6A CN202210258025A CN114511250A CN 114511250 A CN114511250 A CN 114511250A CN 202210258025 A CN202210258025 A CN 202210258025A CN 114511250 A CN114511250 A CN 114511250A
- Authority
- CN
- China
- Prior art keywords
- enterprise
- migration
- model
- data
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013508 migration Methods 0.000 title claims abstract description 96
- 230000005012 migration Effects 0.000 title claims abstract description 73
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000010801 machine learning Methods 0.000 title claims abstract description 25
- 230000001617 migratory effect Effects 0.000 claims abstract description 24
- 238000012549 training Methods 0.000 claims description 17
- 238000012360 testing method Methods 0.000 claims description 15
- 230000008859 change Effects 0.000 claims description 12
- 238000011156 evaluation Methods 0.000 claims description 12
- 238000004458 analytical method Methods 0.000 claims description 7
- 238000007635 classification algorithm Methods 0.000 claims description 6
- 238000007637 random forest analysis Methods 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 5
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 4
- 238000011066 ex-situ storage Methods 0.000 abstract description 4
- 238000011161 development Methods 0.000 description 6
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 241000283899 Gazella Species 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000000586 desensitisation Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000007115 recruitment Effects 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/211—Schema design and management
- G06F16/212—Schema design and management with details for data modelling support
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0637—Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
- G06Q10/06375—Prediction of business process outcome or impact based on a proposed change
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P80/00—Climate change mitigation technologies for sector-wide applications
- Y02P80/10—Efficient use of energy, e.g. using compressed air or pressurized fluid as energy carrier
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- General Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Operations Research (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Educational Administration (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Game Theory and Decision Science (AREA)
- Mathematical Optimization (AREA)
- Marketing (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Mathematical Analysis (AREA)
- Medical Informatics (AREA)
- Computational Linguistics (AREA)
- Algebra (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an enterprise external migration risk early warning method based on machine learning, which comprises the following steps: constructing characteristic variables to form an enterprise ex-situ prediction characteristic database; constructing a feature selection model and establishing a model database under the condition that the data in the enterprise external migration prediction feature database is in a balanced data set; establishing and verifying an enterprise ex-transit prediction model according to the data in the model database; inputting a new sample to the constructed enterprise migratory prediction model to obtain the enterprise migratory probability corresponding to the new sample. The machine learning-based enterprise external migration risk early warning system for implementing the method is also disclosed. The risk early warning method ensures the feasibility and convenience of data acquisition, has strong applicability and strong universality, can directly obtain the ex-enterprise probability, and has high accuracy.
Description
Technical Field
The invention belongs to the field of economic situation prediction and early warning, and particularly relates to an enterprise external migration risk early warning method and system based on machine learning.
Background
The enterprise, as a market economic subject, is influenced by factors such as element cost rise, enterprise capacity adjustment, market demand change, external policy diversion, and the like, and is easy to completely move itself out of the original registration place by carrying out partial enterprise movement on external investment and even changing the enterprise registration authority. The essence of enterprise migration is a process of enterprise location reselection, and the migration of important enterprises such as industrial enterprises with scales above, unicorn enterprises, gazelle enterprises, specialized speciality, special business, small and medium-sized enterprises and the like can directly reduce the economic tax sources of the locations of the enterprises, remarkably influence the warehousing tax income and is not beneficial to the stable development of local social economy. Therefore, it is necessary to predict enterprises with migratory tendency by technical means and to give an early warning to the possible migratory phenomenon in time.
The Chinese patent of invention CN109377058A discloses an enterprise external migration risk assessment method based on a logistic regression model. The method collects the information of the enterprise external migration instance and desensitization telecommunication data provided by an operator; removing low prediction capability indexes and high correlation indexes through data binning and Pearson correlation coefficients; on the basis, an enterprise migratory prediction model is constructed by adopting logistic regression, and finally, the migratory probability of the enterprise is output. The method fills the blank of a modeling method for quantitatively predicting the enterprise migration behavior, and can output the enterprise migration probability. However, the method has few data sources, and the adopted telecommunication data cannot represent all enterprises. Pearson correlation as a linear analysis method cannot effectively screen out nonlinear influence variables. The prediction based on the regression algorithm is complicated to operate and has low accuracy (68%). In general, the method has the defects of poor scene applicability, model universality and application accuracy.
Disclosure of Invention
In order to overcome the defects in the prior art, the embodiment of the invention provides an enterprise ex-business risk early warning method and system based on machine learning.
In order to achieve the purpose, the invention adopts the technical scheme that: a risk early warning method for enterprise migration outside based on machine learning comprises the following steps:
constructing characteristic variables to form an enterprise external migration prediction characteristic database;
constructing a feature selection model and establishing a model database under the condition that the data in the enterprise external migration prediction feature database is in a balanced data set;
establishing and verifying an enterprise external migration prediction model according to the data in the model database;
inputting a new sample to the constructed enterprise migratory prediction model to obtain the enterprise migratory probability corresponding to the new sample.
In the above technical solution, the method further includes:
and classifying the enterprises by adopting a comprehensive scene analysis method based on the obtained enterprise migration probability and in combination with preset evaluation items, and carrying out grading early warning.
In the above technical solution, "constructing feature variables and forming an enterprise external migration prediction feature database" includes:
sorting lists of the external migration enterprises and the non-external migration enterprises to obtain data samples of multiple dimensions of the enterprises in the lists;
constructing characteristic variables according to the data samples;
and according to a preset time period, calculating the numerical values of different characteristic variables of all enterprises in the list to form an enterprise ex-migration prediction characteristic database.
Further, "calculating values of different characteristic variables of all enterprises in the list to form an enterprise ex-migration prediction characteristic database" includes:
calculating the change value of the power consumption ring ratio, the change value of the water consumption ring ratio, the change value of the patent quantity ring ratio, the quantity of enjoyed local policies, the quantity of enterprises in the registered places of the industries, the remote investment frequency, the remote investment amount, the quantity of remote recruiters, the quantity of remote investment engagement reports and the quantity of officers visiting the remote officers in admission in the list;
and converting the results of the characteristic variables, namely whether the enterprise is moved outside, whether the enterprise is in a different place, whether the enterprise enjoys the place policy and whether the enterprise has the officer report, from a character type to a digital type, wherein the result is yes is converted into 1, and the result is no is converted into 0.
In the above technical solution, the "state that the sample set in the enterprise external migration prediction feature database is a balanced data set" includes:
judging whether the ex-business and the non-ex-business in the enterprise ex-business forecast feature database are balanced or not;
and under-sampling the data in the enterprise ex-migration prediction feature database when the ex-migration enterprise and the non-ex-migration enterprise are in an unbalanced state, so that the ratio of the ex-migration enterprise to the non-ex-migration enterprise is 1:1, and forming the balanced data set.
In the above technical solution, "constructing a feature selection model and building a model database" includes:
establishing a feature selection model by utilizing a random forest algorithm according to the balanced data set;
calculating a relative importance value of each of the feature variables through the feature selection model;
and when the relative importance value is smaller than a preset importance value, removing data corresponding to the characteristic variable to form the model database.
In the above technical solution, "establishing and verifying the enterprise migration prediction model according to the data in the model database" includes:
dividing data in the model database into a training set and a test set according to a preset proportion;
establishing the enterprise external migration prediction model based on a machine learning classification algorithm, and setting model parameters;
optimizing the model parameters by taking the training set as training data of the enterprise migrant prediction model;
and verifying the model precision of the trained enterprise ex-transit prediction model by using the test set until the preset precision is reached.
Further, the data in the model database are divided into a training set and a test set according to a ratio of 7: 3.
Further, verifying the model accuracy of the trained enterprise external migration prediction model by using the test set comprises verifying the model accuracy by using the model accuracy and the model recall ratio.
The early warning system comprises at least one data processor and a memory, wherein the memory stores instructions, and when the instructions are executed by at least one processor, the early warning method is implemented.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
1. the enterprise external migration prediction characteristic database uses the energy data and the internet big data as data sources, constructs characteristic variables by acquiring data samples with multiple dimensions, expands the data sources, does not depend on data in a specific field, ensures the feasibility and convenience of data acquisition, is suitable for various industrial and commercial enterprises, and has strong scene applicability.
2. The invention carries out data analysis by depending on a statistical method in the whole process, realizes the measurement of the importance of the characteristic variable by constructing a characteristic selection model based on random forests, can effectively screen out nonlinear influence factors, overcomes the defect that the traditional method can only carry out linear relation measurement, and has smaller application limitation and strong universality compared with the traditional method.
3. The enterprise migration prediction model based on the classification algorithm is constructed by adopting the machine learning technology through data in the model database, the construction steps of the model are simple, the enterprise migration probability can be directly output, and the accuracy is high.
4. Based on the obtained probability of the enterprise migration, the severity of the influence of the enterprise migration on the local economic development is evaluated by combining with preset evaluation items and adopting a scene comprehensive analysis method, the method is simple and convenient, data is visual, automatic grading early warning of the enterprise migration is realized, and monitoring is facilitated.
In order to make the aforementioned and other objects, features and advantages of the invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a method for early warning of enterprise migration risk based on machine learning according to an embodiment of the present invention;
FIG. 2 is a flow chart of constructing a feature selection model in an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The first embodiment is as follows: referring to fig. 1-2, a method for early warning of enterprise migration risk based on machine learning includes the steps:
step S01: constructing characteristic variables to form an enterprise external migration prediction characteristic database;
specifically, the method comprises the following steps:
s11: sorting lists of the external migration enterprises and the non-external migration enterprises; lists of the foreign enterprises and the non-foreign enterprises can be obtained by collecting a list of the foreign enterprises and a local enterprise directory issued by a market supervision and management bureau;
s12: acquiring data samples of multiple dimensions of enterprises in a list; the data samples at least comprise energy utilization classes, employment classes, land utilization classes, external investment classes, policy classes, intellectual property rights classes and industrial chain classes, the data dimensionality is wide, the data source does not depend on a specific field, and the feasibility and the convenience of data acquisition are ensured;
s13: constructing characteristic variables according to the data samples;
s14: calculating the change values of the power consumption ring ratios, the water consumption ring ratios, the patent quantity ring ratios, the enjoyed local policy quantity, the quantity of enterprises in the registered places of the industries, the remote investment frequency, the remote investment amount, the quantity of remote recruiters, the quantity of remote investment engagement reports and the quantity of the officers visiting the remote places in the list according to the preset time period;
s15: and converting the results of the characteristic variables, namely whether the enterprise is moved outside, whether the enterprise is in a different place, whether the enterprise enjoys the place policy and whether the enterprise has the officer report, from a character type to a digital type, wherein the result is yes is converted into 1, and the result is no is converted into 0.
And forming an enterprise external migration prediction feature database by the data for calculating a feature selection model.
Step S02: constructing a feature selection model and establishing a model database under the condition that the sample set in the enterprise ex-transit prediction feature database is a balanced data set;
specifically, the method comprises the following steps:
s21: judging whether the ex-business and the non-ex-business in the enterprise ex-business forecast feature database are balanced or not;
s22: under-sampling data in the enterprise ex-migration prediction feature database when the ex-migration enterprise and the non-ex-migration enterprise are in an unbalanced state, so that the ratio of the ex-migration enterprise to the non-ex-migration enterprise is 1:1, and forming a balanced data set; the undersampling means that part of non-ex-situ enterprises are removed, so that the number of samples of ex-situ enterprises is close to that of non-ex-situ enterprises;
s23: establishing a feature selection model by using a random forest algorithm by taking data of whether enterprises in the balanced data set migrate outside as dependent variables and the rest data as independent variables;
s24: calculating a relative importance value of each of the feature variables through the feature selection model;
the following calculation formula needs to be utilized:
Equation (1) is used to output the importance measure parameter for each argument. In formula (1), VI is the minimum variance of the characteristic variable A; d1D2 are data sets divided by characteristic variable A at two sides of arbitrary data dividing point s; c. C1And c2Respectively a data set D1And D2The sample mean of (1).
Equation (2) is used to convert the minimum variance of the characteristic variables into a relative importance value. In the formula (2), VIMiAnd VIiThe relative importance value and the minimum variance of the index i, and c the feature number, respectively.
S25: and when the relative importance value is smaller than a preset importance value, removing data corresponding to the characteristic variable to form the model database.
Step S03: establishing and verifying an enterprise ex-transit prediction model according to the data in the model database;
specifically, the method comprises the following steps:
s31: dividing data in the model database into a training set and a test set according to a preset proportion; the preset ratio can be set to 7: 3;
s32: taking the training set as training data of the enterprise migratory prediction model, taking data of whether enterprises migrate outside the enterprise as dependent variables, taking the rest data in the training set as independent variables, preferentially selecting a machine learning classification algorithm, establishing the enterprise migratory prediction model, and setting model parameters;
s33: taking the test set as input data of the enterprise migrant prediction model, and outputting a prediction result by using the trained enterprise migrant prediction model;
s34: comparing the prediction results of the models established by the various machine learning classification algorithms in the test set with the difference of the actual migrated enterprises, and calculating the model accuracy and the model recall ratio of each model;
Equation (3) is used to calculate the model accuracy, i.e. the percentage of the number of samples predicted to be correct to the total number of samples. In the formula (3), Accuracy is the model Accuracy; n is the number of enterprises; TP is the number of the external enterprise with correct prediction, and TN is the number of the non-external enterprise with correct prediction.
Equation (4) is used to calculate the model recall ratio, i.e. the percentage of the number of correct live-in samples to the total number of actual live-in samples is predicted. In the formula (4), Recall is the model Recall ratio; TP is the number of the correct external migration enterprises to be predicted, and FN is the number of the wrong external migration enterprises to be predicted.
S35: judging whether the model accuracy and the model recall ratio both reach preset accuracy, wherein the preset accuracy can be set to be 75%; if both of the two reach the preset precision, obtaining a built enterprise ex-transit prediction model; otherwise, the step S32 is returned to optimize the model parameters.
Step S04: inputting a new sample to the constructed enterprise migratory prediction model to obtain the enterprise migratory probability corresponding to the new sample.
In a preferred embodiment, the method further comprises:
step S05: and based on the enterprise ex-migration probability, combining with preset evaluation items, and carrying out grading early warning on the enterprise by adopting a comprehensive scene analysis method.
The comprehensive analysis method of the situation refers to the method of evaluating the influence of future migration of the enterprise by determining the evaluation item of the early warning of the migration, taking the probability of the migration of the enterprise as the weight and adopting a weighted accumulation method.
Formula (5) Total score ═ p × Σiscore
Equation (5) is used to calculate the evaluation total score for the business migrations. In formula (5), Total score is the Total score of the enterprise assessment, p is the probability of enterprise migration, and score is the score of the enterprise on the ith assessment item.
The preset evaluation items comprise enterprise types, enterprise tax payment scales and the like. The method comprises the steps of obtaining an assessment total score of the enterprise migration through weighted calculation of the enterprise migration probability and preset assessment items, classifying the enterprises according to the assessment total score, and classifying the enterprises into major attention enterprises, major attention enterprises and general attention enterprises to perform grading early warning. The external migration of the enterprise can directly reduce the economic tax sources of the places of the enterprise, obviously influence the warehousing tax and is not beneficial to the stable development of the local socioeconomic. Therefore, the probability of the enterprise migration is combined with the evaluation items to calculate the evaluation total score, and the severity of the influence of the enterprise migration on the local economic development can be directly judged by using the evaluation total score, so that the method is simple, convenient and intuitive, and is beneficial to the enterprise to carry out grading early warning and focus attention.
Taking the data from january to august in a certain area as an example, the enterprise in the data is subjected to outrun prediction. The method comprises the following steps:
step S01: and constructing the characteristic variables to form an enterprise ex-transit prediction characteristic database.
And collecting an external enterprise list issued by a market supervision and management office, and integrating local enterprise lists. The method comprises the steps of obtaining power consumption and water consumption data of corresponding enterprises, capturing data of external investment, network remote recruitment, location policy, land 'shooting hanging', intellectual property rights and industry types of the enterprises, crawling news reports related to remote investment engagement of the enterprises and remote visit officers of the enterprises, and counting 12 thousands of data.
The above data is used to construct feature variables, i.e., various indices constructed to describe the characteristics of the population sample. The characteristic variables comprise the change of the power consumption ring ratio of an enterprise, the change of the water consumption ring ratio, the change of the patent quantity ring ratio, the quantity of local policies enjoyed, the quantity of enterprises of which the industry is registered, the frequency of remote investment (including enterprises and enterprise legal persons), the remote investment amount (including enterprises and enterprise legal persons), the quantity of remote recruiters (function departments), the quantity of remote investment and contact reports and the quantity of officers in the remote reception. And converting the results of the characteristic variables, namely whether the enterprise is moved outside, whether the enterprise is in a different place, whether the enterprise enjoys the place policy and whether the enterprise has the officer report, from a character type to a digital type, wherein the result is yes is converted into 1, and the result is no is converted into 0.
Specific characteristic variables are shown in table 1:
TABLE 1 characteristic variable index system
Step S02: and constructing a feature selection model and establishing a model database under the condition that the sample set in the enterprise external migration prediction feature database is a balanced data set.
And judging whether the ex-business and the non-ex-business in the enterprise ex-business forecast feature database are balanced. Under-sampling data in the enterprise ex-migration prediction feature database when the ex-migration enterprise and the non-ex-migration enterprise are in an unbalanced state, so that the ratio of the ex-migration enterprise to the non-ex-migration enterprise is 1:1, and forming a balanced data set; establishing a feature selection model by using a random forest algorithm by taking data of whether enterprises in the balanced data set migrate outside as dependent variables and the rest data as independent variables; and selecting the characteristic variables in the balanced data set by utilizing the characteristic selection model. That is, M feature variables are selected from N existing feature variables of the balanced data set, and the M feature variables can represent the overall features of the sample. Specifically, in the present embodiment, the number of the business units 2151 is substantially smaller than the number of the non-business units. And randomly undersampling the non-migratory enterprises in the enterprise migratory forecasting feature database to obtain 2151 families of the non-migratory enterprises, and forming a balanced data set.
The importance measure parameter of each independent variable is output by using formula (1).
The minimum variance of the characteristic variable is converted to a relative importance value using equation (2).
And eliminating the characteristic variables with relative importance values less than 0.05 after calculation. Wherein, the reserved characteristic variables comprise: the change of the power consumption ring ratio, the quantity of the recruiters in different places (function departments), whether to purchase the use places in different places, the different-place investment frequency of enterprise legal persons and the quantity of the reports of the different-place investment engagement, and the data consisting of the characteristic variables form a model database.
Step S03: and establishing and verifying an enterprise ex-transit prediction model according to the data in the model database.
And dividing the data in the model database into a training set and a test set according to the ratio of 7:3 to obtain 3011 training samples and 1281 test samples. And (3) taking 3011 training samples as model training data, taking whether the enterprise migrates outside as a dependent variable and taking the data in the rest training sets as independent variables, establishing an enterprise migrates outside prediction model based on a machine learning classification algorithm, and setting model parameters. And (3) taking 1281 test samples as input data, and outputting a prediction result by using the trained model. Comparing the predicted results of the test centralized model with actual migration enterprisesThe difference of (1) outputs the Accuracy of the model Accuracy Accuracy and the Recall of the model Recall. Formula (3)Formula (4)Through inspection, the accuracy rate of the random forest prediction model is 82.1%, the model recall ratio is 78.2%, the precision requirement is met, and the enterprise external migration prediction model is output.
Step S04: inputting a new sample to the constructed enterprise migratory prediction model to obtain the enterprise migratory probability corresponding to the new sample.
And inputting the remaining 11 thousands of sample data and outputting the enterprise migratory probability by using the constructed enterprise migratory prediction model.
Step S05: analyzing 2151 case characteristics of the enterprises moving outside, and performing grading early warning on the enterprises by adopting a scenario comprehensive analysis method based on the obtained probabilities of the enterprises moving outside and combining with preset evaluation items.
Specifically, the statistics 2151 of cases of the enterprises moving outside the house shows that the tax of 41 enterprises moving outside the house is more than 1000 ten thousand yuan, the proportion of the total tax of all enterprises moving outside the house is 69.87%, and the tax payment scale of the enterprises can be used as an important early warning index. In addition, the external migration of key enterprises can directly reduce the economic tax sources of the locations of the enterprises, obviously influence the warehousing tax revenue and is not beneficial to the stable development of local socioeconomic performance. Therefore, based on the enterprise migration probability, the Total score of the formula (5) is used in combination with the tax payment scale and the enterprise typeiscore assesses severity of outcome of enterprise migration. The enterprise valuation term specific scores are shown in table 2:
TABLE 2 Enterprise assessment term scores
Serial number | Index (I) | Score value |
1 | Enterprises of four* | 35 |
2 | The annual tax rate of an enterprise is over 1000 million | 25 |
3 | The enterprise engaged in the industry belongs to the regional key development industry | 15 |
4 | Marketing enterprise | 5 |
5 | Headquarters enterprise | 5 |
6 | Gazelle enterprise | 5 |
7 | Unicorn enterprises | 5 |
8 | Dedicated and special new 'small giant' enterprise | 5 |
Note: the general term refers to the general term of more than four types of enterprises of more than scale industrial enterprises, capital grade construction enterprises, quota-based zero-meal-holding enterprises, national key service enterprises and the like
Based on the total score of enterprise evaluation, the enterprise with the tendency of migration is divided into three categories of major concern (score larger than or equal to 40), major concern (score larger than or equal to 20) and general concern (score larger than or equal to 5), and grading early warning is carried out. The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (10)
1. A risk early warning method for enterprise migration outside based on machine learning is characterized by comprising the following steps:
constructing characteristic variables to form an enterprise external migration prediction characteristic database;
constructing a feature selection model and establishing a model database under the condition that a sample set in the enterprise external migration prediction feature database is a balanced data set;
establishing and verifying an enterprise ex-transit prediction model according to the data in the model database;
inputting a new sample to the constructed enterprise migratory prediction model to obtain the enterprise migratory probability corresponding to the new sample.
2. The machine learning-based enterprise relocation risk early warning method according to claim 1, wherein the method further comprises the following steps:
and based on the obtained enterprise migration probability, combining with preset evaluation items, and carrying out grading early warning on the enterprise by adopting a comprehensive scene analysis method.
3. The early warning method for risks of enterprise migration based on machine learning of claim 1, wherein constructing feature variables to form a database of predicted features of enterprise migration comprises:
sorting lists of the external migration enterprises and the non-external migration enterprises to obtain data samples of multiple dimensions of the enterprises in the lists;
constructing characteristic variables according to the data samples;
and according to a preset time period, calculating the numerical values of different characteristic variables of all enterprises in the list to form an enterprise external migration prediction characteristic database.
4. The early warning method for risks of enterprise migration based on machine learning of claim 3, wherein calculating the values of different characteristic variables of all enterprises in the list to form the database of predicted characteristics of enterprise migration comprises:
calculating the change value of the power consumption ring ratio, the change value of the water consumption ring ratio, the change value of the patent quantity ring ratio, the quantity of enjoyed local policies, the quantity of enterprises in the registered places of the industries, the remote investment frequency, the remote investment amount, the quantity of remote recruiters, the quantity of remote investment engagement reports and the quantity of officers visiting the remote officers in admission in the list;
and converting the results of the characteristic variables, namely whether the enterprise is moved outside, whether the enterprise is in a different place, whether the enterprise enjoys the place policy and whether the enterprise has the officer report, from a character type to a digital type, wherein the result is yes is converted into 1, and the result is no is converted into 0.
5. The machine learning-based enterprise migration risk early warning method according to claim 1, wherein the "state that the sample set in the enterprise migration prediction feature database is a balanced data set" comprises:
judging whether the ex-business and the non-ex-business in the enterprise ex-business forecast feature database are balanced or not;
and under-sampling the data in the enterprise ex-migration prediction feature database when the ex-migration enterprise and the non-ex-migration enterprise are in an unbalanced state, so that the ratio of the ex-migration enterprise to the non-ex-migration enterprise is 1:1, and forming the balanced data set.
6. The early warning method for risks of enterprise migration based on machine learning of claim 1, wherein the step of constructing a feature selection model and establishing a model database comprises the following steps:
establishing a feature selection model by utilizing a random forest algorithm according to the balanced data set;
calculating a relative importance value of each of the feature variables through the feature selection model;
and when the relative importance value is smaller than a preset importance value, removing data corresponding to the characteristic variable to form the model database.
7. The machine learning-based enterprise migratory risk early warning method according to claim 1, wherein the step of establishing and verifying an enterprise migratory prediction model according to the data in the model database comprises the steps of:
dividing data in the model database into a training set and a test set according to a preset proportion;
establishing the enterprise external migration prediction model based on a machine learning classification algorithm, and setting model parameters;
optimizing the model parameters by taking the training set as training data of the enterprise migrant prediction model;
and verifying the model precision of the trained enterprise outside migration prediction model by using the test set until the preset precision is reached.
8. The machine learning-based enterprise migratory risk early warning method according to claim 7, wherein: and dividing the data in the model database into a training set and a test set according to a 7:3 ratio.
9. The machine learning-based enterprise migration risk early warning method according to claim 7, wherein verifying the model accuracy of the trained enterprise migration prediction model using the test set comprises verifying the model accuracy using a model accuracy rate and a model recall rate.
10. A machine learning based out-of-business migration risk early warning system, comprising at least one data processor and a memory, the memory having stored therein instructions which, when executed by at least one of the processors, carry out the method according to any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210258025.6A CN114511250A (en) | 2022-03-16 | 2022-03-16 | Enterprise external migration risk early warning method and system based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210258025.6A CN114511250A (en) | 2022-03-16 | 2022-03-16 | Enterprise external migration risk early warning method and system based on machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114511250A true CN114511250A (en) | 2022-05-17 |
Family
ID=81552968
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210258025.6A Pending CN114511250A (en) | 2022-03-16 | 2022-03-16 | Enterprise external migration risk early warning method and system based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114511250A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115660796A (en) * | 2022-12-09 | 2023-01-31 | 北京中科闻歌科技股份有限公司 | Tax fund management method, device, equipment and storage medium for migration risk enterprise |
-
2022
- 2022-03-16 CN CN202210258025.6A patent/CN114511250A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115660796A (en) * | 2022-12-09 | 2023-01-31 | 北京中科闻歌科技股份有限公司 | Tax fund management method, device, equipment and storage medium for migration risk enterprise |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111104981A (en) | Hydrological prediction precision evaluation method and system based on machine learning | |
CN111178611B (en) | Method for predicting daily electric quantity | |
Moghaddam et al. | An appropriate multiple criteria decision making method for solving electricity planning problems, addressing sustainability issue | |
CN111080356A (en) | Method for calculating residence price influence factors by using machine learning regression model | |
CN110930250A (en) | Enterprise credit risk prediction method and system, storage medium and electronic equipment | |
CN110942171A (en) | Enterprise labor and resource dispute risk prediction method based on machine learning | |
CN113537807A (en) | Enterprise intelligent wind control method and device | |
CN114429245A (en) | Analysis display method of engineering cost data | |
CN107256461B (en) | Charging facility construction address evaluation method and system | |
CN113642922A (en) | Small and medium-sized micro enterprise credit evaluation method and device | |
CN108805471A (en) | Evaluation method for water resources carrying capacity based on the analysis of hybrid system interactively | |
CN115471000A (en) | Method for evaluating uncertainty of deterministic graded rainfall forecast | |
CN114511250A (en) | Enterprise external migration risk early warning method and system based on machine learning | |
CN114118793A (en) | Local exchange risk early warning method, device and equipment | |
CN117495094A (en) | Comprehensive evaluation and early warning method and system for safety risk of industrial chain | |
CN116911994A (en) | External trade risk early warning system | |
CN116739742A (en) | Monitoring method, device, equipment and storage medium of credit wind control model | |
CN110866696A (en) | Method and device for training shop falling risk assessment model | |
CN113688506B (en) | Potential atmospheric pollution source identification method based on multi-dimensional data such as micro-station and the like | |
CN113222255B (en) | Method and device for quantifying contract performance and predicting short-term violations | |
CN114510405A (en) | Index data evaluation method, index data evaluation device, index data evaluation apparatus, storage medium, and program product | |
CN115393148A (en) | Data monitoring system, monitoring method, device, medium and terminal for natural resources | |
CN115204501A (en) | Enterprise evaluation method and device, computer equipment and storage medium | |
CN114418450A (en) | Data processing method and device | |
CN114092216A (en) | Enterprise credit rating method, apparatus, computer device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 215000 No. 101, Suhong Middle Road, Suzhou Industrial Park, Jiangsu Province Applicant after: Yuance Information Technology Co.,Ltd. Address before: 215000 No. 101, Suhong Middle Road, Suzhou Industrial Park, Jiangsu Province Applicant before: SUZHOU INDUSTRIAL PARK SURVEYING MAPPING AND GEOINFORMATION Co.,Ltd. |
|
CB02 | Change of applicant information |