CN115358481A

CN115358481A - Early warning and identification method, system and device for enterprise ex-situ migration

Info

Publication number: CN115358481A
Application number: CN202211083680.9A
Authority: CN
Inventors: 谢国城; 陈业强; 徐少强; 桂进军; 曾庆发; 廖小文
Original assignee: Guangdong Eshore Technology Co Ltd
Current assignee: Guangdong Eshore Technology Co Ltd
Priority date: 2022-09-06
Filing date: 2022-09-06
Publication date: 2022-11-18

Abstract

The invention discloses a method for early warning and identifying enterprise migrations, which belongs to the technical field of enterprise wind control in big data and AI, can objectively and accurately carry out quantitative prediction and evaluation on enterprise migrations, and has the characteristics of high accuracy, good robustness and excellent comprehensive generalization; the method comprises the following steps: acquiring qualified enterprise business and business information data, operator data and enterprise supporting policy related index data according to specified requirements by using a feature acquisition unit, and performing matching, splicing, cleaning and standardization processing to obtain an enterprise feature data set; selecting important feature items from all feature items of the enterprise feature data set by using a feature selector to generate an important feature item subset; fusing the important feature item subsets by using a feature fusion device to obtain a fusion feature data set; and randomly dividing the fusion characteristic data set into an enterprise characteristic data set for training and an enterprise characteristic data set for testing. The invention also discloses a system and a device for early warning identification.

Description

Early warning and identifying method, system and device for enterprise ex-business

Technical Field

The invention relates to the technical field of enterprise wind control in big data and AI, in particular to a method, a system and a device for early warning and identification of enterprise migration.

Background

With the development and the innovation of the system of social economy, enterprises move out of consideration of factors such as operation positioning, cost rise, environment, policy and the like. Although the migration outside the enterprise is a normal market behavior, for local governments, the migration outside the enterprise reduces financial taxes, is unfavorable for employment, influences the stable and healthy development of regional economy, further influences the growth of local GDP, and particularly influences the transformation and upgrading of regional industry in high and new technology industries. At present, the local government has a certain lag in acquiring the ex-enterprise migration messages, cannot pre-judge in advance, and proposes a targeted coping measure. Therefore, a method for effectively identifying potential foreign-risk enterprises is urgently needed by local governments at present.

Although the traditional machine learning method is used for mining potential migrant enterprises in the industry at present, the method mainly comprises the steps of extracting enterprise business data, simply preprocessing the enterprise business data, obtaining a mining model by adopting a conventional and single machine learning algorithm and a training mode, for example, obtaining the model by adopting a logistic regression and random forest method, and finally outputting the probability of about to run off of the enterprises, wherein the prediction effect is not ideal finally. Therefore, related government departments cannot quickly and accurately mine possible enterprises in advance, and meanwhile, cannot make response measures in advance and make scientific and reasonable policies.

Therefore, a method for early warning and identifying the enterprise migrations is urgently needed to be designed so as to better provide accurate prejudgment for local governments.

Disclosure of Invention

The invention aims to provide an early warning and identifying method for enterprise migrations, which can objectively and accurately carry out quantitative prediction and evaluation on enterprise migrations and has the characteristics of high accuracy, good robustness and excellent comprehensive generalization.

The second purpose of the invention is to provide a system for early warning and identifying the enterprise migration, which can effectively help the early warning and identifying the enterprise migration risk. It is a third object of the present invention to provide means for implementing the early warning identification system.

The first technical scheme adopted by the invention is as follows:

a method for early warning and identifying enterprise migrations comprises the following steps:

(1) Acquiring qualified enterprise and business information data, operator data of the enterprise and enterprise supporting policy related index data of the location of the enterprise according to specified requirements by using a feature collector, matching, splicing, cleaning and standardizing to obtain an enterprise feature data set, and matching the existing ex-situ enterprise example with the enterprise feature data set;

(2) Selecting important feature items from all feature items of the enterprise feature data set by using a feature selector to generate an important feature item subset;

(3) Fusing the important feature item subsets by using a feature fusion device to obtain a fusion feature data set;

(4) Randomly dividing the fusion characteristic data set into an enterprise characteristic data set for training and an enterprise characteristic data set for testing;

(5) Inputting the enterprise characteristic data set for training into an algorithm strategy module to obtain an early warning identification model, inputting the enterprise characteristic data set for testing into the early warning identification model to obtain an external migration probability set of each enterprise for testing, and calculating the evaluation value of the obtained early warning identification model;

(6) And selecting the early warning identification model with good evaluation value obtained by training, and inputting the enterprise characteristic data needing early warning identification into the selected early warning identification model to obtain an output value, namely the ex-transition probability of the enterprise.

Further, the step (1) comprises the following steps:

(1.1) starting a characteristic collector, calling a multi-source data interface, and acquiring enterprise codes, enterprise names, unified social credit codes, enterprise types, latest industry types of enterprises, registered capital, real payment capital, operation duration, operation range, social security number, external investment times, tax payment A-level and other dimensional index data of national enterprises from enterprise business information data;

(1.2) acquiring dimension index data such as certificate numbers, fixed telephone installation numbers, broadband installation numbers, last-month fixed telephone installation numbers, last-month broadband installation numbers, new fixed telephone installation numbers, new broadband installation numbers, new fixed telephone dismantling numbers, broadband dismantling numbers, fixed telephone moving numbers, broadband moving numbers and the like of enterprise storage clients from operator data;

(1.3) acquiring index data related to enterprise support policies of the locations of the enterprises from a policy data source, matching the industrial and commercial enterprise portrait data with the operator enterprise portrait data according to social unified credit codes, and matching the index data related to the enterprise support policies of the locations of the enterprises according to the locations of the enterprises to obtain a spliced original feature data set;

(1.4) cleaning and standardizing the spliced original characteristic data set to obtain an enterprise characteristic data set;

and (1.5) matching the existing ex-place enterprise examples with the enterprise characteristic data set, marking the successfully matched enterprises as positive examples, and marking the unsuccessfully matched enterprises as negative examples.

Further, the step (1) further comprises the following steps:

(1.6) carrying out time processing based on manual experience, data binning processing, combined classification processing and Feature Tools module calling based on automatic integration on the original Feature data set by using a Feature generator;

and (1.7) carrying out One-HotEncoding processing or LDA processing or cosine similarity processing or neural network processing on the original feature data set by using a feature extractor.

Further, the step (2) comprises the following steps:

(2.1) selecting an optimal feature item from all feature items of the enterprise feature data set as a selected set;

(2.2) selecting an optimal feature item from the feature items in the enterprise feature data set except the selected set, adding the feature item into the selected set, and calculating the gain value of the selected set at the moment;

(2.3) evaluating the gain value of the selected set, and if the gain value of the selected set is not the maximum, repeating the step (2.2);

and (2.4) combining the data corresponding to each feature item in the enterprise feature data set and the selected set to generate an important feature item subset.

Further, in the step (2.2), the gain value is calculated according to the formula:

wherein Gain (A) is the Gain value of the selected set, ent (D) is the information entropy of the enterprise characteristic data set, D is the enterprise characteristic data set, V is the subset number of the enterprise characteristic data set, and D ^v A vth subset of the enterprise trait dataset;

the formula of the information entropy Ent (D) is:

wherein p is _k The sample number is the ratio of the kth sample in the enterprise characteristic data set, and n is the number of subsets in the enterprise characteristic data set.

Further, in the step (5), a calculation formula of the evaluation value of the early warning identification model is:

wherein, F1 is an evaluation value, precision is a proportion of the number of true positive samples in the samples predicted to be positive by the early warning identification model, and the Recall value is a proportion of the number of true positive samples in the actual samples.

Further, in the step (6), the output value range is 0 to 1, the enterprise migration risk is low risk when the output value is less than 0.25, the enterprise migration risk is medium risk when the output value is greater than or equal to 0.25 and less than 0.5, the enterprise migration risk is medium risk when the output value is greater than or equal to 0.5 and less than 0.75, and the enterprise migration risk is extremely high risk when the output value is greater than or equal to 0.75 and less than 1.

The second technical scheme adopted by the invention is as follows:

a system for early warning identification of enterprise migrations, comprising:

a characteristic collector: the system is used for acquiring, cleaning and standardizing enterprise industry and commerce information data needing early warning identification, enterprise portrait data of operators and enterprise support policy related index data of locations of enterprises;

a feature selector: the characteristic collector is used for collecting the characteristics output by the characteristic collector;

a feature fusion device: the device is used for fusing the characteristics output by the characteristic collector and the characteristics output by the characteristic selector;

an algorithm strategy module: the early warning recognition system is used for training a plurality of weak learners and fusing and constructing an early warning recognition model according to output results of the weak learners;

the output end of the characteristic collector is connected with the input end of the characteristic selector, the output end of the characteristic selector is connected with the input end of the characteristic fusion device, and the output end of the characteristic fusion device is connected with the input end of the algorithm strategy module.

Further, the method also comprises the following steps:

a feature generator: the device is used for creating new data characteristics by the data output by the characteristic collector in a model label prediction mode;

a feature extractor: the text data in the data output by the feature generator is used for feature extraction;

the output end of the feature collector is connected with the input end of the feature generator and the input end of the feature extractor respectively, and the output end of the feature generator and the output end of the feature extractor are both connected with the input end of the feature selector.

The third technical scheme adopted by the invention is as follows:

an apparatus for early warning of enterprise migration, comprising a memory storing a computer program and a processor, wherein the processor implements the steps of the method according to any one of claims 1-7 when executing the computer program.

Compared with the prior art, the invention has the following beneficial effects: high accuracy, good robustness and excellent comprehensive generalization performance.

1. According to the method for early warning and identifying the enterprise migrations, the characteristics suitable for the early warning and identifying model of the enterprise migrations are screened out from the complex enterprise samples by using the correlation analysis through an objective method, so that the subjectivity and limitation caused by manual screening are avoided, the accuracy is high, the robustness is good, and the comprehensive generalization performance is excellent;

by arranging the feature selector device, the most different information is obtained from the plurality of related original feature sets, redundant information generated due to the correlation among different feature sets is eliminated, and the model performance is improved; the characteristic diversity can be further improved, and the system device still has strong robustness under the condition of not calling or missing the system component;

by arranging the characteristic fusion device, the mapped sample set still has good separability, and meanwhile, the calculation time for solving the optimal solution by gradient descent is shortened, so that the model precision is improved by subsequent calculation, and the precision loss is reduced;

by adopting a model fusion method to carry out enterprise ex-situ prediction, the accuracy is higher, the model robustness is better, and the comprehensive generalization performance is better;

the set feature collector, the feature generator, the feature extractor, the feature fusion device and the algorithm strategy module are system components which work independently and run mutually without mutual interference.

2. According to the early warning and identifying system for the enterprise migration, the characteristic collector, the characteristic selector, the characteristic fusion device and the algorithm strategy module are arranged, all the modules are mutually cooperated, the system is a complete system with transplanting capability, and the early warning and identifying risk of the enterprise migration can be effectively helped.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of the present system;

FIG. 2 is a schematic structural diagram of a feature collector in the system;

FIG. 3 is a schematic diagram of the structure of a feature generator in the present system;

FIG. 4 is a schematic diagram of the structure of the feature extractor in the present system;

FIG. 5 is a schematic diagram of the structure of the feature selector in the present system;

FIG. 6 is a schematic diagram of the structure of the feature fuser in the present system;

fig. 7 is a schematic diagram of the structure of the algorithm policy module in the present system.

Detailed Description

The technical solution of the present invention will be further described in detail with reference to the following embodiments, but the present invention is not limited thereto.

The invention discloses a method for early warning and identifying the enterprise relocation, which comprises the following steps:

(1) And acquiring qualified enterprise business information data, operator data of the enterprise and related index data of an enterprise supporting policy in the location of the enterprise by using a feature acquisition unit according to specified requirements, matching, splicing, cleaning and standardizing to obtain an enterprise feature data set, and matching the existing ex-situ enterprise example with the enterprise feature data set.

Wherein, the step (1) comprises the following steps:

(1.1) starting a characteristic collector, calling a multi-source data interface, and obtaining dimensional index data of national enterprises, such as enterprise codes, enterprise names, unified social credit codes, enterprise types, latest industry types of the enterprises, registered capital, real payment capital, operation duration, operation range, social security number, external investment times, tax payment A level and the like from enterprise business information data.

And (1.2) acquiring dimension index data such as certificate numbers, fixed telephone installation numbers, broadband installation numbers, last-month fixed telephone installation numbers, last-month broadband installation numbers, new fixed telephone installation numbers, new broadband installation numbers, new fixed telephone dismantling numbers, broadband dismantling numbers, fixed telephone moving numbers, broadband moving numbers and the like of enterprise clients from operator data.

And (1.3) acquiring index data related to enterprise support policies of the enterprise locations from the policy data source, matching the enterprise portrait data of the industrial and commercial enterprises with the enterprise portrait data of the operators according to the social unified credit codes, and matching the index data related to the enterprise support policies of the enterprise locations according to the enterprise locations to obtain a spliced original feature data set.

And (1.4) cleaning and standardizing the spliced original characteristic data set to obtain an enterprise characteristic data set. Calling a characteristic preprocessing interface of the characteristic collector, and processing the missing value, the abnormal value, the discrete type and the numerical type characteristics of the characteristic data in the spliced original characteristic data set; the missing value processing mode includes but is not limited to mean filling, median filling and sample deleting; the abnormal value processing mode is a processing mode which is not limited to mean filling, median filling and sample deleting after the deviation data distribution is detected; the discrete feature processing mode comprises, but is not limited to LabelEncoding, one-HotEncoding and average number coding; the numerical characteristic processing method includes, but is not limited to, a section scaling, binarization, and normalization processing method.

Further, the method also comprises the following steps:

and (1.6) carrying out time processing based on manual experience, data binning processing, combined classification processing and Feature Tools module calling based on automatic integration on the original Feature data set by using a Feature generator.

The calling of the feature generator can specify a feature generation mode of selecting manual experience or a feature generation mode based on automatic integration through the form of parameters. The manual experience based time processing may classify time to generate features such as weekday, weekend or morning, afternoon, evening. The data binning processing is to perform binning analysis on selected feature data to generate corresponding feature classification data, for example, registered capital and operating duration fields of enterprises, and respectively calculate a minimum observed value (lower edge), a 25% quantile (Q1), a median, a 75% quantile (Q3), and a maximum observed value, namely, value domain interval divisions of [ minimum observed value, 25% quantile ], [25% quantile, median ], [ median, 75% quantile ], [75% quantile, and maximum observed value ] are obtained according to data point distribution, and respectively correspond to four

classifications

1,2, 3, and 4. The combined classification processing is to perform feature crossing between fields, and includes but is not limited to a process of generating data features by operations such as addition, subtraction, multiplication and division, or weighting and the like; the FeatureTools module based on automatic integration automatically generates new features through conversion (Transformation) and Aggregation (Aggregation) operations according to the relation between data.

The feature extractor packages an LDA theme model for LDA processing, is used for extracting the theme type of the text of the enterprise supporting policy in the location of each enterprise, and performs One-HotEncoding processing on the type to obtain the enterprise supporting policy label. The LDA topic model recognizes that topics can be represented by a distribution of words and articles can be represented by a distribution of topics. The LDA theme is generated as follows: generation of a document theta sampled from a Dirichlet distribution alpha _i The distribution of topics of; polynomial distribution from topic theta _i Sampling to generate a theme z of a jth word of a document ith _ij (ii) a From DiLekring's beta sampling to generate a topic z _ij Distribution of words

Polynomial distribution from words

Intermediate sampling finally generates word w _ij . Therefore, the obtained supporting policy of the location of the enterprise is matched with the type association of the enterprise, and the enterprise policy label is endowed with a relevant probability positive value in the matching process, otherwise, the value is 0.

The feature extractor packages a cosine similarity method for cosine similarity processing, and is used for calculating the correlation between the words in the enterprise operation category and the enterprise support policy high-frequency words at the location of the enterprise, and the correlation is taken as a new feature. The relevance is based on the text similarity of the Jacard similarity as follows:

wherein, A is the word of enterprise operation category, and B is the enterprise support policy high frequency word of the enterprise location. In general, word-level-based approximation calculation can be used for short texts, and word vectors based on natural language understanding can also be used for measurement.

(2) And selecting important characteristic items from all characteristic items of the enterprise characteristic data set by using a characteristic selector to generate an important characteristic item subset.

The method comprises the following steps:

and (2.1) selecting an optimal characteristic item from all the characteristic items of the enterprise characteristic data set as a selected set.

And (2.2) selecting an optimal characteristic item from the characteristic items in the enterprise characteristic data set except the selected set, adding the characteristic item into the selected set, and calculating the gain value of the selected set at the moment.

(2.3) evaluating the gain value of the selected set, and if the gain value of the selected set is not the maximum, repeating the step (2.2).

In a given feature set a ₁ ,a ₂ ,…,a _n Firstly, an optimal feature item is selected, such as { a } ₂ As the first round selection set. Then, a feature item is added on the basis of the feature item, and a candidate subset containing two feature items is constructed, such as { a } ₂ ，a ₄ And selecting the optimal dual feature subset as a second round of selected subsets, and so on until no more optimal feature subset can be found. Under such a search strategy, for a data set D, assume that the proportion of the ith type sample in D is p _i (i =1,2, \8230;, n), the information entropy is defined as:

for attribute subset A, assume that D is set to V subsets { D) according to its value ₁ ,D ₂ ,…,D _V And the samples of each subset have the same value on a, so we can calculate the information gain of attribute subset a as:

the larger the information Gain (a), the more information the feature subset a contains that contributes to classification. Then, for each candidate feature subset, we compute its information gain value based on the trainable data set D and use this as an evaluation criterion. Combining the feature subset search mechanism and the subset evaluation mechanism is the basic principle of the feature selector.

The feature selector encapsulates Method calls for various feature selections including, but not limited to, filter Method (Filter Method), wrapper Method (Wrapper Method), and embedded Method (Embedding Method).

S21, the filtering method is to select the characteristics of the data setAlternatively, the data is then input for training the model. The Relief (Relevant Features) method designs a "correlation statistic" to measure the importance of Features. The statistic is a vector whose components each correspond to an initial feature, and the importance of the subset of features is determined by the sum of the components of the relevant statistic corresponding to each feature in the subset. Given training set { (x) ₁ ,y ₁ ),(x ₂₁ ,y ₂ ),…,(x _m ,y _m ) For each instance x _i At x _i Searching nearest neighbor x in the same kind of sample _i,nh Called "neighbor-hit", and from x _i In the heterogeneous sample of (2) to find its nearest neighbor x _i,nm Called "false-guess neighbors" (near-miss), and the component of the attribute j corresponding to the correlation statistic is

Wherein

Represents x _a The value taken on the attribute j is,

depending on the type of attribute j: if the attribute j is discrete, then

Time-piece

Otherwise, the value is 1; if the attribute j is continuous, then

Wherein

Has been normalized to [0,1 ]]An interval.

S22, the packaged method directly takes the performance of the finally used learner as the evaluation index of the feature, namely the purpose of the packaged feature selection method is to select the most favorable feature subset of the performance of the given learner. The invention provides a wrapping type feature selection method, which is characterized in that subset search is carried out by using a random strategy under a Las Vegas method (Las Vegasmehod) framework, and the error of a final classifier is used as a feature subset evaluation criterion. The Las Vegas algorithm adopts a random search strategy in the feature subset, and sets the stop condition control parameters, thereby avoiding the problem that the operation cannot be stopped for a long time when the parameters are overlarge.

S23, the embedded method is characterized in that a feature selection process and a learner training process are fused and completed in the same optimization process, namely, feature selection is automatically performed in the learner training process. For a given data set

D＝{(x ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _m ,y _m )}

Wherein x ∈ R ^d Y ∈ R, in one embodiment of the regression model of the present invention, the optimization objective is

Wherein the regularization parameter λ>0, | w | | is L ₁ Normalized norm, L employed by the invention ₁ Norm ratio L ₂ The norm is easier to obtain a sparse solution, i.e. the found w will have fewer non-zero solutions.

(3) And fusing the important feature item subsets into a fused feature data set by using a feature fusion device.

The feature fusion device encapsulates the add fusion method and the concat fusion method. The add fusion method is a parallel strategy, combining the two feature vectors into a complex vector, i.e. for the input feature x ₁ And x ₂ ，z＝x ₁ +ix ₂ Wherein i is an imaginary unit; the concat fusion method is to directly connect two features, if the dimensions of x and y of two input features are p and q, the output featureThe dimension of sign z is p + q.

Further, the feature fusion engine may also encapsulate mid-fusion and back-end fusion on the basis of front-end fusion.

(4) And randomly dividing the fusion characteristic data set into an enterprise characteristic data set for training and an enterprise characteristic data set for testing. Wherein, the data item ratio of the enterprise characteristic data set for training and the enterprise characteristic data set for testing is 1-4: 1.

(5) And inputting the fusion feature data set into an algorithm strategy module to obtain an early warning identification model, inputting the feature data set of the enterprise for testing into the early warning identification model to obtain an external migration probability set of each enterprise for testing, and calculating the evaluation value of the obtained early warning identification model.

The algorithm strategy module trains a plurality of basic learning models by using a K-fold cross validation method, and further fusion is carried out according to output results of the basic learning models, wherein the fusion result is the predicted enterprise migration probability. Specifically, the algorithm strategy module divides the received characteristic data into K equal parts, preferably K =10, of the characteristic data set. The model of each weak learner is used for training K-1 data sets, the rest data sets are used as test sets, and prediction results of all the weak learners are used as training sets and used as input of a fusion model. In particular, the weak learner may be a random forest model, a decision tree model, a support vector machine model, or may be a deep neural network model. The early warning identification method selects a K-fold cross validation method, and the models of logistic regression, lightGBM, random forest, neural network and the like are respectively adopted for traversing and selecting K-1 data in a weak learner; and selecting logistic regression in the fusion learning layer for fusion, and taking model prediction results trained by the previous K weak learners as input of a second layer fusion model. Generally, K > =3, and is a positive integer.

The calculation formula of the evaluation value of the early warning identification model is as follows:

wherein, F1 is an evaluation value, and is a harmonic mean of a Precision value and a Recall value, the Precision value is a proportion of the number of true positive samples in the samples predicted to be positive by the early warning identification model, and the Recall value is a proportion of the number of true positive samples in the actual samples.

(6) And selecting an early warning identification model with good evaluation value, and inputting the enterprise characteristic data needing early warning identification into the selected early warning identification model to obtain an output value, namely the enterprise migratory probability.

The output value range is 0-1, when the output value is less than 0.25, the enterprise migration risk is low risk, when the output value is more than or equal to 0.25 and less than 0.5, the enterprise migration risk is medium risk, when the output value is more than or equal to 0.5 and less than 0.75, the enterprise migration risk is medium risk, and when the output value is more than or equal to 0.75 and less than 1, the enterprise migration risk is high risk.

Output value example results are as follows:

enterprise coding	Probability of migrations	Probability of non-migrant	Label Label
				10**21	0.121	0.879	Low risk of extrinsic migration
10**32	0.622	0.378	Risk of middle and high external migration
				10**13	0.11	0.89	Low risk of extrinsic migration
10**44	0.21	0.79	Low risk of extrinsic migration
				10**15	0.961	0.039	Extremely high risk of migratory
10**75	0.101	0.899	Low risk of extrinsic migration

According to the method for early warning and identifying the enterprise migrations, the characteristics suitable for the early warning and identifying model of the enterprise migrations are screened out from the complex enterprise samples by using the correlation analysis through an objective method, so that the subjectivity and limitation caused by manual screening are avoided, the accuracy is high, the robustness is good, and the comprehensive generalization performance is excellent;

by arranging the feature selector device, the most different information is obtained from the related original feature sets, redundant information generated due to the correlation among different feature sets is eliminated, and the model performance is improved; the characteristic diversity can be further improved, and the system device still has strong robustness under the condition of not calling or missing the system component;

by arranging the characteristic fusion device, the mapped sample set still has good separability, and meanwhile, the calculation time of solving the optimal solution by gradient descent is shortened, so that the model precision is improved by subsequent calculation, and the precision loss is reduced;

by adopting a model fusion method to carry out ex-enterprise migration prediction, the accuracy rate is higher, the model robustness is better, and the comprehensive generalization performance is better;

the set feature collector, the feature generator, the feature extractor, the feature fusion device and the algorithm strategy module are system components working independently and run mutually without mutual interference.

Referring to fig. 1 to 7, the system for early warning and identifying the enterprise migration of the present invention includes:

a feature selector: the feature selector is used for selecting important features from the features output by the feature collector, and aims to select the important features, eliminate dimension disasters of the features, improve model training efficiency and reduce the risk of overfitting.

A feature fusion device: the method is used for fusing the features output by the feature collector and the features output by the feature selector, and the feature fusion device is constructed to realize advantage complementation of multiple features and obtain a better robustness and accuracy recognition result for the early warning recognition model.

An algorithm strategy module: the early warning identification module is used for training a plurality of weak learners and fusing and constructing an early warning identification model according to output results of the weak learners, and the algorithm strategy module is constructed to improve the accuracy rate and precision rate of the algorithm on a model layer.

The output end of the feature collector is connected with the input end of the feature selector, the output end of the feature selector is connected with the input end of the feature fusion device, and the output end of the feature fusion device is connected with the input end of the algorithm strategy module.

According to the early warning and identifying system for the ex-enterprise migration, the characteristic collector, the characteristic selector, the characteristic fusion device and the algorithm strategy module are arranged, all the modules are mutually cooperated, the system is a complete system with transplanting capability, and the early warning and identifying system for the ex-enterprise migration risk can effectively help.

Further, the method also comprises the following steps:

a feature generator: the feature generator is used for creating new data features by the mode of model label prediction of the data output by the feature collector, and can perform deep processing on feature data sets output by the feature collector.

A feature extractor: the feature extractor can extract features of the text data in the spliced original feature data set.

The device for early warning and identifying the enterprise migration comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to realize the steps of the method.

The above description is only exemplary of the invention, and any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention should be considered within the scope of the present invention.

Claims

1. A method for early warning and identifying the enterprise external migration is characterized by comprising the following steps:

(5) Inputting the training enterprise characteristic data set into an algorithm strategy module to obtain an early warning identification model, inputting the testing enterprise characteristic data set into the early warning identification model to obtain an external transition probability set of each testing enterprise, and calculating an evaluation value of the obtained early warning identification model;

(6) And selecting the early warning identification model with good evaluation value obtained by training, and inputting the enterprise characteristic data needing early warning identification into the selected early warning identification model to obtain an output value, namely the enterprise migratory probability.

2. The method for early warning and identifying the enterprise outside migration according to claim 1, wherein the step (1) comprises the following steps:

(1.3) acquiring index data related to enterprise support policies of the locations of the enterprises from a policy data source, matching the enterprise portrait data of the enterprises and the enterprise portrait data of the operators according to social unified credit codes, and matching the index data related to the enterprise support policies of the locations of the enterprises according to the locations of the enterprises to obtain a spliced original feature data set;

and (1.5) matching the existing ex-transit enterprise examples with the enterprise characteristic data set, wherein the successfully matched enterprises are marked as positive examples, and the unsuccessfully matched enterprises are marked as negative examples.

3. The method for early warning and identifying the enterprise outside migration according to claim 2, wherein the step (1) further comprises the following steps:

4. The method for early warning identification of enterprise relocation according to claim 1, wherein the step (2) comprises the following steps:

5. The method for early warning and identifying the enterprise outside migration according to claim 4, wherein in the step (2.2), the gain value is calculated according to the formula:

the formula of the information entropy Ent (D) is:

6. The method for early warning and identifying the enterprise outside the enterprise according to claim 1, wherein in the step (5), the calculation formula of the evaluation value of the early warning and identifying model is:

7. The method for early warning and identifying the enterprise migration according to claim 1, wherein in the step (6), the output value ranges from 0 to 1, when the output value is less than 0.25, the enterprise migration risk is low risk, when the output value is greater than or equal to 0.25 and less than 0.5, the enterprise migration risk is medium risk, when the output value is greater than or equal to 0.5 and less than 0.75, the enterprise migration risk is medium risk, and when the output value is greater than or equal to 0.75 and less than 1, the enterprise migration risk is very high risk.

8. The system for early warning identification of enterprise relocation as claimed in claim 1, comprising:

9. The system for early warning identification of enterprise relocation as claimed in claim 8, further comprising:

10. An apparatus for early warning of enterprise migration according to claims 8-9, comprising a memory and a processor, wherein the memory stores a computer program, and wherein the processor implements the steps of the method according to any of claims 1-7 when executing the computer program.