CN114676961A - Enterprise external migration risk prediction method and device and computer readable storage medium - Google Patents
Enterprise external migration risk prediction method and device and computer readable storage medium Download PDFInfo
- Publication number
- CN114676961A CN114676961A CN202210165505.8A CN202210165505A CN114676961A CN 114676961 A CN114676961 A CN 114676961A CN 202210165505 A CN202210165505 A CN 202210165505A CN 114676961 A CN114676961 A CN 114676961A
- Authority
- CN
- China
- Prior art keywords
- enterprise
- migration
- data set
- sample data
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013508 migration Methods 0.000 title claims abstract description 192
- 230000005012 migration Effects 0.000 title claims abstract description 164
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000003860 storage Methods 0.000 title claims abstract description 8
- 238000003066 decision tree Methods 0.000 claims abstract description 81
- 230000001617 migratory effect Effects 0.000 claims abstract description 30
- 238000012545 processing Methods 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 10
- 238000004140 cleaning Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 238000007499 fusion processing Methods 0.000 claims description 5
- 238000010801 machine learning Methods 0.000 abstract 2
- 238000011066 ex-situ storage Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 238000012549 training Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013480 data collection Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012958 reprocessing Methods 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Quality & Reliability (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- General Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Artificial Intelligence (AREA)
- Tourism & Hospitality (AREA)
- Operations Research (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a method and a device for predicting the risk of enterprise migration and a computer readable storage medium, wherein the method comprises the following steps: acquiring an enterprise ex-migration information data set, wherein the enterprise ex-migration information data set comprises a plurality of enterprise data, and each enterprise data comprises an ex-migration state and a plurality of enterprise characteristics of a corresponding enterprise; vectorizing the enterprise external migration information data set to obtain a plurality of sample data corresponding to the enterprise data one by one; constructing an enterprise external migration decision tree according to the sample data; when whether the target enterprise has the migratory risk or not is evaluated, inputting the enterprise characteristics of the target enterprise into the enterprise migratory decision tree to obtain a migratory risk prediction result of the target enterprise. Compared with the prior art, the method and the device realize the prediction of the enterprise migration risk by using the machine learning of the decision tree, greatly reduce the workload of manual identification, avoid manual misjudgment and greatly improve the accuracy of the prediction result by the machine learning of big data.
Description
Technical Field
The invention relates to the technical field of risk assessment, in particular to a method and a device for predicting an enterprise ex-situ migration risk and a computer readable storage medium.
Background
The enterprise migration is a comprehensive selection made on resources, elements, markets, environments and the like in the development and growth processes of enterprises, and is a necessary product for the market economy to develop to a certain stage. For some manufacturing enterprises, the enterprises are easy to move out under the influence of factors such as expansion requirements, cost rise of comprehensive operators, and increase of quotation of sponsors in other cities. For a certain region, when partial enterprises move outside, the whole enterprise moves outside easily, and meanwhile, the tap enterprises move outside easily to cause the risk that the supply chain follows the outside, so that the integrity of the industrial chain is endangered, and a huge crisis is brought to the development of related small and medium-sized enterprises.
Therefore, timely identification of the enterprise migration risk is crucial to regional development. At present, an effective enterprise ex-transit risk early warning mechanism does not exist in the market, mostly, judgment is carried out through manual investigation, and manual judgment is large in workload and high in misjudgment rate.
Disclosure of Invention
In view of this, it is necessary to provide a method and an apparatus for predicting an enterprise migration risk, and a computer readable storage medium, so as to solve the technical problems of large workload and low accuracy rate when an enterprise migration risk is manually identified in the prior art.
In order to achieve the above object, an embodiment of the present invention provides a method for predicting an enterprise migratory risk, including the following steps:
acquiring an enterprise ex-migration information data set, wherein the enterprise ex-migration information data set comprises a plurality of enterprise data, and each enterprise data comprises an ex-migration state and a plurality of enterprise characteristics of a corresponding enterprise;
vectorizing the enterprise external migration information data set to obtain a plurality of sample data which are in one-to-one correspondence with the enterprise data;
constructing an enterprise external migration decision tree according to the sample data;
when whether the target enterprise has the migratory risk is evaluated, inputting the enterprise characteristics of the target enterprise into the enterprise migratory decision tree to obtain a migratory risk prediction result of the target enterprise.
Optionally, the step of constructing an enterprise external migration decision tree according to the sample data includes:
selecting Z groups of sample data which are not identical from the sample data, wherein Z is a natural number greater than or equal to 2;
constructing corresponding Z enterprise external migration decision trees according to the Z groups of sample data;
correspondingly, the step of inputting the enterprise characteristics of the target enterprise into the enterprise migratory decision tree to obtain the migratory risk prediction result of the target enterprise includes:
Respectively inputting the enterprise characteristics of the target enterprise into the Z enterprise migration decision trees;
and determining the external migration risk prediction result of the target enterprise according to the Z enterprise external migration risk prediction results output by the Z enterprise external migration decision trees.
Optionally, the step of determining the external migration risk prediction result of the target enterprise according to the Z enterprise external migration risk prediction results output by the Z enterprise external migration decision trees includes:
and according to an absolute majority voting method, taking the enterprise external migration risk prediction results with the quantity exceeding Z/2 in the Z enterprise external migration risk prediction results as the external migration risk prediction results of the target enterprise.
Optionally, when the number of the sample data is N and the number of the enterprise features is M, the step of selecting Z groups of sample data that are not identical from the sample data includes:
and performing Z times of random sampling from the sample data by a replaced random sampling method, wherein each time of random sampling is used for randomly selecting N sample data from N sample data, and randomly selecting M enterprise features from M enterprise features to form a group of sample data, wherein M and N are natural numbers larger than or equal to 2, and N is smaller than N and M is smaller than M.
Optionally, the step of constructing an enterprise external migration decision tree according to the sample data further includes:
judging whether the enterprise migration risk prediction results obtained when Z enterprise migration decision trees are constructed meet the accuracy index;
when the accuracy index is not met, adding an enterprise external migration decision tree;
and updating Z by Z +1 and repeatedly executing the step of judging whether the enterprise migration risk prediction results obtained when the Z enterprise migration decision trees are constructed meet the accuracy index until the judgment result is yes.
Optionally, the step of constructing an enterprise external migration decision tree according to the sample data includes:
and constructing the enterprise external migration decision tree through a CART classification tree algorithm according to the sample data.
Optionally, the step of obtaining the enterprise migration information dataset includes:
acquiring an enterprise external migration data set and an enterprise information data set;
and carrying out fusion processing on the enterprise external migration data set and the enterprise information data set to obtain the enterprise external migration information data set.
Optionally, before the step of performing fusion processing on the extraenterprise migration data set and the enterprise information data set, the method further includes:
and performing data cleaning on the enterprise external migration data set and the enterprise information data set, wherein the data cleaning comprises deduplication processing and/or missing value processing.
Another embodiment of the present invention provides an enterprise migration risk prediction apparatus, including a memory and a processor, where the memory stores a computer program, and when the computer program is executed by the processor, the processor is enabled to implement the enterprise migration risk prediction method as described above.
Another embodiment of the present invention provides a computer-readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor to perform the method for enterprise migration risk prediction as described above.
Compared with the prior art, the enterprise ex-situ risk prediction method provided by the embodiment of the invention comprises the steps of firstly obtaining an enterprise ex-situ information data set, then carrying out vectorization processing on the enterprise ex-situ information data set to obtain a plurality of sample data corresponding to enterprise data one by one, then constructing an enterprise ex-situ decision tree according to the sample data, and when whether a certain target enterprise has an ex-situ risk needs to be evaluated, inputting enterprise characteristics of the target enterprise into the enterprise ex-situ decision tree to obtain an ex-situ risk prediction result of the target enterprise.
Drawings
Fig. 1 is a flowchart of an embodiment of a method for predicting an enterprise migratory risk according to the present invention.
Fig. 2 is a block diagram illustrating an embodiment of an enterprise migratory risk prediction apparatus according to the present invention.
The implementation, functional features and advantages of the present invention will be further described with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
Example 1
Referring to fig. 1, fig. 1 is a flowchart illustrating an enterprise migratory risk prediction method according to an embodiment of the present invention. As shown in fig. 1, it comprises the following steps:
step S101, an enterprise migration information data set is obtained, the enterprise migration information data set comprises a plurality of enterprise data, and each enterprise data comprises a migration state and a plurality of enterprise characteristics of a corresponding enterprise.
Specifically, the step of obtaining the enterprise external migration information data set includes:
(1) and acquiring an enterprise external migration data set and an enterprise information data set.
The enterprise migration data set comprises migration information of the enterprise, such as a name, migration time, migration state and other information of the enterprise, and can be acquired through a relevant government class website, for example, the enterprise migration data set is acquired from a government class website or a policy class website in a crawler mode, a manual mode and other modes. The enterprise information data set includes basic information of an enterprise, such as enterprise name, enterprise type, business information, update time and other data, and may be acquired through an enterprise query website, or may be acquired from the enterprise website or a third-party enterprise query website in a crawler, manual or other manner. It should be noted that, when data collection is performed, the time dimensions of the enterprise migration data set and the enterprise information data set need to be kept consistent, and if the migration time is the current month, the last update time of the enterprise information should also be the current month, so that inconsistency of the two types of data in the time dimensions is avoided. In addition, the enterprise extravagant data set and the enterprise information data set are usually structured data, and when the collection is completed, the two types of data are stored in an excel file form or a database form.
(2) And carrying out fusion processing on the enterprise ex-migration data set and the enterprise information data set to obtain the enterprise ex-migration information data set.
Specifically, the step of performing fusion processing on the ex-enterprise migration data set and the enterprise information data set further includes:
and carrying out data cleaning on the enterprise external migration data set and the enterprise information data set, wherein the data cleaning comprises duplication removal processing and missing value processing. Of course, data cleansing is not limited to the above, and may include other cleansing steps.
And the duplication removal processing is used for removing duplicate records in the enterprise external migration data set and the enterprise information data set, such as: repeated items can be removed according to partial fields (such as enterprise names) or whole lines through SQL statements, the uniqueness of the enterprise external migration data set and any enterprise related information (including external migration information and basic information) in the enterprise information data set is ensured, and errors caused by repeated records when the decision tree is built by utilizing the data subsequently are avoided.
And missing value processing is used for filling all data items in the enterprise external migration data set and the enterprise information data set. For data items of text type, the filling is not classified by default, and for data items of numerical type, the completion is carried out according to the average value of the data items in the data set.
In addition to reprocessing and missing value processing, other cleansing operations may be included, such as removing extraneous data items in the data set, including system time, modification time, and the like.
After data cleaning, the cleaned enterprise external migration data set and enterprise information data set are fused, specifically, the fusion is the combination of the two data sets. And if the enterprise external migration data set comprises the external migration information of the company A, supplementing the basic information of the company A in the enterprise information data set to the enterprise external migration data set, and combining to obtain the enterprise external migration data set comprising the external migration information of the company A and the basic information, namely the enterprise external migration information data set. Specifically, the merging of the two data sets may be performed by the join instruction of sql. Certainly, in other embodiments, a data set may also be newly created, and then the migration information of each company in the enterprise migration data set and the basic information of each company in the enterprise information data set are added to the data set, the obtained data set includes a plurality of pieces of enterprise data, each piece of enterprise data includes the migration information of one company and the basic information corresponding to the same, and the data set is the enterprise migration information data set. The enterprise ex-migration information data is centralized, the ex-migration information comprises an ex-migration state, the enterprise types, the operation information and other contents in the basic information are enterprise characteristics, and bases can be provided for ex-migration prediction by analyzing the enterprise characteristics of a plurality of enterprises in the ex-migration state.
Step S102, vectorization processing is carried out on the enterprise external migration information data set, and a plurality of sample data which correspond to the enterprise data one by one are obtained.
Specifically, in order to generate the decision tree subsequently, the enterprise migration information data set needs to be vectorized and converted into a numerical form that can be determined by the computer. In specific implementation, two schemes can be adopted:
in the first scheme, an automatic tool, such as a DictVectorizer of sklern, is used to vectorize text data, such as a data item in the [ external migration state ], which includes two values of yes and no, after vectorization, yes is converted into [1, 0], and no is converted into [0, 1], and the computer can recognize and process the data item in the [ external migration state ] under the format.
And a second scheme, manually labeling and vectorizing each data item, such as: manually modifying the attribute of the data item of the ' external migration state ' of the external migration information data set of the enterprise, marking ' yes ' as [1, 0 ', and marking ' no ' as [0, 1 ].
Vectorizing the external migration state and each enterprise characteristic in one piece of enterprise data, and taking the obtained vectorized data as sample data of the enterprise. Correspondingly, a plurality of enterprise data included in the enterprise migration information data set are vectorized and converted into a format which can be processed by a computer, and a plurality of sample data which correspond to the enterprise data one by one can be obtained and used for constructing the enterprise migration decision tree.
And S103, constructing an enterprise external migration decision tree according to the sample data.
In the embodiment, the enterprise external migration decision tree is constructed through a CART classification tree algorithm. Specifically, the process of constructing the enterprise external migration decision tree through the CART classification tree algorithm comprises the following steps:
dividing the sample data into a test set and a training set according to a certain proportion, setting a threshold value of a kini coefficient, and then training according to the following steps:
the algorithm starts from a root node, and recursively builds a CART classification tree (namely an enterprise external migration decision tree) by using a training set.
(1) Z1 for the current node, if the sample has no features, then a decision sub-tree is returned and the current node stops recursion.
(2) And calculating the kini coefficient of the sample set, if the kini coefficient is smaller than a threshold value, returning to the decision tree subtree, and stopping recursion of the current node.
(3) And calculating the Keyny coefficient of each characteristic value of each existing characteristic of the current node to the data set Z1.
(4) And selecting the feature a with the smallest kini coefficient and the corresponding feature value a from the calculated kini coefficients of the feature value pair data sets Z1. According to the optimal characteristic and the optimal characteristic value, the data set is divided into two parts Z11 and Z12, and the left node and the right node of the current node are established simultaneously, wherein the data set of the left node is Z11, and the data set of the right node is Z12.
(5) And (4) calling the left and right child nodes in a recursion mode to generate a decision tree.
In the above description, the principle of constructing a decision tree by the CART classification tree algorithm is the prior art, and in this embodiment, an enterprise ex-migration decision tree is constructed by using sample data of enterprise ex-migration for predicting the risk of enterprise ex-migration.
And step S104, when whether the target enterprise has the relocation risk is evaluated, inputting the enterprise characteristics of the target enterprise into an enterprise relocation decision tree to obtain a relocation risk prediction result of the target enterprise.
After a decision tree is constructed according to the existing enterprise external migration information data set, the decision tree can be used for predicting whether an enterprise has external migration risks. If the enterprise (marked as a target enterprise) is evaluated to have the migratory risk, acquiring a plurality of enterprise characteristics of the target enterprise, and inputting the enterprise characteristics into an enterprise migratory decision tree as input, wherein the output of the enterprise migratory decision tree is the migratory risk prediction result of the target enterprise. The CART classification tree using the Gini coefficient only has two classifications, and the output result only has two categories, namely the outward migration tendency and the non-outward migration tendency, namely the outward migration risk prediction result is the outward migration tendency and the non-outward migration tendency.
Compared with the prior art, the enterprise migration risk prediction method includes the steps of firstly obtaining an enterprise migration information data set, conducting vectorization processing on the enterprise migration information data set to obtain a plurality of sample data corresponding to enterprise data one by one, then constructing an enterprise migration decision tree according to the sample data, inputting enterprise features of a target enterprise into the enterprise migration decision tree when whether the migration risk exists in the target enterprise or not needs to be evaluated, and then obtaining a migration risk prediction result of the target enterprise.
In some embodiments, in order to further improve the accuracy of the prediction result, a plurality of enterprise migratory decision trees are constructed by sample data, and a final prediction result is determined from the prediction results of the plurality of enterprise migratory decision trees by using an absolute majority voting method. Specifically, in some embodiments, the number of the enterprise migration decision trees is Z, where Z is a natural number greater than or equal to 2, and the step of constructing the enterprise migration decision trees according to the sample data in step S103 includes:
And step S1031, selecting Z groups of sample data which are not identical from the sample data, wherein each group of sample data is used for constructing an enterprise external migration decision tree. Specifically, when the number of sample data is N and the number of enterprise features in each sample data is M, the step of selecting Z groups of sample data that are not identical from the sample data includes: and performing Z times of random sampling from the sample data by a replaced random sampling method, wherein each time of random sampling is used for randomly selecting N sample data from N sample data, and randomly selecting M enterprise features from M enterprise features to form a group of sample data, wherein M and N are natural numbers greater than or equal to 2, and N is less than N and M is less than M. That is, each set of sample data includes n sample data, and each sample data includes m enterprise features.
And S1032, constructing corresponding Z enterprise external migration decision trees according to the Z groups of sample data. Namely, for each group of sample data, n sample data and m enterprise features are utilized to construct an enterprise external migration decision tree. It should be noted that, since the Z group of sample data is obtained by a replaced random sampling method, any group of sample data in the Z group of sample data may be the same as or different from other groups of sample data, and the corporate features in the extracted sample data may be the same or different from each group of sample data.
It should be noted that the value of Z can be adjusted according to the requirement of accuracy, if the accuracy does not meet the expected requirement, 1 decision tree is added on the original basis each time, and the accuracy is recorded until the accuracy is less than the past training record. Therefore, after Z enterprise migration decision trees are constructed, in some embodiments, in order to implement the expected quasi-going rate index, the step of constructing the enterprise migration decision tree according to the sample data in step S103 further includes: (1) judging whether the enterprise migration risk prediction results obtained when Z enterprise migration decision trees are constructed meet the accuracy index; (2) when the accuracy index is not met, newly adding an enterprise external migration decision tree; (2) and updating Z by Z +1 and repeatedly executing the step of judging whether the enterprise migration risk prediction results obtained when the Z enterprise migration decision trees are constructed meet the accuracy index until the judgment result is yes, namely until the enterprise migration risk prediction results meet the accuracy index.
Correspondingly, the step of inputting the enterprise characteristics of the target enterprise into the enterprise migratory decision tree in step S104 to obtain the migratory risk prediction result of the target enterprise includes:
(1) and respectively inputting the enterprise characteristics of the target enterprise into Z enterprise migration decision trees. Correspondingly, since m enterprise features utilized by each enterprise migration decision tree may be different, when the enterprise features of the target enterprise are respectively input into Z enterprise migration decision trees in this step, m enterprise features corresponding to each enterprise migration decision tree are input.
(2) And determining the ex-migration risk prediction result of the target enterprise according to the Z enterprise ex-migration risk prediction results output by the Z enterprise ex-migration decision trees. Specifically, the step of determining the ex-enterprise risk prediction result of the target enterprise according to the Z enterprise ex-enterprise risk prediction results output by the Z enterprise ex-enterprise decision trees includes: and according to an absolute majority voting method, taking the enterprise ex-migration risk prediction results with the quantity exceeding Z/2 in the Z enterprise ex-migration risk prediction results as the ex-migration risk prediction results of the target enterprise. As the prediction result of the ex-enterprise risk is only as follows: and therefore, if the number of the migrations in the Z enterprise migrations risk prediction results exceeds half, the target enterprise is marked as: if the number of the non-migratory tendency exceeds the general number, the target enterprise is marked as: if there is no tendency of migration, and the number of the tendency of migration is equal to that of migration, a rule may be preset to mark the tendency of migration as one of the tendency of migration.
In the process of constructing the decision tree, the closer the value selected by M is to M, the greater the relevance of the decision tree is, the greater the error rate in the prediction is, and the greater the difference between the value selected by M and M is, the factors (namely enterprise features) influencing the migration outside the enterprise cannot be fully utilized, and the accuracy of the prediction result can also be reduced. Based on the above, on one hand, the value of m can be adjusted according to the prediction effect, and specifically, the adjustment can be performed by referring to the adjustment method of the number Z of the enterprise external migration decision trees. On the other hand, in some embodiments, a final prediction result may be determined by generating a plurality of enterprise migration risk decision trees and using an absolute majority voting method, and different enterprise migration risk decision trees may use a plurality of enterprise features that are not identical, so that the plurality of enterprise migration risk decision trees may fully cover more enterprise features than a single enterprise, that is, enterprise features in sample data may be fully used to predict enterprise migration risks, which may improve accuracy of the prediction result to a greater extent.
Example 2
The embodiment of the invention also provides an enterprise migration risk prediction device, which comprises a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor is enabled to realize the enterprise migration risk prediction method as in the embodiment 1.
The principle of the enterprise migration risk prediction apparatus according to this embodiment is the same as that of the method of the enterprise migration risk prediction apparatus according to embodiment 1, and will not be described in detail here.
Example 3
Referring to fig. 2, an embodiment of the present invention further provides an enterprise migration risk prediction apparatus 100, which includes:
the system comprises a data acquisition module 10, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring an enterprise external migration information data set, the enterprise external migration information data set comprises a plurality of enterprise data, and each enterprise data comprises an external migration state and a plurality of enterprise characteristics of a corresponding enterprise;
a data vectorization module 11, configured to perform vectorization processing on the enterprise external migration information data set to obtain a plurality of sample data corresponding to the enterprise data one to one;
a decision tree generating module 12, configured to construct an enterprise external migration decision tree according to the sample data;
and the risk prediction module 13 is configured to, when it is evaluated whether the target enterprise has the migratory risk, input the enterprise characteristics of the target enterprise into the enterprise migratory decision tree to obtain a migratory risk prediction result of the target enterprise.
The specific implementation methods of the data obtaining module 10, the data vectorization module 11, the decision tree generation module 12 and the risk prediction module 13 may refer to the corresponding descriptions in the embodiment shown in fig. 1, and are not described in detail here.
Example 4
The present embodiments provide a computer-readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor to perform the method for enterprise migratory risk prediction as set forth above.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (10)
1. A method for predicting the risk of enterprise migration is characterized by comprising the following steps:
acquiring an enterprise ex-migration information data set, wherein the enterprise ex-migration information data set comprises a plurality of enterprise data, and each enterprise data comprises an ex-migration state and a plurality of enterprise characteristics of a corresponding enterprise;
vectorizing the enterprise external migration information data set to obtain a plurality of sample data corresponding to the enterprise data one by one;
constructing an enterprise external migration decision tree according to the sample data;
when whether the target enterprise has the migratory risk is evaluated, inputting the enterprise characteristics of the target enterprise into the enterprise migratory decision tree to obtain a migratory risk prediction result of the target enterprise.
2. The method according to claim 1, wherein the step of constructing an enterprise migration decision tree according to the sample data comprises:
selecting Z groups of sample data which are not identical from the sample data, wherein Z is a natural number greater than or equal to 2;
constructing corresponding Z enterprise external migration decision trees according to the Z groups of sample data;
correspondingly, the step of inputting the enterprise characteristics of the target enterprise into the enterprise migratory decision tree to obtain the migratory risk prediction result of the target enterprise includes:
Respectively inputting the enterprise characteristics of the target enterprise into the Z enterprise migration decision trees;
and determining the external migration risk prediction result of the target enterprise according to the Z enterprise external migration risk prediction results output by the Z enterprise external migration decision trees.
3. The method for predicting the risk of migrating outside the enterprise according to claim 2, wherein the step of determining the risk prediction result of migrating outside the enterprise of the target enterprise according to the Z risk prediction results of migrating outside the enterprise outputted by the Z decision trees for migrating outside the enterprise comprises:
and according to an absolute majority voting method, taking the enterprise ex-migration risk prediction results with the quantity exceeding Z/2 in the Z enterprise ex-migration risk prediction results as the ex-migration risk prediction results of the target enterprise.
4. The method according to claim 2, wherein when the number of the sample data is N and the number of the enterprise features is M, the step of selecting Z groups of sample data which are not identical from the sample data comprises:
and performing Z times of random sampling from the sample data by a replaced random sampling method, wherein each time of random sampling is used for randomly selecting N sample data from N sample data, and randomly selecting M enterprise features from M enterprise features to form a group of sample data, wherein M and N are natural numbers greater than or equal to 2, and N is less than N and M is less than M.
5. The method according to claim 2, wherein the step of constructing an enterprise migration decision tree according to the sample data further comprises:
judging whether the enterprise migration risk prediction results obtained when Z enterprise migration decision trees are constructed meet the accuracy index;
when the accuracy index is not met, adding an enterprise external migration decision tree;
and updating Z by Z +1 and repeatedly executing the step of judging whether the enterprise migration risk prediction results obtained when the Z enterprise migration decision trees are constructed meet the accuracy index until the judgment result is yes.
6. The method according to claim 1, wherein the step of constructing an enterprise migration decision tree according to the sample data comprises:
and constructing the enterprise external migration decision tree through a CART classification tree algorithm according to the sample data.
7. The method for predicting the risk of enterprise migration according to claim 1, wherein the step of obtaining the data set of the enterprise migration information comprises:
acquiring an enterprise external migration data set and an enterprise information data set;
and carrying out fusion processing on the enterprise external migration data set and the enterprise information data set to obtain the enterprise external migration information data set.
8. The method for predicting the risk of migrating outside the enterprise according to claim 7, wherein the step of fusing the migrating outside the enterprise data set and the enterprise information data set further comprises:
and performing data cleaning on the enterprise external migration data set and the enterprise information data set, wherein the data cleaning comprises deduplication processing and/or missing value processing.
9. An enterprise migration risk prediction device comprising a memory and a processor, wherein the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to implement the enterprise migration risk prediction method according to any one of claims 1 to 8.
10. A computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor to perform the steps of the method of enterprise migration risk prediction according to any of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210165505.8A CN114676961A (en) | 2022-02-23 | 2022-02-23 | Enterprise external migration risk prediction method and device and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210165505.8A CN114676961A (en) | 2022-02-23 | 2022-02-23 | Enterprise external migration risk prediction method and device and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114676961A true CN114676961A (en) | 2022-06-28 |
Family
ID=82071885
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210165505.8A Pending CN114676961A (en) | 2022-02-23 | 2022-02-23 | Enterprise external migration risk prediction method and device and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114676961A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115660796A (en) * | 2022-12-09 | 2023-01-31 | 北京中科闻歌科技股份有限公司 | Tax fund management method, device, equipment and storage medium for migration risk enterprise |
CN116739395A (en) * | 2023-08-15 | 2023-09-12 | 浙江同信企业征信服务有限公司 | Enterprise outward migration prediction method, device, equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109377058A (en) * | 2018-10-26 | 2019-02-22 | 中电科新型智慧城市研究院有限公司 | The enterprise of logic-based regression model moves outside methods of risk assessment |
CN109657978A (en) * | 2018-12-19 | 2019-04-19 | 重庆誉存大数据科技有限公司 | A kind of Risk Identification Method and system |
CN112527958A (en) * | 2020-12-11 | 2021-03-19 | 平安科技(深圳)有限公司 | User behavior tendency identification method, device, equipment and storage medium |
-
2022
- 2022-02-23 CN CN202210165505.8A patent/CN114676961A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109377058A (en) * | 2018-10-26 | 2019-02-22 | 中电科新型智慧城市研究院有限公司 | The enterprise of logic-based regression model moves outside methods of risk assessment |
CN109657978A (en) * | 2018-12-19 | 2019-04-19 | 重庆誉存大数据科技有限公司 | A kind of Risk Identification Method and system |
CN112527958A (en) * | 2020-12-11 | 2021-03-19 | 平安科技(深圳)有限公司 | User behavior tendency identification method, device, equipment and storage medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115660796A (en) * | 2022-12-09 | 2023-01-31 | 北京中科闻歌科技股份有限公司 | Tax fund management method, device, equipment and storage medium for migration risk enterprise |
CN116739395A (en) * | 2023-08-15 | 2023-09-12 | 浙江同信企业征信服务有限公司 | Enterprise outward migration prediction method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7169369B2 (en) | Method, system for generating data for machine learning algorithms | |
US10678810B2 (en) | System for data management in a large scale data repository | |
CN109446341A (en) | The construction method and device of knowledge mapping | |
WO2018051097A1 (en) | System for analysing data relationships to support query execution | |
CN114676961A (en) | Enterprise external migration risk prediction method and device and computer readable storage medium | |
CN112463774B (en) | Text data duplication eliminating method, equipment and storage medium | |
CN107004141A (en) | To the efficient mark of large sample group | |
CN112434024A (en) | Relational database-oriented data dictionary generation method, device, equipment and medium | |
CN116171453A (en) | Method and apparatus for generating and utilizing knowledge patterns for manufacturing simulation models | |
CN111190968A (en) | Data preprocessing and content recommendation method based on knowledge graph | |
CN101546290A (en) | Method for improving accuracy of quality forecast of class hierarchy in object-oriented software | |
CN112416904A (en) | Electric power data standardization processing method and device | |
CN116244333A (en) | Database query performance prediction method and system based on cost factor calibration | |
Chu et al. | Automatic data extraction of websites using data path matching and alignment | |
Riesener et al. | Methodology for Automated Master Data Management using Artificial Intelligence | |
Glake et al. | Data management in multi-agent simulation systems | |
CN111143356B (en) | Report retrieval method and device | |
CN116414872B (en) | Data searching method and system based on natural language identification and knowledge graph | |
CN116260866A (en) | Government information pushing method and device based on machine learning and computer equipment | |
Listl et al. | An Architecture for Knowledge Graph based Simulation Support | |
CN117501275A (en) | Method, computer program product and computer system for analyzing data consisting of a large number of individual messages | |
Ndung'u | Data Preparation for Machine Learning Modelling | |
KR102693831B1 (en) | System and method for generating synthetic data automatically for detecting constraints | |
CN117251605B (en) | Multi-source data query method and system based on deep learning | |
Rabe et al. | A procedure model for the credible measurability of data warehouse metrics on discrete-event simulation models of logistics systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |