CN116342141A - Method, device and equipment for identifying empty shell enterprises - Google Patents

Method, device and equipment for identifying empty shell enterprises Download PDF

Info

Publication number
CN116342141A
CN116342141A CN202211623210.7A CN202211623210A CN116342141A CN 116342141 A CN116342141 A CN 116342141A CN 202211623210 A CN202211623210 A CN 202211623210A CN 116342141 A CN116342141 A CN 116342141A
Authority
CN
China
Prior art keywords
enterprise
features
identified
blank
empty
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211623210.7A
Other languages
Chinese (zh)
Inventor
杨岱川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202211623210.7A priority Critical patent/CN116342141A/en
Publication of CN116342141A publication Critical patent/CN116342141A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • G06Q30/0185Product, service or business identity fraud
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The embodiment of the specification discloses a method, a device and equipment for identifying a blank enterprise, wherein the method can acquire multidimensional original data of the enterprise to be identified; extracting a plurality of first features and a plurality of second features of the enterprise to be identified from the multidimensional raw data; inputting a plurality of first features of the enterprise to be identified into a scoring card model to obtain scores of the enterprise to be identified, wherein the scoring card model is constructed based on the multidimensional original data of the sample enterprise, the plurality of first features and empty enterprise labels; respectively inputting a plurality of second features of the enterprise to be identified into at least one classification model to obtain at least one identification result of whether the enterprise to be identified is a blank enterprise, wherein the at least one classification model is obtained based on the plurality of second features of a sample enterprise and blank enterprise labels; and obtaining the empty shell recognition result of the enterprise to be recognized based on the score and the at least one recognition result.

Description

Method, device and equipment for identifying empty shell enterprises
Technical Field
The present document relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for identifying a blank enterprise.
Background
A shell company (She 11 Corporation), also known as a top company or paper company, generally refers to a legal entity that registers through a regular route, but does not actually conduct or lacks the assets necessary for an actual operation.
Empty enterprises present significant risks to financial institutions. However, the financial institutions cannot consider the interpretability and accuracy of the recognition results due to the restriction of the recognition means.
Therefore, it is needed to propose an intelligent identification scheme for a blank enterprise, which can give consideration to the interpretability and accuracy of the identification result.
Disclosure of Invention
The embodiment of the specification provides a method, a device and equipment for identifying a blank enterprise, so as to consider the interpretability and the accuracy of an identification result.
In order to solve the above technical problems, the embodiments of the present specification are implemented as follows:
in a first aspect, a method for identifying a blank enterprise is provided, including:
acquiring multidimensional original data of an enterprise to be identified;
extracting a plurality of first features and a plurality of second features of the enterprise to be identified from the multidimensional raw data;
inputting a plurality of first features of the enterprise to be identified into a scoring card model to obtain scores of the enterprise to be identified, wherein the scoring card model is constructed based on the multidimensional original data of the sample enterprise, the plurality of first features and empty enterprise labels;
Respectively inputting a plurality of second features of the enterprise to be identified into at least one classification model to obtain at least one identification result of whether the enterprise to be identified is a blank enterprise, wherein the at least one classification model is obtained based on the plurality of second features of a sample enterprise and blank enterprise labels;
and obtaining the empty shell recognition result of the enterprise to be recognized based on the score and the at least one recognition result.
In a second aspect, a device for identifying a vacant shell enterprise is provided, including:
the first acquisition module acquires multidimensional original data of an enterprise to be identified;
the feature extraction module is used for extracting a plurality of first features and a plurality of second features of the enterprise to be identified from the multidimensional raw data;
the first input module is used for inputting a plurality of first features of the enterprise to be identified into a scoring card model to obtain the score of the enterprise to be identified, wherein the scoring card model is constructed based on the multidimensional original data of the sample enterprise, the plurality of first features and the empty enterprise label;
the second input module is used for respectively inputting a plurality of second features of the enterprise to be identified into at least one classification model to obtain at least one identification result of whether the enterprise to be identified is a blank enterprise, wherein the at least one classification model is obtained based on the plurality of second features of the sample enterprise and blank enterprise labels;
And the first determining module is used for obtaining the empty shell recognition result of the enterprise to be recognized based on the score and the at least one recognition result.
In a third aspect, an electronic device is provided, comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
acquiring multidimensional original data of an enterprise to be identified;
extracting a plurality of first features and a plurality of second features of the enterprise to be identified from the multidimensional raw data;
inputting a plurality of first features of the enterprise to be identified into a scoring card model to obtain scores of the enterprise to be identified, wherein the scoring card model is constructed based on the multidimensional original data of the sample enterprise, the plurality of first features and empty enterprise labels;
respectively inputting a plurality of second features of the enterprise to be identified into at least one classification model to obtain at least one identification result of whether the enterprise to be identified is a blank enterprise, wherein the at least one classification model is obtained based on the plurality of second features of a sample enterprise and blank enterprise labels;
and obtaining the empty shell recognition result of the enterprise to be recognized based on the score and the at least one recognition result.
In a fourth aspect, a computer-readable storage medium storing one or more programs that, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to:
acquiring multidimensional original data of an enterprise to be identified;
extracting a plurality of first features and a plurality of second features of the enterprise to be identified from the multidimensional raw data;
inputting a plurality of first features of the enterprise to be identified into a scoring card model to obtain scores of the enterprise to be identified, wherein the scoring card model is constructed based on the multidimensional original data of the sample enterprise, the plurality of first features and empty enterprise labels;
respectively inputting a plurality of second features of the enterprise to be identified into at least one classification model to obtain at least one identification result of whether the enterprise to be identified is a blank enterprise, wherein the at least one classification model is obtained based on the plurality of second features of a sample enterprise and blank enterprise labels;
and obtaining the empty shell recognition result of the enterprise to be recognized based on the score and the at least one recognition result.
According to the at least one technical scheme provided by the embodiment of the specification, whether the enterprise to be identified belongs to the empty enterprise is identified in a mode of fusing the scoring card model and the at least one classification model, and the scoring card model has good interpretability and the classification model has high accuracy, so that the identification mode can give consideration to the interpretability and the accuracy of the identification result.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
fig. 1 is a schematic flow chart of a method for identifying a blank enterprise according to an embodiment of the present disclosure.
Fig. 2 is a schematic diagram of a method for identifying a vacant shell enterprise according to an embodiment of the present disclosure.
Fig. 3 is a detailed flow chart of step 102 in the schematic diagram shown in fig. 1.
Fig. 4 is a detailed flow chart of step 104 in the schematic diagram shown in fig. 1.
Fig. 5 is another flow chart of a method for identifying a blank enterprise according to an embodiment of the present disclosure.
Fig. 6 is another flow chart of a method for identifying a blank enterprise according to an embodiment of the present disclosure.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Fig. 8 is a schematic structural diagram of a device for identifying a blank enterprise according to an embodiment of the present disclosure.
Fig. 9 is another schematic structural diagram of a device for identifying a blank enterprise according to an embodiment of the present disclosure.
Fig. 10 is another schematic structural diagram of a device for identifying a blank enterprise according to an embodiment of the present disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
In order to achieve both the interpretability and the accuracy of the recognition result, the embodiments of the present disclosure provide a method and an apparatus for identifying a bare-shell enterprise, where the method and the apparatus may be executed by an electronic device or may be executed by a software or hardware device installed in the electronic device. The electronic devices herein include, but are not limited to, terminal devices and servers, including, but not limited to: any of smart terminal devices such as smartphones, personal computers (personal computer, PCs), notebook computers, tablet computers, electronic readers, web televisions, wearable devices, etc., the server includes, but is not limited to: any one of a single server, a plurality of servers, a server cluster and a cloud server.
The method for identifying the empty shell enterprises provided in the embodiments of the present disclosure may include two parts, one part is training of a model, and the other part is identifying the empty shell enterprises by applying the trained model, which are described below.
First, the model training section will be described.
In the method for identifying the empty enterprises provided by the embodiment of the specification, a scoring card model is introduced to realize the interpretability of the identification result, and at least one classification model is introduced to ensure the accuracy, recall rate and the like of the identification result.
On this basis, as shown in fig. 1, the method for identifying a blank enterprise provided in the embodiment of the present specification may include:
step 102, training the scoring card model based on the multidimensional raw data of the sample enterprise, the plurality of first features and the empty enterprise labels.
Step 104, training at least one classification model based on the second plurality of characteristics of the sample enterprise and the empty enterprise tag.
The number of the sample enterprises can be a plurality of sample enterprises, and the sample enterprises comprise white samples and black samples, wherein the white samples refer to samples with the labels of the empty shell enterprises being 'no', namely, the samples are not samples of the empty shell enterprises, and the black samples refer to samples with the labels of the empty shell enterprises being 'yes', namely, the samples of the empty shell enterprises.
The first and second pluralities of features of the sample enterprise are derived from the above-described multi-dimensional raw data of the sample enterprise.
In the present specification, the first feature and the second feature are presented to distinguish the features adopted by the score card model and the classification model, and the first feature may be the same as or different from the second feature. Specifically, the first characteristic of the sample enterprise is adopted when the scoring card model is trained, the first characteristic of the enterprise to be identified is adopted when the scoring card model is applied to conduct prediction, the second characteristic of the sample enterprise is adopted when the classification model is trained, and the second characteristic of the enterprise to be identified is adopted when the classification model is applied to conduct prediction.
It should be further noted that the second feature used may be the same or different for different classification models of the at least one classification model.
Fig. 2 is a schematic diagram of a method for identifying a vacant shell enterprise according to an embodiment of the present disclosure. As shown in fig. 2, in the model training part, the empty enterprise recognition method may include: data preparation, feature processing, feature screening, model training, online deployment, feedback adjustment and the like. The method comprises the steps of preparing data, namely acquiring multidimensional original data of a sample enterprise; feature processing, namely performing feature engineering on the acquired multidimensional original data to obtain a feature set; feature screening, namely deleting invalid features in the feature set; model training, namely model training by adopting the characteristics in the characteristic set, wherein the trained model comprises a logistic regression model and at least one classification model, wherein the logistic regression model is used for generating a scoring card model; deploying on-line, namely deploying the trained scoring card model and at least one classification model to a platform needing to identify the empty enterprises; feedback adjustment, i.e. optimizing and adjusting the scoring card model and at least one classification model according to the feedback made by the platform on the use effect of these models.
The above step 102 and the above step 104 are described in detail below with reference to fig. 3 and 4, respectively.
As shown in fig. 3, the step 102 may specifically include:
step 302, acquiring multidimensional original data of a sample enterprise.
In embodiments of the present description, the multidimensional raw data for an enterprise may include, but is not limited to, one or more of the following: business enterprise care information, enterprise penalty information, enterprise judge document information, enterprise brand information, enterprise product information, enterprise intellectual property information, bond rating information of the enterprise, customs registration information of the enterprise and self-certification management information of the enterprise.
Wherein the business enterprise facing information may include, but is not limited to, one or more of the following: business name, type of business (individual household, business), legal representatives, business status, registered capital, date of establishment, registration address, registration authorities, and registration industry, etc.
The enterprise penalty information may include, but is not limited to, one or more of the following: administrative penalty information, information of the person who is performed the credit loss, tax penalty information, business penalty information (address cannot be connected, annual report is not submitted on time, etc.), tax owed and forced payment, serious illegal of business, etc.
Wherein the corporate referee document information may include, but is not limited to, one or more of the following: enterprises are used as legal litigation related to identities such as original notices, complaints, third persons and the like.
Wherein the enterprise brand information may include, but is not limited to, one or more of the following: the enterprise registers registered brands and the like tangible or intangible asset information.
Wherein the enterprise product information may include, but is not limited to, one or more of the following: information on the product name, product type, sales channel, etc. produced or sold by the enterprise.
The enterprise intellectual property information may include, but is not limited to, one or more of the following: the enterprise registers tangible or intangible asset information of registered goods, trademarks, applied patents, and the like.
The bond rating information of the corporation is credit rating information for the valuable bonds issued by the corporation.
The customs registration information of the enterprise may include a customs registration code, such as a customs registration number, among others.
The self-certification management information of the enterprise can include, but is not limited to, information of data of the enterprise owner, such as transaction flow, invoice information and the like, which proves that the enterprise owner is operating normally.
Step 304, preprocessing the multidimensional raw data of the sample enterprise to obtain a feature set of the sample enterprise.
Because the method for identifying the empty enterprises provided by the embodiment of the specification relates to the use of multiple models, the multi-dimensional original data of the enterprises need to be clear, processed and mined to obtain more data characteristics. In particular implementations, the raw data may be preprocessed against a big data platform framework similar to Hive.
Wherein the pre-treatment may include, but is not limited to, one or more of the following:
1) Missing numerical values can be filled by using median, average, mode or logarithmic values and the like to fill missing numerical value characteristics.
2) For the missing non-numerical feature, since the numerical value type missing value filling method such as the mean value cannot be used, a predetermined string representing a NULL value, for example, a string of "NULL" is filled.
3) Splitting the split character string, for example, splitting the address to province, city, county and street, and counting how many enterprises exist on one address;
4) The character type features are corresponding to numerical codes or vector codes, for example, the addresses can be corresponding to numerical codes similar to postal codes, or the characters can be changed into vector codes in an ebedding mode, so that a computer can perform corresponding matrix operation.
5) The features with the association relationship are associated to obtain association features, for example, the time features with the association relationship are associated with non-time features, such as the time of establishment of the enterprise to date, and the changing number, administrative punishment number, complaint notice number and the like of the enterprise in the last month, half year and year; for another example, address features having an association relationship are associated with non-address features, and so on.
After the feature set of the sample enterprise is obtained through the preprocessing mode, the features in the feature set can be further classified, so that the corresponding reasons can be conveniently output when the recognition result is interpreted later.
In one example, features in a feature set may be classified into the following categories:
1) Enterprise basic information class;
2) Enterprise address classes, including, but not limited to, enterprise registry scope, registration address versus number of enterprises (e.g., one-site, multiple enterprises);
3) Business operations exception classes, including, but not limited to, various administrative penalties, business exception information, and tax exception information;
4) Business information change classes including, but not limited to, business change type, change time;
5) The business is running classes including, but not limited to, patent numbers for businesses, ICP docks, brands, asset information, etc.
Then, according to the empty shell enterprise label of the sample enterprise, a good/bad (bad) label can be added to the characteristics in the characteristic set, wherein the label is good to indicate that the sample enterprise corresponding to the characteristics does not belong to the empty shell enterprise, and the label is bad to indicate that the sample enterprise corresponding to the characteristics belongs to the empty shell enterprise.
After the feature set is prepared, the training set and the test set of the model may be split from the feature set with sample tags, e.g., randomly screening 70% as the training set and the remaining 30% as the test set. Thus, the feature sets employed in steps 306 and 318, i.e., the training sets described herein, are not repeated below.
And 306, carrying out box division on the feature set to obtain box division results aiming at a plurality of features.
The sorting is performed according to the principle of a scoring card in which different features are divided into a plurality of ranges, one range corresponding to each box, and table 1 shows a schematic diagram of a scoring card. According to the scoring cards shown in Table 1, for the feature of "statutory representative person age", four boxes of "18.ltoreq.age < 25", "25.ltoreq.age < 35", "35.ltoreq.age < 55" and "55.ltoreq.age" can be divided; similarly, the "legal representative sex" feature can be divided into two boxes, namely "male" and "female", and so on, and will not be described again.
Table 1 scoring card
Figure SMS_1
Step 308, determining the prediction capability of the plurality of features, and screening the plurality of features based on the prediction capability of the plurality of features to obtain the plurality of first features of the sample enterprise.
As an example, the predictive capability of a feature can be characterized by the value of the information (Information Value, IV value) calculated as follows:
Figure SMS_2
Figure SMS_3
wherein for a first feature, IV denotes the IV value of the first feature, IV i IV value of the ith bin representing the first feature, bad i /Bad T Representing the proportion of the current box-dividing hollow shell enterprises to all enterprises, good i /Good T Indicating the proportion of normal businesses (non-empty businesses) to all businesses in the current sub-box.
In the embodiment of the present specification, a feature with an IV value smaller than a certain threshold may be regarded as an invalid feature, deleted from the feature set, and a feature with an IV value greater than or equal to the threshold may be regarded as an valid feature, and retained as a first feature in the feature set, to obtain a final training set.
It is understood that the plurality of first features of the sample enterprise may be all or at least part of the types of features of the sample enterprise in the feature set. Also, hereinafter, the plurality of second features of the sample enterprise may be all or at least part of the types of features of the sample enterprise in the feature set.
Optionally, the plurality of first features and/or the plurality of second features of the sample enterprise may include associated features of the sample enterprise (e.g., features associated with temporal features and non-temporal features, or features associated with address features and non-address features). And training the scoring card model and the classification model by adopting the associated characteristics of the sample enterprise, and analyzing suspicious behaviors such as suspicious group partner, batch registration and the like.
Step 310, determining the fractional box evidence weight WOE value under the first characteristics of the sample enterprise.
From the above formula for calculating IV:
Figure SMS_4
wherein WOE is as follows i Representing evidence weights for characterizing importance of the bins;
Figure SMS_5
the difference between the "proportion of the current sub-box hollow shell enterprises to all enterprises" and the "proportion of the current sub-box normal enterprises to all enterprises" is represented.
Step 312, constructing a logistic regression model based on the binning and the WOE values of the bins under the plurality of first characteristics of the sample enterprise.
Because the logistic regression model is more visual and has strong interpretability and easy understanding, the logistic regression model is adopted as a precursor of the scoring card model in the specification. When the logistic regression model is specifically constructed, the logistic regression model can be written by adopting Python and R languages.
Step 314, training the logistic regression model based on the plurality of first features and the empty-shell enterprise labels of the sample enterprise.
Specifically, the logistic regression model is trained based on the final training set including the first features of the sample enterprise, the test set is adopted to test after the training, and the test result is evaluated by adopting a AUC (Area Under Curve) value or a KS (kolmogorov-Smirnov) value, and the qualified model is generally considered as an AUC value higher than 0.75, and of course, the higher the AUC value, the better.
And 316, performing score conversion on the trained logistic regression model, and setting a reference score and a PDO value to obtain the score card model, wherein the PDO value represents a score variation value when the quality ratio is doubled.
In the scoring card model, the ratio of the probability of breach to the probability of normal, called Odds, is used.
Figure SMS_6
Figure SMS_7
Odds was calculated by putting them into Score:
Score=A-B*ln(Odds)
wherein A and B are constants, and both A and B are greater than or equal to zero.
To determine the a and B values in the formula, 2 conditions need to be defined:
1) A benchmark score P_0, a score when Odds is θ_0 (e.g., when Odds is 1:50, benchmark score 500)
2) PDO (Point Double Odds), which refers to the value of the doubling of the quality ratio when the score increases by N. For example, PDO is set to 50 minutes, and the standard is set to 500 minutes. Then 550 minutes of numerous businesses would be twice as good than 500 minutes of businesses.
P 0 =A-B*ln(θ 0 )
P 0 -PDO=A-B*ln(2θ 0 )
After the values are obtained, a code can be written according to the principle of the grading card model to obtain the grading card model. The scoring card model can score each score for each bin of each first feature, and can score each feature of each test object (in this specification, the test object is an enterprise to be identified) in actual prediction.
And then, the scoring card model can be deployed on a real-time system of a platform which needs to be identified by the empty-shell enterprises, so that the empty-shell enterprises can be judged in real time for the enterprises requested by the users.
Optionally, for convenience of use, embodiments of the present description also convert the score to a certain level. The number of grades can be determined according to actual needs, such as converting from high to low scores to five grades of ABCDE, where a is excellent and E is the worst grade. The risk description is set corresponding to part or all of the grades, so that when one grade is output, the risk description can be correspondingly output, the use by a user is convenient, and the usability is improved.
The training of the scoring card model is described above and the training of the at least one classification model is briefly described below with reference to fig. 4.
As shown in fig. 4, the step 102 may specifically include:
step 302, acquiring multidimensional original data of a sample enterprise.
Step 304, preprocessing the multidimensional raw data of the sample enterprise to obtain a feature set of the sample enterprise.
It should be noted that the specific implementation process of step 302 and step 304 is consistent with the embodiment shown in fig. 3, please refer to the description of the embodiment shown in fig. 3 above, and the description is not repeated here.
Step 318, training at least one classification model based on the second plurality of features of the sample enterprise in the feature set (training set) and the empty enterprise labels.
Wherein the at least one classification model includes, but is not limited to, at least one of a tree model including, but not limited to, at least one of XGBOOST, random forest, and light tgbm, and a neural network model.
The specific training process for the at least one classification model may refer to the prior art and will not be described in detail herein.
It will be appreciated that after training the scoring card model and the at least one classification model, the identification of the empty enterprises may be performed based on these models. The following describes the process of identifying a blank business using a trained model.
As shown in fig. 5, a method for identifying a blank enterprise provided in an embodiment of the present disclosure may include:
and 106, acquiring multidimensional original data of the enterprise to be identified.
The obtained multidimensional raw data of the enterprise to be identified is correspondingly consistent with the multidimensional raw data of the sample enterprise obtained during model training.
Step 108, extracting a plurality of first features and a plurality of second features of the enterprise to be identified from the multidimensional raw data.
The first features of the enterprise to be identified are in one-to-one correspondence with the second features of the sample enterprise when the scoring card model is trained, and the second features of the enterprise to be identified are in one-to-one correspondence with the second features of the sample enterprise when at least one classification model is trained.
Step 110, inputting a plurality of first features of the enterprise to be identified into a scoring card model to obtain a score of the enterprise to be identified, wherein the scoring card model is constructed based on the multidimensional raw data of the sample enterprise, the plurality of first features and the empty enterprise label.
For training of the scoring card model, please refer to the above, and the description is omitted.
Step 112, respectively inputting a plurality of second features of the enterprise to be identified into at least one classification model to obtain at least one identification result of whether the enterprise to be identified is a blank enterprise, wherein the at least one classification model is obtained based on the plurality of second features of the sample enterprise and blank enterprise labels.
For training of at least one classification model, please refer to the above, and no further description is given.
The identification result in the at least one identification result may be a probability that the enterprise to be identified belongs to the empty-shell enterprise, or the identification result in the at least one identification result may be that the enterprise to be identified belongs to or does not belong to the empty-shell enterprise.
And 114, obtaining a blank recognition result of the enterprise to be recognized based on the score and the at least one recognition result.
As an example, in step 114, the score and the at least one recognition result may be output as a blank recognition result of the enterprise to be recognized, respectively; or selecting one result from the score and the at least one identification result according to a voting principle to output.
As another example, in step 114, a target level of the enterprise to be identified may be determined based on the score and a preset correspondence, where the preset correspondence includes a correspondence between a plurality of score ranges and a plurality of levels, and the plurality of levels includes the target level; and then, based on the target grade and the at least one recognition result, obtaining a blank recognition result of the enterprise to be recognized. For example, five grades of ABCDE may be converted from high to low scores, where a is excellent and E is the worst grade. On the basis, the target grade and the at least one recognition result can be simultaneously output for reference by a user; alternatively, the target level and the at least one recognition result may be fused and output.
Wherein, the method of fusing may comprise: if the blank recognition result of the enterprise to be recognized obtained based on the target level is inconsistent with the blank recognition result based on the at least one recognition result, for example, the target level of the enterprise to be recognized is excellent, but the at least one recognition result of the enterprise to be recognized is a blank enterprise, the recognition result of the at least one classification model may be selectively output, because the accuracy of the classification model is higher, or the blank recognition result of the enterprise to be recognized may be determined based on voting principles (such as majority winning); and if the blank recognition result of the enterprise to be recognized, which is obtained based on the target grade, is consistent with the correspondence based on the at least one recognition result, optionally outputting one recognition result, and the like.
Optionally, if the multiple levels have corresponding risk descriptions, if the target level of the enterprise to be identified is selected to be output, a risk description corresponding to the target level may also be output.
Optionally, determining target features in the plurality of first features of the enterprise to be identified, where the score obtained by the target features in the scoring card model is lower than a preset score, when the result of identifying the empty shell of the enterprise to be identified is matched with the target grade and the result of identifying the empty shell of the enterprise to be identified is the empty shell enterprise; determining a feature type of the target feature; and outputting the reason that the enterprise to be identified belongs to the empty enterprise based on the characteristic type of the target characteristic. The feature types may include, as described above, an enterprise basic information class, an enterprise address class, an enterprise business exception class, an enterprise business information change class, and an enterprise forward running class.
Specifically, when outputting the reason that the enterprise to be identified belongs to a blank enterprise, the target feature hit by the enterprise to be identified can be interpreted, and then the text in the format of JSON and the like is output. For example, if an enterprise has a situation that an address is not contacted, the method can output: the business is abnormal, and the enterprise may have a situation that the address is not contacted. Of course, the interpreted caliber may include, but is not limited to, batch registration, address anomaly, business anomaly-annual report non-submission, business anomaly-address failure to connect, suspected cross-job, anomaly change reminder, etc., and possibly related numerical values, dates, legal representative names, etc. An explanation example is shown below:
guangzhou XXX investment information Co.Ltd
Empty shell enterprise class D
{ "key": "batch registration", "description": "the enterprise is similar to other { $numbers } home enterprise registration, suspected batch registration, and": "20" }
{ "key": "address cannot be contacted", "description": "the business cannot be contacted" by the registered residence or business place "by < in_date >" listed in the unusual business directory "," in_date ":"2022 1 month 26 day "}
{ "key": "legal representative person is in the role of the outside," description ":" legal representative person { $core_name } may also be in the role of other { $person_num } home enterprises, suspected of having a batch registration, the "person_name": von XX, "person_num":26}
{ "key": "one address and multiple enterprises", "description": "the enterprise is suspected to register by means of hosting, proxy, autonomous reporting or residence reporting, etc., please notice to verify the actual business address information", "address": guangzhou city and day river XXX road X }.
According to the method for identifying the empty-shell enterprises, provided by the embodiment of the specification, whether the enterprises to be identified belong to the empty-shell enterprises or not is identified in a mode of fusing the scoring card model and at least one classification model, and the scoring card model has good interpretability, the classification model has high accuracy, so that the identification mode can give consideration to the interpretability and accuracy of the identification result, the recall rate and the like.
In addition, the method for identifying the empty shell enterprises provided by the embodiment of the specification is not limited to enterprise business registration information in terms of data feature dimension, but expands the data range, additionally adopts credit information such as products and trademarks of enterprises, the empty shell enterprises generally do not have the information such as the products, and meanwhile, self-checking management information such as invoices and transaction running water uploaded by the enterprises, and most of the empty shell enterprises do not have purchased invoices. From the data dimension point of view, the scheme proposed by the specification is richer than other traditional schemes, which means that the judging angle is more comprehensive and the result is more accurate.
Moreover, according to the method for identifying the empty shell enterprises provided by the embodiment of the specification, the associated features (such as the features obtained by associating the time features with the non-time features or the features obtained by associating the address features with the non-address features) are adopted, so that suspicious behaviors such as suspected partners, batch registration and the like can be analyzed, in other similar schemes, single-point judgment is mostly adopted, judgment on regular registration is also a representative name of a person to be considered, and the identification rate is not as good as that of the scheme.
On the basis of adopting the association features, optionally, the method for identifying the empty-shell enterprises provided in the embodiments of the present disclosure may further include:
determining a plurality of persons (such as legal persons) of a plurality of empty enterprises registered in the same area within a preset time period;
determining a plurality of target empty shell enterprises in the plurality of empty shell enterprises based on preset information of the plurality of empty shell enterprises, wherein the preset information comprises at least one of an IP address and a MAC address, and the plurality of target empty shell enterprises have the same preset information;
and determining a registrant related to the target blank enterprises in the plurality of persons as a partner registrant.
The preset time period may be empirically set, for example, the preset time period may be a relatively short time period, for example, within 1 hour, within a day, within a week, or the like.
Further, the region may also be determined to be a high risk region for empty enterprise registration.
Alternatively, expert experience (expert rule system) can be adopted to carry out spam judgment, so as to realize leak detection and deficiency repair. Specifically, before any one of the steps 106 to 114, as shown in fig. 6, before step 106, the method for identifying a vacant shell enterprise provided in the embodiments of the present disclosure may further include:
step 116, determining whether the enterprise to be identified hits a preset rule; if hit, go to step 118, otherwise go to step 106.
The preset rule is a judging rule set according to expert experience.
And 118, determining a blank recognition result of the enterprise to be recognized based on the preset rule.
For example, the preset rules may include at least one of:
1) For enterprises having the first attribute, the enterprise is still passed, although there is a risk as well. For example, enterprises in national enterprises, central enterprises, national resource control, industry tap enterprises, military industry, banks and white lists can be directly identified as not belonging to empty enterprises. Wherein the white list enterprise is configurable by a customer or operator.
2) For businesses with the second attribute, it is directly considered a bare shell business. Such as a sanctioned corporation, tax-violation corporation, serious-violation corporation, etc.
The method provided by the present specification is described above, and the electronic device provided by the present specification is described below.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. Referring to fig. 7, at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 7, but not only one bus or type of bus.
And a memory for storing the program. In particular, the program may include program code including computer-operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the empty enterprise identification device on a logic level. The processor is used for executing the programs stored in the memory and is specifically used for executing the following operations:
acquiring multidimensional original data of an enterprise to be identified;
extracting a plurality of first features and a plurality of second features of the enterprise to be identified from the multidimensional raw data;
inputting a plurality of first features of the enterprise to be identified into a scoring card model to obtain scores of the enterprise to be identified, wherein the scoring card model is constructed based on the multidimensional original data of the sample enterprise, the plurality of first features and empty enterprise labels;
respectively inputting a plurality of second features of the enterprise to be identified into at least one classification model to obtain at least one identification result of whether the enterprise to be identified is a blank enterprise, wherein the at least one classification model is obtained based on the plurality of second features of a sample enterprise and blank enterprise labels;
And obtaining the empty shell recognition result of the enterprise to be recognized based on the score and the at least one recognition result.
The method disclosed in the embodiments shown in fig. 1, fig. 5 or fig. 6 of the present specification may be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in one or more embodiments of the present description may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with one or more embodiments of the present disclosure may be embodied directly in a hardware decoding processor or in a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.
The electronic device may further execute the method provided in the embodiment shown in fig. 1, fig. 5, or fig. 6, which is not described in detail herein.
Of course, in addition to the software implementation, the electronic device in this specification does not exclude other implementations, such as a logic device or a combination of software and hardware, that is, the execution subject of the following process is not limited to each logic unit, but may also be hardware or a logic device.
The present description also proposes a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, enable the portable electronic device to perform the method of the embodiments of fig. 1, 5 or 6, and in particular to perform the operations of:
acquiring multidimensional original data of an enterprise to be identified;
extracting a plurality of first features and a plurality of second features of the enterprise to be identified from the multidimensional raw data;
inputting a plurality of first features of the enterprise to be identified into a scoring card model to obtain scores of the enterprise to be identified, wherein the scoring card model is constructed based on the multidimensional original data of the sample enterprise, the plurality of first features and empty enterprise labels;
Respectively inputting a plurality of second features of the enterprise to be identified into at least one classification model to obtain at least one identification result of whether the enterprise to be identified is a blank enterprise, wherein the at least one classification model is obtained based on the plurality of second features of a sample enterprise and blank enterprise labels;
and obtaining the empty shell recognition result of the enterprise to be recognized based on the score and the at least one recognition result.
As shown in fig. 8, one embodiment of the present disclosure provides a blank business identification device 800, and in a software implementation, the device 800 may include: a first training module 801 and a second training module 802.
The first training module 801 trains the scoring card model based on the multidimensional raw data of the sample enterprise, the plurality of first features, and the empty enterprise labels.
As an example, the first training module 801 may be configured to:
acquiring multidimensional original data of a sample enterprise;
preprocessing the multidimensional raw data of the sample enterprise to obtain a feature set of the sample enterprise;
the feature set is subjected to box division to obtain box division results aiming at a plurality of features;
determining the prediction capability of the plurality of features, and screening the plurality of features based on the prediction capability of the plurality of features to obtain the plurality of first features of the sample enterprise;
Determining a bin evidence weight WOE value for the sample enterprise under the plurality of first features;
constructing a logistic regression model based on the bin and the WOE values of the bin under the plurality of first characteristics of the sample enterprise;
training the logistic regression model based on the plurality of first features and the empty-shell enterprise labels of the sample enterprise;
and performing score conversion on the trained logistic regression model, and setting a reference score and a PDO value to obtain the score card model, wherein the PDO value represents a score variation value when the quality ratio is doubled.
And then, the scoring card model can be deployed on a real-time system of a platform which needs to be identified by the empty-shell enterprises, so that the empty-shell enterprises can be judged in real time for the enterprises requested by the users.
Optionally, for convenience of use, embodiments of the present description also convert the score to a certain level. The number of grades can be determined according to actual needs, such as converting from high to low scores to five grades of ABCDE, where a is excellent and E is the worst grade. The risk description is set corresponding to part or all of the grades, so that when one grade is output, the risk description can be correspondingly output, the use by a user is convenient, and the usability is improved.
A second training module 802 trains at least one classification model based on a plurality of second features of the sample enterprise and the empty enterprise labels.
As an example, second training module 802 is to:
acquiring multidimensional original data of a sample enterprise;
preprocessing the multidimensional raw data of the sample enterprise to obtain a feature set of the sample enterprise;
training at least one classification model based on a plurality of second features of the sample enterprise in the feature set (training set) and a blank enterprise tag;
wherein the at least one classification model includes, but is not limited to, at least one of a tree model including, but not limited to, at least one of XGBOOST, random forest, and light tgbm, and a neural network model.
It will be appreciated that after training the scoring card model and the at least one classification model, the identification of the empty enterprises may be performed based on these models. The following describes the process of identifying a blank business using a trained model.
As shown in fig. 9, an embodiment of the present disclosure provides a device 800 for identifying a blank enterprise, and in a software implementation, the device 800 may further include: a first acquisition module 803, a feature extraction module 804, a first input module 805, a second input module 806, and a first determination module 807.
The first obtaining module 803 obtains multidimensional raw data of the enterprise to be identified.
The feature extraction module 804 extracts a plurality of first features and a plurality of second features of the enterprise to be identified from the multidimensional raw data.
The first input module 805 inputs the first features of the enterprise to be identified into a scoring card model to obtain a score of the enterprise to be identified, where the scoring card model is constructed based on the multidimensional raw data of the sample enterprise, the first features and the empty enterprise labels.
The second input module 806 inputs the plurality of second features of the enterprise to be identified into at least one classification model respectively to obtain at least one identification result of whether the enterprise to be identified is a blank enterprise, where the at least one classification model is obtained based on the plurality of second features of the sample enterprise and the blank enterprise label training.
The first determining module 807 obtains a blank identification result of the enterprise to be identified based on the score and the at least one identification result.
As an example, in the first determining module 807, the score and the at least one recognition result may be output as a blank recognition result of the enterprise to be recognized, respectively; or selecting one result from the score and the at least one identification result according to a voting principle to output.
As another example, in the first determining module 807, a target level of the enterprise to be identified may be determined based on the score and a preset correspondence, where the preset correspondence includes a correspondence between a plurality of score ranges and a plurality of levels, the plurality of levels including the target level; and then, based on the target grade and the at least one recognition result, obtaining a blank recognition result of the enterprise to be recognized. For example, five grades of ABCDE may be converted from high to low scores, where a is excellent and E is the worst grade. On the basis, the target grade and the at least one recognition result can be simultaneously output for reference by a user; alternatively, the target level and the at least one recognition result may be fused and output.
Wherein, the method of fusing may comprise: if the blank recognition result of the enterprise to be recognized obtained based on the target level is inconsistent with the blank recognition result based on the at least one recognition result, for example, the target level of the enterprise to be recognized is excellent, but the at least one recognition result of the enterprise to be recognized is a blank enterprise, the recognition result of the at least one classification model may be selectively output, because the accuracy of the classification model is higher, or the blank recognition result of the enterprise to be recognized may be determined based on voting principles (such as majority winning); and if the blank recognition result of the enterprise to be recognized, which is obtained based on the target grade, is consistent with the correspondence based on the at least one recognition result, optionally outputting one recognition result, and the like.
Optionally, if the multiple levels have corresponding risk descriptions, if the target level of the enterprise to be identified is selected to be output, a risk description corresponding to the target level may also be output.
Optionally, determining target features in the plurality of first features of the enterprise to be identified, where the score obtained by the target features in the scoring card model is lower than a preset score, when the result of identifying the empty shell of the enterprise to be identified is matched with the target grade and the result of identifying the empty shell of the enterprise to be identified is the empty shell enterprise; determining a feature type of the target feature; and outputting the reason that the enterprise to be identified belongs to the empty enterprise based on the characteristic type of the target characteristic. Specifically, when outputting the reason that the enterprise to be identified belongs to a blank enterprise, the target feature hit by the enterprise to be identified can be interpreted, and then the text in the format of JSON and the like is output.
According to the empty-shell enterprise identification device provided by the embodiment of the specification, whether the enterprise to be identified belongs to an empty-shell enterprise is identified in a mode of fusing the scoring card model and at least one classification model, and the scoring card model has good interpretability, the classification model has high accuracy, so that the identification mode can give consideration to the interpretability and accuracy of an identification result, recall rate and the like.
In addition, in the data feature dimension, the device for identifying the empty shell enterprises provided by the embodiment of the specification is not limited to enterprise business registration information, but expands the data range, and additionally adopts credit information such as products and trademarks of enterprises, so that the empty shell enterprises generally do not have the information such as products, and meanwhile receive self-certification management information such as invoices and transaction running water uploaded by enterprises, and most of empty shell enterprises do not have purchased invoices. From the data dimension point of view, the scheme proposed by the specification is richer than other traditional schemes, which means that the judging angle is more comprehensive and the result is more accurate.
Moreover, the device for identifying a blank enterprise provided in the embodiments of the present disclosure adopts association features (such as features obtained by associating time features with non-time features, or features obtained by associating address features with non-address features), so that suspicious behaviors such as suspected parties, batch registration and the like can be analyzed, in other similar schemes, single-point judgment is mostly performed, judgment on regular registration is also a representative name of a belief, and the identification rate is not as good as in the present scheme.
Optionally, on the basis of adopting the association feature, the apparatus 800 for identifying a vacant shell enterprise provided in the embodiments of the present disclosure may further include:
A third determining module for determining a plurality of persons (such as legal persons) of a plurality of empty enterprises registered in the same region within a preset duration;
a fourth determining module, configured to determine a plurality of target empty-shell enterprises among the plurality of empty-shell enterprises based on preset information of the plurality of empty-shell enterprises, where the preset information includes at least one of an IP address and a MAC address, and the plurality of target empty-shell enterprises have the same preset information;
and a fifth determining module, configured to determine a registrant related to the target empty enterprises among the plurality of persons as a group registrant.
And a sixth determining module for determining the region as a high risk region registered by the empty shell enterprise.
Alternatively, expert experience (expert rule system) can be adopted to carry out spam judgment, so as to realize leak detection and deficiency repair. Specifically, as shown in fig. 10, an embodiment of the present disclosure provides a device 800 for identifying a shell enterprise, where in a software implementation, the device 800 may further include a judging module 808 and a second determining module 809 in addition to the first obtaining module 803, the feature extracting module 804, the first input module 805, the second input module 806, and the first determining module 807.
A judging module 808, determining whether the enterprise to be identified hits a preset rule; if hit, the second determination module 809 is triggered, otherwise the first acquisition module 803 is triggered.
The preset rule is a judging rule set according to expert experience.
And a second determining module 809 for determining a blank case recognition result of the enterprise to be recognized based on the preset rule.
For example, the preset rules may include at least one of:
1) For enterprises of the first nature, the enterprises pass though the enterprises are at risk. For example, enterprises in national enterprises, central enterprises, national resource control, industry tap enterprises, military industry, banks and white lists can be directly identified as not belonging to empty enterprises. Wherein the white list enterprise is configurable by a customer or operator.
2) For the enterprises of the second attribute, the enterprises are directly identified as empty enterprises. Such as a sanctioned corporation, tax-violation corporation, serious-violation corporation, etc.
It should be noted that, the empty-shell enterprise recognition device 1000 can implement an empty-shell enterprise recognition method provided in fig. 5, and can achieve the same technical effects, and details of the method described in the above-mentioned embodiment section are referred to, and are not repeated.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
In summary, the foregoing description is only a preferred embodiment of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, or the like, which is within the spirit and principles of one or more embodiments of the present disclosure, is intended to be included within the scope of one or more embodiments of the present disclosure.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

Claims (17)

1. A method of identifying a bare shell enterprise comprising:
acquiring multidimensional original data of an enterprise to be identified;
extracting a plurality of first features and a plurality of second features of the enterprise to be identified from the multidimensional raw data;
Inputting a plurality of first features of the enterprise to be identified into a scoring card model to obtain scores of the enterprise to be identified, wherein the scoring card model is constructed based on the multidimensional original data of the sample enterprise, the plurality of first features and empty enterprise labels;
respectively inputting a plurality of second features of the enterprise to be identified into at least one classification model to obtain at least one identification result of whether the enterprise to be identified is a blank enterprise, wherein the at least one classification model is obtained based on the plurality of second features of a sample enterprise and blank enterprise labels;
and obtaining the empty shell recognition result of the enterprise to be recognized based on the score and the at least one recognition result.
2. The method of claim 1, further comprising, prior to said inputting the plurality of first features of the business to be identified into a scoring card model to obtain a score for the business to be identified:
the scoring card model is trained based on the multi-dimensional raw data of the sample enterprise, the plurality of first features, and a blank enterprise tag.
3. The method of claim 2, the constructing the scoring card model based on the multi-dimensional raw data, the plurality of first features, and a blank enterprise label for a sample enterprise, comprising:
Acquiring the multidimensional raw data of the sample enterprise;
preprocessing the multidimensional raw data of the sample enterprise to obtain a feature set of the sample enterprise;
the feature set is subjected to box division to obtain box division results aiming at a plurality of features;
determining the prediction capability of the plurality of features, and screening the plurality of features based on the prediction capability of the plurality of features to obtain the plurality of first features of the sample enterprise;
constructing a logistic regression model based on the bin and the WOE values of the bin under the plurality of first characteristics of the sample enterprise;
training the logistic regression model based on the plurality of first features and the empty-shell enterprise labels of the sample enterprise;
and performing score conversion on the trained logistic regression model, and setting a reference score and a PDO value to obtain the score card model, wherein the PDO value represents a score variation value when the quality ratio is doubled.
4. A method according to claim 3, the pre-treatment comprising at least one of:
filling the missing numerical values;
filling a preset character string representing a null value for the missing non-numerical feature;
splitting the split character string;
The character type features are corresponding to numerical codes or vector codes;
and correlating the features with the correlation to obtain correlation features.
5. The method of any of claims 1-4, the multi-dimensional raw data comprising at least one of:
face information of the business enterprise;
enterprise penalty information;
the corporate referee document information;
enterprise brand information;
enterprise product information;
intellectual property information of enterprises;
bond rating information of the enterprise;
customs registration information of the enterprise;
and the self-certification management information of the enterprise.
6. The method of claims 1-4, prior to the acquiring the multi-dimensional raw data of the enterprise to be identified, the method further comprising:
determining whether the enterprise to be identified hits a preset rule;
if yes, determining a blank recognition result of the enterprise to be recognized based on the preset rule;
and if the enterprise to be identified is not hit, executing the step of acquiring the multidimensional original data of the enterprise to be identified.
7. The method according to any one of claims 1-4, wherein the obtaining, based on the score and the at least one recognition result, a blank recognition result of the enterprise to be recognized includes:
determining a target grade of an enterprise to be identified based on the score and a preset corresponding relation, wherein the preset corresponding relation comprises corresponding relations between a plurality of score ranges and a plurality of grades, and the plurality of grades comprise the target grade;
And obtaining the blank recognition result of the enterprise to be recognized based on the target grade and the at least one recognition result.
8. The method of claim 7, the plurality of levels each having a corresponding risk profile, the method further comprising:
and outputting a risk description corresponding to the target grade.
9. The method of claim 7, further comprising:
and if the blank recognition result of the enterprise to be recognized, which is obtained based on the target grade, is inconsistent with the at least one recognition result, determining the blank recognition result of the enterprise to be recognized based on a voting principle.
10. The method of claim 7, further comprising:
determining target features in the plurality of first features of the enterprise to be identified under the condition that the blank identification result of the enterprise to be identified is matched with the target grade and the blank identification result of the enterprise to be identified is a blank enterprise, wherein the scores of the target features obtained in the score card model are lower than preset scores;
determining a feature type of the target feature;
and outputting the reason that the enterprise to be identified belongs to the empty enterprise based on the characteristic type of the target characteristic.
11. The method of claim 10, the feature type comprising at least one of:
enterprise basic information class;
an enterprise address class;
abnormal business operation;
enterprise business information change class;
enterprises are running classes in the forward direction.
12. The method of any one of claims 1-4, 8-11, further comprising:
determining a plurality of persons of a plurality of empty enterprises registered in the same area within a preset time period;
determining a plurality of target empty shell enterprises in the plurality of empty shell enterprises based on preset information of the plurality of empty shell enterprises, wherein the preset information comprises at least one of an IP address and a MAC address, and the plurality of target empty shell enterprises have the same preset information;
and determining a registrant related to the target blank enterprises in the plurality of persons as a partner registrant.
13. The method of claim 12, further comprising:
the region is determined to be a high risk region for empty enterprise registration.
14. The method according to any one of claim 1 to 4, 8 to 1 and 13,
the at least one classification model includes at least one of a tree model and a neural network model, wherein the tree model includes at least one of XGBOOST, random forest, and light tgbm.
15. A blank business identification device comprising:
the first acquisition module acquires multidimensional original data of an enterprise to be identified;
the feature extraction module is used for extracting a plurality of first features and a plurality of second features of the enterprise to be identified from the multidimensional raw data;
the first input module is used for inputting a plurality of first features of the enterprise to be identified into a scoring card model to obtain the score of the enterprise to be identified, wherein the scoring card model is constructed based on the multidimensional original data of the sample enterprise, the plurality of first features and the empty enterprise label;
the second input module is used for respectively inputting a plurality of second features of the enterprise to be identified into at least one classification model to obtain at least one identification result of whether the enterprise to be identified is a blank enterprise, wherein the at least one classification model is obtained based on the plurality of second features of the sample enterprise and blank enterprise labels;
and the first determining module is used for obtaining the empty shell recognition result of the enterprise to be recognized based on the score and the at least one recognition result.
16. An electronic device, comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
Acquiring multidimensional original data of an enterprise to be identified;
extracting a plurality of first features and a plurality of second features of the enterprise to be identified from the multidimensional raw data;
inputting a plurality of first features of the enterprise to be identified into a scoring card model to obtain scores of the enterprise to be identified, wherein the scoring card model is constructed based on the multidimensional original data of the sample enterprise, the plurality of first features and empty enterprise labels;
respectively inputting a plurality of second features of the enterprise to be identified into at least one classification model to obtain at least one identification result of whether the enterprise to be identified is a blank enterprise, wherein the at least one classification model is obtained based on the plurality of second features of a sample enterprise and blank enterprise labels;
and obtaining the empty shell recognition result of the enterprise to be recognized based on the score and the at least one recognition result.
17. A computer-readable storage medium storing one or more programs that, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to:
Acquiring multidimensional original data of an enterprise to be identified;
extracting a plurality of first features and a plurality of second features of the enterprise to be identified from the multidimensional raw data;
inputting a plurality of first features of the enterprise to be identified into a scoring card model to obtain scores of the enterprise to be identified, wherein the scoring card model is constructed based on the multidimensional original data of the sample enterprise, the plurality of first features and empty enterprise labels;
respectively inputting a plurality of second features of the enterprise to be identified into at least one classification model to obtain at least one identification result of whether the enterprise to be identified is a blank enterprise, wherein the at least one classification model is obtained based on the plurality of second features of a sample enterprise and blank enterprise labels;
and obtaining the empty shell recognition result of the enterprise to be recognized based on the score and the at least one recognition result.
CN202211623210.7A 2022-12-16 2022-12-16 Method, device and equipment for identifying empty shell enterprises Pending CN116342141A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211623210.7A CN116342141A (en) 2022-12-16 2022-12-16 Method, device and equipment for identifying empty shell enterprises

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211623210.7A CN116342141A (en) 2022-12-16 2022-12-16 Method, device and equipment for identifying empty shell enterprises

Publications (1)

Publication Number Publication Date
CN116342141A true CN116342141A (en) 2023-06-27

Family

ID=86877995

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211623210.7A Pending CN116342141A (en) 2022-12-16 2022-12-16 Method, device and equipment for identifying empty shell enterprises

Country Status (1)

Country Link
CN (1) CN116342141A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116681358A (en) * 2023-08-04 2023-09-01 深圳中科闻歌科技有限公司 XGBoost model-based new registration abnormal enterprise detection method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116681358A (en) * 2023-08-04 2023-09-01 深圳中科闻歌科技有限公司 XGBoost model-based new registration abnormal enterprise detection method

Similar Documents

Publication Publication Date Title
EP3985578A1 (en) Method and system for automatically training machine learning model
Finlay Predictive analytics, data mining and big data: Myths, misconceptions and methods
US20190164015A1 (en) Machine learning techniques for evaluating entities
US11818163B2 (en) Automatic machine learning vulnerability identification and retraining
WO2020177478A1 (en) Credit-based qualification information auditing method, apparatus and device
CN112507936A (en) Image information auditing method and device, electronic equipment and readable storage medium
CN110782158B (en) Object evaluation method and device
CN113011646A (en) Data processing method and device and readable storage medium
CN112712429A (en) Remittance service auditing method, remittance service auditing device, computer equipment and storage medium
CN111476653A (en) Risk information identification, determination and model training method and device
CN112734161A (en) Method, equipment and storage medium for accurately identifying empty-shell enterprises
CN116342141A (en) Method, device and equipment for identifying empty shell enterprises
CN116821759A (en) Identification prediction method and device for category labels, processor and electronic equipment
CN113112323B (en) Abnormal order identification method, device, equipment and medium based on data analysis
CN110570301B (en) Risk identification method, device, equipment and medium
CN112258315B (en) Method and device for checking vehicle credit pre-credit data based on identity tag
CN116881687B (en) Power grid sensitive data identification method and device based on feature extraction
CN113065739B (en) Method and device for evaluating performance capability of executed person and electronic equipment
CN115953248B (en) Wind control method, device, equipment and medium based on saprolitic additivity interpretation
CN115713399B (en) User credit evaluation system combined with third-party data source
CN112581042B (en) Performance capability evaluation system and method and electronic equipment
US20240144294A1 (en) Methods and Systems for Managing Transactions Associated with Vehicles Using a Distributed Ledger
CN117113154A (en) Method and system for identifying partner of fake plate blank enterprise
KR20230073542A (en) System and method for detecting online living crime capable of risk assessment
KR20220141462A (en) AI platform that judges the authenticity of financial-related information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination