CN114493619A - Enterprise credit investigation label construction method based on electric power data - Google Patents

Enterprise credit investigation label construction method based on electric power data Download PDF

Info

Publication number
CN114493619A
CN114493619A CN202111488010.0A CN202111488010A CN114493619A CN 114493619 A CN114493619 A CN 114493619A CN 202111488010 A CN202111488010 A CN 202111488010A CN 114493619 A CN114493619 A CN 114493619A
Authority
CN
China
Prior art keywords
data
label
enterprise
credit investigation
electric power
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111488010.0A
Other languages
Chinese (zh)
Inventor
赵莹莹
郭乃网
苏运
田英杰
张国庆
李凡
吴裔
沈泉江
王彬彬
刘畅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Shanghai Electric Power Co Ltd
Transwarp Technology Shanghai Co Ltd
Original Assignee
State Grid Shanghai Electric Power Co Ltd
Transwarp Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Shanghai Electric Power Co Ltd, Transwarp Technology Shanghai Co Ltd filed Critical State Grid Shanghai Electric Power Co Ltd
Priority to CN202111488010.0A priority Critical patent/CN114493619A/en
Publication of CN114493619A publication Critical patent/CN114493619A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/018Certifying business or products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Strategic Management (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Tourism & Hospitality (AREA)
  • Probability & Statistics with Applications (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an enterprise credit investigation label construction method based on electric power data, which comprises the following steps: acquiring enterprise power data, and carrying out data auditing and analysis on the data; preprocessing the acquired data; determining the label content of the credit investigation label according to the service scene requirement; determining the value of each label based on a machine learning algorithm, a classification clustering algorithm and a text mining algorithm; verifying the validity of the label; and (4) performing labeling operation on the enterprise, and labeling the corresponding business scene. Compared with the prior art, the method has the advantages of performing multi-dimensional depiction on the enterprise credit, rapidly updating the label, perfecting an enterprise credit investigation system and the like.

Description

Enterprise credit investigation label construction method based on electric power data
Technical Field
The invention relates to the technical field of electric power credit investigation, in particular to a method for constructing an enterprise credit investigation label based on electric power data.
Background
At the present stage, society has already definitely paralleled data as a novel production element with traditional elements such as land, labor force, capital, technology and the like, and the requirements of 'improving social data resource value, cultivating new digital economy industry, new state and new mode, and supporting the establishment of standardized data development and utilization scenes in the fields of agriculture, industry, traffic, education, security protection, city management, public resource transaction and the like' are put forward again.
With the comprehensive promotion of the construction and the digital transformation work of a digital power grid, the open sharing transaction behavior of the data assets is standardized, the rapid circulation of the data assets is promoted comprehensively, and the value of further releasing the electric power data as a novel production element is extremely urgent. There is no method in the prior art that can evaluate and label the electric power credit of an enterprise.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide the electric power data-based enterprise credit investigation label construction method which can perform multi-dimensional depiction on enterprise credit, is quick in label updating and can perfect an enterprise credit investigation system.
The purpose of the invention can be realized by the following technical scheme:
an enterprise credit investigation label construction method based on electric power data comprises the following steps:
step 1: acquiring enterprise power data, and carrying out data auditing and analysis on the data;
step 2: preprocessing the data acquired in the step 1;
and step 3: determining the label content of the credit investigation label according to the service scene requirement;
and 4, step 4: determining the value of each label based on a machine learning algorithm, a classification clustering algorithm and a text mining algorithm;
and 5: verifying the validity of the label;
step 6: and (4) performing labeling operation on the enterprise, and labeling the corresponding business scene.
Preferably, the data auditing analysis in step 1 includes:
and (4) checking the total data amount: carrying out integrity check on the data file to ensure that the extracted data is consistent with the data extraction requirement and the service data on the whole;
and (3) performing comprehensive analysis on data: detecting the characteristic conditions of each variable, including data type, missing condition, value range and distribution characteristic;
and (3) data dictionary verification: determining the characteristic condition of the data according to the data dictionary, and verifying the updating condition of the actual data by contrasting the data dictionary;
and (3) repeatability inspection: performing key value repeatability inspection and recording repeatability inspection results;
checking an association key: checking the associated key according to the data dictionary;
logical check: and carrying out logic check on the audit relationship existing in the data.
Preferably, the step 2 specifically comprises:
and (3) performing data cleaning on the data acquired in the step (1), wherein the data cleaning comprises the following steps:
missing value processing: repairing the intermittent missing of the power data by adopting a moving average method, and repairing the continuously missing power utilization data by adopting a KNN algorithm based on cosine similarity;
exception and extreme processing: correcting the data with extreme values by methods of interception, deletion and the like;
outlier filtering: outliers were filtered using a quartile test and a 3 sigma standard deviation test.
Preferably, the step 2 further comprises:
after data cleaning, cleaning the abnormal user, specifically:
and after data cleaning, calculating the proportion of missing values of each user, and if the proportion of the missing values exceeds 50%, filtering all data of the user.
More preferably, the preprocessing of the data in step 2 further includes:
and carrying out normalization processing, data type conversion and data dimension reduction processing on the data.
Preferably, the credit investigation tag comprises:
the fact label is used for reflecting the basic attribute characteristics of the user;
the rule tag is used for determining the electricity utilization characteristics of the user according to the time window and the user type by combining the electricity consumption and the electricity consumption change data;
and model labels, namely building a prediction model based on a machine learning algorithm, a classification clustering algorithm and a text mining algorithm based on specific business scene requirements to obtain label values.
More preferably, the fact tag includes a business name, a power utilization address, geographic coordinates, an affiliated industry, an industry code, and an affiliated station area.
More preferably, the rule labels comprise electricity utilization period preference, electricity utilization month ring ratio, electricity utilization abnormity characteristic, electricity utilization level, electricity utilization trend registration, industry characteristic, recent electricity utilization, electricity utilization fluctuation and customer group characteristic.
More preferably, the model label comprises: payment overdue prediction, operation condition prediction, electricity stealing prediction, default probability and near-term electricity prediction.
Preferably, the step 5 specifically comprises:
and verifying the significance and the model precision of the label respectively based on statistics and machine learning related indexes to ensure the usability of different types of labels.
Compared with the prior art, the invention has the following beneficial effects:
firstly, carrying out multidimensional depiction on enterprise credit: the electric power data adopted by the enterprise credit investigation label construction method has huge values including commercial information, business information, user information and the like, and can supplement credit data for the current credit investigation market so as to achieve multi-dimensional credit depiction of credit main bodies.
Secondly, the label is updated quickly, and an enterprise credit investigation system is perfected: the construction method of the enterprise credit investigation label provides the supplementary data for the credit subject through the self data of the electric power so as to enhance the effective judgment of the credit condition of the enterprise, is an effective path in the initial stage of promoting the landing application of the large data increment and change of the electric power, has high updating speed of the label and greatly perfects the credit investigation system of the enterprise.
Drawings
FIG. 1 is a schematic flow chart of a method for constructing an enterprise credit investigation label according to the present invention;
FIG. 2 is a label design framework for a scenario of "an event affects an enterprise" in an embodiment of the present invention;
FIG. 3 is a clustering inflection point plot in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a clustering effect according to an embodiment of the present invention;
fig. 5 is a time chart of monthly electricity utilization of an enterprise according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
An enterprise credit investigation tag construction method based on electric power data is provided, and the flow is shown in fig. 1, and the method comprises the following steps:
step 1: acquiring enterprise power data, and carrying out data auditing and analysis on the data;
the data auditing and analyzing comprises the following steps:
and (4) checking the total data amount: the integrity of the data file is checked, and the extracted data is ensured to be consistent with the data extraction requirement and the service data on the whole by analyzing the total conditions of time span, field number, record number and the like of the service period;
and (3) comprehensively analyzing data: detecting the characteristic conditions of each variable, including data types, missing conditions, value range and distribution characteristics, so as to form general preliminary knowledge of the user electricity consumption data;
and (3) data dictionary verification: determining the characteristic condition of the data according to the data dictionary, verifying the updating condition of the actual data by contrasting the data dictionary, and researching the value, distribution and other conditions of the actual data on the basis of reading of the data dictionary in order to avoid the problem caused by wrong understanding of the data characteristics;
and (3) repeatability inspection: performing key value repeatability inspection and recording repeatability inspection results;
checking an association key: checking the association key according to the data dictionary to ensure that the connection of the data is matched with the record;
logical check: carrying out logic check on the checking relation existing in the data;
step 2: preprocessing the data acquired in the step 1;
the preprocessing method comprises data cleaning, data normalization processing, data type conversion, data dimension reduction and the like, wherein the data cleaning comprises the following steps:
missing value processing: repairing the intermittent missing of the power data by adopting a moving average method, and repairing the continuously missing power utilization data by adopting a KNN algorithm based on cosine similarity;
exception and extreme processing: and correcting the data with extreme values by interception, deletion and the like, wherein value range interception can be used for processing fields by using the quantile values in order to improve the robustness of the model and the accuracy of the model on the overall sample. The frequency of record deletion occurring at the extreme value is negligibly low, and direct deletion of the record of the occurring extreme value can be considered, so that noise is reduced;
outlier filtering: filtering abnormal values by adopting quartile detection and 3 sigma standard deviation detection;
the method also comprises abnormal user cleaning, which specifically comprises the following steps: after filtering out abnormal values, calculating the proportion of each user missing value (the missing number of user data/the total number of user data), wherein the proportion of the missing values exceeds 50% in the cleaning process, and then all data of the user are filtered, namely all types of electricity utilization data are filtered;
and step 3: determining the label content of the credit investigation label according to the service scene requirement;
the credit investigation label comprises:
the fact tag is used for reflecting the basic attribute characteristics of the user, particularly screening the fields related to the enterprise from the data dictionary, and comprises the following steps: information such as an enterprise name, a power utilization address, geographic coordinates, an affiliated industry, an industry code, an affiliated station area and the like;
the rule tag is used for determining the electricity utilization characteristics of the user by combining electricity consumption and electricity consumption change data according to a time window (day/week/month/year) and a user type (industry/customer group/area), and comprises electricity utilization period preference, electricity utilization month ring ratio, electricity utilization abnormal characteristics, electricity consumption grade, electricity utilization trend registration, industry characteristics, recent electricity consumption, electricity utilization fluctuation, customer group characteristics and the like;
the model label is built based on specific business scene requirements and a prediction model based on a machine learning algorithm, a classification clustering algorithm and a text mining algorithm to obtain a label value;
and 4, step 4: determining the value of each label based on a machine learning algorithm, a classification clustering algorithm and a text mining algorithm;
and 5: verifying the validity of the label, specifically:
based on statistics and machine learning related indexes, the significance and model precision of the labels are verified respectively, and the usability of the labels of different types is guaranteed;
step 6: and (4) performing labeling operation on the enterprise, and labeling the corresponding business scene.
Through the construction of a label system, the electricity utilization preference, the electricity utilization abnormity and other conditions of a client can be known and mastered, highly refined feature identification is obtained by applying algorithms such as abstraction, induction, reasoning and the like to the static and dynamic characteristics of a target object and is used for differentiated management and decision making, and a credit label system architecture based on electric power data is constructed according to three types of labels including a fact label, a rule label and a model label.
Aiming at the construction results of part of important labels, the conclusion or phenomenon reflected by the verification label is in accordance with common sense and experience knowledge, so that the reasonability and the practical significance of the constructed label are proved, the correlation between the part of constructed labels and the credit change of the user is verified, and part of user cases are selected for analysis and verification, so that the constructed labels can reflect the credit abnormal change or change trend of the user to a certain extent, and finally the correlation between the labels and the credit condition of enterprises is verified by a method based on group statistics. And performing iterative upgrade of the label and iterative upgrade of the model as the data is continuously updated and iterated.
In this embodiment, the model label is generally constructed based on a specific service demand scenario, and in combination with relevant data of a user, corresponding models are established for different demand types (regression, classification, clustering) through a data mining technology, such as a machine learning algorithm (logistic regression, support vector machine, XGBoost, etc.), a deep learning algorithm (BP neural network, DNN, etc.), so as to construct a model label suitable for a current service scenario. In order to explore the influence of a social emergency on the power consumption of an enterprise user in production, and by combining actual data of an event, the embodiment constructs a label of the type of influence of the event on the enterprise by using a clustering method, a label design structure is shown in fig. 2, and a label construction process specifically comprises the following steps:
(1) clustering model label development overall implementation process
The label of the type of the influence of a certain event on the enterprise is used for describing the direction and the degree of the influence of the event on the production operation of each enterprise. The project is based on electricity utilization data of users in different periods before and after an event, and by means of a related clustering algorithm, clients are grouped according to electricity utilization changes of the users, the category of influence of the event on enterprise production and specific users under the category are obtained, and finally labels of groups to which the users belong are marked.
Firstly, defining a business problem according to information such as user portrait and the like depicted by a label system, then determining basic information needed to be used according to the problem, determining related characteristics according to the information, performing data preprocessing (such as data standardization, high-dimensional data dimension reduction, abnormal value processing, missing value processing and the like), selecting an algorithm, analyzing a clustering result (whether assessment has certain practical significance), finally obtaining a model, marking labels with different influence types for different enterprises, and solving the business problem.
(2) Defining business problems
And (3) the influence direction and degree of the event on the production and operation of the enterprise, and marking corresponding labels on different enterprises according to the influence degree and the influence direction.
(3) Determining basic information and service related characteristics
Preliminarily determining basic information and power consumption information of at least an enterprise, such as the basic information of the enterprise name, the power consumption and the like; the screening condition of the feature enterprise relates to labels such as 'short-time electricity consumption change trend'.
And in the aspect of time span, taking 10 months in 2019 to 4 months in 2020 as a time window for observing the power utilization behavior of the enterprise. Wherein, the 10, 11 and 12 months in 2019 are regarded as 'early-stage events', and the enterprise production is stable in the period; 1, 2 and 3 months in 2020 are regarded as 'event periods', and the enterprise production behaviors are seriously affected in the period; taking month 4 of 2020 as "late event," most enterprises have been working again in this period.
(4) Data pre-processing
In order to eliminate the influence and interference of other factors, such as difficulty in production and operation of enterprises before an event occurs, enterprise electricity utilization behaviors before the event occurs are screened, enterprises with electricity consumption reduction from 2019 in the month 06 to 2019 in the month 12 are screened (screening conditions: the trend of the enterprise short-time electricity utilization is more than or equal to 3), enterprises with electricity utilization behaviors before the event occurs in 'stable development', 'stable growth', 'rapid growth' and 'rapid growth' are reserved, and the organization of the enterprises is approximately regarded as enterprises with relatively healthy production and operation before the event occurs.
The method comprises the steps of selecting 'monthly average power consumption total amount' of an enterprise as a clustering variable, calculating monthly average power consumption total amounts of the enterprises in three periods ('earlier period of event', 'during event', 'late period of event') after acquiring an approximately healthy enterprise with stable or increased power consumption before an event, and using the monthly average power consumption total amounts as an enterprise clustering basis.
(5) Selective clustering algorithm
The clustering purpose is mainly to cluster enterprises with similar electricity utilization shapes before, during and after an event into one category, so as to judge different influence modes of the event on the enterprises. If the power consumption magnitude of two enterprises is different, but the power consumption increase and decrease trends of the three enterprises in different periods are similar, the two enterprises still tend to be regarded as the same category. Based on the purpose, in order to eliminate the influence of the power consumption grade difference among different enterprises on the clustering result, the power consumption of each enterprise is subjected to line normalization operation, the enterprises with the normalized power consumption data are clustered by using a K-means algorithm, and the optimal clustering number (K value) is determined by using an elbow method.
To further determine the optimal cluster class number, a cluster "inflection point plot" is drawn as shown in FIG. 3, and the optimal cluster number is determined by observing the data set SSE (sum of squared errors within a group), which is formulated as follows:
Figure RE-GDA0003589699440000071
wherein, CiDenotes the ith cluster, p is CiSample point of (1), miIs CiCenter of mass (C)iMean of all samples), SSE is the clustering error of all samples, and represents how good the clustering effect is.
With the increase of the cluster number k, the sample division is finer, the aggregation degree of each cluster is gradually increased, then the error square sum SSE is naturally gradually reduced, and when k is smaller than the real cluster number, since the increase of k greatly increases the aggregation degree of each cluster, the decrease range of SSE is large, and when k reaches the real cluster number, the return of the aggregation degree obtained by increasing k is rapidly reduced, so the decrease range of SSE is rapidly reduced, and then the decrease range tends to be gentle along with the continuous increase of the k value, that is, the relationship between SSE and k is similar to the shape of the elbow, and the k value corresponding to the elbow is the real cluster number of the data.
As can be seen from fig. 3, the decrease from class 1 to class 3 is fast, and then slow, so the optimal cluster number is selected to be 3.
(6) Clustering effect analysis
The input enterprise data is divided into 3 types through a K-means algorithm, power consumption conditions of typical enterprises (clustering centers) corresponding to all the types at the early stage of an event, the middle stage of the event and the later stage of the event are shown, clustering results shown in fig. 4 are obtained, the enterprises are divided into three obvious types (the clustering results are remarkable) and are recorded as cluster1, cluster2 and cluster3, wherein the cluster1 cluster centers at the early stage of practice, the middle stage of the event and the later stage of the event correspond to min-max normalized monthly average power consumption which is (0.909820, 0.724460 and 0.000404), and the monthly average power consumption of the enterprises after the event is reduced and is still lower than that during the event, namely the enterprises are negatively impacted, and the monthly average power consumption of the enterprise at the later stage of the event is still low.
The Cluster2 Cluster center is at the early stage of the event, during the event and at the later stage of the event, corresponding to the min-max normalized monthly average power consumption is (0.059747, 0.541984 and 0.910289), and from the view of the power consumption change trend, the monthly average power consumption is gradually increased along with the occurrence of time, and the monthly average power consumption at the later stage of the event reaches the maximum.
The Cluster3 Cluster center is at the early stage of the event, during the event and at the later stage of the event, the average monthly power consumption corresponding to min-max is (0.998007, 0.110126 and 0.113928), and the trend of power consumption change means that the monthly power consumption of an enterprise is reduced after the event occurs, but the monthly power consumption at the later stage of the event is slightly higher than the monthly power consumption during the event, namely the enterprise is impacted negatively by the event, but at the later stage of the event, the monthly power consumption shows that the enterprise production is gradually recovered.
(7) Obtaining model tags
And dividing the enterprises into 3 classes according to the clustering result, and marking different model label values. Namely, the "type of impact of an event on the enterprise" label: the influence and the direction of the event on the production of the enterprise are reflected, and the value of the label is 0, 1 and 2. The label value is 0, which indicates that the enterprise is subjected to negative impact of the event, and after rework, the production condition is not rapidly recovered from the monthly average power consumption; the label value 1 shows that the event has a positive effect on the enterprise production, the monthly average power consumption of the enterprise is gradually increased along with the occurrence of the event, and the monthly average power consumption is continuously increased after the repeated work and production; and the value of the label is 2, which indicates that the enterprise is subjected to negative impact of the event, and in the later period of the event, the power consumption of the enterprise rebounds rapidly from the monthly average power consumption, namely the production condition is gradually recovered.
(8) Label application
The method is characterized in that a certain enterprise with a label value of 2 of the type of influence of a certain event on the enterprise is selected as an example, the power consumption of the enterprise is rapidly reduced along with the occurrence of the event, but in the later period of the event, the power consumption tends to be stably recovered, namely although the production and operation of the enterprise are greatly impacted by the event and are adversely affected, the enterprise recovery capability is strong.
A representative enterprise is screened according to the label result, the monthly and daily electricity utilization time sequence of the enterprise is shown in fig. 5, the electricity consumption of the enterprise is greatly reduced, but the enterprise gradually rises about 4 months along with the repeated work and production, and it can be seen that although the industry is seriously impacted by the incident, the enterprise is relatively stronger in risk resistance and stronger in recovery capability after encountering the emergency.
Through the model label, the conclusion can be directly obtained, and conditions are provided for subsequent application.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. An enterprise credit investigation label construction method based on electric power data is characterized by comprising the following steps:
step 1: acquiring enterprise power data, and carrying out data auditing and analysis on the data;
step 2: preprocessing the data acquired in the step 1;
and step 3: determining the label content of the credit investigation label according to the service scene requirement;
and 4, step 4: determining the value of each label based on a machine learning algorithm, a classification clustering algorithm and a text mining algorithm;
and 5: verifying the validity of the label;
step 6: and (5) performing labeling operation on the enterprise, and labeling the corresponding business scene.
2. The method for constructing the enterprise credit investigation label based on the electric power data as claimed in claim 1, wherein the data auditing and analyzing in the step 1 comprises:
and (4) checking the total data amount: carrying out integrity check on the data file to ensure that the extracted data is consistent with the data extraction requirement and the service data on the whole;
and (3) comprehensively analyzing data: detecting the characteristic conditions of each variable, including data type, missing condition, value range and distribution characteristic;
and (3) data dictionary verification: determining the characteristic condition of the data according to the data dictionary, and verifying the updating condition of the actual data by contrasting the data dictionary;
and (3) repeatability inspection: performing key value repeatability inspection and recording repeatability inspection results;
checking an association key: checking the associated key according to the data dictionary;
logical check: and logically checking the checking relation existing in the data.
3. The method for constructing the enterprise credit investigation label based on the electric power data according to claim 1, wherein the step 2 specifically comprises:
and (3) performing data cleaning on the data acquired in the step (1), wherein the data cleaning comprises the following steps:
missing value processing: repairing the intermittent missing of the power data by adopting a moving average method, and repairing the continuously missing power utilization data by adopting a KNN algorithm based on cosine similarity;
exception and extreme processing: correcting the data with extreme values by methods of interception, deletion and the like;
outlier filtering: outliers were filtered using the quartile detection and the 3 σ standard deviation detection.
4. The method for constructing the enterprise credit investigation label based on the electric power data as claimed in claim 1, wherein the step 2 further comprises:
after data cleaning, cleaning the abnormal user, specifically:
and after data cleaning, calculating the proportion of missing values of each user, and if the proportion of the missing values exceeds 50%, filtering all data of the user.
5. The method for constructing the enterprise credit investigation label based on the electric power data as claimed in claim 2, wherein the preprocessing of the data in the step 2 further comprises:
and carrying out normalization processing, data type conversion and data dimension reduction processing on the data.
6. The method for constructing the enterprise credit investigation label based on the electric power data as claimed in claim 1, wherein the credit investigation label comprises:
the fact label is used for reflecting the basic attribute characteristics of the user;
the rule tag is used for determining the electricity utilization characteristics of the user according to the time window and the user type by combining the electricity consumption and the electricity consumption change data;
and model labels, namely building a prediction model based on a machine learning algorithm, a classification clustering algorithm and a text mining algorithm based on specific business scene requirements to obtain label values.
7. The method as claimed in claim 6, wherein the fact label includes a business name, a power consumption address, geographic coordinates, an industry code and a region.
8. The method as claimed in claim 6, wherein the rule tag includes electricity consumption period preference, electricity consumption month and ring ratio, electricity consumption abnormity characteristic, electricity consumption grade, electricity consumption trend registration, industry characteristic, recent electricity consumption, electricity consumption fluctuation and customer group characteristic.
9. The method as claimed in claim 6, wherein the model label comprises: payment overdue prediction, operation condition prediction, electricity stealing prediction, default probability and near-term electricity prediction.
10. The method for constructing the enterprise credit investigation label based on the electric power data according to claim 1, wherein the step 5 specifically comprises:
and verifying the significance and the model precision of the label respectively based on statistics and machine learning related indexes to ensure the usability of different types of labels.
CN202111488010.0A 2021-12-08 2021-12-08 Enterprise credit investigation label construction method based on electric power data Pending CN114493619A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111488010.0A CN114493619A (en) 2021-12-08 2021-12-08 Enterprise credit investigation label construction method based on electric power data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111488010.0A CN114493619A (en) 2021-12-08 2021-12-08 Enterprise credit investigation label construction method based on electric power data

Publications (1)

Publication Number Publication Date
CN114493619A true CN114493619A (en) 2022-05-13

Family

ID=81492519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111488010.0A Pending CN114493619A (en) 2021-12-08 2021-12-08 Enterprise credit investigation label construction method based on electric power data

Country Status (1)

Country Link
CN (1) CN114493619A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897097A (en) * 2022-06-06 2022-08-12 国网北京市电力公司 Power consumer portrait method, device, equipment and medium
CN117217568A (en) * 2023-07-24 2023-12-12 广东省投资和信用中心(广东省发展和改革事务中心) Economic monitoring method and system based on market subject information resource library
CN117787572A (en) * 2024-02-27 2024-03-29 国网山西省电力公司临汾供电公司 Abnormal electricity utilization user identification method and device, storage medium and electronic equipment
CN118228947A (en) * 2024-05-27 2024-06-21 山东政信大数据科技有限责任公司 Intelligent analysis method and system for enterprise digital transformation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010182287A (en) * 2008-07-17 2010-08-19 Steven C Kays Intelligent adaptive design
CN113487448A (en) * 2021-05-31 2021-10-08 国网上海市电力公司 Power credit labeling method and system based on power big data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010182287A (en) * 2008-07-17 2010-08-19 Steven C Kays Intelligent adaptive design
CN113487448A (en) * 2021-05-31 2021-10-08 国网上海市电力公司 Power credit labeling method and system based on power big data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
潘旭: "智能配电网多维数据质量评价方法", 中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑, 15 September 2019 (2019-09-15), pages 1 - 3 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897097A (en) * 2022-06-06 2022-08-12 国网北京市电力公司 Power consumer portrait method, device, equipment and medium
CN117217568A (en) * 2023-07-24 2023-12-12 广东省投资和信用中心(广东省发展和改革事务中心) Economic monitoring method and system based on market subject information resource library
CN117787572A (en) * 2024-02-27 2024-03-29 国网山西省电力公司临汾供电公司 Abnormal electricity utilization user identification method and device, storage medium and electronic equipment
CN117787572B (en) * 2024-02-27 2024-05-17 国网山西省电力公司临汾供电公司 Abnormal electricity utilization user identification method and device, storage medium and electronic equipment
CN118228947A (en) * 2024-05-27 2024-06-21 山东政信大数据科技有限责任公司 Intelligent analysis method and system for enterprise digital transformation

Similar Documents

Publication Publication Date Title
CN110223168B (en) Label propagation anti-fraud detection method and system based on enterprise relationship map
CN110852856B (en) Invoice false invoice identification method based on dynamic network representation
CN114493619A (en) Enterprise credit investigation label construction method based on electric power data
CN110909963A (en) Credit scoring card model training method and taxpayer abnormal risk assessment method
CN108345670B (en) Service hotspot discovery method for 95598 power work order
CN114048436A (en) Construction method and construction device for forecasting enterprise financial data model
Wong et al. Financial accounting fraud detection using business intelligence
CN111783829A (en) Financial anomaly detection method and device based on multi-label learning
CN109829804A (en) A kind of tax risk recognition methods towards marker samples missing administrative region
CN114757468B (en) Root cause analysis method for process execution abnormality in process mining
CN113554310A (en) Enterprise credit dynamic evaluation model based on intelligent contract
CN115205011A (en) Bank user portrait model generation method based on LSF-FC algorithm
CN114169686A (en) ESG evaluation method for listed company
CN112634048A (en) Anti-money laundering model training method and device
CN117391440A (en) Enterprise information reconnaissance platform and method
Lidyah et al. The Evolution of Accounting Information Systems Research: A Bibliometric Analysis of Key Concepts and Influential Authors
CN109919667A (en) A kind of method and apparatus of the IP of enterprise for identification
CN113450004A (en) Power credit report generation method and device, electronic equipment and readable storage medium
CN116304929A (en) Financial manipulation recognition method and device based on A-stock market
CN113177733B (en) Middle and small micro enterprise data modeling method and system based on convolutional neural network
CN114626940A (en) Data analysis method and device and electronic equipment
CN115907954A (en) Account identification method and device, computer equipment and storage medium
CN115237970A (en) Data prediction method, device, equipment, storage medium and program product
CN114693428A (en) Data determination method and device, computer readable storage medium and electronic equipment
TWM622331U (en) System and device for risk prediction therefor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination