CN113360652B - Enterprise-level power user intelligent classification method and device - Google Patents
Enterprise-level power user intelligent classification method and device Download PDFInfo
- Publication number
- CN113360652B CN113360652B CN202110629382.4A CN202110629382A CN113360652B CN 113360652 B CN113360652 B CN 113360652B CN 202110629382 A CN202110629382 A CN 202110629382A CN 113360652 B CN113360652 B CN 113360652B
- Authority
- CN
- China
- Prior art keywords
- industry
- enterprise
- classification
- level power
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000013507 mapping Methods 0.000 claims abstract description 26
- 238000003058 natural language processing Methods 0.000 claims abstract description 9
- 230000011218 segmentation Effects 0.000 claims description 46
- 238000012549 training Methods 0.000 claims description 14
- 239000000047 product Substances 0.000 claims description 8
- 238000003064 k means clustering Methods 0.000 claims description 6
- 230000035945 sensitivity Effects 0.000 claims description 6
- 238000012937 correction Methods 0.000 claims description 4
- 239000013589 supplement Substances 0.000 claims description 3
- 230000015572 biosynthetic process Effects 0.000 abstract 1
- 238000011161 development Methods 0.000 abstract 1
- 230000005611 electricity Effects 0.000 abstract 1
- 238000012423 maintenance Methods 0.000 abstract 1
- 238000004519 manufacturing process Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000009193 crawling Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Bioinformatics & Computational Biology (AREA)
- Tourism & Hospitality (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Evolutionary Biology (AREA)
- Primary Health Care (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Probability & Statistics with Applications (AREA)
- Game Theory and Decision Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an enterprise-level intelligent power user classification method and device, wherein the method comprises the following steps: firstly, correcting an industry classification standard to match the actual development condition of the regional industry to form a classification standard; secondly, the enterprise-level power user set of the known industry is utilized to respectively train a first comparison word stock and a second comparison word stock based on a name-industry mapping relation and an operation range-industry mapping relation through natural language processing, and then the enterprise-level power user names to be classified in the target area are classified for the first time; and finally, determining the optimal clustering number according to the daily load sequence of the users in each primary classification industry, and further obtaining the final classification. The invention can realize the intelligent classification of enterprise-level power users based on the industry attribute and the load characteristic under the condition of adapting to the actual industry layout of the target area, and balance the quantity difference of various users, thereby providing similar operation and maintenance management and value-added service formation technical support for the power grid company or the electricity selling company for the similar users.
Description
Technical Field
The invention relates to the technical field of power consumer service management, in particular to an enterprise-level power consumer intelligent classification method and device.
Background
Since the second industrial revolution enters the electric era, electric energy is one of indispensable energy forms, plays a great role in our life, and is not utilized in all aspects of social production, daily life and the like. The wide use of electric energy means that the number of electric power users is continuously increased, and how to efficiently and orderly manage huge user quantities is of great importance to reduce the workload of power supply offices, respond to the user demands rapidly and the like.
At present, the power supply bureau mainly realizes the classified management of power users according to the electric quantity information such as the report capacity, the electric energy sensitivity and the like. On the one hand, the existing classification may cause users in different industries to be classified into one class, but the service requirements lack similarity, so that the meaning of intelligent classification cannot be achieved; on the other hand, the power big data information has potential value of exploring industry dynamics, and the existing classification cannot deeply mine the power big data information. Thus, reclassifying power customers based on industry attributes can effectively solve the above-described problems.
However, with the increasing complexity of the social and economic structures, the transformation of the regional industry is continuously advanced, the emerging industry is endless, the existing industry classification is single and rough, and the method is difficult to adapt to the specific actual industry distribution situation of each region. In order to make the number of users dividing the classification result into industry categories equal, the existing industry classification standard needs to be modified to adapt to the actual situations of different areas.
Disclosure of Invention
The invention aims to solve the technical problem of providing an enterprise-level intelligent power user classification method and device so as to reduce the difference of various users in number and realize unified management of similar users.
In order to solve the above technical problems, an embodiment of the present invention provides an enterprise-level power consumer intelligent classification method, including:
s1, correcting an industry classification standard to match the actual situation of a target area to form a primary classification standard;
step S2, generating an enterprise-level power user set of a known industry according to the primary classification standard, and respectively training a first comparison word stock based on a name-industry mapping relation and a second comparison word stock based on an operation range-industry mapping relation by utilizing the enterprise-level power user set of the known industry through natural language processing;
step S3, obtaining enterprise-level power user names to be classified in a target area;
step S4, judging whether the enterprise-level power users to be classified can be subjected to primary industry classification according to names, if so, carrying out primary industry classification on the users according to a first comparison word stock, and if not, carrying out primary industry classification on the enterprise-level power users to be classified according to the operation range according to a second comparison word stock;
and S5, determining the optimal cluster number according to the daily load sequences of the users in each primary classification industry, and further obtaining the final classification.
Further, the step S1 specifically includes: on the basis of the national economy industry classification standard, the method combines the actual number scale of enterprises in each industry in a target area, supplements the emerging industry which is not involved in the standard, disassembles the industry entries with more enterprises, deletes the industry entries with no or few related enterprises, merges the industry entries with less enterprises to obtain m industry classification entries trade 1 ,trade 2 ,…,trade m 。
Further, in the step S2, the first comparison word stock List 1 based on the name-industry mapping relationship specifically includes: after the names of enterprise-level power users in the known industry are segmented, the vocabulary with obvious industry attributes and the industry are combined into a first comparison word bank List 1, namely:
wherein, word ij The requirement is that the enterprise-level power user names in the ith industry appear for multiple times, and the enterprise-level power user names except the ith industry appear rarely, namely:
in the above, n ij Word representation ij The number of occurrences in the business-level power user name word segmentation result of the ith industry; n (N) i Representing the total word number of the word segmentation result of the enterprise-level power user name of the ith industry; alpha is a sensitivity coefficient, and alpha is more than or equal to 1; n's' ij Word representation ij The number of occurrences in business-level power user name word segmentation results other than the ith industry; sigma is an accuracy coefficient, and sigma is more than or equal to 0 and less than or equal to 1.
Further, in the step S2, the second comparison word stock List 2 based on the business scope-industry mapping relationship is specifically: after the business scope of enterprise-level power users in the known industry is segmented, the business scope keywords of each industry are extracted by utilizing an improved TF-IDF algorithm, and the keywords of each industry are subjected to Q-based ij Value sorting, namely after meaningless keywords are filtered, a plurality of keywords before sorting and corresponding industries are taken to form a second comparison word stock List 2, namely:
in the above, W i Representing the total word number of word segmentation results in the ith industry operation range; w (w) ij Representing the number of times that the keyword j appears in the business scope word segmentation result of the enterprise-level power consumer in the ith industry; w' ij The number of times the keyword j appears in the business scope word segmentation results of the enterprise-class power consumer except the ith industry is represented.
Further, the step S4 specifically includes: performing primary industry classification on the user according to a first comparison word stock List 1, including: performing word segmentation on the enterprise-level power user name to be classified, if word segmentation results contain words in List 1 ij Classifying the user as the user in the ith industry if any word in the List 1 is not contained ij The classification by name is not possible; and then, carrying out primary industry classification on the user according to a second comparison word stock List 2, wherein the method comprises the following steps: the business scope of the user is crawled from the internet and word segmentation is carried out on the business scope, then word segmentation results are matched with a second comparison word stock List 2, the number and sequence numbers of words corresponding to each industry in the comparison word stock List 2 contained in the word segmentation results are counted, the matching number of each industry is compared, the industry with the largest matching number is determined to be the industry to which the user belongs, when the matching number of the industries is the same as that of a plurality of industries, the product sum of the sequence numbers and the number is compared, and the industry with the smallest product sum is determined to be the industry to which the user belongs.
Further, the step S5 specifically includes: after industry classification is completed for enterprise-level power users, clustering daily load sequences of users in each industry by using a K-Means clustering method, gradually increasing the clustering number until the Pearson correlation coefficient between at least 1 user daily load sequence in the same cluster and at least a certain proportion of users in each cluster under the current clustering number is greater than a certain given value, determining the clustering number as the optimal clustering number, and taking the clustering result as a final classification result.
Further, the step S5 specifically includes: after industry classification is completed for enterprise-level power users, a K-Means clustering method is utilized, and a clustering number t=1 is set for daily load sequences X of users in each industry i Clustering is carried out, and then the clustering number t is gradually increased until each cluster under the current clustering numberX i,k There is no less than a proportion μ of the user daily load sequence X in (k=1, 2, …, t) i,k,p Daily load sequence X of at least 1 other user in same cluster i,k,q Pearson correlation coefficient betweenGreater than a given value M, i.e
In the above formula, cov (·, ·) represents covariance; sigma represents standard deviation; e (-) represents sample expectation;
the cluster number t is determined as the optimal cluster number, and the cluster result is used as the final classification result.
Further, after the industry to which the users to be classified belong is determined, the users are incorporated into the enterprise-level power user set of the industry known in the step S2, and are used for correcting the first comparison word stock List 1 and the second comparison word stock List 2.
Further, the enterprise-level power consumer collection of the known industry contains the user's name and business scope information, wherein the name is obtained in the power supply office user profile, and the business scope is obtained in batches on the national enterprise credit information publicity system network by using a web crawler.
The invention also provides an intelligent classification device for the enterprise-level power users, which comprises the following steps:
the correction module is used for correcting the industry classification standard to match the actual situation of the target area so as to form a primary classification standard;
the training module is used for generating an enterprise-level power user set of a known industry according to the primary classification standard, and respectively training a first comparison word stock based on a name-industry mapping relation and a second comparison word stock based on an operation range-industry mapping relation by utilizing the enterprise-level power user set of the known industry through natural language processing;
the acquisition module is used for acquiring enterprise-level power user names to be classified in the target area;
the primary classification module is used for judging whether the primary industry classification can be carried out on the enterprise-level power users to be classified according to the names, if so, the primary industry classification can be carried out on the users according to the first comparison word stock, and if not, the primary industry classification can be carried out on the enterprise-level power users to be classified according to the operation range according to the second comparison word stock;
and the final classification module is used for determining the optimal cluster number according to the daily load sequences of the users in each primary classification industry and further obtaining final classification.
The embodiment of the invention has the beneficial effects that: the business classification standard is corrected by acquiring the information such as the name of the enterprise-level power user in the target area, the operating range and the like and the daily load value sequence so as to adapt to the actual industry layout condition of the target area, thereby realizing the intelligent classification of the enterprise-level power user based on the industry attribute and the load characteristic, reducing the difference of various users in quantity, improving the working efficiency of business personnel, achieving the aim of unified management of similar users and facilitating the effective popularization of value-added business. The invention can also automatically identify the industry attribute of the user and carry out high-efficiency and accurate classification according to the reproduced industry classification standard, thereby avoiding the investment of a large amount of manpower resources.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of an intelligent classification method for enterprise-level power users according to an embodiment of the invention.
Fig. 2 is a schematic flow chart of an intelligent classification method for enterprise-level power users according to an embodiment of the invention.
Detailed Description
The following description of embodiments refers to the accompanying drawings, which illustrate specific embodiments in which the invention may be practiced.
Referring to fig. 1, a first embodiment of the present invention provides an intelligent classification method for enterprise-level power users, including:
s1, correcting an industry classification standard to match the actual situation of a target area to form a primary classification standard;
step S2, generating an enterprise-level power user set of a known industry according to the primary classification standard, and respectively training a first comparison word stock based on a name-industry mapping relation and a second comparison word stock based on an operation range-industry mapping relation by utilizing the enterprise-level power user set of the known industry through natural language processing;
step S3, obtaining enterprise-level power user names to be classified in a target area;
step S4, judging whether the enterprise-level power users to be classified can be subjected to primary industry classification according to names, if so, carrying out primary industry classification on the users according to a first comparison word stock, and if not, carrying out primary industry classification on the enterprise-level power users to be classified according to the operation range according to a second comparison word stock;
and S5, determining the optimal cluster number according to the daily load sequences of the users in each primary classification industry, and further obtaining the final classification.
Specifically, referring to fig. 2, in step S1, the statistical yearbook of the target area can be referred to from the statistical office network of the area. The industry classification standard is corrected to match the actual situation of the region, specifically: based on the national economy industry classification standard, the method combines the actual number scale of each industry enterprise in the target area to perform the operations of adding, removing and deleting the industry classification items, namely supplementing the emerging industry which is not involved in the standard, disassembling the industry items with more enterprises, deleting the industry items with no or few related enterprises, merging the industry items with less enterprises, so that the related enterprises of each industry are equivalent in number, and obtaining m industry classification item trade 1 ,trade 2 ,…,trade m 。
In step S2, an enterprise-level electric power user set of a known industry is generated according to the above standard, and the set is used to respectively train a first comparison word stock List 1 based on a name-industry mapping relation and a second comparison word stock List 2 based on an operation range-industry mapping relation through natural language processing. The enterprise-level power consumer collection of the known industry contains the names of the consumers and business scope information, wherein the names are acquired in the power supply bureau consumer files, and the business scope can be acquired in batches on the national enterprise credit information public system network by utilizing a web crawler.
The first comparison word stock List 1 based on the name-industry mapping relation specifically comprises: after the names of enterprise-level power users in the known industry are segmented, the vocabulary with obvious industry attributes and the industry are combined into a comparison word stock List 1, namely:
wherein, word ij The requirement is that the enterprise-level power user names in the ith industry appear for multiple times, and the enterprise-level power user names except the ith industry appear rarely, namely:
in the above, n ij Word representation ij The number of occurrences in the business-level power user name word segmentation result of the ith industry; n (N) i Representing the total word number of the word segmentation result of the enterprise-level power user name of the ith industry; alpha is a sensitivity coefficient, and alpha is more than or equal to 1; n's' ij Word representation ij The number of occurrences in business-level power user name word segmentation results other than the ith industry; sigma is an accuracy coefficient, and sigma is more than or equal to 0 and less than or equal to 1.
The second comparison word stock List 2 based on the operation range-industry mapping relation is specifically: after the business scope of enterprise-level power users in the known industry is segmented, the business scope keywords of each industry are extracted by utilizing an improved TF-IDF algorithm, and the keywords of each industry are subjected to Q-based ij Value sorting, filtering meaningless keywords, and taking before sortingThe keywords and the corresponding industries form a comparison word stock List 2, namely:
in the above, W i Representing the total word number of word segmentation results in the ith industry operation range; w (w) ij Representing the number of times that the keyword j appears in the business scope word segmentation result of the enterprise-level power consumer in the ith industry; w' ij The number of times the keyword j appears in the business scope word segmentation results of the enterprise-class power consumer except the ith industry is represented.
In step S4, the user is subjected to primary industry classification according to the first comparison word stock List 1, namely, the business-level power user name to be classified is segmented, and if the word segmentation result contains words in List 1 ij Classifying the user as the user in the ith industry if any word in the List 1 is not contained ij It cannot be classified by name. And (3) carrying out primary industry classification on the user according to the second comparison word stock List 2, namely firstly crawling the operating range of the user from the internet and segmenting the user, further matching the segmentation result with the reference word stock List 2, counting the number and sequence number of words corresponding to each industry in the comparison word stock List 2 contained in the segmentation result, comparing the matching number with each industry, and determining the industry with the largest matching number as the industry to which the user belongs. When the number of the matching industries is the same as that of the matching industries, comparing the product sum of the sequence numbers and the number, and determining the industry with the smallest product sum as the industry to which the user belongs.
Finally, after the industry to which the users to be classified belong is determined, the users are incorporated into the enterprise-level power user set of the industry known in the step S2 and used for correcting the List 1 and the List 2.
In step S5, an optimal cluster number is determined according to the daily load sequence of the user in the primary classification industry, specifically: completion of industry for enterprise-level power usersAfter sub-classification, firstly, a K-Means clustering method is utilized and a clustering number t=1 is set for daily load sequences X of users in each industry i Clustering is carried out, and then the clustering number t is gradually increased until each cluster X under the current clustering number i,k There is no less than a proportion μ of the user daily load sequence X in (k=1, 2, …, t) i,k,p Daily load sequence X of at least 1 other user in same cluster i,k,q Pearson correlation coefficient betweenGreater than a given value M, i.e
In the above formula, cov (·, ·) represents covariance; sigma represents standard deviation; e (-) represents sample expectations.
The cluster number t is determined as the optimal cluster number, and the cluster result is used as the final classification result.
The method for intelligently classifying the enterprise-level power users based on the industry attribute and the load characteristic is described below by taking the actual enterprise-level power users in Shenzhen city as target objects.
1) Industry classification criteria are modified to match the regional reality.
Taking Shenzhen city as an example, looking up the statistics annual certificates issued by Shenzhen city statistics office networks, shenzhen city enterprises are mainly in high and new manufacturing industry. Therefore, on the basis of the standard of national economy industry classification, the operation of adding, removing and deleting industry classification items is performed on the industry classification items by combining the actual number scale of enterprises in each industry in Shenzhen city, namely, the emerging industry which is not involved in the standard is supplemented, the industry items with more enterprises are disassembled, the industry items with no or less related enterprises are deleted, the industry items with less enterprises are combined, so that the number of related enterprises in each industry is equivalent, and finally the Shenzhen city industry classification items are reproduced as shown in the following table:
table 1 industry class entry (Shenzhen region adapted)
Obtaining 36 industry classification entries trade 1 ,trade 2 ,…,trade 36 。
2) And generating an enterprise-level power user set of a known industry according to the standard, and respectively training a comparison word stock List 1 based on a name-industry mapping relation and a comparison word stock List 2 based on an operation range-industry mapping relation by utilizing the set through natural language processing.
According to the reproduced industry classification standard, 500 enterprise-level power users are manually classified according to the user name and the operating range, so that an enterprise-level power user set of the known industry is formed.
After the user name is segmented by utilizing the set, the vocabulary with obvious industry attributes and the industry are combined into a comparison word stock List 1, namely:
wherein, word ij The requirement is that the enterprise-level power user names in the ith industry appear for multiple times, and the enterprise-level power user names except the ith industry appear rarely, namely:
in the above, n ij Representing word ij The number of occurrences in the business-level power user name word segmentation result of the ith industry; n (N) i Representing the total word number of the word segmentation result of the enterprise-level power user name of the ith industry; alpha is a sensitivity coefficient, alpha is more than or equal to 1, and is set as 2; n's' ij Representing the number of occurrences in business-level power consumer name word segmentation results other than the ith industry; sigma is an accuracy coefficient, 0.ltoreq.sigma.ltoreq.1, here set to 0.9.
On the other hand, after the business scope of the user is segmented, the business scope keywords of each industry are extracted by utilizing an improved TF-IDF algorithm, and the keywords of each industry are subjected to Q-based ij Value sorting, namely after meaningless keywords are filtered, a plurality of keywords before sorting and corresponding industries are taken to form a comparison word stock List 2, namely:
in the above, W i Representing the total word number of word segmentation results in the ith industry operation range; w (w) ij Representing the number of times that the keyword j appears in the business scope word segmentation result of the enterprise-level power consumer in the ith industry; w' ij The number of times the keyword j appears in the business scope word segmentation results of the enterprise-class power consumer except the ith industry is represented.
List 1 and List 2 were thus obtained as:
3) Acquiring enterprise-level power user names to be classified in a target area;
4) Judging whether the enterprise-level power users to be classified can be subjected to industry classification according to names, if so, carrying out industry classification on the users according to a comparison word stock List 1, if not, acquiring the operating range of the users on a national enterprise credit information public system network by utilizing a web crawler, and then carrying out industry classification on the users according to a comparison word stock List 2;
for example, table 2 below illustrates that 50 enterprise-level power customers in table 2 are initially classified in the manner described above, and the partial results are shown in table 2:
TABLE 2 industry classification section results
The accuracy of the results obtained by the embodiment is 86% with 43 users the same as the industry to which the users actually belong. In fact, the accuracy rate is greatly related to the number of enterprise-level power user sets in the known industry for training the model, and in the embodiment, only a small number of 500 users are used as the enterprise-level power user sets in the known industry, and in this case, the accuracy rate reaching 86% can illustrate that the enterprise-level power user intelligent classification method based on the industry attribute and the load characteristic has practicability. Practical applications may extend the enterprise-level power consumer collection of known industries for training models.
5) And determining the optimal cluster number according to the daily load sequence of the users in each primary classification industry, and further obtaining the final classification.
Taking 15 enterprise-level power users in the manufacturing industry of display devices as an example after the enterprise-level power users are classified for the first time, firstly, using a K-Means clustering method and setting a clustering number t=1 to a daily load sequence X of the users in each industry i Clustering is carried out, and then the clustering number t is gradually increased until each cluster X under the current clustering number i,k There is a user daily load sequence X in which not less than a certain proportion μ (here set to 90%) is present in (k=1, 2, …, t) i,k,p Daily load sequence X of at least 1 other user in same cluster i,k,q Pearson correlation coefficient betweenAbove a given value M (here set to 0.5), i.e
In the above formula, cov (·, ·) represents covariance; sigma represents standard deviation; e (-) represents sample expectations.
The cluster number t is determined as the optimal cluster number, and the cluster result is used as the final classification result.
In this embodiment, when the clustering number=1 is found, the Pearson correlation coefficient between the daily load sequences of every two users is shown in table 3, and it is seen that only 1 user with the serial number of 9 has a Pearson correlation coefficient smaller than 0.5 with respect to the daily load sequences of the rest users, that is, the condition is satisfied, so that 1 is determined as the optimal clustering number, and the clustering result is used as the final classification result, that is, the enterprise-level power users in the manufacturing industry of the display device belong to the same class of users.
Table 3 shows Pearson correlation coefficients between daily load sequences for two users in device manufacturing
The embodiment verifies that the enterprise-level power user intelligent classification method based on the industry attribute and the load characteristic has certain feasibility and effectiveness, and can bring considerable economic benefit to power grid companies and society after popularization.
Corresponding to an enterprise-level power consumer intelligent classification method in the embodiment of the invention, a second embodiment of the invention provides an enterprise-level power consumer intelligent classification device, which comprises:
the correction module is used for correcting the industry classification standard to match the actual situation of the target area so as to form a primary classification standard;
the training module is used for generating an enterprise-level power user set of a known industry according to the primary classification standard, and respectively training a first comparison word stock based on a name-industry mapping relation and a second comparison word stock based on an operation range-industry mapping relation by utilizing the enterprise-level power user set of the known industry through natural language processing;
the acquisition module is used for acquiring enterprise-level power user names to be classified in the target area;
the primary classification module is used for judging whether the primary industry classification can be carried out on the enterprise-level power users to be classified according to the names, if so, the primary industry classification can be carried out on the users according to the first comparison word stock, and if not, the primary industry classification can be carried out on the enterprise-level power users to be classified according to the operation range according to the second comparison word stock;
and the final classification module is used for determining the optimal cluster number according to the daily load sequences of the users in each primary classification industry and further obtaining final classification.
For the working principle and process of the present embodiment, please refer to the description of the first embodiment, and the description is omitted here.
In summary, compared with the prior art, the embodiment of the invention has the following beneficial effects: the business classification standard is corrected by acquiring the information such as the name of the enterprise-level power user in the target area, the operating range and the like and the daily load value sequence so as to adapt to the actual industry layout condition of the target area, thereby realizing the intelligent classification of the enterprise-level power user based on the industry attribute and the load characteristic, reducing the difference of various users in quantity, improving the working efficiency of business personnel, achieving the aim of unified management of similar users and facilitating the effective popularization of value-added business. The invention can also automatically identify the industry attribute of the user and carry out high-efficiency and accurate classification according to the reproduced industry classification standard, thereby avoiding the investment of a large amount of manpower resources.
The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.
Claims (6)
1. An enterprise-level power consumer intelligent classification method, comprising:
s1, correcting an industry classification standard to match the actual situation of a target area to form a primary classification standard;
step S2, generating an enterprise-level power user set of a known industry according to the primary classification standard, and respectively training a first comparison word stock based on a name-industry mapping relation and a second comparison word stock based on an operation range-industry mapping relation by utilizing the enterprise-level power user set of the known industry through natural language processing;
step S3, obtaining enterprise-level power user names to be classified in a target area;
step S4, judging whether the enterprise-level power users to be classified can be subjected to primary industry classification according to names, if so, carrying out primary industry classification on the users according to a first comparison word stock, and if not, carrying out primary industry classification on the enterprise-level power users to be classified according to the operation range according to a second comparison word stock;
step S5, determining the optimal cluster number according to the daily load sequence of the user in each primary classification industry, and further obtaining the final classification;
the step S1 specifically includes: on the basis of the national economy industry classification standard, the method combines the actual number scale of enterprises in each industry in a target area, supplements the emerging industry which is not involved in the standard, disassembles the industry entries with more enterprises, deletes the industry entries with no or few related enterprises, merges the industry entries with less enterprises to obtain m industry classification entries trade 1 ,trade 2 ,…,trade m ;
In the step S2, the first comparison word library List 1 based on the name-industry mapping relationship specifically includes: after the names of enterprise-level power users in the known industry are segmented, the vocabulary with obvious industry attributes and the industry are combined into a first comparison word bank List 1, namely:
wherein, word ij The requirement is that the enterprise-level power user names in the ith industry appear for multiple times, and the enterprise-level power user names except the ith industry appear rarely, namely:
in the above, n ij Word representation ij The number of occurrences in the business-level power user name word segmentation result with the keyword j of the ith industry; alpha is a sensitivity coefficient, and alpha is more than or equal to 1; n's' ij Word representation ij The number of occurrences in business-level power user name word segmentation results with a keyword j other than the ith industry; sigma is an accuracy coefficient, and sigma is more than or equal to 0 and less than or equal to 1;
in the step S2, the second comparison word stock List 2 based on the operation range-industry mapping relationship specifically includes: after the business scope of enterprise-level power users in the known industry is segmented, the business scope keywords of each industry are extracted by utilizing an improved TF-IDF algorithm, and the keywords of each industry are subjected to Q-based ij Value sorting, namely after meaningless keywords are filtered, a plurality of keywords before sorting and corresponding industries are taken to form a second comparison word stock List 2, namely:
in the above, W i Representing the total word number of word segmentation results in the ith industry operation range; w (w) ij Representing the number of times that the keyword j appears in the business scope word segmentation result of the enterprise-level power consumer in the ith industry; w' ij Representing the number of times that the keyword j appears in the business scope word segmentation results of the enterprise-level power consumer except the ith industry;
the step S4 specifically includes: performing primary industry classification on the user according to a first comparison word stock List 1, including: performing word segmentation on the enterprise-level power user name to be classified, if word segmentation results contain words in List 1 ij Classifying the user as the user in the ith industry if any word in the List 1 is not contained ij The classification by name is not possible; and then, carrying out primary industry classification on the user according to a second comparison word stock List 2, wherein the method comprises the following steps: the business scope of the user is crawled from the internet and word segmentation is carried out on the business scope, then word segmentation results are matched with a second comparison word stock List 2, the number and sequence numbers of words corresponding to each industry in the comparison word stock List 2 contained in the word segmentation results are counted, the matching number of each industry is compared, the industry with the largest matching number is determined to be the industry to which the user belongs, when the matching number of the industries is the same as that of a plurality of industries, the product sum of the sequence numbers and the number is compared, and the industry with the smallest product sum is determined to be the industry to which the user belongs.
2. The intelligent classification method of enterprise-class power consumers as claimed in claim 1, wherein the step S5 specifically comprises: after industry classification is completed for enterprise-level power users, clustering daily load sequences of users in each industry by using a K-Means clustering method, gradually increasing the clustering number until the Pearson correlation coefficient between at least 1 user daily load sequence in the same cluster and at least a certain proportion of users in each cluster under the current clustering number is greater than a certain given value, determining the clustering number as the optimal clustering number, and taking the clustering result as a final classification result.
3. The intelligent classification method of enterprise-class power consumers as claimed in claim 2, wherein the step S5 specifically comprises: after industry classification is completed for enterprise-level power users, a K-Means clustering method is utilized, and a clustering number t=1 is set for daily load sequences X of users in each industry i Clustering is carried out, and then the clustering number t is gradually increased until each cluster X under the current clustering number i,k There is no less than a proportion μ of the user daily load sequence X in (k=1, 2, …, t) i,k,p Daily load sequence X of at least 1 other user in same cluster i,k,q Pearson correlation coefficient betweenGreater than or equal to a given value M, i.e
In the above formula, cov (·, ·) represents covariance;representing a user daily load sequence X i,k,p Standard deviation of (2);Representing a user daily load sequence X i,k,q Standard deviation of (2); e (-) represents sample expectation;
the cluster number t is determined as the optimal cluster number, and the cluster result is used as the final classification result.
4. The intelligent classification method of enterprise-class power subscribers according to claim 1, wherein after determining the industry to which the subscribers to be classified belong, the intelligent classification method is incorporated into the enterprise-class power subscriber set of the industry known in step S2, and is used for correcting the first comparison word bank List 1 and the second comparison word bank List 2.
5. The intelligent classification method of enterprise-class power subscribers of claim 1, wherein the enterprise-class power subscriber set of the known industry contains the names of the subscribers and business scope information, wherein the names are obtained in a power supply office subscriber profile and the business scope is obtained in batches on a national enterprise credit information publicizing system network using a web crawler.
6. An enterprise-level power consumer intelligent classification device, comprising:
the correction module is used for correcting the industry classification standard to match the actual situation of the target area so as to form a primary classification standard;
the training module is used for generating an enterprise-level power user set of a known industry according to the primary classification standard, and respectively training a first comparison word stock based on a name-industry mapping relation and a second comparison word stock based on an operation range-industry mapping relation by utilizing the enterprise-level power user set of the known industry through natural language processing;
the acquisition module is used for acquiring enterprise-level power user names to be classified in the target area;
the primary classification module is used for judging whether the primary industry classification can be carried out on the enterprise-level power users to be classified according to the names, if so, the primary industry classification can be carried out on the users according to the first comparison word stock, and if not, the primary industry classification can be carried out on the enterprise-level power users to be classified according to the operation range according to the second comparison word stock;
the final classification module is used for determining the optimal cluster number according to the daily load sequences of the users in each primary classification industry and further obtaining final classification;
the correction module is specifically configured to: on the basis of the national economy industry classification standard, the method combines the actual number scale of enterprises in each industry in a target area, supplements the emerging industry which is not involved in the standard, disassembles the industry entries with more enterprises, deletes the industry entries with no or few related enterprises, merges the industry entries with less enterprises to obtain m industry classification entries trade 1 ,trade 2 ,…,trade m ;
The training module specifically comprises the following first comparison word stock List 1 based on the name-industry mapping relation: after the names of enterprise-level power users in the known industry are segmented, the vocabulary with obvious industry attributes and the industry are combined into a first comparison word bank List 1, namely:
wherein, word ij The requirement is that the enterprise-level power user names in the ith industry appear for multiple times, and the enterprise-level power user names except the ith industry appear rarely, namely:
in the above, n ij Word representation ij The number of occurrences in the business-level power user name word segmentation result with the i-th industry and the keyword j; alpha is a sensitivity coefficient, and alpha is more than or equal to 1; n's' ij Word representation ij The number of occurrences in the business-level power user name word segmentation results with a keyword j, except for the ith industry; sigma is an accuracy coefficient, and sigma is more than or equal to 0 and less than or equal to 1;
the training module specifically comprises the following second comparison word stock List 2 based on the operation range-industry mapping relation: after the business scope of enterprise-level power users in the known industry is segmented, the business scope keywords of each industry are extracted by utilizing an improved TF-IDF algorithm, and the keywords of each industry are subjected to Q-based ij Value sorting, namely after meaningless keywords are filtered, a plurality of keywords before sorting and corresponding industries are taken to form a second comparison word stock List 2, namely:
in the above, W i Representing the total word number of word segmentation results in the ith industry operation range; w (w) ij Representing the number of times that the keyword j appears in the business scope word segmentation result of the enterprise-level power consumer in the ith industry; w' ij Representing the number of times that the keyword j appears in the business scope word segmentation results of the enterprise-level power consumer except the ith industry;
the primary classification module is specifically configured to: performing primary industry classification on the user according to a first comparison word stock List 1, including: performing word segmentation on the enterprise-level power user name to be classified, if the word segmentation result contains a List 1word ij Classifying the user as the user in the ith industry if any word in the List 1 is not contained ij The classification by name is not possible; and then, carrying out primary industry classification on the user according to a second comparison word stock List 2, wherein the method comprises the following steps: the business scope of the user is crawled from the internet and word segmentation is carried out on the business scope, then word segmentation results are matched with a second comparison word stock List 2, the number and sequence numbers of words corresponding to each industry in the comparison word stock List 2 contained in the word segmentation results are counted, the matching number of each industry is compared, the industry with the largest matching number is determined to be the industry to which the user belongs, when the matching number of the industries is the same as that of a plurality of industries, the product sum of the sequence numbers and the number is compared, and the industry with the smallest product sum is determined to be the industry to which the user belongs.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110629382.4A CN113360652B (en) | 2021-06-07 | 2021-06-07 | Enterprise-level power user intelligent classification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110629382.4A CN113360652B (en) | 2021-06-07 | 2021-06-07 | Enterprise-level power user intelligent classification method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113360652A CN113360652A (en) | 2021-09-07 |
CN113360652B true CN113360652B (en) | 2024-03-01 |
Family
ID=77532581
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110629382.4A Active CN113360652B (en) | 2021-06-07 | 2021-06-07 | Enterprise-level power user intelligent classification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113360652B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104200275A (en) * | 2014-06-24 | 2014-12-10 | 国家电网公司 | Power utilization mode classification and control method based on user behavior characteristics |
CN105243389A (en) * | 2015-09-28 | 2016-01-13 | 北京橙鑫数据科技有限公司 | Industry classification tag determining method and apparatus for company name |
CN106777244A (en) * | 2016-12-27 | 2017-05-31 | 国网浙江象山县供电公司 | A kind of power customer electricity consumption behavior analysis method and system |
CN109800801A (en) * | 2019-01-10 | 2019-05-24 | 浙江工业大学 | K-Means clustering lane method of flow based on Gauss regression algorithm |
CN112686043A (en) * | 2021-01-12 | 2021-04-20 | 武汉大学 | Word vector-based classification method for emerging industries to which enterprises belong |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9910922B2 (en) * | 2014-08-28 | 2018-03-06 | International Business Machines Corporation | Analysis of user's data to recommend connections |
US10074097B2 (en) * | 2015-02-03 | 2018-09-11 | Opower, Inc. | Classification engine for classifying businesses based on power consumption |
-
2021
- 2021-06-07 CN CN202110629382.4A patent/CN113360652B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104200275A (en) * | 2014-06-24 | 2014-12-10 | 国家电网公司 | Power utilization mode classification and control method based on user behavior characteristics |
CN105243389A (en) * | 2015-09-28 | 2016-01-13 | 北京橙鑫数据科技有限公司 | Industry classification tag determining method and apparatus for company name |
CN106777244A (en) * | 2016-12-27 | 2017-05-31 | 国网浙江象山县供电公司 | A kind of power customer electricity consumption behavior analysis method and system |
CN109800801A (en) * | 2019-01-10 | 2019-05-24 | 浙江工业大学 | K-Means clustering lane method of flow based on Gauss regression algorithm |
CN112686043A (en) * | 2021-01-12 | 2021-04-20 | 武汉大学 | Word vector-based classification method for emerging industries to which enterprises belong |
Also Published As
Publication number | Publication date |
---|---|
CN113360652A (en) | 2021-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103064970B (en) | Optimize the search method of interpreter | |
WO2022126963A1 (en) | Customer profiling method based on customer response corpora, and device related thereto | |
CN110781332A (en) | Electric power resident user daily load curve clustering method based on composite clustering algorithm | |
CN113590698B (en) | Artificial intelligence technology-based data asset classification modeling and hierarchical protection method | |
CN108664653A (en) | A kind of Medical Consumption client's automatic classification method based on K-means | |
CN110597796B (en) | Big data real-time modeling method and system based on full life cycle | |
CN112184484A (en) | Differentiated service method and system for power users | |
CN115423603A (en) | Wind control model establishing method and system based on machine learning and storage medium | |
CN111339167A (en) | Method for analyzing influence factors of transformer area line loss rate based on K-means and principal component linear regression | |
CN108304990B (en) | Power failure sensitivity pre-judging method and system | |
CN113836310A (en) | Knowledge graph driven industrial product supply chain management method and system | |
CN111752541B (en) | Payment routing method based on Rete algorithm | |
CN115062087A (en) | User portrait construction method, device, equipment and medium | |
KR20210033294A (en) | Automatic manufacturing apparatus for reports, and control method thereof | |
CN112215655B (en) | Label management method and system for customer portrait | |
CN108009847B (en) | Method for extracting imbedding characteristics of shop under takeaway scene | |
CN113792081A (en) | Method and system for automatically checking data assets | |
CN113360652B (en) | Enterprise-level power user intelligent classification method and device | |
CN110826845B (en) | Multidimensional combination cost allocation device and method | |
CN112183037A (en) | Data classification and summarization method and system in parallel enterprise finance and tax SaaS system | |
CN116561345A (en) | Information knowledge graph construction method based on multi-mode data company | |
CN113538011B (en) | Method for associating non-booked contact information with booked user in electric power system | |
CN115034762A (en) | Post recommendation method and device, storage medium, electronic equipment and product | |
CN112818215A (en) | Product data processing method, device, equipment and storage medium | |
CN109919811B (en) | Insurance agent culture scheme generation method based on big data and related equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |