CN112035715B - User label design method and device - Google Patents
User label design method and device Download PDFInfo
- Publication number
- CN112035715B CN112035715B CN202010663731.XA CN202010663731A CN112035715B CN 112035715 B CN112035715 B CN 112035715B CN 202010663731 A CN202010663731 A CN 202010663731A CN 112035715 B CN112035715 B CN 112035715B
- Authority
- CN
- China
- Prior art keywords
- user
- clustering
- data
- correlation
- average temperature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000013461 design Methods 0.000 title claims abstract description 13
- 230000005611 electricity Effects 0.000 claims abstract description 82
- 238000004364 calculation method Methods 0.000 claims abstract description 47
- 238000012216 screening Methods 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims description 22
- 230000007774 longterm Effects 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9035—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Evolutionary Computation (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a user label design method and a device, wherein the method comprises the following steps: acquiring all user power consumption data, and performing data screening on all user power consumption data based on the basic attributes of users to obtain screened user power consumption data; performing correlation calculation between the power consumption data and the average temperature on the screened user power consumption data to obtain correlation characteristic data between the user power consumption data and the average temperature; and forming a corresponding user label according to the correlation characteristic data between the electricity utilization data and the average temperature of the user and the basic attribute of the user. In the embodiment of the invention, the user label is constructed according to the actual situation of the user, so that the subsequent user label adjustment of the power supply strategy of the user is facilitated, the power utilization requirement of the user is met, and the power utilization experience of the user is improved.
Description
Technical Field
The invention relates to the technical field of power grid user power supply, in particular to a user tag design method and device.
Background
With the continuous improvement of the practicability degree of the information-based construction, a large amount of basic information of all aspects of customers is accumulated at present, and data support is provided for the development of all work. However, the existing data analysis and support mode can not realize multi-dimensional and three-dimensional customer feature depiction, and can not support the relationship between the power consumption and the temperature of the basic attributes of the user files of different users; when the weather changes suddenly, there is no way to adjust the power supply strategy through the corresponding tag, so that the power consumption requirement of the user cannot be effectively guaranteed.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a user tag design method and device, which are used for constructing a user tag according to the actual situation of a user, facilitating the adjustment of a user power supply strategy for the user tag in the follow-up process, meeting the power utilization requirement of the user and improving the power utilization experience of the user.
In order to solve the above technical problem, an embodiment of the present invention provides a user tag design method, where the method includes:
acquiring all user electricity consumption data, and performing data screening on all user electricity consumption data based on the basic attributes of the users to obtain screened user electricity consumption data;
performing correlation calculation between the power consumption data and the average temperature on the screened user power consumption data to obtain correlation characteristic data between the user power consumption data and the average temperature;
forming a corresponding user label according to the correlation characteristic data between the electricity utilization data of the user and the average temperature and the basic attribute of the user;
clustering and identifying the user labels based on a clustering algorithm, and determining the proportion of the basic attribute of each user profile in the user labels after clustering and identification;
obtaining the correlation strength of the user electricity consumption data corresponding to the user tags after grouping and identification and the average temperature data;
and adjusting the user power supply service of the basic attribute of the relevant user profile corresponding to the corresponding parcel based on the proportion of the basic attribute of each user profile in the grouped and identified user tags, the correlation strength between the power consumption data of the users corresponding to the grouped and identified user tags and the average temperature data, and the average temperature information of the weather predicted in the future.
Optionally, the user electricity consumption data comprises quarterly and/or monthly and/or daily electricity consumption data of the user in the whole year;
the basic attributes of the user comprise a user profile basic attribute and a user state basic attribute;
the user profile basic attributes comprise social security attributes, electricity utilization categories, user categories, importance degrees, regional characteristics, electricity price types and load properties;
the user state basic attributes comprise new customers, long-term electricity-free customers and batch electricity-using customers.
Optionally, the performing of correlation calculation between the power consumption data and the average temperature on the screened user power consumption data includes:
and performing correlation calculation based on the screened user electricity utilization data of the user in the preset time period and the average temperature of the preset time period.
Optionally, the calculation formula of the correlation calculation is as follows:
wherein r (X, Y) represents the correlation between X and Y; cov (X, Y) represents the covariance between X and Y; var (X) Var (Y) represents the variance of X and the variance of Y, respectively; x represents the electricity consumption data of the user in the preset time period, and Y represents the average temperature in the preset time period.
Optionally, the clustering and identifying the user tags based on the clustering algorithm includes:
carrying out preliminary clustering on the user tags by using a Canopy clustering algorithm to obtain a preliminary clustering result;
clustering the primary clustering result by using a Kmeans clustering algorithm to obtain a clustering result;
and clustering and identifying the user tags according to each clustering center in the clustering result.
Optionally, the performing preliminary clustering on the user tag by using a Canopy clustering algorithm to obtain a preliminary clustering result includes:
initializing the user tag as a list data;
randomly selecting an object D from the list data as a clustering center of a Canopy clustering algorithm, marking the object D as C, and deleting the object D from the list data;
calculating the distances between all objects in the list data and C, adding the objects into a clustering center of a Canopy clustering algorithm and marking the objects as C when the distances are greater than a first preset distance, and deleting the objects in the list data when the distances are less than a second preset distance;
adding the clustering center of the Canopy clustering algorithm into a clustering list, and repeating until the list data is empty;
and taking the clustering list as a preliminary clustering result.
Optionally, the clustering the preliminary clustering result by using a Kmeans clustering algorithm to obtain a clustering result includes:
taking the preliminary clustering result as an initialized mass center of a Kmeans clustering algorithm, and distributing each user label to the corresponding mass center;
calculating the distance of each user label to each centroid, and distributing the user labels to the nearest clustering centroid;
carrying out mean value calculation on each clustering centroid, and updating the clustering centroids according to the mean value calculation;
and calculating variance error values of all the user tags to the corresponding updated clustering centroids, judging whether the variance error values are larger than a preset threshold value, if so, repeating the step of calculating the distance from each user tag to each centroid, otherwise, finishing clustering and obtaining a clustering result.
Optionally, the correlation strength includes: very strong correlation, moderate correlation, weak correlation, or irrelevant.
In addition, an embodiment of the present invention further provides a user tag design apparatus, where the apparatus includes:
the data screening module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for acquiring all user electricity consumption data, and performing data screening on all the user electricity consumption data based on the basic attribute of a user to acquire screened user electricity consumption data;
a correlation calculation module: the correlation calculation module is used for carrying out correlation calculation between the power consumption data and the average temperature on the screened user power consumption data to obtain correlation characteristic data between the user power consumption data and the average temperature;
a tag generation module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for forming a corresponding user tag according to correlation characteristic data between power utilization data and average temperature of a user and basic attributes of the user;
a grouping and identification module: the system comprises a clustering algorithm, a user label identification module, a user profile base attribute module and a user label identification module, wherein the clustering algorithm is used for clustering and identifying the user labels based on the clustering algorithm and determining the proportion of each user profile base attribute in the user labels after clustering and identification;
a correlation mild acquisition module: the correlation strength between the user electricity consumption data corresponding to the user tags after the grouping and the identification and the average temperature data is obtained;
a power supply adjustment module: and the user power supply service is used for adjusting the basic attributes of the related user profiles of the corresponding parcel based on the proportion of the basic attributes of each user profile in the grouped and identified user tags, the correlation strength between the power consumption data and the average temperature data of the users corresponding to the grouped and identified user tags and the average temperature information of the weather predicted in the future.
In the embodiment of the invention, the user label is constructed according to the actual situation of the user, so that the subsequent user label adjustment of the power supply strategy of the user is facilitated, the power utilization requirement of the user is met, and the power utilization experience of the user is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart diagram of a user tag design method in an embodiment of the invention;
fig. 2 is a schematic structural diagram of a user tag designing apparatus in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
Referring to fig. 1, fig. 1 is a flow chart illustrating a user tag design method according to an embodiment of the present invention.
As shown in fig. 1, a user tag design method includes:
s11: acquiring all user electricity consumption data, and performing data screening on all user electricity consumption data based on the basic attributes of the users to obtain screened user electricity consumption data;
in the implementation process of the invention, the electricity consumption data of the user comprises the electricity consumption data of the user in each quarter and/or each month and/or each day in the whole year; the basic attributes of the user comprise a user profile basic attribute and a user state basic attribute; the user file basic attributes comprise social security attributes, electricity utilization categories, user categories, importance degrees, regional characteristics, electricity price types and load properties; the user state basic attributes comprise new customers, long-term electricity-free customers and batch electricity-using customers.
Specifically, the user electricity consumption data comprises quarterly and/or monthly and/or daily electricity consumption data of the user in the whole year; basic attributes of the user; the basic attributes comprise user profile basic attributes and user state basic attributes; user profile base attributes: social security attributes (low security, five security, etc.), electricity utilization categories, user categories, importance levels (important customers, important attention customers, etc.), regional characteristics (urban areas, towns, etc.), electricity price types (single-system electricity price, two-system electricity price), load properties, etc.; user state base attributes: the new customer, the long-term electricity-free customer and the batch electricity-using customer.
And screening the user electricity utilization data according to the basic attribute of the user, removing the electricity utilization data of the long-term electricity-non-utilization client in the basic attribute of the user state, performing simple user classification and other operations according to the basic data of the user, and then obtaining the screened user electricity utilization data.
S12: performing correlation calculation between the power consumption data and the average temperature on the screened user power consumption data to obtain correlation characteristic data between the user power consumption data and the average temperature;
in a specific implementation process of the present invention, the calculating a correlation between the power consumption data and the average temperature of the screened user power consumption data includes: and performing correlation calculation based on the screened user electricity utilization data of the user in the preset time period and the average temperature of the preset time period.
Further, the calculation formula of the correlation calculation is as follows:
wherein r (X, Y) represents the correlation between X and Y; cov (X, Y) represents the covariance between X and Y; var (X) Var (Y) represents the variance of X and the variance of Y, respectively; x represents the electricity consumption data of the user in the preset time period, and Y represents the average temperature in the preset time period.
Specifically, correlation calculation is carried out on the user electricity utilization data of the user in the screened user electricity utilization data in a preset time period and the average temperature of the preset time period, and correlation characteristic data between the user electricity utilization data and the average temperature are obtained; and the calculation formula of the correlation calculation is as follows:
wherein r (X, Y) represents the correlation between X and Y; cov (X, Y) represents the covariance between X and Y; var (X) Var (Y) represents the variance of X and the variance of Y, respectively; x represents the electricity consumption data of the user in the preset time period, and Y represents the average temperature in the preset time period.
S13: forming a corresponding user label according to the correlation characteristic data between the electricity utilization data of the user and the average temperature and the basic attribute of the user;
in the specific implementation process of the invention, the marking is carried out according to the correlation characteristic data between the electricity consumption data of the user and the average temperature and the basic attribute of the user, and then a corresponding user label is formed.
S14: clustering and identifying the user labels based on a clustering algorithm, and determining the proportion of the basic attribute of each user profile in the user labels after clustering and identification;
in the specific implementation process of the present invention, the clustering and identifying the user tags based on the clustering algorithm includes: carrying out preliminary clustering on the user tags by using a Canopy clustering algorithm to obtain a preliminary clustering result; clustering the primary clustering result by using a Kmeans clustering algorithm to obtain a clustering result; and clustering and identifying the user tags according to each clustering center in the clustering result.
Further, the performing preliminary clustering on the user tag by using a Canopy clustering algorithm to obtain a preliminary clustering result includes: initializing the user tag as a list data; randomly selecting an object D from the list data as a clustering center of a Canopy clustering algorithm, marking the object D as C, and deleting the object D from the list data; calculating the distances between all objects in the list data and C, when the distances are larger than a first preset distance, adding the objects into a clustering center of a Canopy clustering algorithm and marking the objects as C, and when the distances are smaller than a second preset distance, deleting the objects from the list data; adding the clustering center of the Canopy clustering algorithm into a clustering list, and repeating until the list data is empty; and taking the clustering list as a preliminary clustering result.
Further, the clustering the preliminary clustering result by using a Kmeans clustering algorithm to obtain a clustering result includes: taking the preliminary clustering result as an initialized mass center of a Kmeans clustering algorithm, and distributing each user label to the corresponding mass center; calculating the distance of each user label to each centroid, and distributing the user labels to the nearest clustering centroid; carrying out mean value calculation on each clustering centroid, and updating the clustering centroids according to the mean value calculation; and calculating variance error values of all the user tags to the corresponding updated clustering centroids, judging whether the variance error values are larger than a preset threshold value, if so, repeating the step of calculating the distance from each user tag to each centroid, otherwise, finishing clustering and obtaining a clustering result.
Specifically, clustering is carried out on the user labels through a Canopy-Kmeans clustering algorithm, clustering and identification are carried out according to clustering results, and after the clustering and identification are finished, the proportion of the basic attribute of each user profile in the user labels after the clustering and identification is determined.
When the user tags need to be grouped and identified, the user tags need to be clustered firstly, specifically, a Canopy clustering algorithm is adopted for primary clustering, then a Kmeans clustering algorithm is utilized for secondary clustering to obtain a clustering result, and then the user tags are grouped and identified according to each clustering center in the clustering result.
When the Canopy clustering algorithm is used for preliminary clustering, the clustering process comprises the following steps: initializing a user tag into a data list, and presetting two threshold values comprising a first preset distance and a second preset distance; randomly selecting an object D in the data list as a clustering center of a Canopy clustering algorithm, marking the object D as C, and deleting the object D from the list data; calculating the distances between all objects in the list data and C, adding the objects into a clustering center of a Canopy clustering algorithm and marking the objects as C when the distances are greater than a first preset distance, and deleting the objects in the list data when the distances are less than a second preset distance; adding the clustering center of the Canopy clustering algorithm into a clustering list, and repeating until the list data is empty; the cluster list is taken as a preliminary clustering result.
And assuming that all objects in the list data are A and the cluster center object is C, calculating the distance between A and C by adopting a cosine distance calculation formula, specifically:
wherein A = (a) 1 ,a 2 ,…,a n ),C=(c 1 ,c 2 ,…,c n ),i=1,2,…,n。
After the Canopy clustering algorithm is completed, obtaining a primary clustering result, and clustering the primary clustering result by adopting a Kmeans clustering algorithm; and classifying by taking k objects in the space as centers, classifying the objects closest to each center in the object space into one class respectively, and successively calculating and updating the value of each clustering centroid in a multi-iteration mode until the clustering centroid is stable and unchanged.
And (3) clustering by using a Kmeans clustering algorithm: taking the preliminary clustering result as an initialized mass center of a Kmeans clustering algorithm, and distributing each user label to the corresponding mass center; namely, the Canopy center generated by the Canopy clustering algorithm is used as the initialized centroid of the Kmeans algorithm, and each label is already distributed to the corresponding centroid; calculating the distance of each user label to each centroid, and distributing the user labels to the nearest clustering centroids; the distance calculation formula still adopts the cosine distance used in the Canopy clustering algorithm; carrying out mean value calculation on each clustering centroid, and updating the clustering centroids according to the mean value calculation; and calculating variance error values of all the user tags from the corresponding updated clustering centroids, judging whether the variance error values are larger than a preset threshold value, if so, repeating the step of calculating the distance from each user tag to each centroid, and if not, finishing clustering to obtain a clustering result.
After the clustering result is obtained, clustering and identifying the user labels according to each clustering center in the clustering result; and then calculating the proportion of the basic attribute of each user profile in the user tags after the grouping and the identification.
S15: obtaining the correlation strength of the user electricity consumption data corresponding to the user tags after grouping and identification and the average temperature data;
in the implementation process of the present invention, the correlation strength includes: very strong correlation, moderate correlation, weak correlation, or irrelevant.
The correlation coefficient between the power consumption data of the user and the temperature needs to be divided according to the numerical ranges, and the correlation strength in each numerical range is labeled, which is specifically shown in the following table:
magnitude of correlation coefficient | General explanation |
0.8~1.0 | Very strong correlation |
0.6~0.8 | Strong correlation |
0.4~0.6 | Moderate correlation |
0.2~0.4 | Weak correlation |
0~0.2 | Weakly or not related |
And obtaining the correlation strength of the user electricity consumption data corresponding to the user tags after grouping and identification and the average temperature data according to the table.
S16: and adjusting the user power supply service of the basic attribute of the relevant user profile corresponding to the parcel based on the proportion of the basic attribute of each user profile in the grouped and identified user tags, the correlation strength of the user power consumption data and the average temperature data corresponding to the grouped and identified user tags and the average temperature information of the future predicted weather.
In the specific implementation process of the invention, the user power supply service of the relevant user profile basic attribute of the corresponding parcel is adjusted according to the proportion of the basic attribute of each user profile in the grouped and identified user tags, the correlation strength of the user power consumption data and the average temperature data corresponding to the grouped and identified user tags and the future predicted weather average temperature information.
In the embodiment of the invention, the user label is constructed according to the actual situation of the user, so that the subsequent user label adjustment of the power supply strategy of the user is facilitated, the power utilization requirement of the user is met, and the power utilization experience of the user is improved.
Examples
Referring to fig. 2, fig. 2 is a schematic structural diagram of a user tag designing apparatus according to an embodiment of the present invention.
As shown in fig. 2, a user tag designing apparatus, the apparatus comprising:
the data screening module 21: the system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring all user electricity consumption data, and performing data screening on all user electricity consumption data based on the basic attributes of users to acquire screened user electricity consumption data;
in the specific implementation process of the invention, the user electricity utilization data comprises quarterly and/or monthly and/or daily electricity utilization data of the user in the whole year; the basic attributes of the user comprise user profile basic attributes and user state basic attributes; the user profile basic attributes comprise social security attributes, electricity utilization categories, user categories, importance degrees, regional characteristics, electricity price types and load properties; the user state basic attribute is new customers, long-term electricity-free customers and batch electricity-using customers.
Specifically, the user electricity consumption data comprises quarterly and/or monthly and/or daily electricity consumption data of the user in the whole year; basic attributes of the user; the basic attributes comprise user profile basic attributes and user state basic attributes; user profile base attributes: social security attributes (low security, five security, etc.), electricity utilization categories, user categories, importance levels (important customers, important attention customers, etc.), regional characteristics (urban areas, towns, etc.), electricity price types (single-system electricity price, two-system electricity price), load properties, etc.; user state base attributes: new customers, long-term electricity-free customers and batch electricity-using customers.
And screening the user electricity utilization data according to the basic attribute of the user, removing the electricity utilization data of the long-term electricity-non-utilization client in the basic attribute of the user state, performing simple user classification and other operations according to the basic data of the user, and then obtaining the screened user electricity utilization data.
The correlation calculation module 22: the correlation calculation module is used for carrying out correlation calculation between the power consumption data and the average temperature on the screened user power consumption data to obtain correlation characteristic data between the user power consumption data and the average temperature;
in a specific implementation process of the present invention, the calculating a correlation between the power consumption data and the average temperature of the screened user power consumption data includes: and performing correlation calculation based on the screened user electricity utilization data of the user in the preset time period and the average temperature of the preset time period.
Further, the calculation formula of the correlation calculation is as follows:
wherein r (X, Y) represents the correlation between X and Y; cov (X, Y) represents the covariance between X and Y; var (X) Var (Y) represents the variance of X and the variance of Y, respectively; x represents the electricity consumption data of the user in the preset time period, and Y represents the average temperature in the preset time period.
Specifically, correlation calculation is carried out on the user electricity utilization data of the user in the screened user electricity utilization data in a preset time period and the average temperature of the preset time period, and correlation characteristic data between the user electricity utilization data and the average temperature are obtained; and the calculation formula of the correlation calculation is as follows:
wherein r (X, Y) represents the correlation between X and Y; cov (X, Y) represents the covariance between X and Y; var (X) Var (Y) each represents a variance of X and a variance of Y; x represents the electricity consumption data of the user in the preset time period, and Y represents the average temperature in the preset time period.
The label generation module 23: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for forming a corresponding user tag according to correlation characteristic data between power utilization data and average temperature of a user and basic attributes of the user;
in the specific implementation process of the invention, the electricity consumption data of the user and the average temperature are labeled according to the correlation characteristic data and the basic attribute of the user, and then a corresponding user label is formed.
Group and identification module 24: the system comprises a clustering algorithm, a user label identification module, a user profile base attribute module and a user label identification module, wherein the clustering algorithm is used for clustering and identifying the user labels based on the clustering algorithm and determining the proportion of each user profile base attribute in the user labels after clustering and identification;
in the specific implementation process of the present invention, the clustering and identifying the user tags based on the clustering algorithm includes: carrying out preliminary clustering on the user tags by using a Canopy clustering algorithm to obtain a preliminary clustering result; clustering the primary clustering result by using a Kmeans clustering algorithm to obtain a clustering result; and clustering and identifying the user tags according to each clustering center in the clustering result.
Further, the performing preliminary clustering on the user tag by using a Canopy clustering algorithm to obtain a preliminary clustering result includes: initializing the user tag as a list data; randomly selecting an object D from the list data as a clustering center of a Canopy clustering algorithm, marking the object D as C, and deleting the object D from the list data; calculating the distances between all objects in the list data and C, adding the objects into a clustering center of a Canopy clustering algorithm and marking the objects as C when the distances are greater than a first preset distance, and deleting the objects in the list data when the distances are less than a second preset distance; adding the clustering center of the Canopy clustering algorithm into a clustering list, and repeating until the list data is empty; and taking the clustering list as a preliminary clustering result.
Further, the clustering the preliminary clustering result by using a Kmeans clustering algorithm to obtain a clustering result includes: taking the preliminary clustering result as an initialized mass center of a Kmeans clustering algorithm, and distributing each user label to the corresponding mass center; calculating the distance of each user label to each centroid, and distributing the user labels to the nearest clustering centroid; carrying out mean value calculation on each clustering centroid, and updating the clustering centroids according to the mean value calculation; and calculating variance error values of all the user tags from the corresponding updated clustering centroids, judging whether the variance error values are larger than a preset threshold value, if so, repeating the step of calculating the distance from each user tag to each centroid, and if not, finishing clustering to obtain a clustering result.
Specifically, clustering is carried out on the user labels through a Canopy-Kmeans clustering algorithm, clustering and identification are carried out according to clustering results, and after the clustering and identification are completed, the proportion of each user profile basic attribute in the user labels after clustering and identification is determined.
When the user tags need to be grouped and identified, the user tags need to be clustered firstly, specifically, a Canopy clustering algorithm is adopted for primary clustering, then a Kmeans clustering algorithm is utilized for secondary clustering to obtain a clustering result, and then the user tags are grouped and identified according to each clustering center in the clustering result.
When the Canopy clustering algorithm is used for preliminary clustering, the clustering process comprises the following steps: initializing a user tag into a data list, and presetting two threshold values comprising a first preset distance and a second preset distance; randomly selecting an object D in the data list as a clustering center of a Canopy clustering algorithm, marking the object D as C, and deleting the object D from the list data; calculating the distances between all objects in the list data and C, adding the objects into a clustering center of a Canopy clustering algorithm and marking the objects as C when the distances are greater than a first preset distance, and deleting the objects in the list data when the distances are less than a second preset distance; adding the clustering center of the Canopy clustering algorithm into a clustering list, and repeating until the list data is empty; the cluster list is used as a preliminary clustering result.
And assuming that all objects in the list data are A and the cluster center object is C, calculating the distance between A and C by adopting a cosine distance calculation formula, specifically:
wherein A = (a) 1 ,a 2 ,…,a n ),C=(c 1 ,c 2 ,…,c n ),i=1,2,…,n。
After the Canopy clustering algorithm is completed, obtaining a primary clustering result, and clustering the primary clustering result by adopting a Kmeans clustering algorithm; and classifying k objects in the space as centers, classifying the objects closest to each center in the object space into one class, and gradually calculating and updating the value of each clustering centroid in a multi-iteration mode until the clustering centroids are stable and unchanged.
And (3) clustering by using a Kmeans clustering algorithm: taking the preliminary clustering result as an initialized mass center of a Kmeans clustering algorithm, and distributing each user label to the corresponding mass center; namely, the Canopy center generated by the Canopy clustering algorithm is used as the initialized centroid of the Kmeans algorithm, and each label is already distributed to the corresponding centroid; calculating the distance of each user label to each centroid, and distributing the user labels to the nearest clustering centroids; the distance calculation formula still adopts the cosine distance used in the Canopy clustering algorithm; carrying out mean value calculation on each clustering centroid, and updating the clustering centroids according to the mean value calculation; and calculating variance error values of all the user tags from the corresponding updated clustering centroids, judging whether the variance error values are larger than a preset threshold value, if so, repeating the step of calculating the distance from each user tag to each centroid, and if not, finishing clustering to obtain a clustering result.
After the clustering result is obtained, clustering and identifying the user labels according to each clustering center in the clustering result; and then calculating the proportion of the basic attribute of each user profile in the user tags after the grouping and the identification.
The correlation mild acquisition module 25: the correlation strength between the user electricity consumption data corresponding to the user tags after the grouping and the identification and the average temperature data is obtained;
in the implementation process of the present invention, the correlation strength includes: very strong correlation, moderate correlation, weak correlation, or irrelevant.
The correlation coefficient between the power consumption data of the user and the temperature needs to be divided according to the numerical ranges, and the correlation strength in each numerical range is labeled, which is specifically shown in the following table:
magnitude of correlation coefficient | General explanation |
0.8~1.0 | Very strong correlation |
0.6~0.8 | Strong correlation |
0.4~0.6 | Moderate correlation |
0.2~0.4 | Weak correlation |
0~0.2 | Weakly or not |
And obtaining the correlation strength of the user electricity consumption data corresponding to the user tags after grouping and identification and the average temperature data according to the table.
The power supply adjustment module 26: and the user power supply service is used for adjusting the basic attributes of the related user profiles of the corresponding parcel based on the proportion of the basic attributes of each user profile in the grouped and identified user tags, the correlation strength between the power consumption data and the average temperature data of the users corresponding to the grouped and identified user tags and the average temperature information of the weather predicted in the future.
In the specific implementation process of the invention, the power supply service of the user corresponding to the basic attribute of the relevant user profile of the corresponding parcel is adjusted according to the proportion of the basic attribute of each user profile in the grouped and identified user tags, the correlation strength of the power consumption data and the average temperature data of the user corresponding to the grouped and identified user tags and the average temperature information of the weather predicted in the future.
In the embodiment of the invention, the user label is constructed according to the actual situation of the user, so that the subsequent user label adjustment of the power supply strategy of the user is facilitated, the power utilization requirement of the user is met, and the power utilization experience of the user is improved.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, and the program may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like.
In addition, the user tag design method and apparatus provided by the embodiment of the present invention are described in detail above, a specific example should be adopted herein to explain the principle and the implementation manner of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (9)
1. A method for designing a user tag, the method comprising:
acquiring all user electricity consumption data, and performing data screening on all user electricity consumption data based on the basic attributes of the users to obtain screened user electricity consumption data;
performing correlation calculation between the power consumption data and the average temperature on the screened user power consumption data to obtain correlation characteristic data between the user power consumption data and the average temperature;
forming a corresponding user label according to the correlation characteristic data between the electricity utilization data of the user and the average temperature and the basic attribute of the user;
clustering and identifying the user labels based on a clustering algorithm, and determining the proportion of the basic attribute of each user profile in the user labels after clustering and identification;
obtaining the correlation strength of the user electricity consumption data corresponding to the user tags after grouping and identification and the average temperature data;
and adjusting the user power supply service of the basic attribute of the relevant user profile corresponding to the parcel based on the proportion of the basic attribute of each user profile in the grouped and identified user tags, the correlation strength of the user power consumption data and the average temperature data corresponding to the grouped and identified user tags and the average temperature information of the future predicted weather.
2. The method of claim 1, wherein the user electricity usage data comprises user quarterly and/or monthly and/or daily electricity usage data throughout the year;
the basic attributes of the user comprise a user profile basic attribute and a user state basic attribute;
the user profile basic attributes comprise social security attributes, electricity utilization categories, user categories, importance degrees, regional characteristics, electricity price types and load properties;
the user state basic attributes comprise new customers, long-term electricity-unused customers and batch electricity-used customers.
3. The method for designing the user tag according to claim 1, wherein the calculating the correlation between the power consumption data and the average temperature of the screened power consumption data of the user includes:
and performing correlation calculation based on the screened user electricity utilization data of the user in the preset time period and the average temperature of the preset time period.
4. The method of claim 3, wherein the correlation calculation is calculated as follows:
wherein r (X, Y) represents the correlation between X and Y; cov (X, Y) represents the covariance between X and Y; var (X) Var (Y) represents the variance of X and the variance of Y, respectively; x represents the electricity consumption data of the user in the preset time period, and Y represents the average temperature in the preset time period.
5. The method of claim 1, wherein the clustering and identifying the user tags based on a clustering algorithm comprises:
carrying out preliminary clustering on the user tags by using a Canopy clustering algorithm to obtain a preliminary clustering result;
clustering the primary clustering result by using a Kmeans clustering algorithm to obtain a clustering result;
and clustering and identifying the user tags according to each clustering center in the clustering result.
6. The method according to claim 5, wherein the performing preliminary clustering on the user tags by using a Canopy clustering algorithm to obtain a preliminary clustering result comprises:
initializing the user tag as a list data;
randomly selecting an object D from the list data as a clustering center of a Canopy clustering algorithm, marking the object D as C, and deleting the object D from the list data;
calculating the distances between all objects in the list data and C, when the distances are larger than a first preset distance, adding the objects into a clustering center of a Canopy clustering algorithm and marking the objects as C, and when the distances are smaller than a second preset distance, deleting the objects from the list data;
adding the clustering center of the Canopy clustering algorithm into a clustering list, and repeating until the list data is empty;
and taking the clustering list as a preliminary clustering result.
7. The method according to claim 5, wherein the clustering the preliminary clustering result by using a Kmeans clustering algorithm to obtain a clustering result comprises:
taking the preliminary clustering result as an initialized mass center of a Kmeans clustering algorithm, and distributing each user label to the corresponding mass center;
calculating the distance of each user label to each centroid, and distributing the user labels to the nearest clustering centroid;
carrying out mean value calculation on each clustering centroid, and updating the clustering centroids according to the mean value calculation;
and calculating variance error values of all the user tags to the corresponding updated clustering centroids, judging whether the variance error values are larger than a preset threshold value, if so, repeating the step of calculating the distance from each user tag to each centroid, otherwise, finishing clustering and obtaining a clustering result.
8. The user tag design method of claim 1, wherein the correlation strength comprises: very strong correlation, moderate correlation, weak correlation, or irrelevant.
9. A user tag design apparatus, the apparatus comprising:
the data screening module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for acquiring all user electricity consumption data, and performing data screening on all the user electricity consumption data based on the basic attribute of a user to acquire screened user electricity consumption data;
a correlation calculation module: the correlation calculation module is used for carrying out correlation calculation between the power consumption data and the average temperature on the screened user power consumption data to obtain correlation characteristic data between the user power consumption data and the average temperature;
a tag generation module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for forming a corresponding user tag according to correlation characteristic data between power utilization data and average temperature of a user and basic attributes of the user;
a grouping and identification module: the system comprises a clustering algorithm, a user label identification module, a user profile base attribute module and a user label identification module, wherein the clustering algorithm is used for clustering and identifying the user labels based on the clustering algorithm and determining the proportion of each user profile base attribute in the user labels after clustering and identification;
a correlation mild acquisition module: the correlation strength between the user electricity consumption data corresponding to the user tags after the grouping and the identification and the average temperature data is obtained;
a power supply adjusting module: and the user power supply service is used for adjusting the basic attributes of the related user profiles of the corresponding parcel based on the proportion of the basic attributes of each user profile in the grouped and identified user tags, the correlation strength between the power consumption data and the average temperature data of the users corresponding to the grouped and identified user tags and the average temperature information of the weather predicted in the future.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010663731.XA CN112035715B (en) | 2020-07-10 | 2020-07-10 | User label design method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010663731.XA CN112035715B (en) | 2020-07-10 | 2020-07-10 | User label design method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112035715A CN112035715A (en) | 2020-12-04 |
CN112035715B true CN112035715B (en) | 2023-04-14 |
Family
ID=73579036
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010663731.XA Active CN112035715B (en) | 2020-07-10 | 2020-07-10 | User label design method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112035715B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102982489A (en) * | 2012-11-23 | 2013-03-20 | 广东电网公司电力科学研究院 | Power customer online grouping method based on mass measurement data |
CN103971296A (en) * | 2014-05-16 | 2014-08-06 | 国家电网公司 | Power purchase method for mathematic model based on electrical loads and temperature |
CN103995161A (en) * | 2014-06-03 | 2014-08-20 | 深圳市康拓普信息技术有限公司 | Method and system for discriminating electricity stealing and leaking users |
CN105069536A (en) * | 2015-08-19 | 2015-11-18 | 国网安徽省电力公司经济技术研究院 | Electricity demand predication method based on temperature and economic growth |
CN105787259A (en) * | 2016-02-17 | 2016-07-20 | 国网甘肃省电力公司武威供电公司 | Method for analyzing influence correlation of multiple meteorological factors and load changes |
CN106530132A (en) * | 2016-11-14 | 2017-03-22 | 国家电网公司 | Power load clustering method and device |
CN208384675U (en) * | 2018-08-02 | 2019-01-15 | 蒋婧 | A kind of power marketing electric charge pressing payment auxiliary device |
CN109284886A (en) * | 2018-02-02 | 2019-01-29 | 中领世能(天津)科技有限公司 | Electrical Safety management method and device based on artificial intelligence |
CN110728537A (en) * | 2019-09-24 | 2020-01-24 | 国网河北省电力有限公司信息通信分公司 | Prediction payment method based on power consumer behavior label |
CN111065901A (en) * | 2017-07-11 | 2020-04-24 | 因特瑞有限公司 | Time temperature indicating label |
CN111178957A (en) * | 2019-12-23 | 2020-05-19 | 广西电网有限责任公司 | Method for early warning sudden increase of electric quantity of electricity consumption customer |
CN111210159A (en) * | 2020-01-14 | 2020-05-29 | 国网上海市电力公司 | Air conditioner load power utilization right distribution and transaction system and method based on block chain under UEIOT |
-
2020
- 2020-07-10 CN CN202010663731.XA patent/CN112035715B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102982489A (en) * | 2012-11-23 | 2013-03-20 | 广东电网公司电力科学研究院 | Power customer online grouping method based on mass measurement data |
CN103971296A (en) * | 2014-05-16 | 2014-08-06 | 国家电网公司 | Power purchase method for mathematic model based on electrical loads and temperature |
CN103995161A (en) * | 2014-06-03 | 2014-08-20 | 深圳市康拓普信息技术有限公司 | Method and system for discriminating electricity stealing and leaking users |
CN105069536A (en) * | 2015-08-19 | 2015-11-18 | 国网安徽省电力公司经济技术研究院 | Electricity demand predication method based on temperature and economic growth |
CN105787259A (en) * | 2016-02-17 | 2016-07-20 | 国网甘肃省电力公司武威供电公司 | Method for analyzing influence correlation of multiple meteorological factors and load changes |
CN106530132A (en) * | 2016-11-14 | 2017-03-22 | 国家电网公司 | Power load clustering method and device |
CN111065901A (en) * | 2017-07-11 | 2020-04-24 | 因特瑞有限公司 | Time temperature indicating label |
CN109284886A (en) * | 2018-02-02 | 2019-01-29 | 中领世能(天津)科技有限公司 | Electrical Safety management method and device based on artificial intelligence |
CN208384675U (en) * | 2018-08-02 | 2019-01-15 | 蒋婧 | A kind of power marketing electric charge pressing payment auxiliary device |
CN110728537A (en) * | 2019-09-24 | 2020-01-24 | 国网河北省电力有限公司信息通信分公司 | Prediction payment method based on power consumer behavior label |
CN111178957A (en) * | 2019-12-23 | 2020-05-19 | 广西电网有限责任公司 | Method for early warning sudden increase of electric quantity of electricity consumption customer |
CN111210159A (en) * | 2020-01-14 | 2020-05-29 | 国网上海市电力公司 | Air conditioner load power utilization right distribution and transaction system and method based on block chain under UEIOT |
Also Published As
Publication number | Publication date |
---|---|
CN112035715A (en) | 2020-12-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11043808B2 (en) | Method for identifying pattern of load cycle | |
CN111144468B (en) | Method and device for labeling power consumer information, electronic equipment and storage medium | |
CN106446967A (en) | Novel power system load curve clustering method | |
CN111724278A (en) | Fine classification method and system for power multi-load users | |
CN110674993A (en) | User load short-term prediction method and device | |
CN117690030B (en) | Multi-face flower identification method and system based on image processing | |
CN108664653A (en) | A kind of Medical Consumption client's automatic classification method based on K-means | |
Bakhshi et al. | Review and comparison between clustering algorithms with duplicate entities detection purpose | |
CN116644184A (en) | Human Resource Information Management System Based on Data Clustering | |
CN111324790A (en) | Load type identification method based on support vector machine classification | |
Calò et al. | A hierarchical modeling approach for clustering probability density functions | |
CN112035715B (en) | User label design method and device | |
CN112529712A (en) | Modeling method and system for user operation analysis RFM | |
CN110633337B (en) | Feature area determination method and device and electronic equipment | |
Yang et al. | Application Research of K-means Algorithm based on Big Data Background | |
CN114372835B (en) | Comprehensive energy service potential customer identification method, system and computer equipment | |
CN111667152B (en) | Automatic auditing method for text data calibration task based on crowdsourcing | |
CN114638284A (en) | Power utilization behavior characterization method considering external influence factors | |
Li | Cluster Analysis of Students' Consumption Behavior Based on K-means++ Algorithm | |
CN110278189B (en) | Intrusion detection method based on network flow characteristic weight map | |
Xu et al. | Network user interest pattern mining based on entropy clustering algorithm | |
CN114417972A (en) | User electricity consumption behavior analysis method based on principal component analysis and density peak clustering | |
CN113989676A (en) | Terminal area meteorological scene identification method for improving deep convolutional self-coding embedded clustering | |
CN109919811B (en) | Insurance agent culture scheme generation method based on big data and related equipment | |
Kohan et al. | Comparison of modified k-means and hierarchical algorithms in customers load curves clustering for designing suitable tariffs in electricity market |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |