CN112035715B - User label design method and device - Google Patents

User label design method and device Download PDF

Info

Publication number
CN112035715B
CN112035715B CN202010663731.XA CN202010663731A CN112035715B CN 112035715 B CN112035715 B CN 112035715B CN 202010663731 A CN202010663731 A CN 202010663731A CN 112035715 B CN112035715 B CN 112035715B
Authority
CN
China
Prior art keywords
user
clustering
data
correlation
average temperature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010663731.XA
Other languages
Chinese (zh)
Other versions
CN112035715A (en
Inventor
洪莹
王凯
吴思思
黄玉珊
韦国惠
黄绪荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Power Grid Co Ltd
Original Assignee
Guangxi Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Power Grid Co Ltd filed Critical Guangxi Power Grid Co Ltd
Priority to CN202010663731.XA priority Critical patent/CN112035715B/en
Publication of CN112035715A publication Critical patent/CN112035715A/en
Application granted granted Critical
Publication of CN112035715B publication Critical patent/CN112035715B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Evolutionary Computation (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a user label design method and a device, wherein the method comprises the following steps: acquiring all user power consumption data, and performing data screening on all user power consumption data based on the basic attributes of users to obtain screened user power consumption data; performing correlation calculation between the power consumption data and the average temperature on the screened user power consumption data to obtain correlation characteristic data between the user power consumption data and the average temperature; and forming a corresponding user label according to the correlation characteristic data between the electricity utilization data and the average temperature of the user and the basic attribute of the user. In the embodiment of the invention, the user label is constructed according to the actual situation of the user, so that the subsequent user label adjustment of the power supply strategy of the user is facilitated, the power utilization requirement of the user is met, and the power utilization experience of the user is improved.

Description

User label design method and device
Technical Field
The invention relates to the technical field of power grid user power supply, in particular to a user tag design method and device.
Background
With the continuous improvement of the practicability degree of the information-based construction, a large amount of basic information of all aspects of customers is accumulated at present, and data support is provided for the development of all work. However, the existing data analysis and support mode can not realize multi-dimensional and three-dimensional customer feature depiction, and can not support the relationship between the power consumption and the temperature of the basic attributes of the user files of different users; when the weather changes suddenly, there is no way to adjust the power supply strategy through the corresponding tag, so that the power consumption requirement of the user cannot be effectively guaranteed.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a user tag design method and device, which are used for constructing a user tag according to the actual situation of a user, facilitating the adjustment of a user power supply strategy for the user tag in the follow-up process, meeting the power utilization requirement of the user and improving the power utilization experience of the user.
In order to solve the above technical problem, an embodiment of the present invention provides a user tag design method, where the method includes:
acquiring all user electricity consumption data, and performing data screening on all user electricity consumption data based on the basic attributes of the users to obtain screened user electricity consumption data;
performing correlation calculation between the power consumption data and the average temperature on the screened user power consumption data to obtain correlation characteristic data between the user power consumption data and the average temperature;
forming a corresponding user label according to the correlation characteristic data between the electricity utilization data of the user and the average temperature and the basic attribute of the user;
clustering and identifying the user labels based on a clustering algorithm, and determining the proportion of the basic attribute of each user profile in the user labels after clustering and identification;
obtaining the correlation strength of the user electricity consumption data corresponding to the user tags after grouping and identification and the average temperature data;
and adjusting the user power supply service of the basic attribute of the relevant user profile corresponding to the corresponding parcel based on the proportion of the basic attribute of each user profile in the grouped and identified user tags, the correlation strength between the power consumption data of the users corresponding to the grouped and identified user tags and the average temperature data, and the average temperature information of the weather predicted in the future.
Optionally, the user electricity consumption data comprises quarterly and/or monthly and/or daily electricity consumption data of the user in the whole year;
the basic attributes of the user comprise a user profile basic attribute and a user state basic attribute;
the user profile basic attributes comprise social security attributes, electricity utilization categories, user categories, importance degrees, regional characteristics, electricity price types and load properties;
the user state basic attributes comprise new customers, long-term electricity-free customers and batch electricity-using customers.
Optionally, the performing of correlation calculation between the power consumption data and the average temperature on the screened user power consumption data includes:
and performing correlation calculation based on the screened user electricity utilization data of the user in the preset time period and the average temperature of the preset time period.
Optionally, the calculation formula of the correlation calculation is as follows:
Figure GDA0003914005980000021
wherein r (X, Y) represents the correlation between X and Y; cov (X, Y) represents the covariance between X and Y; var (X) Var (Y) represents the variance of X and the variance of Y, respectively; x represents the electricity consumption data of the user in the preset time period, and Y represents the average temperature in the preset time period.
Optionally, the clustering and identifying the user tags based on the clustering algorithm includes:
carrying out preliminary clustering on the user tags by using a Canopy clustering algorithm to obtain a preliminary clustering result;
clustering the primary clustering result by using a Kmeans clustering algorithm to obtain a clustering result;
and clustering and identifying the user tags according to each clustering center in the clustering result.
Optionally, the performing preliminary clustering on the user tag by using a Canopy clustering algorithm to obtain a preliminary clustering result includes:
initializing the user tag as a list data;
randomly selecting an object D from the list data as a clustering center of a Canopy clustering algorithm, marking the object D as C, and deleting the object D from the list data;
calculating the distances between all objects in the list data and C, adding the objects into a clustering center of a Canopy clustering algorithm and marking the objects as C when the distances are greater than a first preset distance, and deleting the objects in the list data when the distances are less than a second preset distance;
adding the clustering center of the Canopy clustering algorithm into a clustering list, and repeating until the list data is empty;
and taking the clustering list as a preliminary clustering result.
Optionally, the clustering the preliminary clustering result by using a Kmeans clustering algorithm to obtain a clustering result includes:
taking the preliminary clustering result as an initialized mass center of a Kmeans clustering algorithm, and distributing each user label to the corresponding mass center;
calculating the distance of each user label to each centroid, and distributing the user labels to the nearest clustering centroid;
carrying out mean value calculation on each clustering centroid, and updating the clustering centroids according to the mean value calculation;
and calculating variance error values of all the user tags to the corresponding updated clustering centroids, judging whether the variance error values are larger than a preset threshold value, if so, repeating the step of calculating the distance from each user tag to each centroid, otherwise, finishing clustering and obtaining a clustering result.
Optionally, the correlation strength includes: very strong correlation, moderate correlation, weak correlation, or irrelevant.
In addition, an embodiment of the present invention further provides a user tag design apparatus, where the apparatus includes:
the data screening module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for acquiring all user electricity consumption data, and performing data screening on all the user electricity consumption data based on the basic attribute of a user to acquire screened user electricity consumption data;
a correlation calculation module: the correlation calculation module is used for carrying out correlation calculation between the power consumption data and the average temperature on the screened user power consumption data to obtain correlation characteristic data between the user power consumption data and the average temperature;
a tag generation module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for forming a corresponding user tag according to correlation characteristic data between power utilization data and average temperature of a user and basic attributes of the user;
a grouping and identification module: the system comprises a clustering algorithm, a user label identification module, a user profile base attribute module and a user label identification module, wherein the clustering algorithm is used for clustering and identifying the user labels based on the clustering algorithm and determining the proportion of each user profile base attribute in the user labels after clustering and identification;
a correlation mild acquisition module: the correlation strength between the user electricity consumption data corresponding to the user tags after the grouping and the identification and the average temperature data is obtained;
a power supply adjustment module: and the user power supply service is used for adjusting the basic attributes of the related user profiles of the corresponding parcel based on the proportion of the basic attributes of each user profile in the grouped and identified user tags, the correlation strength between the power consumption data and the average temperature data of the users corresponding to the grouped and identified user tags and the average temperature information of the weather predicted in the future.
In the embodiment of the invention, the user label is constructed according to the actual situation of the user, so that the subsequent user label adjustment of the power supply strategy of the user is facilitated, the power utilization requirement of the user is met, and the power utilization experience of the user is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart diagram of a user tag design method in an embodiment of the invention;
fig. 2 is a schematic structural diagram of a user tag designing apparatus in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
Referring to fig. 1, fig. 1 is a flow chart illustrating a user tag design method according to an embodiment of the present invention.
As shown in fig. 1, a user tag design method includes:
s11: acquiring all user electricity consumption data, and performing data screening on all user electricity consumption data based on the basic attributes of the users to obtain screened user electricity consumption data;
in the implementation process of the invention, the electricity consumption data of the user comprises the electricity consumption data of the user in each quarter and/or each month and/or each day in the whole year; the basic attributes of the user comprise a user profile basic attribute and a user state basic attribute; the user file basic attributes comprise social security attributes, electricity utilization categories, user categories, importance degrees, regional characteristics, electricity price types and load properties; the user state basic attributes comprise new customers, long-term electricity-free customers and batch electricity-using customers.
Specifically, the user electricity consumption data comprises quarterly and/or monthly and/or daily electricity consumption data of the user in the whole year; basic attributes of the user; the basic attributes comprise user profile basic attributes and user state basic attributes; user profile base attributes: social security attributes (low security, five security, etc.), electricity utilization categories, user categories, importance levels (important customers, important attention customers, etc.), regional characteristics (urban areas, towns, etc.), electricity price types (single-system electricity price, two-system electricity price), load properties, etc.; user state base attributes: the new customer, the long-term electricity-free customer and the batch electricity-using customer.
And screening the user electricity utilization data according to the basic attribute of the user, removing the electricity utilization data of the long-term electricity-non-utilization client in the basic attribute of the user state, performing simple user classification and other operations according to the basic data of the user, and then obtaining the screened user electricity utilization data.
S12: performing correlation calculation between the power consumption data and the average temperature on the screened user power consumption data to obtain correlation characteristic data between the user power consumption data and the average temperature;
in a specific implementation process of the present invention, the calculating a correlation between the power consumption data and the average temperature of the screened user power consumption data includes: and performing correlation calculation based on the screened user electricity utilization data of the user in the preset time period and the average temperature of the preset time period.
Further, the calculation formula of the correlation calculation is as follows:
Figure GDA0003914005980000051
wherein r (X, Y) represents the correlation between X and Y; cov (X, Y) represents the covariance between X and Y; var (X) Var (Y) represents the variance of X and the variance of Y, respectively; x represents the electricity consumption data of the user in the preset time period, and Y represents the average temperature in the preset time period.
Specifically, correlation calculation is carried out on the user electricity utilization data of the user in the screened user electricity utilization data in a preset time period and the average temperature of the preset time period, and correlation characteristic data between the user electricity utilization data and the average temperature are obtained; and the calculation formula of the correlation calculation is as follows:
Figure GDA0003914005980000061
wherein r (X, Y) represents the correlation between X and Y; cov (X, Y) represents the covariance between X and Y; var (X) Var (Y) represents the variance of X and the variance of Y, respectively; x represents the electricity consumption data of the user in the preset time period, and Y represents the average temperature in the preset time period.
S13: forming a corresponding user label according to the correlation characteristic data between the electricity utilization data of the user and the average temperature and the basic attribute of the user;
in the specific implementation process of the invention, the marking is carried out according to the correlation characteristic data between the electricity consumption data of the user and the average temperature and the basic attribute of the user, and then a corresponding user label is formed.
S14: clustering and identifying the user labels based on a clustering algorithm, and determining the proportion of the basic attribute of each user profile in the user labels after clustering and identification;
in the specific implementation process of the present invention, the clustering and identifying the user tags based on the clustering algorithm includes: carrying out preliminary clustering on the user tags by using a Canopy clustering algorithm to obtain a preliminary clustering result; clustering the primary clustering result by using a Kmeans clustering algorithm to obtain a clustering result; and clustering and identifying the user tags according to each clustering center in the clustering result.
Further, the performing preliminary clustering on the user tag by using a Canopy clustering algorithm to obtain a preliminary clustering result includes: initializing the user tag as a list data; randomly selecting an object D from the list data as a clustering center of a Canopy clustering algorithm, marking the object D as C, and deleting the object D from the list data; calculating the distances between all objects in the list data and C, when the distances are larger than a first preset distance, adding the objects into a clustering center of a Canopy clustering algorithm and marking the objects as C, and when the distances are smaller than a second preset distance, deleting the objects from the list data; adding the clustering center of the Canopy clustering algorithm into a clustering list, and repeating until the list data is empty; and taking the clustering list as a preliminary clustering result.
Further, the clustering the preliminary clustering result by using a Kmeans clustering algorithm to obtain a clustering result includes: taking the preliminary clustering result as an initialized mass center of a Kmeans clustering algorithm, and distributing each user label to the corresponding mass center; calculating the distance of each user label to each centroid, and distributing the user labels to the nearest clustering centroid; carrying out mean value calculation on each clustering centroid, and updating the clustering centroids according to the mean value calculation; and calculating variance error values of all the user tags to the corresponding updated clustering centroids, judging whether the variance error values are larger than a preset threshold value, if so, repeating the step of calculating the distance from each user tag to each centroid, otherwise, finishing clustering and obtaining a clustering result.
Specifically, clustering is carried out on the user labels through a Canopy-Kmeans clustering algorithm, clustering and identification are carried out according to clustering results, and after the clustering and identification are finished, the proportion of the basic attribute of each user profile in the user labels after the clustering and identification is determined.
When the user tags need to be grouped and identified, the user tags need to be clustered firstly, specifically, a Canopy clustering algorithm is adopted for primary clustering, then a Kmeans clustering algorithm is utilized for secondary clustering to obtain a clustering result, and then the user tags are grouped and identified according to each clustering center in the clustering result.
When the Canopy clustering algorithm is used for preliminary clustering, the clustering process comprises the following steps: initializing a user tag into a data list, and presetting two threshold values comprising a first preset distance and a second preset distance; randomly selecting an object D in the data list as a clustering center of a Canopy clustering algorithm, marking the object D as C, and deleting the object D from the list data; calculating the distances between all objects in the list data and C, adding the objects into a clustering center of a Canopy clustering algorithm and marking the objects as C when the distances are greater than a first preset distance, and deleting the objects in the list data when the distances are less than a second preset distance; adding the clustering center of the Canopy clustering algorithm into a clustering list, and repeating until the list data is empty; the cluster list is taken as a preliminary clustering result.
And assuming that all objects in the list data are A and the cluster center object is C, calculating the distance between A and C by adopting a cosine distance calculation formula, specifically:
Figure GDA0003914005980000071
wherein A = (a) 1 ,a 2 ,…,a n ),C=(c 1 ,c 2 ,…,c n ),i=1,2,…,n。
After the Canopy clustering algorithm is completed, obtaining a primary clustering result, and clustering the primary clustering result by adopting a Kmeans clustering algorithm; and classifying by taking k objects in the space as centers, classifying the objects closest to each center in the object space into one class respectively, and successively calculating and updating the value of each clustering centroid in a multi-iteration mode until the clustering centroid is stable and unchanged.
And (3) clustering by using a Kmeans clustering algorithm: taking the preliminary clustering result as an initialized mass center of a Kmeans clustering algorithm, and distributing each user label to the corresponding mass center; namely, the Canopy center generated by the Canopy clustering algorithm is used as the initialized centroid of the Kmeans algorithm, and each label is already distributed to the corresponding centroid; calculating the distance of each user label to each centroid, and distributing the user labels to the nearest clustering centroids; the distance calculation formula still adopts the cosine distance used in the Canopy clustering algorithm; carrying out mean value calculation on each clustering centroid, and updating the clustering centroids according to the mean value calculation; and calculating variance error values of all the user tags from the corresponding updated clustering centroids, judging whether the variance error values are larger than a preset threshold value, if so, repeating the step of calculating the distance from each user tag to each centroid, and if not, finishing clustering to obtain a clustering result.
After the clustering result is obtained, clustering and identifying the user labels according to each clustering center in the clustering result; and then calculating the proportion of the basic attribute of each user profile in the user tags after the grouping and the identification.
S15: obtaining the correlation strength of the user electricity consumption data corresponding to the user tags after grouping and identification and the average temperature data;
in the implementation process of the present invention, the correlation strength includes: very strong correlation, moderate correlation, weak correlation, or irrelevant.
The correlation coefficient between the power consumption data of the user and the temperature needs to be divided according to the numerical ranges, and the correlation strength in each numerical range is labeled, which is specifically shown in the following table:
magnitude of correlation coefficient General explanation
0.8~1.0 Very strong correlation
0.6~0.8 Strong correlation
0.4~0.6 Moderate correlation
0.2~0.4 Weak correlation
0~0.2 Weakly or not related
And obtaining the correlation strength of the user electricity consumption data corresponding to the user tags after grouping and identification and the average temperature data according to the table.
S16: and adjusting the user power supply service of the basic attribute of the relevant user profile corresponding to the parcel based on the proportion of the basic attribute of each user profile in the grouped and identified user tags, the correlation strength of the user power consumption data and the average temperature data corresponding to the grouped and identified user tags and the average temperature information of the future predicted weather.
In the specific implementation process of the invention, the user power supply service of the relevant user profile basic attribute of the corresponding parcel is adjusted according to the proportion of the basic attribute of each user profile in the grouped and identified user tags, the correlation strength of the user power consumption data and the average temperature data corresponding to the grouped and identified user tags and the future predicted weather average temperature information.
In the embodiment of the invention, the user label is constructed according to the actual situation of the user, so that the subsequent user label adjustment of the power supply strategy of the user is facilitated, the power utilization requirement of the user is met, and the power utilization experience of the user is improved.
Examples
Referring to fig. 2, fig. 2 is a schematic structural diagram of a user tag designing apparatus according to an embodiment of the present invention.
As shown in fig. 2, a user tag designing apparatus, the apparatus comprising:
the data screening module 21: the system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring all user electricity consumption data, and performing data screening on all user electricity consumption data based on the basic attributes of users to acquire screened user electricity consumption data;
in the specific implementation process of the invention, the user electricity utilization data comprises quarterly and/or monthly and/or daily electricity utilization data of the user in the whole year; the basic attributes of the user comprise user profile basic attributes and user state basic attributes; the user profile basic attributes comprise social security attributes, electricity utilization categories, user categories, importance degrees, regional characteristics, electricity price types and load properties; the user state basic attribute is new customers, long-term electricity-free customers and batch electricity-using customers.
Specifically, the user electricity consumption data comprises quarterly and/or monthly and/or daily electricity consumption data of the user in the whole year; basic attributes of the user; the basic attributes comprise user profile basic attributes and user state basic attributes; user profile base attributes: social security attributes (low security, five security, etc.), electricity utilization categories, user categories, importance levels (important customers, important attention customers, etc.), regional characteristics (urban areas, towns, etc.), electricity price types (single-system electricity price, two-system electricity price), load properties, etc.; user state base attributes: new customers, long-term electricity-free customers and batch electricity-using customers.
And screening the user electricity utilization data according to the basic attribute of the user, removing the electricity utilization data of the long-term electricity-non-utilization client in the basic attribute of the user state, performing simple user classification and other operations according to the basic data of the user, and then obtaining the screened user electricity utilization data.
The correlation calculation module 22: the correlation calculation module is used for carrying out correlation calculation between the power consumption data and the average temperature on the screened user power consumption data to obtain correlation characteristic data between the user power consumption data and the average temperature;
in a specific implementation process of the present invention, the calculating a correlation between the power consumption data and the average temperature of the screened user power consumption data includes: and performing correlation calculation based on the screened user electricity utilization data of the user in the preset time period and the average temperature of the preset time period.
Further, the calculation formula of the correlation calculation is as follows:
Figure GDA0003914005980000101
wherein r (X, Y) represents the correlation between X and Y; cov (X, Y) represents the covariance between X and Y; var (X) Var (Y) represents the variance of X and the variance of Y, respectively; x represents the electricity consumption data of the user in the preset time period, and Y represents the average temperature in the preset time period.
Specifically, correlation calculation is carried out on the user electricity utilization data of the user in the screened user electricity utilization data in a preset time period and the average temperature of the preset time period, and correlation characteristic data between the user electricity utilization data and the average temperature are obtained; and the calculation formula of the correlation calculation is as follows:
Figure GDA0003914005980000102
wherein r (X, Y) represents the correlation between X and Y; cov (X, Y) represents the covariance between X and Y; var (X) Var (Y) each represents a variance of X and a variance of Y; x represents the electricity consumption data of the user in the preset time period, and Y represents the average temperature in the preset time period.
The label generation module 23: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for forming a corresponding user tag according to correlation characteristic data between power utilization data and average temperature of a user and basic attributes of the user;
in the specific implementation process of the invention, the electricity consumption data of the user and the average temperature are labeled according to the correlation characteristic data and the basic attribute of the user, and then a corresponding user label is formed.
Group and identification module 24: the system comprises a clustering algorithm, a user label identification module, a user profile base attribute module and a user label identification module, wherein the clustering algorithm is used for clustering and identifying the user labels based on the clustering algorithm and determining the proportion of each user profile base attribute in the user labels after clustering and identification;
in the specific implementation process of the present invention, the clustering and identifying the user tags based on the clustering algorithm includes: carrying out preliminary clustering on the user tags by using a Canopy clustering algorithm to obtain a preliminary clustering result; clustering the primary clustering result by using a Kmeans clustering algorithm to obtain a clustering result; and clustering and identifying the user tags according to each clustering center in the clustering result.
Further, the performing preliminary clustering on the user tag by using a Canopy clustering algorithm to obtain a preliminary clustering result includes: initializing the user tag as a list data; randomly selecting an object D from the list data as a clustering center of a Canopy clustering algorithm, marking the object D as C, and deleting the object D from the list data; calculating the distances between all objects in the list data and C, adding the objects into a clustering center of a Canopy clustering algorithm and marking the objects as C when the distances are greater than a first preset distance, and deleting the objects in the list data when the distances are less than a second preset distance; adding the clustering center of the Canopy clustering algorithm into a clustering list, and repeating until the list data is empty; and taking the clustering list as a preliminary clustering result.
Further, the clustering the preliminary clustering result by using a Kmeans clustering algorithm to obtain a clustering result includes: taking the preliminary clustering result as an initialized mass center of a Kmeans clustering algorithm, and distributing each user label to the corresponding mass center; calculating the distance of each user label to each centroid, and distributing the user labels to the nearest clustering centroid; carrying out mean value calculation on each clustering centroid, and updating the clustering centroids according to the mean value calculation; and calculating variance error values of all the user tags from the corresponding updated clustering centroids, judging whether the variance error values are larger than a preset threshold value, if so, repeating the step of calculating the distance from each user tag to each centroid, and if not, finishing clustering to obtain a clustering result.
Specifically, clustering is carried out on the user labels through a Canopy-Kmeans clustering algorithm, clustering and identification are carried out according to clustering results, and after the clustering and identification are completed, the proportion of each user profile basic attribute in the user labels after clustering and identification is determined.
When the user tags need to be grouped and identified, the user tags need to be clustered firstly, specifically, a Canopy clustering algorithm is adopted for primary clustering, then a Kmeans clustering algorithm is utilized for secondary clustering to obtain a clustering result, and then the user tags are grouped and identified according to each clustering center in the clustering result.
When the Canopy clustering algorithm is used for preliminary clustering, the clustering process comprises the following steps: initializing a user tag into a data list, and presetting two threshold values comprising a first preset distance and a second preset distance; randomly selecting an object D in the data list as a clustering center of a Canopy clustering algorithm, marking the object D as C, and deleting the object D from the list data; calculating the distances between all objects in the list data and C, adding the objects into a clustering center of a Canopy clustering algorithm and marking the objects as C when the distances are greater than a first preset distance, and deleting the objects in the list data when the distances are less than a second preset distance; adding the clustering center of the Canopy clustering algorithm into a clustering list, and repeating until the list data is empty; the cluster list is used as a preliminary clustering result.
And assuming that all objects in the list data are A and the cluster center object is C, calculating the distance between A and C by adopting a cosine distance calculation formula, specifically:
Figure GDA0003914005980000121
wherein A = (a) 1 ,a 2 ,…,a n ),C=(c 1 ,c 2 ,…,c n ),i=1,2,…,n。
After the Canopy clustering algorithm is completed, obtaining a primary clustering result, and clustering the primary clustering result by adopting a Kmeans clustering algorithm; and classifying k objects in the space as centers, classifying the objects closest to each center in the object space into one class, and gradually calculating and updating the value of each clustering centroid in a multi-iteration mode until the clustering centroids are stable and unchanged.
And (3) clustering by using a Kmeans clustering algorithm: taking the preliminary clustering result as an initialized mass center of a Kmeans clustering algorithm, and distributing each user label to the corresponding mass center; namely, the Canopy center generated by the Canopy clustering algorithm is used as the initialized centroid of the Kmeans algorithm, and each label is already distributed to the corresponding centroid; calculating the distance of each user label to each centroid, and distributing the user labels to the nearest clustering centroids; the distance calculation formula still adopts the cosine distance used in the Canopy clustering algorithm; carrying out mean value calculation on each clustering centroid, and updating the clustering centroids according to the mean value calculation; and calculating variance error values of all the user tags from the corresponding updated clustering centroids, judging whether the variance error values are larger than a preset threshold value, if so, repeating the step of calculating the distance from each user tag to each centroid, and if not, finishing clustering to obtain a clustering result.
After the clustering result is obtained, clustering and identifying the user labels according to each clustering center in the clustering result; and then calculating the proportion of the basic attribute of each user profile in the user tags after the grouping and the identification.
The correlation mild acquisition module 25: the correlation strength between the user electricity consumption data corresponding to the user tags after the grouping and the identification and the average temperature data is obtained;
in the implementation process of the present invention, the correlation strength includes: very strong correlation, moderate correlation, weak correlation, or irrelevant.
The correlation coefficient between the power consumption data of the user and the temperature needs to be divided according to the numerical ranges, and the correlation strength in each numerical range is labeled, which is specifically shown in the following table:
magnitude of correlation coefficient General explanation
0.8~1.0 Very strong correlation
0.6~0.8 Strong correlation
0.4~0.6 Moderate correlation
0.2~0.4 Weak correlation
0~0.2 Weakly or not
And obtaining the correlation strength of the user electricity consumption data corresponding to the user tags after grouping and identification and the average temperature data according to the table.
The power supply adjustment module 26: and the user power supply service is used for adjusting the basic attributes of the related user profiles of the corresponding parcel based on the proportion of the basic attributes of each user profile in the grouped and identified user tags, the correlation strength between the power consumption data and the average temperature data of the users corresponding to the grouped and identified user tags and the average temperature information of the weather predicted in the future.
In the specific implementation process of the invention, the power supply service of the user corresponding to the basic attribute of the relevant user profile of the corresponding parcel is adjusted according to the proportion of the basic attribute of each user profile in the grouped and identified user tags, the correlation strength of the power consumption data and the average temperature data of the user corresponding to the grouped and identified user tags and the average temperature information of the weather predicted in the future.
In the embodiment of the invention, the user label is constructed according to the actual situation of the user, so that the subsequent user label adjustment of the power supply strategy of the user is facilitated, the power utilization requirement of the user is met, and the power utilization experience of the user is improved.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, and the program may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like.
In addition, the user tag design method and apparatus provided by the embodiment of the present invention are described in detail above, a specific example should be adopted herein to explain the principle and the implementation manner of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (9)

1. A method for designing a user tag, the method comprising:
acquiring all user electricity consumption data, and performing data screening on all user electricity consumption data based on the basic attributes of the users to obtain screened user electricity consumption data;
performing correlation calculation between the power consumption data and the average temperature on the screened user power consumption data to obtain correlation characteristic data between the user power consumption data and the average temperature;
forming a corresponding user label according to the correlation characteristic data between the electricity utilization data of the user and the average temperature and the basic attribute of the user;
clustering and identifying the user labels based on a clustering algorithm, and determining the proportion of the basic attribute of each user profile in the user labels after clustering and identification;
obtaining the correlation strength of the user electricity consumption data corresponding to the user tags after grouping and identification and the average temperature data;
and adjusting the user power supply service of the basic attribute of the relevant user profile corresponding to the parcel based on the proportion of the basic attribute of each user profile in the grouped and identified user tags, the correlation strength of the user power consumption data and the average temperature data corresponding to the grouped and identified user tags and the average temperature information of the future predicted weather.
2. The method of claim 1, wherein the user electricity usage data comprises user quarterly and/or monthly and/or daily electricity usage data throughout the year;
the basic attributes of the user comprise a user profile basic attribute and a user state basic attribute;
the user profile basic attributes comprise social security attributes, electricity utilization categories, user categories, importance degrees, regional characteristics, electricity price types and load properties;
the user state basic attributes comprise new customers, long-term electricity-unused customers and batch electricity-used customers.
3. The method for designing the user tag according to claim 1, wherein the calculating the correlation between the power consumption data and the average temperature of the screened power consumption data of the user includes:
and performing correlation calculation based on the screened user electricity utilization data of the user in the preset time period and the average temperature of the preset time period.
4. The method of claim 3, wherein the correlation calculation is calculated as follows:
Figure FDA0003914005970000021
wherein r (X, Y) represents the correlation between X and Y; cov (X, Y) represents the covariance between X and Y; var (X) Var (Y) represents the variance of X and the variance of Y, respectively; x represents the electricity consumption data of the user in the preset time period, and Y represents the average temperature in the preset time period.
5. The method of claim 1, wherein the clustering and identifying the user tags based on a clustering algorithm comprises:
carrying out preliminary clustering on the user tags by using a Canopy clustering algorithm to obtain a preliminary clustering result;
clustering the primary clustering result by using a Kmeans clustering algorithm to obtain a clustering result;
and clustering and identifying the user tags according to each clustering center in the clustering result.
6. The method according to claim 5, wherein the performing preliminary clustering on the user tags by using a Canopy clustering algorithm to obtain a preliminary clustering result comprises:
initializing the user tag as a list data;
randomly selecting an object D from the list data as a clustering center of a Canopy clustering algorithm, marking the object D as C, and deleting the object D from the list data;
calculating the distances between all objects in the list data and C, when the distances are larger than a first preset distance, adding the objects into a clustering center of a Canopy clustering algorithm and marking the objects as C, and when the distances are smaller than a second preset distance, deleting the objects from the list data;
adding the clustering center of the Canopy clustering algorithm into a clustering list, and repeating until the list data is empty;
and taking the clustering list as a preliminary clustering result.
7. The method according to claim 5, wherein the clustering the preliminary clustering result by using a Kmeans clustering algorithm to obtain a clustering result comprises:
taking the preliminary clustering result as an initialized mass center of a Kmeans clustering algorithm, and distributing each user label to the corresponding mass center;
calculating the distance of each user label to each centroid, and distributing the user labels to the nearest clustering centroid;
carrying out mean value calculation on each clustering centroid, and updating the clustering centroids according to the mean value calculation;
and calculating variance error values of all the user tags to the corresponding updated clustering centroids, judging whether the variance error values are larger than a preset threshold value, if so, repeating the step of calculating the distance from each user tag to each centroid, otherwise, finishing clustering and obtaining a clustering result.
8. The user tag design method of claim 1, wherein the correlation strength comprises: very strong correlation, moderate correlation, weak correlation, or irrelevant.
9. A user tag design apparatus, the apparatus comprising:
the data screening module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for acquiring all user electricity consumption data, and performing data screening on all the user electricity consumption data based on the basic attribute of a user to acquire screened user electricity consumption data;
a correlation calculation module: the correlation calculation module is used for carrying out correlation calculation between the power consumption data and the average temperature on the screened user power consumption data to obtain correlation characteristic data between the user power consumption data and the average temperature;
a tag generation module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for forming a corresponding user tag according to correlation characteristic data between power utilization data and average temperature of a user and basic attributes of the user;
a grouping and identification module: the system comprises a clustering algorithm, a user label identification module, a user profile base attribute module and a user label identification module, wherein the clustering algorithm is used for clustering and identifying the user labels based on the clustering algorithm and determining the proportion of each user profile base attribute in the user labels after clustering and identification;
a correlation mild acquisition module: the correlation strength between the user electricity consumption data corresponding to the user tags after the grouping and the identification and the average temperature data is obtained;
a power supply adjusting module: and the user power supply service is used for adjusting the basic attributes of the related user profiles of the corresponding parcel based on the proportion of the basic attributes of each user profile in the grouped and identified user tags, the correlation strength between the power consumption data and the average temperature data of the users corresponding to the grouped and identified user tags and the average temperature information of the weather predicted in the future.
CN202010663731.XA 2020-07-10 2020-07-10 User label design method and device Active CN112035715B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010663731.XA CN112035715B (en) 2020-07-10 2020-07-10 User label design method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010663731.XA CN112035715B (en) 2020-07-10 2020-07-10 User label design method and device

Publications (2)

Publication Number Publication Date
CN112035715A CN112035715A (en) 2020-12-04
CN112035715B true CN112035715B (en) 2023-04-14

Family

ID=73579036

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010663731.XA Active CN112035715B (en) 2020-07-10 2020-07-10 User label design method and device

Country Status (1)

Country Link
CN (1) CN112035715B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982489A (en) * 2012-11-23 2013-03-20 广东电网公司电力科学研究院 Power customer online grouping method based on mass measurement data
CN103971296A (en) * 2014-05-16 2014-08-06 国家电网公司 Power purchase method for mathematic model based on electrical loads and temperature
CN103995161A (en) * 2014-06-03 2014-08-20 深圳市康拓普信息技术有限公司 Method and system for discriminating electricity stealing and leaking users
CN105069536A (en) * 2015-08-19 2015-11-18 国网安徽省电力公司经济技术研究院 Electricity demand predication method based on temperature and economic growth
CN105787259A (en) * 2016-02-17 2016-07-20 国网甘肃省电力公司武威供电公司 Method for analyzing influence correlation of multiple meteorological factors and load changes
CN106530132A (en) * 2016-11-14 2017-03-22 国家电网公司 Power load clustering method and device
CN208384675U (en) * 2018-08-02 2019-01-15 蒋婧 A kind of power marketing electric charge pressing payment auxiliary device
CN109284886A (en) * 2018-02-02 2019-01-29 中领世能(天津)科技有限公司 Electrical Safety management method and device based on artificial intelligence
CN110728537A (en) * 2019-09-24 2020-01-24 国网河北省电力有限公司信息通信分公司 Prediction payment method based on power consumer behavior label
CN111065901A (en) * 2017-07-11 2020-04-24 因特瑞有限公司 Time temperature indicating label
CN111178957A (en) * 2019-12-23 2020-05-19 广西电网有限责任公司 Method for early warning sudden increase of electric quantity of electricity consumption customer
CN111210159A (en) * 2020-01-14 2020-05-29 国网上海市电力公司 Air conditioner load power utilization right distribution and transaction system and method based on block chain under UEIOT

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102982489A (en) * 2012-11-23 2013-03-20 广东电网公司电力科学研究院 Power customer online grouping method based on mass measurement data
CN103971296A (en) * 2014-05-16 2014-08-06 国家电网公司 Power purchase method for mathematic model based on electrical loads and temperature
CN103995161A (en) * 2014-06-03 2014-08-20 深圳市康拓普信息技术有限公司 Method and system for discriminating electricity stealing and leaking users
CN105069536A (en) * 2015-08-19 2015-11-18 国网安徽省电力公司经济技术研究院 Electricity demand predication method based on temperature and economic growth
CN105787259A (en) * 2016-02-17 2016-07-20 国网甘肃省电力公司武威供电公司 Method for analyzing influence correlation of multiple meteorological factors and load changes
CN106530132A (en) * 2016-11-14 2017-03-22 国家电网公司 Power load clustering method and device
CN111065901A (en) * 2017-07-11 2020-04-24 因特瑞有限公司 Time temperature indicating label
CN109284886A (en) * 2018-02-02 2019-01-29 中领世能(天津)科技有限公司 Electrical Safety management method and device based on artificial intelligence
CN208384675U (en) * 2018-08-02 2019-01-15 蒋婧 A kind of power marketing electric charge pressing payment auxiliary device
CN110728537A (en) * 2019-09-24 2020-01-24 国网河北省电力有限公司信息通信分公司 Prediction payment method based on power consumer behavior label
CN111178957A (en) * 2019-12-23 2020-05-19 广西电网有限责任公司 Method for early warning sudden increase of electric quantity of electricity consumption customer
CN111210159A (en) * 2020-01-14 2020-05-29 国网上海市电力公司 Air conditioner load power utilization right distribution and transaction system and method based on block chain under UEIOT

Also Published As

Publication number Publication date
CN112035715A (en) 2020-12-04

Similar Documents

Publication Publication Date Title
US11043808B2 (en) Method for identifying pattern of load cycle
CN111144468B (en) Method and device for labeling power consumer information, electronic equipment and storage medium
CN106446967A (en) Novel power system load curve clustering method
CN111724278A (en) Fine classification method and system for power multi-load users
CN110674993A (en) User load short-term prediction method and device
CN117690030B (en) Multi-face flower identification method and system based on image processing
CN108664653A (en) A kind of Medical Consumption client's automatic classification method based on K-means
Bakhshi et al. Review and comparison between clustering algorithms with duplicate entities detection purpose
CN116644184A (en) Human Resource Information Management System Based on Data Clustering
CN111324790A (en) Load type identification method based on support vector machine classification
Calò et al. A hierarchical modeling approach for clustering probability density functions
CN112035715B (en) User label design method and device
CN112529712A (en) Modeling method and system for user operation analysis RFM
CN110633337B (en) Feature area determination method and device and electronic equipment
Yang et al. Application Research of K-means Algorithm based on Big Data Background
CN114372835B (en) Comprehensive energy service potential customer identification method, system and computer equipment
CN111667152B (en) Automatic auditing method for text data calibration task based on crowdsourcing
CN114638284A (en) Power utilization behavior characterization method considering external influence factors
Li Cluster Analysis of Students' Consumption Behavior Based on K-means++ Algorithm
CN110278189B (en) Intrusion detection method based on network flow characteristic weight map
Xu et al. Network user interest pattern mining based on entropy clustering algorithm
CN114417972A (en) User electricity consumption behavior analysis method based on principal component analysis and density peak clustering
CN113989676A (en) Terminal area meteorological scene identification method for improving deep convolutional self-coding embedded clustering
CN109919811B (en) Insurance agent culture scheme generation method based on big data and related equipment
Kohan et al. Comparison of modified k-means and hierarchical algorithms in customers load curves clustering for designing suitable tariffs in electricity market

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant