CN112801207A - Power user portrait construction method and device based on big data - Google Patents
Power user portrait construction method and device based on big data Download PDFInfo
- Publication number
- CN112801207A CN112801207A CN202110204028.7A CN202110204028A CN112801207A CN 112801207 A CN112801207 A CN 112801207A CN 202110204028 A CN202110204028 A CN 202110204028A CN 112801207 A CN112801207 A CN 112801207A
- Authority
- CN
- China
- Prior art keywords
- power
- user
- data
- power consumer
- consumer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000010276 construction Methods 0.000 title claims abstract description 47
- 239000013598 vector Substances 0.000 claims abstract description 125
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 51
- 238000000034 method Methods 0.000 claims abstract description 44
- 238000012545 processing Methods 0.000 claims abstract description 37
- 238000005065 mining Methods 0.000 claims abstract description 26
- 239000011159 matrix material Substances 0.000 claims description 55
- 238000004891 communication Methods 0.000 claims description 40
- 230000006399 behavior Effects 0.000 claims description 30
- 238000004364 calculation method Methods 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 10
- 230000009467 reduction Effects 0.000 claims description 6
- 230000003993 interaction Effects 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000000513 principal component analysis Methods 0.000 description 3
- 238000003066 decision tree Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Economics (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the technical field of big data, and discloses a big data-based electric power user portrait construction method, which comprises the following steps: acquiring power consumer data, and performing information mining on the power consumer data by using an information mining algorithm based on a prefix tree to obtain a feature vector of a power consumer; processing the characteristic vector of the power consumer by using a characteristic vector processing algorithm based on time sequence evolution to obtain a power consumer characteristic vector based on time sequence evolution; clustering the power user characteristic vectors evolved based on the time sequence by using an improved characteristic vector clustering algorithm to obtain clustered power user characteristics, and constructing a power user portrait according to a clustering result; and constructing a user social portrait by utilizing a space-based user discovery method according to the constructed user portrait. The invention also provides a power user portrait construction system based on the big data. The invention realizes the construction of the power consumer image.
Description
Technical Field
The invention relates to the technical field of big data, in particular to a method and a device for constructing a power user portrait based on big data.
Background
At present, the user portrait technology is rapidly developed and is being applied to the fields of social media, e-commerce, mobile and the like. When the method is applied to enterprises in practice, user figures need to be displayed step by step from whole to detail in the professional field according to actual business requirements, so that in the application of power enterprises, the method provides support for power enterprise decision making by constructing the power user figures and becomes a hot topic of current research.
The traditional user feature extraction algorithm is high in calculation complexity, the calculated amount of the feature clustering algorithm K-means clustering algorithm is large, a local optimal solution is easily caused, globally optimal user portrait features cannot be quickly obtained, and the user portrait is constructed.
In view of this, how to extract user portrait features more quickly and implement user portrait construction is an urgent problem to be solved by those skilled in the art.
Disclosure of Invention
The invention provides a big data-based electric power user portrait construction method, which comprises the steps of carrying out information mining on electric power user data by using an information mining algorithm based on a prefix tree to obtain a characteristic vector of an electric power user, processing the characteristic vector of the electric power user by using a characteristic vector processing algorithm based on time sequence evolution to obtain electric power user characteristics based on time sequence evolution, carrying out clustering processing on the electric power user characteristics by using an improved characteristic clustering algorithm, constructing the clustered electric power user characteristics into a user portrait, and constructing a user social portrait by using a space-based user discovery method.
In order to achieve the above object, the present invention provides a method for constructing a portrait of a power consumer based on big data, comprising:
acquiring power consumer data, and performing information mining on the power consumer data by using an information mining algorithm based on a prefix tree to obtain a feature vector of a power consumer;
processing the characteristic vector of the power consumer by using a characteristic vector processing algorithm based on time sequence evolution to obtain a power consumer characteristic vector based on time sequence evolution;
clustering the power user characteristic vectors evolved based on the time sequence by using an improved characteristic vector clustering algorithm to obtain clustered power user characteristics, and constructing a power user portrait according to a clustering result;
constructing communication information relation matrixes of different power users according to the constructed user portrait;
and constructing mobile behavior matrixes of different power users, and constructing the social portrait of the power users according to the communication information relation matrixes and the mobile behavior matrixes of the different power users.
Optionally, the information mining on the power consumer data by using the prefix tree-based information mining algorithm includes:
the electric power user data comprises power consumption time sequence data of a user, power consumption time sequence data of an electric appliance of the user, power consumption time sequence data of an area where the user is located, a consultation text of the user to an electric power enterprise and the like;
the information mining algorithm based on the prefix tree comprises the following processes:
1) for power consumer data B ═<s1,s2,...,sk>Wherein s isiThe method comprises the steps that a power event occurs at time i, the length of power consumer data B is k, the power consumer data B is constructed into a prefix tree, a root node is empty, and the power data are placed in child nodes according to a time sequence increasing sequence;
2) adding a power event at the end of the power consumer data B, if the duration of the power consumer data B does not change, the event is called a simultaneous event SI, and if the duration of the power consumer data B adds 1, the event is called a sequential event SE;
3) calculating an event set of the added simultaneous events SI and the sequential events SE:
SI(B)=(u(B)+u(SI)+u(SE))/u(CES)
wherein:
u (B) is the utility value of the power user data B, namely the total occurrence number of the power user data B;
u (SI) is the utility value of a simultaneous event;
u (SE) is the utility value of the sequence event;
u (CES) is the ratio of the self utility value of the event to the total utility sum of the utility values of the event sequence;
according to the formula, the power event which occurs more times has a higher utility value;
4) ranking the power events in the event set by their utility values, wherein power events with higher utility values will be in more advanced positions;
5) and for the simultaneous events, the power events with higher utility values are placed in the corresponding time sequence nodes in the prefix tree, and for the sequence events, the power events with higher utility values are placed in the next layer of nodes of the corresponding time sequence in the prefix tree, wherein the events stored in each node in the prefix tree are the feature vectors of the power user features.
Optionally, the processing the feature vector of the power consumer by using a feature vector processing algorithm based on time sequence evolution includes:
in the prefix tree, the characteristic vector with a longer time sequence is positioned at a position closer to a root node in the prefix tree, so that the characteristic vector of the power user is processed from the bottom layer of the prefix tree;
the feature vector processing algorithm based on time sequence evolution comprises the following steps:
obtaining a time node of a power consumer characteristic vector m, and calculating an absolute value | t of a difference value between the time node and a current time nodem|;
Calculating the weight of the power user feature vector by using a time sequence evolution feature weight calculation formula, assigning the time sequence evolution feature weight to the corresponding power user feature vector, and obtaining the power user feature vector based on time sequence evolution, wherein the time sequence evolution feature weight calculation formula is as follows:
wherein:
wmfor electric powerThe weight of the household feature vector m;
λ is the attenuation factor, which is set to 0.4.
In an embodiment of the invention, the latest power consumer feature vector has a higher weight by performing feature weighting based on time sequence evolution on the power consumer feature vector.
Optionally, the clustering, by using an improved feature vector clustering algorithm, the power user feature vector based on time sequence evolution includes:
1) for an initially given m time-evolution-based power user feature vectors m1,m2,...,mmConverting the vector into a power user characteristic matrix X consisting of m n-dimensional vectorsm×nAnd calculating a covariance matrix S of the matricesm×m=Cov(Xm×n);
2) The covariance matrix S is obtained by calculationm×mSelecting the eigenvectors corresponding to the largest K eigenvalues to form a matrix Wn×KAnd performing dimensionality reduction processing on the power user characteristic matrix by using the following formula:
Z=Xm×nWn×K
wherein:
z is a power user characteristic matrix after dimensionality reduction;
3) calculating the mutual distance between any two vectors in Z and storing the calculation result in a matrix Dm×mSimultaneously calculating the average distance T between any two vectors;
4) according to matrix Dm×mCalculating the maximum distance between any two eigenvectors, and calculating the eigenvectors v at two ends of the maximum distance1,v2As an initial clustering center;
5) and repeatedly executing 4), if the distance between the new clustering center and the known clustering center is greater than T, considering the new clustering center to be effective until K clustering centers are obtained, wherein the feature vector of the clustering center is the user portrait description, in one specific embodiment of the invention, K is 8, and simultaneously, the invention takes the key word corresponding to the feature vector as the user portrait description, such as large night power consumption, small weekend power consumption and the like.
Optionally, the constructing a user social representation by using a space-based user discovery method includes:
1) extracting the short message frequency, the call duration and the call frequency data of the user according to the power user communication label in the user figure, and constructing the communication interaction relationship among different power users by using the following formula:
wherein:
meirepresenting the communication weight of the power consumer i to the power consumer j;
callifor the frequency of the call from Utility i to Utility j, Σ calljThe communication frequency of the power consumer i to all the power consumers is set;
lonifor the duration of a call from Utility i to Utility j, Σ lonjThe call duration of the power consumer i to all the power consumers is set;
messishort message frequency, Σ mess, for power consumer i to power consumer jjShort message frequency of all power consumers for the power consumer i;
2) establishing a communication information relation matrix M of different power users:
wherein:
ci,jcommunication weight me representing that power consumer i contacts power consumer ji;
3) According to the connection condition of the mobile phone Bluetooth of the power user, the connection condition of the Bluetooth in different time windows is recorded by equally dividing one day into 24 time windows, if the connection condition is marked as 1, otherwise, the connection condition is marked as 0, and a user movement behavior distribution characteristic matrix is established, wherein the user movement behavior distribution characteristic matrix S is as follows:
wherein:
vecN is a motion behavior vector of the power consumer N, and is in the form of vecN ═ 1, 0.., 1], which indicates connection conditions of bluetooth in different time windows;
calculating the similarity of the movement behaviors among different power users by using the Jaccard similarity coefficient:
further, the invention constructs a mobile behavior matrix U of the power consumer:
4) calculating the social relationship weight Y of the power users according to the communication information relationship matrix M and the mobile behavior matrix U of different power usersi:
Yi=α*M+(1-α)*U
Wherein:
α is the matrix weight, which is set to 0.5;
in one embodiment of the invention, when the social relationship weight of the power consumer is greater than 0.7, the power consumer is considered to have higher influence, so that the power product is more likely to be recommended to other users.
In addition, to achieve the above object, the present invention further provides a big data-based power consumer representation construction system, where the apparatus includes:
the power data acquisition device is used for acquiring power user data;
the electric power data processor is used for carrying out information mining on the electric power user data by using an information mining algorithm based on a prefix tree to obtain a characteristic vector of the electric power user; processing the characteristic vector of the power consumer by using a characteristic vector processing algorithm based on time sequence evolution to obtain a power consumer characteristic vector based on time sequence evolution;
the electric power user portrait construction device is used for clustering electric power user feature vectors evolved based on time sequence by using an improved feature vector clustering algorithm to obtain clustered electric power user features, constructing an electric power user portrait according to a clustering result, and constructing a user social portrait according to the constructed user portrait by using a space-based user discovery method.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium having stored thereon power user portrait construction program instructions executable by one or more processors to implement the steps of the implementation method of big data based power user portrait construction as described above.
Compared with the prior art, the invention provides a power user portrait construction method based on big data, and the technology has the following advantages:
firstly, the invention processes the characteristic vector of the power consumer by utilizing a characteristic vector processing algorithm based on time sequence evolution, and because the characteristic vector with longer time sequence is closer to the root node in the prefix tree, the invention processes the characteristic vector of the power consumer from the bottom layer of the prefix tree. The feature vector processing algorithm based on time sequence evolution comprises the following steps: obtaining a time node of a power consumer characteristic vector m, and calculating an absolute value | t of a difference value between the time node and a current time nodemL, |; calculating the weight of the power user feature vector by using a time sequence evolution feature weight calculation formula, assigning the time sequence evolution feature weight to the corresponding power user feature vector, and obtaining the power user feature based on time sequence evolution, wherein the time sequence evolution feature weight calculation formula is as follows:
wherein: w is amThe weight is the characteristic vector m of the power consumer; λ is the attenuation factor. According to the invention, the characteristic weighting based on time sequence evolution is carried out on the electric power user characteristic vector, so that the latest electric power user characteristic vector has higher weight, and the constructed electric power user portrait is more in line with the current characteristics of the user.
Meanwhile, the invention provides a space-based user discovery method for constructing a user social portrait, extracting the short message frequency, the call duration and the call frequency data of a user according to the power user communication tag in the user portrait, and constructing the communication interaction relationship among different power users by using the following formula:
wherein: meiRepresenting the communication weight of the power consumer i to the power consumer j; shelliFor the frequency of the call from Utility i to Utility j, Σ calljThe communication frequency of the power consumer i to all the power consumers is set; loniFor the duration of a call from Utility i to Utility j, Σ lonjThe call duration of the power consumer i to all the power consumers is set; messiShort message frequency, Σ mess, for power consumer i to power consumer jjShort message frequency of all power consumers for the power consumer i; therefore, a communication information relation matrix M of different power users is established:
wherein: c. Ci,jCommunication weight me representing that power consumer i contacts power consumer ji. Meanwhile, according to the connection condition of the mobile phone Bluetooth of the power user, the method equally divides one day into 24 time windows, records the connection condition of the Bluetooth in different time windows, marks the time window as 1 if the connection condition is detected, marks the time window as 0 if the connection condition is detected, establishes a user movement behavior distribution characteristic matrix, and distributes the characteristics of the user movement behaviorThe matrix S is:
wherein: vecN is a motion behavior vector of the power consumer N, and is in the form of vecN ═ 1, 0.., 1], which indicates connection conditions of bluetooth in different time windows; calculating the similarity of the movement behaviors among different power users by using the Jaccard similarity coefficient:
thereby constructing a moving behavior matrix U of the power consumer:
finally, calculating the social relationship weight Y of the power users according to the communication information relationship matrix M and the mobile behavior matrix U of different power usersi:
Yi=α*M+(1-α)*U
Wherein: α is the matrix weight, which is set to 0.5; when the social relationship weight of the power consumer is greater than 0.7, the power consumer is considered to have higher influence, so that the power product is more likely to be recommended to other users.
Drawings
FIG. 1 is a schematic flow chart of a big data-based power consumer representation construction method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a big data-based electrical consumer representation construction system according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The method comprises the steps of carrying out information mining on power user data by using an information mining algorithm based on a prefix tree to obtain a characteristic vector of a power user, processing the characteristic vector of the power user by using a characteristic vector processing algorithm based on time sequence evolution to obtain power user characteristics based on time sequence evolution, carrying out clustering processing on the power user characteristics by using an improved characteristic clustering algorithm, constructing the clustered power user characteristics into a user portrait, and constructing a user social portrait by using a user discovery method based on a space. Fig. 1 is a schematic diagram illustrating a method for constructing a user profile based on big data according to an embodiment of the present invention.
In this embodiment, the method for constructing the user profile of the power consumer based on big data includes:
and S1, acquiring the power consumer data, and performing information mining on the power consumer data by using an information mining algorithm based on the prefix tree to obtain the characteristic vector of the power consumer.
Firstly, acquiring power consumer data, wherein the power consumer data comprises power consumption time sequence data of a user, power consumption time sequence data of an electric appliance of the user, power consumption time sequence data of an area where the user is located, a consultation text of the user to a power enterprise and the like;
further, the invention utilizes an information mining algorithm based on a prefix tree to mine information of the power consumer data, and the information mining algorithm based on the prefix tree comprises the following processes:
1) for power consumer data B ═<s1,s2,...,sk>Wherein s isiThe method comprises the steps that a power event occurs at time i, the length of power consumer data B is k, the power consumer data B is constructed into a prefix tree, a root node is empty, and the power data are placed in child nodes according to a time sequence increasing sequence;
2) adding a power event at the end of the power consumer data B, if the duration of the power consumer data B does not change, the event is called a simultaneous event SI, and if the duration of the power consumer data B adds 1, the event is called a sequential event SE;
3) calculating an event set of the added simultaneous events SI and the sequential events SE:
SI(B)=(u(B)+u(SI)+u(SE))/u(CES)
wherein:
u (B) is the utility value of the power user data B, namely the total occurrence number of the power user data B;
u (SI) is the utility value of a simultaneous event;
u (SE) is the utility value of the sequence event;
u (CES) is the ratio of the self utility value of the event to the total utility sum of the utility values of the event sequence;
according to the formula, the power event which occurs more times has a higher utility value;
4) ranking the power events in the event set by their utility values, wherein power events with higher utility values will be in more advanced positions;
5) and for the simultaneous events, the power events with higher utility values are placed in the corresponding time sequence nodes in the prefix tree, and for the sequence events, the power events with higher utility values are placed in the next layer of nodes of the corresponding time sequence in the prefix tree, wherein the events stored in each node in the prefix tree are the feature vectors of the power user features.
And S2, processing the feature vector of the power consumer by using a feature vector processing algorithm based on time sequence evolution to obtain the power consumer feature vector based on time sequence evolution.
Furthermore, the characteristic vector of the power consumer is processed by utilizing a characteristic vector processing algorithm based on time sequence evolution, in the prefix tree, the characteristic vector with longer time sequence is closer to the root node in the prefix tree, so that the characteristic vector of the power consumer is processed from the bottom layer of the prefix tree;
the feature vector processing algorithm based on time sequence evolution comprises the following steps:
obtaining a time node of a power consumer characteristic vector m, and calculating an absolute value | t of a difference value between the time node and a current time nodem|;
Calculating the weight of the power user feature vector by using a time sequence evolution feature weight calculation formula, assigning the time sequence evolution feature weight to the corresponding power user feature vector, and obtaining the power user feature vector based on time sequence evolution, wherein the time sequence evolution feature weight calculation formula is as follows:
wherein:
wmthe weight is the characteristic vector m of the power consumer;
λ is the attenuation factor, which is set to 0.4.
In an embodiment of the invention, the latest power consumer feature vector has a higher weight by performing feature weighting based on time sequence evolution on the power consumer feature vector.
And S3, clustering the power user feature vectors evolved based on the time sequence by using an improved feature vector clustering algorithm to obtain clustered power user features, and constructing a power user portrait according to a clustering result.
Further, the invention utilizes an improved feature vector clustering algorithm to cluster the power user feature vectors evolved based on time sequence, and the improved feature vector clustering algorithm comprises the following processes:
1) for an initially given m time-evolution-based power user feature vectors m1,m2,...,mmConverting the vector into a power user characteristic matrix X consisting of m n-dimensional vectorsm×nAnd calculating a covariance matrix S of the matricesm×m=Cov(Xm×n);
2) The covariance matrix S is obtained by calculationm×mSelecting the maximum K eigenvalues corresponding to the eigenvalues and eigenvectorsEigenvector formation matrix Wn×KAnd performing dimensionality reduction processing on the power user characteristic matrix by using the following formula:
Z=Xm×nWn×K
wherein:
z is a power user characteristic matrix after dimensionality reduction;
3) calculating the mutual distance between any two vectors in Z and storing the calculation result in a matrix Dm×mSimultaneously calculating the average distance T between any two vectors;
4) according to matrix Dm×mCalculating the maximum distance between any two eigenvectors, and calculating the eigenvectors v at two ends of the maximum distance1,v2As an initial clustering center;
5) and repeatedly executing 4), if the distance between the new clustering center and the known clustering center is greater than T, considering the new clustering center to be effective until K clustering centers are obtained, wherein the feature vector of the clustering center is the user portrait description, in one specific embodiment of the invention, K is 8, and simultaneously, the invention takes the key word corresponding to the feature vector as the user portrait description, such as large night power consumption, small weekend power consumption and the like.
S4, constructing the user social portrait by using a space-based user discovery method according to the constructed user portrait.
Further, according to the constructed user portrait, the invention utilizes a space-based user discovery method to construct a user social portrait, wherein the space-based user discovery method comprises the following steps:
1) extracting the short message frequency, the call duration and the call frequency data of the user according to the power user communication label in the user figure, and constructing the communication interaction relationship among different power users by using the following formula:
wherein:
meirepresenting communication right of power consumer i to power consumer jWeighing;
callifor the frequency of the call from Utility i to Utility j, Σ calljThe communication frequency of the power consumer i to all the power consumers is set;
lonifor the duration of a call from Utility i to Utility j, Σ lonjThe call duration of the power consumer i to all the power consumers is set;
messishort message frequency, Σ mess, for power consumer i to power consumer jjShort message frequency of all power consumers for the power consumer i;
2) establishing a communication information relation matrix M of different power users:
wherein:
ci,jcommunication weight me representing that power consumer i contacts power consumer ji;
3) According to the connection condition of the mobile phone Bluetooth of the power user, the connection condition of the Bluetooth in different time windows is recorded by equally dividing one day into 24 time windows, if the connection condition is marked as 1, otherwise, the connection condition is marked as 0, and a user movement behavior distribution characteristic matrix is established, wherein the user movement behavior distribution characteristic matrix S is as follows:
wherein:
vecN is a motion behavior vector of the power consumer N, and is in the form of vecN ═ 1, 0.., 1], which indicates connection conditions of bluetooth in different time windows;
calculating the similarity of the movement behaviors among different power users by using the Jaccard similarity coefficient:
further, the invention constructs a mobile behavior matrix U of the power consumer:
4) calculating the social relationship weight Y of the power users according to the communication information relationship matrix M and the mobile behavior matrix U of different power usersi:
Yi=α*M+(1-α)*U
Wherein:
α is the matrix weight, which is set to 0.5;
in one embodiment of the invention, when the social relationship weight of the power consumer is greater than 0.7, the power consumer is considered to have higher influence, so that the power product is more likely to be recommended to other users.
The following describes embodiments of the present invention through an algorithmic experiment and tests of the inventive treatment method. The hardware test environment of the algorithm of the invention is as follows: the operating device is Ubuntu16.04, the computer processor is Inteli5-8500 CPU @3GHZ multiplied by 6, the size of the memory bank is 16G, Tensorflow-gpu1.18 version, and keras 2.24 version; the contrast retrieval method is a power user portrait construction method based on random forests, a power user portrait construction method based on decision trees and a power user portrait construction method based on principal component analysis.
In the algorithm experiment, the data set is the power use data of 5000 power users. In the experiment, the power use data is input into the method and the comparison method, and the accuracy of user portrait construction is used as an index for evaluating the performance of the algorithm.
According to experimental results, the user portrait construction accuracy of the power user portrait construction method based on the random forest is 85.32%, the user portrait construction accuracy of the power user portrait construction method based on the decision tree is 80.65%, the user portrait construction accuracy of the power user portrait construction method based on the principal component analysis is 91.32%, and the user portrait construction accuracy of the power user portrait construction method based on the principal component analysis is 94.68%.
The invention further provides a power consumer portrait construction system based on the big data. Fig. 2 is a schematic diagram illustrating an internal structure of a big data-based power consumer representation construction system according to an embodiment of the present invention.
In the present embodiment, the big data based power consumer representation construction system 1 at least comprises a power data acquisition device 11, a power data processor 12, a power consumer representation construction device 13, a communication bus 14, and a network interface 15.
The power data acquisition device 11 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, or a mobile Computer, or may be a server.
The power data processor 12 includes at least one type of readable storage medium including flash memory, hard disks, multi-media cards, card type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, and the like. The power data processor 12 may in some embodiments be an internal storage unit of the big data based power consumer representation construction system 1, such as a hard disk of the big data based power consumer representation construction system 1. The power data processor 12 may also be an external storage device of the big data-based power consumer representation constructing system 1 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), and the like, provided on the big data-based power consumer representation constructing system 1. Further, the power data processor 12 may also include both an internal storage unit and an external storage device of the power user representation construction system 1 based on big data. The power data processor 12 can be used not only to store application software installed in the power consumer representation construction system 1 based on large data and various types of data, but also to temporarily store data that has been output or is to be output.
Power consumer representation creation means 13 may in some embodiments be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data processing chip for executing program code or processing data stored in power data processor 12, such as power consumer representation creation program instructions.
The communication bus 14 is used to enable connection communication between these components.
The network interface 15 may optionally comprise a standard wired interface, a wireless interface (e.g. WI-FI interface), typically used for establishing a communication connection between the apparatus 1 and other electronic devices.
Optionally, the apparatus 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the big data based power user representation construction system 1 and for displaying a visualized user interface.
While FIG. 2 shows only the power consumer representation construction system 1 with components 11-15 and based on big data, those skilled in the art will appreciate that the configuration shown in FIG. 1 does not constitute a limitation of the large data based power consumer representation construction system 1, and may include fewer or more components than shown, or some components in combination, or a different arrangement of components.
In the embodiment of the apparatus 1 shown in fig. 2, the power data processor 12 stores therein power consumer representation construction program instructions; the steps of the power consumer figure constructing apparatus 13 executing the power consumer figure constructing program instructions stored in the power data processor 12 are the same as the implementation method of the power consumer figure constructing method based on the big data, and are not described here.
Furthermore, an embodiment of the present invention also provides a computer-readable storage medium having stored thereon power user portrait construction program instructions executable by one or more processors to implement the following operations:
acquiring power consumer data, and performing information mining on the power consumer data by using an information mining algorithm based on a prefix tree to obtain a feature vector of a power consumer;
processing the characteristic vector of the power consumer by using a characteristic vector processing algorithm based on time sequence evolution to obtain a power consumer characteristic vector based on time sequence evolution;
clustering the power user characteristic vectors evolved based on the time sequence by using an improved characteristic vector clustering algorithm to obtain clustered power user characteristics, and constructing a power user portrait according to a clustering result;
constructing communication information relation matrixes of different power users according to the constructed user portrait;
and constructing mobile behavior matrixes of different power users, and constructing the social portrait of the power users according to the communication information relation matrixes and the mobile behavior matrixes of the different power users.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (8)
1. A big data-based power user portrait construction method is characterized by comprising the following steps:
acquiring power consumer data, and performing information mining on the power consumer data by using an information mining algorithm based on a prefix tree to obtain a feature vector of a power consumer;
processing the characteristic vector of the power consumer by using a characteristic vector processing algorithm based on time sequence evolution to obtain a power consumer characteristic vector based on time sequence evolution;
clustering the power user characteristic vectors evolved based on the time sequence by using an improved characteristic vector clustering algorithm to obtain clustered power user characteristics, and constructing a power user portrait according to a clustering result;
constructing communication information relation matrixes of different power users according to the constructed user portrait;
and constructing mobile behavior matrixes of different power users, and constructing the social portrait of the power users according to the communication information relation matrixes and the mobile behavior matrixes of the different power users.
2. The big-data-based electric power user portrait construction method of claim 1, wherein the information mining of the electric power user data by using the prefix-tree-based information mining algorithm comprises:
the information mining algorithm based on the prefix tree comprises the following processes:
1) for power consumer data B ═<s1,s2,...,sk>Wherein s isiThe method comprises the steps that a power event occurs at time i, the length of power consumer data B is k, the power consumer data B is constructed into a prefix tree, a root node is empty, and the power data are placed in child nodes according to a time sequence increasing sequence;
2) adding a power event at the end of the power consumer data B, if the duration of the power consumer data B does not change, the event is called a simultaneous event SI, and if the duration of the power consumer data B adds 1, the event is called a sequential event SE;
3) calculating an event set of the added simultaneous events SI and the sequential events SE:
SI(B)=(u(B)+u(SI)+u(SE))/u(CES)
wherein:
u (B) is the utility value of the power user data B, namely the total occurrence number of the power user data B;
u (SI) is the utility value of a simultaneous event;
u (SE) is the utility value of the sequence event;
u (CES) is the ratio of the self utility value of the event to the total utility sum of the utility values of the event sequence;
4) ranking the power events in the event set by their utility values, wherein power events with higher utility values will be in more advanced positions;
5) and for the simultaneous events, the power events with higher utility values are placed in the corresponding time sequence nodes in the prefix tree, and for the sequence events, the power events with higher utility values are placed in the next layer of nodes of the corresponding time sequence in the prefix tree, wherein the events stored in each node in the prefix tree are the feature vectors of the power user features.
3. The big-data-based electric power user portrait construction method according to claim 2, wherein the processing of the feature vector of the electric power user by using the time-series evolution-based feature vector processing algorithm includes:
the feature vector processing algorithm based on time sequence evolution comprises the following steps:
obtaining a time node of a power consumer characteristic vector m, and calculating an absolute value | t of a difference value between the time node and a current time nodem|;
Calculating the weight of the power user feature vector by using a time sequence evolution feature weight calculation formula, assigning the time sequence evolution feature weight to the corresponding power user feature vector, and obtaining the power user feature vector based on time sequence evolution, wherein the time sequence evolution feature weight calculation formula is as follows:
wherein:
wmthe weight is the characteristic vector m of the power consumer;
λ is the attenuation factor, which is set to 0.4.
4. The big-data-based electric power user portrait construction method according to claim 3, wherein the clustering processing of the electric power user feature vectors based on time sequence evolution by using the improved feature vector clustering algorithm includes:
1) for an initially given m time-evolution-based power user feature vectors m1,m2,...,mmConverting the vector into a power user characteristic matrix X consisting of m n-dimensional vectorsm×nAnd calculating a covariance matrix S of the matricesm×m=Cov(Xm×n);
2) Is calculated toTo covariance matrix Sm×mSelecting the eigenvectors corresponding to the largest K eigenvalues to form a matrix Wn×KAnd performing dimensionality reduction processing on the power user characteristic matrix by using the following formula:
Z=Xm×nWn×K
wherein:
z is a power user characteristic matrix after dimensionality reduction;
3) calculating the mutual distance between any two vectors in Z and storing the calculation result in a matrix Dm×mSimultaneously calculating the average distance T between any two vectors;
4) according to matrix Dm×mCalculating the maximum distance between any two eigenvectors, and calculating the eigenvectors v at two ends of the maximum distance1,v2As an initial clustering center;
5) and repeating the step 4), if the distance between the new clustering center and the known clustering center is greater than T, considering the new clustering center to be effective until K clustering centers are obtained, wherein the feature vector of the clustering centers is the user portrait description.
5. The big data-based electric power user portrait construction method as claimed in claim 4, wherein the construction of the communication information relation matrix of different electric power users comprises:
extracting the short message frequency, the call duration and the call frequency data of the user according to the power user communication label in the user figure, and constructing the communication interaction relationship among different power users by using the following formula:
wherein:
meirepresenting the communication weight of the power consumer i to the power consumer j;
callifor the frequency of the call from Utility i to Utility j, Σ calljCommunication for power consumer i to all power consumersA speech frequency;
lonifor the duration of a call from Utility i to Utility j, Σ lonjThe call duration of the power consumer i to all the power consumers is set;
messishort message frequency, Σ mess, for power consumer i to power consumer jjShort message frequency of all power consumers for the power consumer i;
establishing a communication information relation matrix M of different power users:
wherein:
ci,jcommunication weight me representing that power consumer i contacts power consumer ji。
6. The big data-based electric power user portrait construction method as claimed in claim 5, wherein the construction of the electric power user social portrait according to the communication information relationship matrix and the movement behavior matrix of different electric power users comprises:
according to the connection condition of the mobile phone Bluetooth of the power user, the connection condition of the Bluetooth in different time windows is recorded by equally dividing one day into 24 time windows, if the connection condition is marked as 1, otherwise, the connection condition is marked as 0, and a user movement behavior distribution characteristic matrix is established, wherein the user movement behavior distribution characteristic matrix S is as follows:
wherein:
vecN is a motion behavior vector of the power consumer N, and is in the form of vecN ═ 1, 0.., 1], which indicates connection conditions of bluetooth in different time windows;
calculating the similarity of the movement behaviors among different power users by using the Jaccard similarity coefficient:
constructing a moving behavior matrix U of the power consumer:
calculating the social relationship weight Y of the power users according to the communication information relationship matrix M and the mobile behavior matrix U of different power usersi:
Yi=α*M+(1-α)*U
Wherein:
α is the matrix weight, which is set to 0.5.
7. A big data-based power consumer representation construction system, the device comprising:
the power data acquisition device is used for acquiring power user data;
the electric power data processor is used for carrying out information mining on the electric power user data by using an information mining algorithm based on a prefix tree to obtain a characteristic vector of the electric power user; processing the characteristic vector of the power consumer by using a characteristic vector processing algorithm based on time sequence evolution to obtain a power consumer characteristic vector based on time sequence evolution;
the electric power user portrait construction device is used for clustering electric power user feature vectors evolved based on time sequence by using an improved feature vector clustering algorithm to obtain clustered electric power user features, constructing an electric power user portrait according to a clustering result, and constructing a user social portrait according to the constructed user portrait by using a space-based user discovery method.
8. A computer readable storage medium having stored thereon power user representation construction program instructions executable by one or more processors to implement the steps of a method of implementing big data based power user representation construction as claimed in any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110204028.7A CN112801207A (en) | 2021-02-24 | 2021-02-24 | Power user portrait construction method and device based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110204028.7A CN112801207A (en) | 2021-02-24 | 2021-02-24 | Power user portrait construction method and device based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112801207A true CN112801207A (en) | 2021-05-14 |
Family
ID=75815433
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110204028.7A Withdrawn CN112801207A (en) | 2021-02-24 | 2021-02-24 | Power user portrait construction method and device based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112801207A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113269660A (en) * | 2021-06-08 | 2021-08-17 | 建投河北热力有限公司 | Heat supply control method and device, electronic equipment and computer readable storage medium |
CN113407705A (en) * | 2021-06-18 | 2021-09-17 | 广东电网有限责任公司广州供电局 | Power user portrait generation method and device, electronic equipment and storage medium |
-
2021
- 2021-02-24 CN CN202110204028.7A patent/CN112801207A/en not_active Withdrawn
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113269660A (en) * | 2021-06-08 | 2021-08-17 | 建投河北热力有限公司 | Heat supply control method and device, electronic equipment and computer readable storage medium |
CN113407705A (en) * | 2021-06-18 | 2021-09-17 | 广东电网有限责任公司广州供电局 | Power user portrait generation method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10671936B2 (en) | Method for clustering nodes of a textual network taking into account textual content, computer-readable storage device and system implementing said method | |
CN110347835B (en) | Text clustering method, electronic device and storage medium | |
CN110866181B (en) | Resource recommendation method, device and storage medium | |
KR102122373B1 (en) | Method and apparatus for obtaining user portrait | |
CN107239993B (en) | Matrix decomposition recommendation method and system based on expansion label | |
WO2019062021A1 (en) | Method for pushing loan advertisement in application program, electronic device, and medium | |
CN106030571A (en) | Dynamically modifying elements of user interface based on knowledge graph | |
CN106250464B (en) | Training method and device of ranking model | |
CN110503459B (en) | User credibility assessment method and device based on big data and storage medium | |
CN105956011B (en) | Searching method and device | |
CN113220734A (en) | Course recommendation method and device, computer equipment and storage medium | |
CN110503506A (en) | Item recommendation method, device and medium based on score data | |
CN107911448A (en) | Content pushing method and device | |
CN105930390A (en) | Relation-type database expansion method and relation-type database expansion system | |
CN112801207A (en) | Power user portrait construction method and device based on big data | |
CN114065750A (en) | Commodity information matching and publishing method and device, equipment, medium and product thereof | |
EP2678809A1 (en) | Entity fingerprints | |
CN106776716A (en) | A kind of intelligent Matching marketing consultant and the method and apparatus of user | |
CN109885834A (en) | A kind of prediction technique and device of age of user gender | |
CN112307352B (en) | Content recommendation method, system, device and storage medium | |
CN110866042A (en) | Intelligent table query method and device and computer readable storage medium | |
CN113849748A (en) | Information display method and device, electronic equipment and readable storage medium | |
CN111858617A (en) | User searching method and device, computer readable storage medium and electronic equipment | |
Yang et al. | An adaptive automatic approach to filtering empty images from camera traps using a deep learning model | |
CN111177547A (en) | Scientific and technological achievement searching method and device based on big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210514 |
|
WW01 | Invention patent application withdrawn after publication |