CN112801207A - Power user portrait construction method and device based on big data - Google Patents

Power user portrait construction method and device based on big data Download PDF

Info

Publication number
CN112801207A
CN112801207A CN202110204028.7A CN202110204028A CN112801207A CN 112801207 A CN112801207 A CN 112801207A CN 202110204028 A CN202110204028 A CN 202110204028A CN 112801207 A CN112801207 A CN 112801207A
Authority
CN
China
Prior art keywords
power
user
data
power consumer
consumer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110204028.7A
Other languages
Chinese (zh)
Inventor
汪礼君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202110204028.7A priority Critical patent/CN112801207A/en
Publication of CN112801207A publication Critical patent/CN112801207A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of big data, and discloses a big data-based electric power user portrait construction method, which comprises the following steps: acquiring power consumer data, and performing information mining on the power consumer data by using an information mining algorithm based on a prefix tree to obtain a feature vector of a power consumer; processing the characteristic vector of the power consumer by using a characteristic vector processing algorithm based on time sequence evolution to obtain a power consumer characteristic vector based on time sequence evolution; clustering the power user characteristic vectors evolved based on the time sequence by using an improved characteristic vector clustering algorithm to obtain clustered power user characteristics, and constructing a power user portrait according to a clustering result; and constructing a user social portrait by utilizing a space-based user discovery method according to the constructed user portrait. The invention also provides a power user portrait construction system based on the big data. The invention realizes the construction of the power consumer image.

Description

Power user portrait construction method and device based on big data
Technical Field
The invention relates to the technical field of big data, in particular to a method and a device for constructing a power user portrait based on big data.
Background
At present, the user portrait technology is rapidly developed and is being applied to the fields of social media, e-commerce, mobile and the like. When the method is applied to enterprises in practice, user figures need to be displayed step by step from whole to detail in the professional field according to actual business requirements, so that in the application of power enterprises, the method provides support for power enterprise decision making by constructing the power user figures and becomes a hot topic of current research.
The traditional user feature extraction algorithm is high in calculation complexity, the calculated amount of the feature clustering algorithm K-means clustering algorithm is large, a local optimal solution is easily caused, globally optimal user portrait features cannot be quickly obtained, and the user portrait is constructed.
In view of this, how to extract user portrait features more quickly and implement user portrait construction is an urgent problem to be solved by those skilled in the art.
Disclosure of Invention
The invention provides a big data-based electric power user portrait construction method, which comprises the steps of carrying out information mining on electric power user data by using an information mining algorithm based on a prefix tree to obtain a characteristic vector of an electric power user, processing the characteristic vector of the electric power user by using a characteristic vector processing algorithm based on time sequence evolution to obtain electric power user characteristics based on time sequence evolution, carrying out clustering processing on the electric power user characteristics by using an improved characteristic clustering algorithm, constructing the clustered electric power user characteristics into a user portrait, and constructing a user social portrait by using a space-based user discovery method.
In order to achieve the above object, the present invention provides a method for constructing a portrait of a power consumer based on big data, comprising:
acquiring power consumer data, and performing information mining on the power consumer data by using an information mining algorithm based on a prefix tree to obtain a feature vector of a power consumer;
processing the characteristic vector of the power consumer by using a characteristic vector processing algorithm based on time sequence evolution to obtain a power consumer characteristic vector based on time sequence evolution;
clustering the power user characteristic vectors evolved based on the time sequence by using an improved characteristic vector clustering algorithm to obtain clustered power user characteristics, and constructing a power user portrait according to a clustering result;
constructing communication information relation matrixes of different power users according to the constructed user portrait;
and constructing mobile behavior matrixes of different power users, and constructing the social portrait of the power users according to the communication information relation matrixes and the mobile behavior matrixes of the different power users.
Optionally, the information mining on the power consumer data by using the prefix tree-based information mining algorithm includes:
the electric power user data comprises power consumption time sequence data of a user, power consumption time sequence data of an electric appliance of the user, power consumption time sequence data of an area where the user is located, a consultation text of the user to an electric power enterprise and the like;
the information mining algorithm based on the prefix tree comprises the following processes:
1) for power consumer data B ═<s1,s2,...,sk>Wherein s isiThe method comprises the steps that a power event occurs at time i, the length of power consumer data B is k, the power consumer data B is constructed into a prefix tree, a root node is empty, and the power data are placed in child nodes according to a time sequence increasing sequence;
2) adding a power event at the end of the power consumer data B, if the duration of the power consumer data B does not change, the event is called a simultaneous event SI, and if the duration of the power consumer data B adds 1, the event is called a sequential event SE;
3) calculating an event set of the added simultaneous events SI and the sequential events SE:
SI(B)=(u(B)+u(SI)+u(SE))/u(CES)
Figure BDA0002949688870000021
wherein:
u (B) is the utility value of the power user data B, namely the total occurrence number of the power user data B;
u (SI) is the utility value of a simultaneous event;
u (SE) is the utility value of the sequence event;
u (CES) is the ratio of the self utility value of the event to the total utility sum of the utility values of the event sequence;
according to the formula, the power event which occurs more times has a higher utility value;
4) ranking the power events in the event set by their utility values, wherein power events with higher utility values will be in more advanced positions;
5) and for the simultaneous events, the power events with higher utility values are placed in the corresponding time sequence nodes in the prefix tree, and for the sequence events, the power events with higher utility values are placed in the next layer of nodes of the corresponding time sequence in the prefix tree, wherein the events stored in each node in the prefix tree are the feature vectors of the power user features.
Optionally, the processing the feature vector of the power consumer by using a feature vector processing algorithm based on time sequence evolution includes:
in the prefix tree, the characteristic vector with a longer time sequence is positioned at a position closer to a root node in the prefix tree, so that the characteristic vector of the power user is processed from the bottom layer of the prefix tree;
the feature vector processing algorithm based on time sequence evolution comprises the following steps:
obtaining a time node of a power consumer characteristic vector m, and calculating an absolute value | t of a difference value between the time node and a current time nodem|;
Calculating the weight of the power user feature vector by using a time sequence evolution feature weight calculation formula, assigning the time sequence evolution feature weight to the corresponding power user feature vector, and obtaining the power user feature vector based on time sequence evolution, wherein the time sequence evolution feature weight calculation formula is as follows:
Figure BDA0002949688870000031
wherein:
wmfor electric powerThe weight of the household feature vector m;
λ is the attenuation factor, which is set to 0.4.
In an embodiment of the invention, the latest power consumer feature vector has a higher weight by performing feature weighting based on time sequence evolution on the power consumer feature vector.
Optionally, the clustering, by using an improved feature vector clustering algorithm, the power user feature vector based on time sequence evolution includes:
1) for an initially given m time-evolution-based power user feature vectors m1,m2,...,mmConverting the vector into a power user characteristic matrix X consisting of m n-dimensional vectorsm×nAnd calculating a covariance matrix S of the matricesm×m=Cov(Xm×n);
2) The covariance matrix S is obtained by calculationm×mSelecting the eigenvectors corresponding to the largest K eigenvalues to form a matrix Wn×KAnd performing dimensionality reduction processing on the power user characteristic matrix by using the following formula:
Z=Xm×nWn×K
wherein:
z is a power user characteristic matrix after dimensionality reduction;
3) calculating the mutual distance between any two vectors in Z and storing the calculation result in a matrix Dm×mSimultaneously calculating the average distance T between any two vectors;
4) according to matrix Dm×mCalculating the maximum distance between any two eigenvectors, and calculating the eigenvectors v at two ends of the maximum distance1,v2As an initial clustering center;
5) and repeatedly executing 4), if the distance between the new clustering center and the known clustering center is greater than T, considering the new clustering center to be effective until K clustering centers are obtained, wherein the feature vector of the clustering center is the user portrait description, in one specific embodiment of the invention, K is 8, and simultaneously, the invention takes the key word corresponding to the feature vector as the user portrait description, such as large night power consumption, small weekend power consumption and the like.
Optionally, the constructing a user social representation by using a space-based user discovery method includes:
1) extracting the short message frequency, the call duration and the call frequency data of the user according to the power user communication label in the user figure, and constructing the communication interaction relationship among different power users by using the following formula:
Figure BDA0002949688870000041
wherein:
meirepresenting the communication weight of the power consumer i to the power consumer j;
callifor the frequency of the call from Utility i to Utility j, Σ calljThe communication frequency of the power consumer i to all the power consumers is set;
lonifor the duration of a call from Utility i to Utility j, Σ lonjThe call duration of the power consumer i to all the power consumers is set;
messishort message frequency, Σ mess, for power consumer i to power consumer jjShort message frequency of all power consumers for the power consumer i;
2) establishing a communication information relation matrix M of different power users:
Figure BDA0002949688870000042
wherein:
ci,jcommunication weight me representing that power consumer i contacts power consumer ji
3) According to the connection condition of the mobile phone Bluetooth of the power user, the connection condition of the Bluetooth in different time windows is recorded by equally dividing one day into 24 time windows, if the connection condition is marked as 1, otherwise, the connection condition is marked as 0, and a user movement behavior distribution characteristic matrix is established, wherein the user movement behavior distribution characteristic matrix S is as follows:
Figure BDA0002949688870000043
wherein:
vecN is a motion behavior vector of the power consumer N, and is in the form of vecN ═ 1, 0.., 1], which indicates connection conditions of bluetooth in different time windows;
calculating the similarity of the movement behaviors among different power users by using the Jaccard similarity coefficient:
Figure BDA0002949688870000044
further, the invention constructs a mobile behavior matrix U of the power consumer:
Figure BDA0002949688870000045
4) calculating the social relationship weight Y of the power users according to the communication information relationship matrix M and the mobile behavior matrix U of different power usersi
Yi=α*M+(1-α)*U
Wherein:
α is the matrix weight, which is set to 0.5;
in one embodiment of the invention, when the social relationship weight of the power consumer is greater than 0.7, the power consumer is considered to have higher influence, so that the power product is more likely to be recommended to other users.
In addition, to achieve the above object, the present invention further provides a big data-based power consumer representation construction system, where the apparatus includes:
the power data acquisition device is used for acquiring power user data;
the electric power data processor is used for carrying out information mining on the electric power user data by using an information mining algorithm based on a prefix tree to obtain a characteristic vector of the electric power user; processing the characteristic vector of the power consumer by using a characteristic vector processing algorithm based on time sequence evolution to obtain a power consumer characteristic vector based on time sequence evolution;
the electric power user portrait construction device is used for clustering electric power user feature vectors evolved based on time sequence by using an improved feature vector clustering algorithm to obtain clustered electric power user features, constructing an electric power user portrait according to a clustering result, and constructing a user social portrait according to the constructed user portrait by using a space-based user discovery method.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium having stored thereon power user portrait construction program instructions executable by one or more processors to implement the steps of the implementation method of big data based power user portrait construction as described above.
Compared with the prior art, the invention provides a power user portrait construction method based on big data, and the technology has the following advantages:
firstly, the invention processes the characteristic vector of the power consumer by utilizing a characteristic vector processing algorithm based on time sequence evolution, and because the characteristic vector with longer time sequence is closer to the root node in the prefix tree, the invention processes the characteristic vector of the power consumer from the bottom layer of the prefix tree. The feature vector processing algorithm based on time sequence evolution comprises the following steps: obtaining a time node of a power consumer characteristic vector m, and calculating an absolute value | t of a difference value between the time node and a current time nodemL, |; calculating the weight of the power user feature vector by using a time sequence evolution feature weight calculation formula, assigning the time sequence evolution feature weight to the corresponding power user feature vector, and obtaining the power user feature based on time sequence evolution, wherein the time sequence evolution feature weight calculation formula is as follows:
Figure BDA0002949688870000051
wherein: w is amThe weight is the characteristic vector m of the power consumer; λ is the attenuation factor. According to the invention, the characteristic weighting based on time sequence evolution is carried out on the electric power user characteristic vector, so that the latest electric power user characteristic vector has higher weight, and the constructed electric power user portrait is more in line with the current characteristics of the user.
Meanwhile, the invention provides a space-based user discovery method for constructing a user social portrait, extracting the short message frequency, the call duration and the call frequency data of a user according to the power user communication tag in the user portrait, and constructing the communication interaction relationship among different power users by using the following formula:
Figure BDA0002949688870000061
wherein: meiRepresenting the communication weight of the power consumer i to the power consumer j; shelliFor the frequency of the call from Utility i to Utility j, Σ calljThe communication frequency of the power consumer i to all the power consumers is set; loniFor the duration of a call from Utility i to Utility j, Σ lonjThe call duration of the power consumer i to all the power consumers is set; messiShort message frequency, Σ mess, for power consumer i to power consumer jjShort message frequency of all power consumers for the power consumer i; therefore, a communication information relation matrix M of different power users is established:
Figure BDA0002949688870000062
wherein: c. Ci,jCommunication weight me representing that power consumer i contacts power consumer ji. Meanwhile, according to the connection condition of the mobile phone Bluetooth of the power user, the method equally divides one day into 24 time windows, records the connection condition of the Bluetooth in different time windows, marks the time window as 1 if the connection condition is detected, marks the time window as 0 if the connection condition is detected, establishes a user movement behavior distribution characteristic matrix, and distributes the characteristics of the user movement behaviorThe matrix S is:
Figure BDA0002949688870000063
wherein: vecN is a motion behavior vector of the power consumer N, and is in the form of vecN ═ 1, 0.., 1], which indicates connection conditions of bluetooth in different time windows; calculating the similarity of the movement behaviors among different power users by using the Jaccard similarity coefficient:
Figure BDA0002949688870000064
thereby constructing a moving behavior matrix U of the power consumer:
Figure BDA0002949688870000065
finally, calculating the social relationship weight Y of the power users according to the communication information relationship matrix M and the mobile behavior matrix U of different power usersi
Yi=α*M+(1-α)*U
Wherein: α is the matrix weight, which is set to 0.5; when the social relationship weight of the power consumer is greater than 0.7, the power consumer is considered to have higher influence, so that the power product is more likely to be recommended to other users.
Drawings
FIG. 1 is a schematic flow chart of a big data-based power consumer representation construction method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a big data-based electrical consumer representation construction system according to an embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The method comprises the steps of carrying out information mining on power user data by using an information mining algorithm based on a prefix tree to obtain a characteristic vector of a power user, processing the characteristic vector of the power user by using a characteristic vector processing algorithm based on time sequence evolution to obtain power user characteristics based on time sequence evolution, carrying out clustering processing on the power user characteristics by using an improved characteristic clustering algorithm, constructing the clustered power user characteristics into a user portrait, and constructing a user social portrait by using a user discovery method based on a space. Fig. 1 is a schematic diagram illustrating a method for constructing a user profile based on big data according to an embodiment of the present invention.
In this embodiment, the method for constructing the user profile of the power consumer based on big data includes:
and S1, acquiring the power consumer data, and performing information mining on the power consumer data by using an information mining algorithm based on the prefix tree to obtain the characteristic vector of the power consumer.
Firstly, acquiring power consumer data, wherein the power consumer data comprises power consumption time sequence data of a user, power consumption time sequence data of an electric appliance of the user, power consumption time sequence data of an area where the user is located, a consultation text of the user to a power enterprise and the like;
further, the invention utilizes an information mining algorithm based on a prefix tree to mine information of the power consumer data, and the information mining algorithm based on the prefix tree comprises the following processes:
1) for power consumer data B ═<s1,s2,...,sk>Wherein s isiThe method comprises the steps that a power event occurs at time i, the length of power consumer data B is k, the power consumer data B is constructed into a prefix tree, a root node is empty, and the power data are placed in child nodes according to a time sequence increasing sequence;
2) adding a power event at the end of the power consumer data B, if the duration of the power consumer data B does not change, the event is called a simultaneous event SI, and if the duration of the power consumer data B adds 1, the event is called a sequential event SE;
3) calculating an event set of the added simultaneous events SI and the sequential events SE:
SI(B)=(u(B)+u(SI)+u(SE))/u(CES)
Figure BDA0002949688870000081
wherein:
u (B) is the utility value of the power user data B, namely the total occurrence number of the power user data B;
u (SI) is the utility value of a simultaneous event;
u (SE) is the utility value of the sequence event;
u (CES) is the ratio of the self utility value of the event to the total utility sum of the utility values of the event sequence;
according to the formula, the power event which occurs more times has a higher utility value;
4) ranking the power events in the event set by their utility values, wherein power events with higher utility values will be in more advanced positions;
5) and for the simultaneous events, the power events with higher utility values are placed in the corresponding time sequence nodes in the prefix tree, and for the sequence events, the power events with higher utility values are placed in the next layer of nodes of the corresponding time sequence in the prefix tree, wherein the events stored in each node in the prefix tree are the feature vectors of the power user features.
And S2, processing the feature vector of the power consumer by using a feature vector processing algorithm based on time sequence evolution to obtain the power consumer feature vector based on time sequence evolution.
Furthermore, the characteristic vector of the power consumer is processed by utilizing a characteristic vector processing algorithm based on time sequence evolution, in the prefix tree, the characteristic vector with longer time sequence is closer to the root node in the prefix tree, so that the characteristic vector of the power consumer is processed from the bottom layer of the prefix tree;
the feature vector processing algorithm based on time sequence evolution comprises the following steps:
obtaining a time node of a power consumer characteristic vector m, and calculating an absolute value | t of a difference value between the time node and a current time nodem|;
Calculating the weight of the power user feature vector by using a time sequence evolution feature weight calculation formula, assigning the time sequence evolution feature weight to the corresponding power user feature vector, and obtaining the power user feature vector based on time sequence evolution, wherein the time sequence evolution feature weight calculation formula is as follows:
Figure BDA0002949688870000082
wherein:
wmthe weight is the characteristic vector m of the power consumer;
λ is the attenuation factor, which is set to 0.4.
In an embodiment of the invention, the latest power consumer feature vector has a higher weight by performing feature weighting based on time sequence evolution on the power consumer feature vector.
And S3, clustering the power user feature vectors evolved based on the time sequence by using an improved feature vector clustering algorithm to obtain clustered power user features, and constructing a power user portrait according to a clustering result.
Further, the invention utilizes an improved feature vector clustering algorithm to cluster the power user feature vectors evolved based on time sequence, and the improved feature vector clustering algorithm comprises the following processes:
1) for an initially given m time-evolution-based power user feature vectors m1,m2,...,mmConverting the vector into a power user characteristic matrix X consisting of m n-dimensional vectorsm×nAnd calculating a covariance matrix S of the matricesm×m=Cov(Xm×n);
2) The covariance matrix S is obtained by calculationm×mSelecting the maximum K eigenvalues corresponding to the eigenvalues and eigenvectorsEigenvector formation matrix Wn×KAnd performing dimensionality reduction processing on the power user characteristic matrix by using the following formula:
Z=Xm×nWn×K
wherein:
z is a power user characteristic matrix after dimensionality reduction;
3) calculating the mutual distance between any two vectors in Z and storing the calculation result in a matrix Dm×mSimultaneously calculating the average distance T between any two vectors;
4) according to matrix Dm×mCalculating the maximum distance between any two eigenvectors, and calculating the eigenvectors v at two ends of the maximum distance1,v2As an initial clustering center;
5) and repeatedly executing 4), if the distance between the new clustering center and the known clustering center is greater than T, considering the new clustering center to be effective until K clustering centers are obtained, wherein the feature vector of the clustering center is the user portrait description, in one specific embodiment of the invention, K is 8, and simultaneously, the invention takes the key word corresponding to the feature vector as the user portrait description, such as large night power consumption, small weekend power consumption and the like.
S4, constructing the user social portrait by using a space-based user discovery method according to the constructed user portrait.
Further, according to the constructed user portrait, the invention utilizes a space-based user discovery method to construct a user social portrait, wherein the space-based user discovery method comprises the following steps:
1) extracting the short message frequency, the call duration and the call frequency data of the user according to the power user communication label in the user figure, and constructing the communication interaction relationship among different power users by using the following formula:
Figure BDA0002949688870000091
wherein:
meirepresenting communication right of power consumer i to power consumer jWeighing;
callifor the frequency of the call from Utility i to Utility j, Σ calljThe communication frequency of the power consumer i to all the power consumers is set;
lonifor the duration of a call from Utility i to Utility j, Σ lonjThe call duration of the power consumer i to all the power consumers is set;
messishort message frequency, Σ mess, for power consumer i to power consumer jjShort message frequency of all power consumers for the power consumer i;
2) establishing a communication information relation matrix M of different power users:
Figure BDA0002949688870000101
wherein:
ci,jcommunication weight me representing that power consumer i contacts power consumer ji
3) According to the connection condition of the mobile phone Bluetooth of the power user, the connection condition of the Bluetooth in different time windows is recorded by equally dividing one day into 24 time windows, if the connection condition is marked as 1, otherwise, the connection condition is marked as 0, and a user movement behavior distribution characteristic matrix is established, wherein the user movement behavior distribution characteristic matrix S is as follows:
Figure BDA0002949688870000102
wherein:
vecN is a motion behavior vector of the power consumer N, and is in the form of vecN ═ 1, 0.., 1], which indicates connection conditions of bluetooth in different time windows;
calculating the similarity of the movement behaviors among different power users by using the Jaccard similarity coefficient:
Figure BDA0002949688870000103
further, the invention constructs a mobile behavior matrix U of the power consumer:
Figure BDA0002949688870000104
4) calculating the social relationship weight Y of the power users according to the communication information relationship matrix M and the mobile behavior matrix U of different power usersi
Yi=α*M+(1-α)*U
Wherein:
α is the matrix weight, which is set to 0.5;
in one embodiment of the invention, when the social relationship weight of the power consumer is greater than 0.7, the power consumer is considered to have higher influence, so that the power product is more likely to be recommended to other users.
The following describes embodiments of the present invention through an algorithmic experiment and tests of the inventive treatment method. The hardware test environment of the algorithm of the invention is as follows: the operating device is Ubuntu16.04, the computer processor is Inteli5-8500 CPU @3GHZ multiplied by 6, the size of the memory bank is 16G, Tensorflow-gpu1.18 version, and keras 2.24 version; the contrast retrieval method is a power user portrait construction method based on random forests, a power user portrait construction method based on decision trees and a power user portrait construction method based on principal component analysis.
In the algorithm experiment, the data set is the power use data of 5000 power users. In the experiment, the power use data is input into the method and the comparison method, and the accuracy of user portrait construction is used as an index for evaluating the performance of the algorithm.
According to experimental results, the user portrait construction accuracy of the power user portrait construction method based on the random forest is 85.32%, the user portrait construction accuracy of the power user portrait construction method based on the decision tree is 80.65%, the user portrait construction accuracy of the power user portrait construction method based on the principal component analysis is 91.32%, and the user portrait construction accuracy of the power user portrait construction method based on the principal component analysis is 94.68%.
The invention further provides a power consumer portrait construction system based on the big data. Fig. 2 is a schematic diagram illustrating an internal structure of a big data-based power consumer representation construction system according to an embodiment of the present invention.
In the present embodiment, the big data based power consumer representation construction system 1 at least comprises a power data acquisition device 11, a power data processor 12, a power consumer representation construction device 13, a communication bus 14, and a network interface 15.
The power data acquisition device 11 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, or a mobile Computer, or may be a server.
The power data processor 12 includes at least one type of readable storage medium including flash memory, hard disks, multi-media cards, card type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, and the like. The power data processor 12 may in some embodiments be an internal storage unit of the big data based power consumer representation construction system 1, such as a hard disk of the big data based power consumer representation construction system 1. The power data processor 12 may also be an external storage device of the big data-based power consumer representation constructing system 1 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), and the like, provided on the big data-based power consumer representation constructing system 1. Further, the power data processor 12 may also include both an internal storage unit and an external storage device of the power user representation construction system 1 based on big data. The power data processor 12 can be used not only to store application software installed in the power consumer representation construction system 1 based on large data and various types of data, but also to temporarily store data that has been output or is to be output.
Power consumer representation creation means 13 may in some embodiments be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data processing chip for executing program code or processing data stored in power data processor 12, such as power consumer representation creation program instructions.
The communication bus 14 is used to enable connection communication between these components.
The network interface 15 may optionally comprise a standard wired interface, a wireless interface (e.g. WI-FI interface), typically used for establishing a communication connection between the apparatus 1 and other electronic devices.
Optionally, the apparatus 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the big data based power user representation construction system 1 and for displaying a visualized user interface.
While FIG. 2 shows only the power consumer representation construction system 1 with components 11-15 and based on big data, those skilled in the art will appreciate that the configuration shown in FIG. 1 does not constitute a limitation of the large data based power consumer representation construction system 1, and may include fewer or more components than shown, or some components in combination, or a different arrangement of components.
In the embodiment of the apparatus 1 shown in fig. 2, the power data processor 12 stores therein power consumer representation construction program instructions; the steps of the power consumer figure constructing apparatus 13 executing the power consumer figure constructing program instructions stored in the power data processor 12 are the same as the implementation method of the power consumer figure constructing method based on the big data, and are not described here.
Furthermore, an embodiment of the present invention also provides a computer-readable storage medium having stored thereon power user portrait construction program instructions executable by one or more processors to implement the following operations:
acquiring power consumer data, and performing information mining on the power consumer data by using an information mining algorithm based on a prefix tree to obtain a feature vector of a power consumer;
processing the characteristic vector of the power consumer by using a characteristic vector processing algorithm based on time sequence evolution to obtain a power consumer characteristic vector based on time sequence evolution;
clustering the power user characteristic vectors evolved based on the time sequence by using an improved characteristic vector clustering algorithm to obtain clustered power user characteristics, and constructing a power user portrait according to a clustering result;
constructing communication information relation matrixes of different power users according to the constructed user portrait;
and constructing mobile behavior matrixes of different power users, and constructing the social portrait of the power users according to the communication information relation matrixes and the mobile behavior matrixes of the different power users.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (8)

1. A big data-based power user portrait construction method is characterized by comprising the following steps:
acquiring power consumer data, and performing information mining on the power consumer data by using an information mining algorithm based on a prefix tree to obtain a feature vector of a power consumer;
processing the characteristic vector of the power consumer by using a characteristic vector processing algorithm based on time sequence evolution to obtain a power consumer characteristic vector based on time sequence evolution;
clustering the power user characteristic vectors evolved based on the time sequence by using an improved characteristic vector clustering algorithm to obtain clustered power user characteristics, and constructing a power user portrait according to a clustering result;
constructing communication information relation matrixes of different power users according to the constructed user portrait;
and constructing mobile behavior matrixes of different power users, and constructing the social portrait of the power users according to the communication information relation matrixes and the mobile behavior matrixes of the different power users.
2. The big-data-based electric power user portrait construction method of claim 1, wherein the information mining of the electric power user data by using the prefix-tree-based information mining algorithm comprises:
the information mining algorithm based on the prefix tree comprises the following processes:
1) for power consumer data B ═<s1,s2,...,sk>Wherein s isiThe method comprises the steps that a power event occurs at time i, the length of power consumer data B is k, the power consumer data B is constructed into a prefix tree, a root node is empty, and the power data are placed in child nodes according to a time sequence increasing sequence;
2) adding a power event at the end of the power consumer data B, if the duration of the power consumer data B does not change, the event is called a simultaneous event SI, and if the duration of the power consumer data B adds 1, the event is called a sequential event SE;
3) calculating an event set of the added simultaneous events SI and the sequential events SE:
SI(B)=(u(B)+u(SI)+u(SE))/u(CES)
Figure FDA0002949688860000011
wherein:
u (B) is the utility value of the power user data B, namely the total occurrence number of the power user data B;
u (SI) is the utility value of a simultaneous event;
u (SE) is the utility value of the sequence event;
u (CES) is the ratio of the self utility value of the event to the total utility sum of the utility values of the event sequence;
4) ranking the power events in the event set by their utility values, wherein power events with higher utility values will be in more advanced positions;
5) and for the simultaneous events, the power events with higher utility values are placed in the corresponding time sequence nodes in the prefix tree, and for the sequence events, the power events with higher utility values are placed in the next layer of nodes of the corresponding time sequence in the prefix tree, wherein the events stored in each node in the prefix tree are the feature vectors of the power user features.
3. The big-data-based electric power user portrait construction method according to claim 2, wherein the processing of the feature vector of the electric power user by using the time-series evolution-based feature vector processing algorithm includes:
the feature vector processing algorithm based on time sequence evolution comprises the following steps:
obtaining a time node of a power consumer characteristic vector m, and calculating an absolute value | t of a difference value between the time node and a current time nodem|;
Calculating the weight of the power user feature vector by using a time sequence evolution feature weight calculation formula, assigning the time sequence evolution feature weight to the corresponding power user feature vector, and obtaining the power user feature vector based on time sequence evolution, wherein the time sequence evolution feature weight calculation formula is as follows:
Figure FDA0002949688860000021
wherein:
wmthe weight is the characteristic vector m of the power consumer;
λ is the attenuation factor, which is set to 0.4.
4. The big-data-based electric power user portrait construction method according to claim 3, wherein the clustering processing of the electric power user feature vectors based on time sequence evolution by using the improved feature vector clustering algorithm includes:
1) for an initially given m time-evolution-based power user feature vectors m1,m2,...,mmConverting the vector into a power user characteristic matrix X consisting of m n-dimensional vectorsm×nAnd calculating a covariance matrix S of the matricesm×m=Cov(Xm×n);
2) Is calculated toTo covariance matrix Sm×mSelecting the eigenvectors corresponding to the largest K eigenvalues to form a matrix Wn×KAnd performing dimensionality reduction processing on the power user characteristic matrix by using the following formula:
Z=Xm×nWn×K
wherein:
z is a power user characteristic matrix after dimensionality reduction;
3) calculating the mutual distance between any two vectors in Z and storing the calculation result in a matrix Dm×mSimultaneously calculating the average distance T between any two vectors;
4) according to matrix Dm×mCalculating the maximum distance between any two eigenvectors, and calculating the eigenvectors v at two ends of the maximum distance1,v2As an initial clustering center;
5) and repeating the step 4), if the distance between the new clustering center and the known clustering center is greater than T, considering the new clustering center to be effective until K clustering centers are obtained, wherein the feature vector of the clustering centers is the user portrait description.
5. The big data-based electric power user portrait construction method as claimed in claim 4, wherein the construction of the communication information relation matrix of different electric power users comprises:
extracting the short message frequency, the call duration and the call frequency data of the user according to the power user communication label in the user figure, and constructing the communication interaction relationship among different power users by using the following formula:
Figure FDA0002949688860000031
wherein:
meirepresenting the communication weight of the power consumer i to the power consumer j;
callifor the frequency of the call from Utility i to Utility j, Σ calljCommunication for power consumer i to all power consumersA speech frequency;
lonifor the duration of a call from Utility i to Utility j, Σ lonjThe call duration of the power consumer i to all the power consumers is set;
messishort message frequency, Σ mess, for power consumer i to power consumer jjShort message frequency of all power consumers for the power consumer i;
establishing a communication information relation matrix M of different power users:
Figure FDA0002949688860000032
wherein:
ci,jcommunication weight me representing that power consumer i contacts power consumer ji
6. The big data-based electric power user portrait construction method as claimed in claim 5, wherein the construction of the electric power user social portrait according to the communication information relationship matrix and the movement behavior matrix of different electric power users comprises:
according to the connection condition of the mobile phone Bluetooth of the power user, the connection condition of the Bluetooth in different time windows is recorded by equally dividing one day into 24 time windows, if the connection condition is marked as 1, otherwise, the connection condition is marked as 0, and a user movement behavior distribution characteristic matrix is established, wherein the user movement behavior distribution characteristic matrix S is as follows:
Figure FDA0002949688860000033
wherein:
vecN is a motion behavior vector of the power consumer N, and is in the form of vecN ═ 1, 0.., 1], which indicates connection conditions of bluetooth in different time windows;
calculating the similarity of the movement behaviors among different power users by using the Jaccard similarity coefficient:
Figure FDA0002949688860000034
constructing a moving behavior matrix U of the power consumer:
Figure FDA0002949688860000041
calculating the social relationship weight Y of the power users according to the communication information relationship matrix M and the mobile behavior matrix U of different power usersi
Yi=α*M+(1-α)*U
Wherein:
α is the matrix weight, which is set to 0.5.
7. A big data-based power consumer representation construction system, the device comprising:
the power data acquisition device is used for acquiring power user data;
the electric power data processor is used for carrying out information mining on the electric power user data by using an information mining algorithm based on a prefix tree to obtain a characteristic vector of the electric power user; processing the characteristic vector of the power consumer by using a characteristic vector processing algorithm based on time sequence evolution to obtain a power consumer characteristic vector based on time sequence evolution;
the electric power user portrait construction device is used for clustering electric power user feature vectors evolved based on time sequence by using an improved feature vector clustering algorithm to obtain clustered electric power user features, constructing an electric power user portrait according to a clustering result, and constructing a user social portrait according to the constructed user portrait by using a space-based user discovery method.
8. A computer readable storage medium having stored thereon power user representation construction program instructions executable by one or more processors to implement the steps of a method of implementing big data based power user representation construction as claimed in any one of claims 1 to 6.
CN202110204028.7A 2021-02-24 2021-02-24 Power user portrait construction method and device based on big data Withdrawn CN112801207A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110204028.7A CN112801207A (en) 2021-02-24 2021-02-24 Power user portrait construction method and device based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110204028.7A CN112801207A (en) 2021-02-24 2021-02-24 Power user portrait construction method and device based on big data

Publications (1)

Publication Number Publication Date
CN112801207A true CN112801207A (en) 2021-05-14

Family

ID=75815433

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110204028.7A Withdrawn CN112801207A (en) 2021-02-24 2021-02-24 Power user portrait construction method and device based on big data

Country Status (1)

Country Link
CN (1) CN112801207A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269660A (en) * 2021-06-08 2021-08-17 建投河北热力有限公司 Heat supply control method and device, electronic equipment and computer readable storage medium
CN113407705A (en) * 2021-06-18 2021-09-17 广东电网有限责任公司广州供电局 Power user portrait generation method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269660A (en) * 2021-06-08 2021-08-17 建投河北热力有限公司 Heat supply control method and device, electronic equipment and computer readable storage medium
CN113407705A (en) * 2021-06-18 2021-09-17 广东电网有限责任公司广州供电局 Power user portrait generation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US10671936B2 (en) Method for clustering nodes of a textual network taking into account textual content, computer-readable storage device and system implementing said method
CN110347835B (en) Text clustering method, electronic device and storage medium
CN110866181B (en) Resource recommendation method, device and storage medium
KR102122373B1 (en) Method and apparatus for obtaining user portrait
CN107239993B (en) Matrix decomposition recommendation method and system based on expansion label
WO2019062021A1 (en) Method for pushing loan advertisement in application program, electronic device, and medium
CN106030571A (en) Dynamically modifying elements of user interface based on knowledge graph
CN106250464B (en) Training method and device of ranking model
CN110503459B (en) User credibility assessment method and device based on big data and storage medium
CN105956011B (en) Searching method and device
CN113220734A (en) Course recommendation method and device, computer equipment and storage medium
CN110503506A (en) Item recommendation method, device and medium based on score data
CN107911448A (en) Content pushing method and device
CN105930390A (en) Relation-type database expansion method and relation-type database expansion system
CN112801207A (en) Power user portrait construction method and device based on big data
CN114065750A (en) Commodity information matching and publishing method and device, equipment, medium and product thereof
EP2678809A1 (en) Entity fingerprints
CN106776716A (en) A kind of intelligent Matching marketing consultant and the method and apparatus of user
CN109885834A (en) A kind of prediction technique and device of age of user gender
CN112307352B (en) Content recommendation method, system, device and storage medium
CN110866042A (en) Intelligent table query method and device and computer readable storage medium
CN113849748A (en) Information display method and device, electronic equipment and readable storage medium
CN111858617A (en) User searching method and device, computer readable storage medium and electronic equipment
Yang et al. An adaptive automatic approach to filtering empty images from camera traps using a deep learning model
CN111177547A (en) Scientific and technological achievement searching method and device based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210514

WW01 Invention patent application withdrawn after publication