CN117786224A - User portrait establishment method based on various environment-friendly data sources - Google Patents

User portrait establishment method based on various environment-friendly data sources Download PDF

Info

Publication number
CN117786224A
CN117786224A CN202311850339.6A CN202311850339A CN117786224A CN 117786224 A CN117786224 A CN 117786224A CN 202311850339 A CN202311850339 A CN 202311850339A CN 117786224 A CN117786224 A CN 117786224A
Authority
CN
China
Prior art keywords
data
user
real
time
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202311850339.6A
Other languages
Chinese (zh)
Inventor
于振江
邵丹
石立彬
李坤
张斌
郭润洲
田真真
贺治国
宋慧敏
李征
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Xingtai Power Supply Co of State Grid Hebei Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Xingtai Power Supply Co of State Grid Hebei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Xingtai Power Supply Co of State Grid Hebei Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202311850339.6A priority Critical patent/CN117786224A/en
Publication of CN117786224A publication Critical patent/CN117786224A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a user portrait establishing method based on various environment-friendly data sources, which comprises the following steps: step 1: collecting real-time data from different power environmental protection data sources, and integrating the data of the different power data sources into a standardized data model, wherein the data comprises daily electricity quantity and monthly electricity quantity; step 2: establishing a real-time data processing pipeline, designing data stream processing logic, and transmitting the real-time data stream to the processing logic by using a stream processing engine so as to process and analyze data; step 3: in the real-time data stream, executing feature engineering, extracting key features from environment-friendly data to describe environment-friendly behaviors and features of users, and constructing a user portrait model to update user portraits in real time; step 4: establishing a real-time user image database; step 5: an access control policy is implemented to ensure that only authorized personnel can access the user profile data. The method is more suitable for application scenes in which the problem of real-time data processing needs to be solved.

Description

User portrait establishment method based on various environment-friendly data sources
Technical Field
The invention relates to the technical field of data processing, in particular to a user portrait establishing method based on various environment-friendly data sources.
Background
User Profile is a detailed description and analysis of a particular User or group of users that is intended to help an enterprise better understand its target audience. Such descriptions typically include various information about the user, such as his or her characteristics, interests, behavior, needs, preferences, etc. The user portrayal can help businesses and organizations better locate their markets, formulate more accurate market policies, optimize products and services, provide personalized user experience, and improve customer relationship management.
The creation of the user representation may be based on an environment-friendly data source. The construction of the user portrait is not only limited to the traditional data such as personal basic information, consumption habit and the like, but also can comprise data related to power environmental protection, which is helpful for enterprises to better understand environmental awareness, behaviors and demands of users, so that a system capable of processing real-time data flow is required to be established for establishing the real-time user portrait.
Disclosure of Invention
In order to solve the problems, the invention provides a user portrait establishing method based on various environment-friendly data sources.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a user image building method based on various environment-friendly data sources comprises the following steps:
step 1: collecting real-time data from different power environment-friendly data sources, integrating the data of the different power data sources into a standardized data model, and ensuring the consistency and comparability of the data;
step 2: establishing a real-time data processing pipeline, designing data stream processing logic, and transmitting the real-time data stream to the processing logic by using a stream processing engine so as to process and analyze data;
step 3: in the real-time data stream, executing feature engineering, extracting key features from environment-friendly data to describe environment-friendly behaviors and features of users, and constructing a user portrait model to update user portraits in real time;
step 4: establishing a real-time user portrayal database to store characteristics, preferences and environmental protection behaviors of the user, and updating the user portrayal database by using the output of the user portrayal model;
step 5: an access control policy is implemented to ensure that only authorized personnel can access the user profile data.
Further: the step 1 comprises the following steps:
determining environmental protection data sources to be collected, and identifying the type and format of the data sources;
before integrating the data into the central store, the data is standardized to ensure consistency and comparability of the data;
performing data cleaning operation, including removing repeated data, filling missing values and checking the integrity of the data so as to ensure the quality of the data;
metadata of the data is recorded, including data source, data acquisition time and data type information.
Further: the step 2 comprises the following steps:
installing and configuring an ApacheSparkstreaming environment to ensure that Spark clusters are correctly set, wherein the ApacheSparkstreaming environment comprises a main node and a working node;
introducing a real-time data stream into a SparkStreaming application using a data stream source connector;
writing data processing logic in a SparkStreaming application program;
starting a SparkStreaming application program, which runs on a real-time data stream and continuously processes the arrived data; processing the data stream according to a fixed time interval, thereby realizing real-time processing;
the processed real-time data are output to different targets, so that the data output is ensured to meet the requirements of user images;
the real-time processing process comprises the following steps:
extracting ammeter information from an ammeter information table in the energy management system, wherein the ammeter information comprises ammeter IDs, user IDs, ammeter types, metering point IDs and ammeter coefficients;
extracting ammeter information from an ammeter information table in an energy data warehouse and associating the ammeter information with the ammeter information table in an energy management system;
extracting ammeter data including power information from a solar electricity price chart in an energy data warehouse, and screening data of the last two days;
extracting main metering point information from a metering point information table in an energy management system;
inserting the extracted data into a temporary table, wherein the temporary table comprises an ammeter ID, main metering point information and meter reading time;
meanwhile, collecting meter bottom data of an ammeter is extracted from a daily meter reading data table of electric energy in the electric energy management system, is associated with a metering point information table in the energy management system and a user information table in the energy management system, and data of the last two days are screened;
creating a table for storing daily electricity quantity statistics results and storing the counted daily electricity quantity data;
calculating the data difference value of the ammeter in the front and the rear days, and multiplying the data difference value by the conversion coefficient of the ammeter to obtain the total electric quantity and peak valley flat electric quantity statistical information of each user in each day;
checking whether data in a table base data table exists on the same day, and if not, executing the following steps;
backing up the data of the current day in the table bottom data table to the table bottom data table;
deleting the data in the table base data table of the current day;
updating statistical information in a summary result table for storing the daily power data of the enterprise, calculating the daily power consumption and the peak Gu Pingdian of each user, and updating the corresponding proportion;
deleting the data in the daily power data summarization result table of the enterprise on the same day;
inserting the recalculated total electric quantity, peak-valley average electric quantity statistical information of each user every day;
updating a daily power data summarization result table for storing enterprises;
deleting the data in the daily power data summarization result table of the enterprise on the same day;
inserting the total daily electric quantity, peak valley flat electric quantity statistical information of each enterprise calculated according to the model;
updating a daily power data summarization result table for storing enterprises;
the processed real-time data is output to different targets, so that the data output is ensured to meet the requirements of user images.
Further: the step 3 comprises the following steps:
feature engineering is performed from the real-time data stream to extract key features for describing the user's environmental behavior and features, the feature extraction including the following formulas,
Feature i =f(x i )
wherein x is i Representing the characteristic value, f (x i ) Is a feature transfer function;
constructing a user image model by using an online learning algorithm to predict environmental protection behaviors or characteristics of a user according to new data points;
initializing parameters of a linear regression model, including weights and intercepts;
as new data points arrive, an online learning algorithm is performed to update model parameters, in online linear regression, the update rules use a gradient descent method, the model parameters update the formula:
wherein,is a parameter at t+1 at time step, < >>Is a parameter at time step t, alpha is a learning rate, y (t) Is the actual output, h (x (t) ) Is a model for input x (t) Is predicted by the computer;
along with the continuously arrived new data points, the model parameters are updated in real time according to an online learning algorithm;
the output of the model is used to update the user representation database in real time and then update the prediction results into the user representation database.
Further: the step 4 comprises the following steps:
establishing a user portrait database, wherein the database comprises user identification, characteristics, preferences and environmental protection behavior fields;
an interface for querying and updating the user portraits is implemented;
outputting a user portrait model;
when a new data point arrives and is processed by the model, the output of the model is used to update the representation database record for the corresponding user, and the update logic uses the following formula:
Characteristics (t+1) =Characteristics (t) +α×ΔCharacteristics
Preferences (t+1) =Preferences (t) +α×ΔPreferences
EnvironmentalBehavior (t+1)
=EnvironmentalBehavior (t) +α×ΔEnvironmentalBehavior
wherein, characacteristics (t+1) Representing the user Characteristics updated at time t+1, alpha is the update rate parameter, and ΔCharacteristics represents the user Characteristics of the model outputThe point changes.
Further: the step 5 comprises the following steps:
a data transmission encryption technology is used to ensure the security in the data transmission process;
encrypting the sensitive field to ensure that only authorized users have access to the plaintext data;
implementing an access control policy to ensure that only authorized personnel can access the user profile data;
requiring the user to provide valid credentials to verify his identity;
allocating proper roles and authorities for different users to limit the access scope;
for user data that does not need to be complete, data desensitization is performed to reduce privacy risks;
a monitoring and auditing mechanism is established to track access and use of user profile data.
Compared with the prior art, the invention has the following technical progress:
aiming at the problem of real-time data processing, the method allows the construction of the real-time user portrait, ensures that the user portrait data is always in the latest state, and can capture the changes of user behaviors and environmental protection data more quickly compared with the traditional batch processing method. Because the data is updated in real time, the user portrait is more accurate and is closer to the current behavior and environmental protection characteristics of the user, which means that personalized advice and service are more accurate, thereby improving the satisfaction degree of the user. The method supports real-time decision making, so that enterprises can take urgent measures according to the latest user behaviors and environmental protection data, and the efficiency and effect of environmental protection decision making are improved. By providing real-time user portrayal data, the interaction between the user and the service provider is enhanced, the user can view his personal data, get real-time suggestions, and provide feedback, which promotes the user's active participation and interaction.
The method ensures that the data feedback loop is faster and more effective, and the feedback provided by the user can be rapidly reflected in the user portrait, thereby improving the data quality and the user satisfaction. Real-time data processing allows enterprises to monitor environmental data in time for compliance and regulatory requirements to ensure compliance with regulations and regulations. Enterprises with real-time user portraits have a competitive advantage over competitors in that they can provide more innovative and real-time environmental solutions that meet customer needs.
In a word, compared with the traditional batch processing method, the method is more suitable for application scenes in which the problem of real-time data processing is required to be solved, and emphasizes real-time performance, individuation, user interaction and decision support, so that the method has obvious advantages in aspects of user satisfaction, data quality, competitive advantage and the like.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.
In the drawings:
FIG. 1 is a flow chart of the present invention.
Detailed Description
The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
As shown in fig. 1, the invention discloses a user image building method based on various environment-friendly data sources, which comprises the following steps:
step 1: data collection and integration
Real-time data, such as sensor data, social media data, energy consumption data, etc., from different environmental data sources are collected, connected to these data sources using data collectors or APIs, the data is aggregated into a central data store, such as a data lake or data warehouse, the data of the different data sources is integrated into a standardized data model, ensuring consistency and comparability of the data.
Step 2: real-time data stream processing
The real-time data processing pipeline is built using stream processing techniques, data stream processing logic is designed, including data cleansing, real-time feature extraction and data conversion, and the real-time data stream is transferred to the processing logic using a stream processing engine to process and analyze the data.
Step 3: feature engineering and model training
In real-time data streams, feature engineering is performed to extract key features from the environmental data to describe the user's environmental behavior and features, and machine learning, deep learning, or statistical analysis techniques are used to construct a user representation model that can update the user representation in real-time, using appropriate algorithms to train the model, such as online learning algorithms, to process new data that is continually arriving.
Step 4: real-time user image update
A real-time user portrayal database is built to store the characteristics, preferences and environmental behavior of the user, and the user portrayal database is updated using the output of the user portrayal model, which should support fast read and write operations to enable query and update of the real-time user portrayal.
Step 5: data security and privacy protection
When processing real-time data, the security and privacy of the data are ensured, the security of the data during transmission and storage is protected by using a data encryption technology, and an access control strategy is implemented to ensure that only authorized personnel can access the user portrait data.
The method uses a real-time data stream processing technology to enable the user portraits to be updated in real time so as to reflect the latest environment-friendly behavior and characteristics of the user, and the real-time user portraits method can help enterprises to better understand the user, provide personalized suggestions, promote the environment-friendly behavior and meet the demands of the user.
Specifically, step 1 includes:
and (3) data source identification: first, environmental data sources, such as sensors, social media platforms, energy monitoring systems, etc., that need to be collected are determined, and the type and format of the data sources are identified.
Data connection and acquisition: using an appropriate data collector or API, a connection is established with each data source, and real-time data may be obtained using a programming language (e.g., python, java) or a specialized data collection tool. For example, the social media data may be acquired using the Weibo API, or the sensor data may be acquired using a sensor connection to an Internet of things platform.
And (3) data transmission: the acquired data needs to be transferred to a central data storage location, which may be a data lake, data warehouse or real-time data flow platform, with the particular choice depending on the system requirements, real-time data flow platforms are generally more suitable for processing real-time data for large-scale data.
Data format normalization: different data sources may use different data formats and structures, and therefore, the data needs to be standardized to ensure consistency and comparability of the data, including converting the data into a common data format such as JSON or part, prior to integration of the data into a central store.
Data cleaning and verification: and performing data cleaning operation, including removing duplicate data, filling up missing values and checking the integrity of the data to ensure the quality of the data, and detecting data anomalies by using data verification rules and logic.
And (3) data storage: the cleaned and standardized data is stored and an appropriate database or storage engine may be selected to store the data, such as HadoopHDFS, amazonS, or relational databases, etc.
Metadata record: metadata of the recorded data, including information of data source, data acquisition time, data type and the like, is helpful for data management and tracking of the source of the data.
Real-time requirements: considering the real-time requirements of data, if real-time user image is required, the data transmission and processing process is ensured to be efficient and quick so as to realize minimum delay.
Through the steps, a data collection and integration pipeline can be established, and real-time data from different environment-friendly data sources are integrated into a central storage, so that subsequent real-time data processing and user portrait establishment are facilitated, the consistency, quality and usability of the data are ensured, and the accuracy and the instantaneity of the user portrait are supported.
Specifically, step 2 includes:
real-time data processing pipeline was established using Apache Spark Streaming:
and (3) setting the environment: first, the Apache Spark Streaming environment is installed and configured to ensure that Spark clusters, including master and working nodes, have been properly set, and also to select the appropriate data source connectors, such as Kafka connectors or custom connectors.
Accessing a data stream source: the real-time data stream is introduced into the SparkStreaming application using a suitable data stream source connector, such as Kafka, flume or custom streaming data source, which may be the integrated data store in the previous step.
Real-time data processing logic: writing data processing logic in a Spark Streaming application program, wherein the real-time processing process comprises the following steps:
step 2.1: data extraction
The meter information is extracted from a meter information table in an energy management System (SGPM) that records basic information of the meter, including, but not limited to, a meter ID, a user ID (ons_id), a meter type, a metering point ID (mp_id), a meter coefficient (t_factor), and the like. In the electric power system, ammeter information is key data for managing electricity consumption, and is used for tracking and recording electricity consumption conditions of users and carrying out electric energy metering and data analysis.
Ammeter information is extracted from ammeter information tables in an energy Data Warehouse (DWH) and is associated with ammeter information tables in an energy management system. This table is used to store information about the electricity meter in the electrical energy data warehouse. Electrical energy data warehouses are commonly used to centrally store, manage, and analyze electrical energy data to support operation and decision making of electrical power systems. The ammeter information table in the energy Data Warehouse (DWH) may contain important data such as identification information, technical parameters, installation positions and the like of the ammeter, so that the system can more effectively manage, monitor and analyze the electric energy.
Electricity meter data is extracted from a daily electricity price chart in an energy Data Warehouse (DWH), which provides daily electricity price information, typically used to support electricity billing, cost analysis, and assessment of consumer electricity usage behavior. In this view, the following information is contained:
date (time): the day of the day was recorded.
Ammeter ID (meter_id): a unique identifier associated with the electrical meter.
The electrical value information of different time periods, such as peak period, normal period, valley period, etc.
Through the table, a user can know electricity price conditions of different time periods every day, and reasonable electricity utilization planning and cost control are facilitated. This information has important reference value for both economic dispatch of the power system and analysis of the consumer's electricity usage behavior.
The main Metering Point information is extracted from a Metering Point (Metering Point) information table in an energy management System (SGPM). This table records information about metering points in the power management system, including, but not limited to, metering point ID (mp_id), metering point name, metering point level (mp_level), and the like.
In an electrical power system, a metering point is a specific location or device for measuring electrical energy. The Metering points can be used for monitoring the use condition of electric energy, collecting data, supporting electric energy Metering and charging, and the information of the Metering points is stored in a Metering Point information table (SGPM) in an energy management System (SGPM) so as to provide basic data for the system to manage and analyze the use condition of the electric energy.
Step 2.2: inserted into temporary watch
And inserting the extracted data into a temporary table for storing ammeter reading data and acquiring intermediate results of table bottom data. The main purpose of this table is to store intermediate calculations during data processing so that subsequent steps can use the data for statistics and analysis.
The fields contained in the table are:
user ID (ons_id): a unique identifier of the user to which the electricity meter belongs.
Meter reading date (recode_date): the date of the meter reading is recorded.
Electric energy parameter values (pap1_value, pap2_value, etc.): values of various parameters representing the electrical energy may be used to calculate statistical information such as the electrical energy.
Ammeter ID (meter_id): a unique identifier of the electricity meter.
Conversion coefficient (t_factor): and the coefficient is used for converting the electric energy parameter value into the actual electric quantity.
Data source system flag (from_sys): a system for identifying a source of data.
Meanwhile, collecting meter bottom data of an ammeter is extracted from a daily meter reading data table of the electric energy in the electric energy management system, is associated with a Metering Point information table and a gy_ons table in the SGPM, and is used for screening data of the last two days. This table stores daily electricity usage data for each meter, including information on power, quantity, etc.
The fields included are:
user ID (ons_id): a unique identifier of the user to which the electricity meter belongs.
Date of data (data_date): the date of the meter reading is recorded.
Electrical quantity parameter values (pap_r, pap_r1, etc.): representing the electrical power values for different time periods of the day.
Data identification (id): a unique identifier of the data is recorded.
Conversion coefficient (t_factor): and the coefficient is used for converting the electric quantity parameter value into the actual electric quantity.
Through the meter, the system can acquire the daily electricity consumption condition of each ammeter, and basic data are provided for electric energy metering, cost calculation and electricity consumption analysis.
Step 2.3: statistics of daily electrical quantity data
A table named daily electricity amount statistics is created for storing the counted daily electricity amount data. This table is mainly used for recording daily power statistics of each user, including total power (t_pq), PEAK power (peak_pq), FLAT power (flat_pq), gu Dianliang (valley_pq), etc.
The fields included are:
user ID (conc_id): an identifier uniquely identifying the user.
User number (CONS_NO): a unique number in the power system for identifying the user.
Statistical DATE (record_date): the date of the statistics was recorded.
Total charge (t_pq): the total daily power consumption of the user.
PEAK charge (peak_pq): the peak hours of the user daily use electricity.
FLAT charge (flag_pq): the user uses electricity daily at ordinary times.
Gu Dianliang (valley_pq): the user uses electricity at the valley time of each day.
Data source system (from_sys): a system for identifying a source of data.
Through the meter, the system can conveniently inquire and analyze the daily electric quantity condition of each user, and support the monitoring of electricity consumption behavior and the electric energy management decision. These statistics have important reference values for both power system operation and customer service.
And calculating the data difference of the electric meter before and after the day, and multiplying the data difference by the conversion coefficient of the electric meter to obtain the statistical information such as the total electric quantity, the peak Gu Pingdian quantity and the like of each user every day.
Step 2.4: rejecting abnormal data and data supplementation
Checking whether data in a table base data table exists on the same day, and if not, executing the following steps:
backing up the data of the current day in the table bottom data table to the table bottom data table;
deleting the data in the table base data table of the current day;
and updating statistical information in a summary result table for storing the daily power data of the enterprise, calculating daily power consumption, peak Gu Pingdian amount and the like of each user, and updating corresponding proportion.
The table bottom data table is mainly used for storing power data in enterprise electricity utilization information. The following are the fields that may be included in this table and their meanings:
user ID (ID): a unique identifier of the enterprise user.
Electric household number (ons_no): a unique identification number of the enterprise user in the power system.
Data time (data_time): time of recording power data.
Forward active power indication (pap_e): an indication for indicating the forward active power.
Forward reactive power indication (pap_a): for indicating the forward reactive power.
Reverse reactive power indication (pap_b): an indication for indicating reverse reactive power.
Conversion coefficient (t_factor): for converting the indication of electrical energy into a coefficient of actual power.
Date of data origin (ds): the date of the data source is identified.
Through the meter, the system can store and manage the power indication value of each time point of the enterprise user, and basic data is provided for electric energy metering, analysis and monitoring.
The fields contained in this table and their meanings are used to store the enterprise daily power data summary results table:
name of business (ons_name): name or identification of the enterprise user.
Statistical date (ds): the date of the statistics is summarized.
Data time (data_time): a specific statistical time point.
Total active power (pap_e): total daily active power of the enterprise.
Data source system (from_sys): a system for identifying a source of data.
This table provides total active power data for each time point by summarizing the business daily power data. The summary is helpful for the system to monitor the overall condition of the enterprise electricity load and support the electricity load management and the statistical analysis.
Step 2.5: from enterprise aggregate data
The data in the daily power data summary result table for the storage enterprise on the same day is deleted.
The recalculated statistics of total power per day, peak Gu Pingdian, etc. for each user are inserted.
The update is used for storing a daily power data summary result table of the enterprise.
Step 2.6: building base data from a model
The data in the daily power data summary result table of the enterprise for storing the current day is deleted.
Statistical information such as total daily power, peak Gu Pingdian and the like of each enterprise calculated according to the model is inserted.
The update is used for storing a daily power data summary result table of the enterprise.
Through the steps, the codes realize comprehensive processing of the electric power consumption information, including data extraction, statistics and cleaning, and build summary data tables with different dimensions.
For storing a table of aggregate results of daily power data for an enterprise, the fields contained in this table and their meanings:
name of business (ons_name): name or identification of the enterprise user.
Statistical date (ds): the date of the statistics is summarized.
Total active power (pap_e): total daily active power of the enterprise.
The table provides daily total active power data by summarizing the daily power data of the enterprise, and the summarization is helpful for the system to monitor the overall condition of the power load of the enterprise and support power load management and statistical analysis.
Data stream processing engine: starting Spark Streaming application, which will run on a real-time data stream and continue to process the arriving data, spark Streaming provides a mechanism for micro-batching, which can process the data stream at regular time intervals (e.g., every second), thus achieving real-time processing.
And (3) real-time data output: the processed real-time data can be output to different targets, such as a user portrait database, a real-time data dashboard, a visualization tool or other application programs, so as to ensure that the data output meets the requirements of user portraits.
Using Apache Spark Streaming, such sliding window calculations can be easily implemented to process real-time data streams and extract useful information for user portrayal construction.
Specifically, step 3 includes:
feature extraction: feature engineering is performed from the real-time data stream to extract key features for describing the user's environmental behavior and features, which may include the user's energy consumption, social media activity, environmental activity participation, etc., the feature extraction includes the following formulas:
Feature i =f(x i )
wherein x is i Representing the characteristic value, f (x i ) Various feature transfer functions are possible, such as normalization, feature scaling, etc.
Online learning algorithm selection: an appropriate online learning algorithm is selected to construct the user representation model, and online linear regression can be used to build the user representation model to predict the user's environmental behavior or characteristics from the new data points.
Model initialization: parameters of the linear regression model are initialized, including weights (coeffients) and intercept (intercept).
On-line learning: as new data points arrive, an online learning algorithm is performed to update model parameters, in online linear regression, the update rules may use Gradient Descent (Gradient device):
model parameter updating formula:
is the actual output, h (x (t) ) Is a model for input x (t) Is a prediction of (2).
Updating a real-time model: with new data points continuously arrived, model parameters are updated in real time according to an online learning algorithm. This enables the user portrayal model to adapt to new data and user behavior in time.
Updating a user portrait database: the output of the model may be used to update the user representation database in real-time. For example, the model may predict the energy consumption or environmental behavior of the user and then update these predictions into the user profile database.
Through this step, a real-time user portrayal model can be built, which can be updated in real time according to new data points in the real-time data stream to better describe the environment-friendly behavior and characteristics of the user, and the online learning algorithm allows the model to maintain high efficiency and real-time when processing new data that is continuously arrived.
Specifically, step 4 includes:
database model: a user profile database is created, which may use a relational database (e.g., mySQL) or a NoSQL database (e.g., mongo db). The database should include fields for user identification, characteristics, preferences, environmental protection actions, etc. The following is an example database model:
user ID (UserID)
User Characteristics (Characteristics)
User preference (preference)
Environmental protection behavior (environmental Behavior)
User portrayal query and update interface: interfaces for querying and updating user portraits are implemented, which should be able to support fast read and write operations to meet the needs of real-time user portraits.
Output of user portrayal model: the output of the user representation model in step 3, such as the user's environmental behavior prediction results, may be used to update the user representation database in real-time.
Real-time update logic: when a new data point arrives and is processed by the model, the output of the model is used to update the representation database record of the corresponding user, and the specific update logic uses the following formula:
Characteristics (t+1) =Characteristics (t) +α×ΔCharacteristics
Preferences (t+1) =Preferences (t) +α×ΔPreferences
EnvironmentalBehavior (t+1)
=EnvironmentalBehavior (t) +α×ΔEnvironmentalBehavior
wherein, characacteristics (t+1) Representing the user Characteristics updated at time t+1, α being the update rate parameter, ΔCharacteristics representing the use of model outputThe user characteristics vary.
Real-time user image query: the user representation database supports real-time queries to retrieve user representations based on user IDs, which allows other systems or applications to access user representation data at any time.
Through the step, a real-time user portrait database can be established, and can be updated in real time according to the environmental protection behaviors, characteristics and preferences of the user so as to meet the real-time requirements of the user portrait, so that the user portrait is kept up to date to support the promotion of personalized suggestions and environmental protection behaviors.
Specifically, step 5 includes:
encryption of data transmission: transport layer security protocol (TLS) or other suitable data transmission encryption techniques are used to ensure security during data transmission, TLS prevents data from being stolen or tampered with during transmission by encrypting the data transmission channel.
Data storage encryption: when user portrait data is stored in a database, the stored data is encrypted by adopting a data encryption technology. The data is protected against unauthorized access using an appropriate encryption algorithm, and the encrypted data storage may be implemented by:
data field level encryption: sensitive fields (e.g., user characteristics, environmental protection behavior) are encrypted to ensure that only authorized users can access the plaintext data.
Database-level encryption: the entire database may be encrypted to protect the entire user profile database so that even if the database file is stolen, sensitive information cannot be easily accessed.
Access control policy: an access control policy is implemented to ensure that only authorized personnel can access the user profile data. This may be achieved by using authentication and authorization mechanisms, in particular the following policies may be used:
user authentication: the user is required to provide valid credentials (username and password, API key, etc.) to verify their identity.
Role and rights management: appropriate roles and permissions are assigned to different users to limit their scope of access. For example, only an administrator can access and modify all data, whereas an average user can only access his own data.
Data desensitization: for user data that does not need to be complete, data desensitization can be performed to reduce privacy risks. Data desensitization is a method to protect user privacy by removing or replacing portions of the content of sensitive data, e.g., desensitizing a real name to a simulated name.
Monitoring and auditing: monitoring and auditing mechanisms are established to track access and use of user profile data so that potential security threats or unauthorized access can be detected.
Compliance management: applicable privacy regulations and laws, such as GDPR (european general data protection regulations), are followed to ensure data processing compliance, taking into account user consent and data deletion rights in the data processing.
Through the steps, the security and privacy protection of the user portrait data during transmission, storage and access can be ensured, and the method is a key step for establishing user trust, compliance and data security.
Step 6 may also be included:
in step 6, a user interface or data visualization dashboard is built to present real-time user portrait data, let the user view his personal user portraits and environmental advice, and allow the user to provide feedback, how to implement the method of this step is as follows:
data visualization interface: a user-friendly data visualization interface or dashboard was developed for presenting real-time user image data, the interface should include the following elements:
1. user personal information: the user's identification information is displayed to ensure that the user is viewing his own user portraits.
2. The user characteristics are as follows: the characteristics of the user, such as energy consumption, environmental protection behavior and the like, are displayed in a graph or table form.
3. Environmental protection advice: providing user personalized environment-friendly advice based on portrait data.
4. User feedback button: allowing the user to provide feedback, such as reporting inaccurate data or making improvement suggestions.
Updating real-time data: ensuring that the data visualization interface can acquire and display the latest user portrait data in real time, and realizing the real-time query and connection with the user portrait database in the step 4.
Personalized environmental protection advice: the use of user portrait data to generate personalized environmental protection suggestions may be implemented based on an algorithm or rules engine. For example, if the user's energy consumption is high, the system may recommend energy saving measures be taken.
User feedback mechanism: the user feedback button is implemented so that the user can provide feedback, information that the user can provide feedback, such as:
1. inaccurate data: if the user finds that the data is inaccurate, they can report the problem and provide the correct information.
2. Improvement advice: the user may make suggestions to improve the accuracy and relevance of the user profile data.
Feedback processing: feedback processing mechanisms are designed to ensure that received feedback is processed in time, which can be forwarded to a data management team for data modification or system improvement.
User notification: sending notifications to users to inform them that they can view their individual user portraits and provide feedback can be accomplished through in-application notifications, email or text messages.
User privacy protection: in the real-time data visualization interface, user privacy is guaranteed to be respected, and sensitive information such as a complete address or telephone number is not displayed so as to protect the user privacy.
Through the steps, a user-friendly data visualization interface can be established, so that a user can view personal user portrait data in real time, obtain environment-friendly suggestions and provide feedback to continuously improve the accuracy and the relevance of the user portrait. This helps the user to better understand their environmental behavior and needs while also promoting the user's active participation and providing feedback.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and the present invention is not limited thereto, but may be modified or substituted for some of the technical features described in the foregoing embodiments by those skilled in the art, even though the present invention has been described in detail with reference to the foregoing embodiments. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (6)

1. A user image establishing method based on various environment-friendly data sources is characterized by comprising the following steps:
step 1: collecting real-time data from different power environment-friendly data sources, integrating the data of the different power data sources into a standardized data model, and ensuring the consistency and comparability of the data;
step 2: establishing a real-time data processing pipeline, designing data stream processing logic, and transmitting the real-time data stream to the processing logic by using a stream processing engine so as to process and analyze data;
step 3: in the real-time data stream, executing feature engineering, extracting key features from environment-friendly data to describe environment-friendly behaviors and features of users, and constructing a user portrait model to update user portraits in real time;
step 4: establishing a real-time user portrayal database to store characteristics, preferences and environmental protection behaviors of the user, and updating the user portrayal database by using the output of the user portrayal model;
step 5: an access control policy is implemented to ensure that only authorized personnel can access the user profile data.
2. The method for creating a user image based on multiple environmental protection data sources according to claim 1, wherein the step 1 comprises:
determining environmental protection data sources to be collected, and identifying the type and format of the data sources;
before integrating the data into the central store, the data is standardized to ensure consistency and comparability of the data;
performing data cleaning operation, including removing repeated data, filling missing values and checking the integrity of the data so as to ensure the quality of the data;
metadata of the data is recorded, including data source, data acquisition time and data type information.
3. The method for creating a user image based on multiple environmental protection data sources according to claim 2, wherein said step 2 comprises:
installing and configuring an ApacheSparkstreaming environment to ensure that Spark clusters are correctly set, wherein the ApacheSparkstreaming environment comprises a main node and a working node;
introducing a real-time data stream into a SparkStreaming application using a data stream source connector;
writing data processing logic in a SparkStreaming application program;
starting a SparkStreaming application program, which runs on a real-time data stream and continuously processes the arrived data; processing the data stream according to a fixed time interval, thereby realizing real-time processing;
the processed real-time data are output to different targets, so that the data output is ensured to meet the requirements of user images;
the real-time processing process comprises the following steps:
extracting ammeter information from an ammeter information table in the energy management system, wherein the ammeter information comprises ammeter IDs, user IDs, ammeter types, metering point IDs and ammeter coefficients;
extracting ammeter information from an ammeter information table in an energy data warehouse and associating the ammeter information with the ammeter information table in an energy management system;
extracting ammeter data including power information from a solar electricity price chart in an energy data warehouse, and screening data of the last two days;
extracting main metering point information from a metering point information table in an energy management system;
inserting the extracted data into a temporary table, wherein the temporary table comprises an ammeter ID, main metering point information and meter reading time;
meanwhile, collecting meter bottom data of an ammeter is extracted from a daily meter reading data table of electric energy in the electric energy management system, is associated with a metering point information table in the energy management system and a user information table in the energy management system, and data of the last two days are screened;
creating a table for storing daily electricity quantity statistics results and storing the counted daily electricity quantity data;
calculating the data difference value of the ammeter in the front and the rear days, and multiplying the data difference value by the conversion coefficient of the ammeter to obtain the total electric quantity and peak valley flat electric quantity statistical information of each user in each day;
checking whether data in a table base data table exists on the same day, and if not, executing the following steps;
backing up the data of the current day in the table bottom data table to the table bottom data table;
deleting the data in the table base data table of the current day;
updating statistical information in a summary result table for storing the daily power data of the enterprise, calculating the daily power consumption and the peak Gu Pingdian of each user, and updating the corresponding proportion;
deleting the data in the daily power data summarization result table of the enterprise on the same day;
inserting the recalculated total electric quantity, peak-valley average electric quantity statistical information of each user every day;
updating a daily power data summarization result table for storing enterprises;
deleting the data in the daily power data summarization result table of the enterprise on the same day;
inserting the total daily electric quantity, peak valley flat electric quantity statistical information of each enterprise calculated according to the model;
updating a daily power data summarization result table for storing enterprises;
the processed real-time data is output to different targets, so that the data output is ensured to meet the requirements of user images.
4. A method for creating a user image based on multiple environmental data sources as claimed in claim 3, wherein said step 3 comprises:
feature engineering is performed from the real-time data stream to extract key features for describing the user's environmental behavior and features, the feature extraction including the following formulas,
Feature i =f(x i )
wherein x is i Representing the characteristic value, f (x i ) Is a feature transfer function;
constructing a user image model by using an online learning algorithm to predict environmental protection behaviors or characteristics of a user according to new data points;
initializing parameters of a linear regression model, including weights and intercepts;
as new data points arrive, an online learning algorithm is performed to update model parameters, in online linear regression, the update rules use a gradient descent method, the model parameters update the formula:
wherein,is a parameter at t+1 at time step, < >>Is a parameter at time step t, alpha is a learning rate, y (t) Is the actual output, h (x (t) ) Is a model for input x (t) Is predicted by the computer;
along with the continuously arrived new data points, the model parameters are updated in real time according to an online learning algorithm;
the output of the model is used to update the user representation database in real time and then update the prediction results into the user representation database.
5. The method for creating a user image based on multiple environmental data sources according to claim 4, wherein said step 4 comprises:
establishing a user portrait database, wherein the database comprises user identification, characteristics, preferences and environmental protection behavior fields;
an interface for querying and updating the user portraits is implemented;
outputting a user portrait model;
when a new data point arrives and is processed by the model, the output of the model is used to update the representation database record for the corresponding user, and the update logic uses the following formula:
Characteristics (t+1) =Characteristics (t) +α×ΔCharacteristics
Preferences (t+1) =Preferences (t) +α×ΔPreferences EnvironmentalBehavior (t+1)
=EnvironmentalBehavior(t)+α×ΔEnvironmentalBehavior
wherein, characacteristics (t+1) Representing the user Characteristics updated at time t+1, α is the update rate parameter, and ΔCharacteristics represents the user Characteristics change of the model output.
6. The method for creating a user image based on multiple environmental data sources according to claim 5, wherein said step 5 comprises:
a data transmission encryption technology is used to ensure the security in the data transmission process;
encrypting the sensitive field to ensure that only authorized users have access to the plaintext data;
implementing an access control policy to ensure that only authorized personnel can access the user profile data;
requiring the user to provide valid credentials to verify his identity;
allocating proper roles and authorities for different users to limit the access scope;
for user data that does not need to be complete, data desensitization is performed to reduce privacy risks;
a monitoring and auditing mechanism is established to track access and use of user profile data.
CN202311850339.6A 2023-12-28 2023-12-28 User portrait establishment method based on various environment-friendly data sources Withdrawn CN117786224A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311850339.6A CN117786224A (en) 2023-12-28 2023-12-28 User portrait establishment method based on various environment-friendly data sources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311850339.6A CN117786224A (en) 2023-12-28 2023-12-28 User portrait establishment method based on various environment-friendly data sources

Publications (1)

Publication Number Publication Date
CN117786224A true CN117786224A (en) 2024-03-29

Family

ID=90392613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311850339.6A Withdrawn CN117786224A (en) 2023-12-28 2023-12-28 User portrait establishment method based on various environment-friendly data sources

Country Status (1)

Country Link
CN (1) CN117786224A (en)

Similar Documents

Publication Publication Date Title
US10949170B2 (en) Data processing systems for integration of consumer feedback with data subject access requests and related methods
US10564935B2 (en) Data processing systems for integration of consumer feedback with data subject access requests and related methods
US11036674B2 (en) Data processing systems for processing data subject access requests
US20220019693A1 (en) Data processing systems for generating and populating a data inventory
US10430740B2 (en) Data processing systems for calculating and communicating cost of fulfilling data subject access requests and related methods
US10204154B2 (en) Data processing systems for generating and populating a data inventory
US10360193B2 (en) Method and apparatus for smart archiving and analytics
Mytton et al. Sources of data center energy estimates: A comprehensive review
Perera et al. Twitter analytics: Architecture, tools and analysis
US8504668B2 (en) System and method for managing delivery of public services
US20190180052A1 (en) Data processing systems for processing data subject access requests
US20150127601A1 (en) System and method for improving and managing smart grid unread meter investigations
US20190086878A1 (en) Systems and methods for determining baseline consumption
CN113722301A (en) Big data processing method, device and system based on education information and storage medium
US10776517B2 (en) Data processing systems for calculating and communicating cost of fulfilling data subject access requests and related methods
Ke et al. Big data, big change: In the financial management
KR20200139960A (en) A Platform System for Influencer Based Marketing Using Block Chain
CN112950343A (en) Enterprise financial data acquisition and processing method and system
CN109286613A (en) Control system is led in a kind of monitoring of network public-opinion
TW202230233A (en) Method, computing device and system for profit sharing
US20150081570A1 (en) Customer preference management and notification systems
Gagliardelli et al. A big data platform exploiting auditable tokenization to promote good practices inside local energy communities
Khurshid et al. Big data-9vs, challenges and solutions
CN117786224A (en) User portrait establishment method based on various environment-friendly data sources
US20110119199A1 (en) Facility Resource Consumption Estimator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20240329