CN116522227A - Charging user image drawing method and device based on multi-label data, medium and equipment - Google Patents

Charging user image drawing method and device based on multi-label data, medium and equipment Download PDF

Info

Publication number
CN116522227A
CN116522227A CN202310483328.2A CN202310483328A CN116522227A CN 116522227 A CN116522227 A CN 116522227A CN 202310483328 A CN202310483328 A CN 202310483328A CN 116522227 A CN116522227 A CN 116522227A
Authority
CN
China
Prior art keywords
data
charging
analyzed
user
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310483328.2A
Other languages
Chinese (zh)
Inventor
朱彬
胡晓锐
龙羿
黄会
胡文
邓雯玲
李顺
何珉
李智
徐婷婷
孙正凯
许珂
刘一畔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Chongqing Electric Power Co Ltd
Marketing Service Center of State Grid Chongqing Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Chongqing Electric Power Co Ltd
Marketing Service Center of State Grid Chongqing Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Chongqing Electric Power Co Ltd, Marketing Service Center of State Grid Chongqing Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202310483328.2A priority Critical patent/CN116522227A/en
Publication of CN116522227A publication Critical patent/CN116522227A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请公开了一种基于多标签数据的充电用户画像方法及装置、介质和设备。方法包括:获取多个充电用户的原始充电数据,并对原始充电数据进行预处理,得到多个预设属性对应的数据字段;分别针对每个预设属性对应的数据字段进行聚类操作,得到预设属性对应的聚类中心以及每个聚类中心对应的画像标签;获取待分析用户的待分析充电数据,并根据待分析充电数据在画像标签中确定待分析用户的目标标签,根据目标标签生成待分析用户的用户画像。本申请的方法解决了现有方法调查问卷真实性不强,硬聚类无法精确到用户,导致的对用户认识不够深入,不能够充分理解用户充电行为的问题。

The present application discloses a charging user portrait method and device, medium and equipment based on multi-label data. The method includes: obtaining the original charging data of multiple charging users, and preprocessing the original charging data to obtain data fields corresponding to multiple preset attributes; respectively performing clustering operations on the data fields corresponding to each preset attribute to obtain The cluster center corresponding to the preset attribute and the portrait label corresponding to each cluster center; obtain the charging data of the user to be analyzed, and determine the target label of the user to be analyzed in the portrait label according to the charging data to be analyzed, according to the target label Generate user portraits of users to be analyzed. The method of the present application solves the problem that the authenticity of the survey questionnaire in the existing method is not strong, and the hard clustering cannot be accurate to the user, which leads to insufficient understanding of the user and the inability to fully understand the charging behavior of the user.

Description

基于多标签数据的充电用户画像方法及装置、介质、设备Charging user portrait method and device, medium, and equipment based on multi-label data

技术领域technical field

本申请涉及电动汽车领域,尤其是涉及到一种基于多标签数据的充电用户画像方法及装置、介质和设备。The present application relates to the field of electric vehicles, in particular to a charging user portrait method and device, medium and equipment based on multi-label data.

背景技术Background technique

随着新能源汽车规模的不断扩展,将在不久的将来,海量的电动汽车(EV:Electric Vehicle)将接入电力系统,这将给电力系统带来更多的不确定性。对此,研究用户充电行为数据挖掘其背后的规律将有利于配电网管理优化。在用户行为研究领域中,用户画像作为描述用户属性、偏好、行为习惯的一种从数据中抽象出来的标签化用户模型,在电商领域得到了广泛应用,企业可通过用户画像更深入的了解每个用户的需求,从而制定个性化服务。With the continuous expansion of new energy vehicles, a large number of electric vehicles (EV: Electric Vehicle) will be connected to the power system in the near future, which will bring more uncertainties to the power system. In this regard, studying the laws behind user charging behavior data mining will be beneficial to the optimization of distribution network management. In the field of user behavior research, user portraits, as a labeled user model abstracted from data to describe user attributes, preferences, and behavior habits, have been widely used in the field of e-commerce. Enterprises can use user portraits to gain a deeper understanding The needs of each user, so as to develop personalized services.

在新能源汽车领域,用户画像一般是基于大数据或问卷调查,由于问卷调查问卷问题设置具有引导性,以及反馈率不高,真实性不强,并不能作为一种稳定的画像方法,而基于大数据的画像方法主要是通过基于划分的硬聚类完成用户分类,通常是选用多类属性合成多维数据进行聚类,最后得出的结果是用户簇和簇内数量,但并没有精确到用户,单个用户的历史数据可能出现在多个簇中。In the field of new energy vehicles, user portraits are generally based on big data or questionnaire surveys. Due to the instructive setting of questionnaire questions, low feedback rate, and weak authenticity, it cannot be used as a stable portrait method. The portrait method of big data mainly completes user classification through hard clustering based on division. Usually, multi-category attributes are used to synthesize multi-dimensional data for clustering. The final result is the user cluster and the number of clusters, but it is not accurate to the user. , the historical data of a single user may appear in multiple clusters.

发明内容Contents of the invention

有鉴于此,本申请提供了一种基于多标签数据的充电用户画像方法及装置、介质和设备,解决了现有方法调查问卷真实性不强,硬聚类无法精确到用户,导致的对用户认识不够深入,不能够充分理解用户充电行为的问题。In view of this, this application provides a charging user portrait method and device, medium and equipment based on multi-label data, which solves the problem that the authenticity of the questionnaire in the existing method is not strong, and the hard clustering cannot be accurate to the user, which leads to the lack of confidence in the user. The understanding is not deep enough to fully understand the problem of user charging behavior.

根据本申请的一个方面,提供了一种基于多标签数据的充电用户画像方法,包括:According to one aspect of the present application, a charging user portrait method based on multi-label data is provided, including:

获取多个充电用户的原始充电数据,并对所述原始充电数据进行预处理,得到多个预设属性对应的数据字段;Acquiring original charging data of multiple charging users, and performing preprocessing on the original charging data to obtain data fields corresponding to multiple preset attributes;

分别针对每个所述预设属性对应的数据字段进行聚类操作,得到所述预设属性对应的聚类中心以及每个所述聚类中心对应的画像标签;performing a clustering operation on the data fields corresponding to each of the preset attributes to obtain the cluster centers corresponding to the preset attributes and the portrait labels corresponding to each of the cluster centers;

获取待分析用户的待分析充电数据,并根据所述待分析充电数据在所述画像标签中确定所述待分析用户的目标标签,根据所述目标标签生成所述待分析用户的用户画像。Acquiring charging data to be analyzed of the user to be analyzed, determining a target tag of the user to be analyzed in the portrait tag according to the charging data to be analyzed, and generating a user portrait of the user to be analyzed according to the target tag.

可选地,所述根据所述待分析充电数据在所述画像标签中确定所述待分析用户的目标标签,包括:Optionally, determining the target tag of the user to be analyzed in the portrait tag according to the charging data to be analyzed includes:

分别在所述待分析充电数据中提取与每个所述预设属性对应的待分析字段组,其中所述待分析字段组包括至少一个待分析字段以及每个所述待分析字段对应的动作发生时刻;Extracting a field group to be analyzed corresponding to each of the preset attributes from the charging data to be analyzed, wherein the field group to be analyzed includes at least one field to be analyzed and an action occurrence corresponding to each field to be analyzed time;

根据所述动作发生时刻确定每个所述待分析字段的时间衰减系数;determining the time decay coefficient of each field to be analyzed according to the moment when the action occurs;

针对每个所述预设属性,分别基于所述时间衰减系数计算所述待分析字段组与每个所述预设属性的画像标签的匹配度,并确定所述匹配度最高的画像标签为所述待分析用户在所述预设属性下的目标标签。For each of the preset attributes, calculate the matching degree between the field group to be analyzed and the portrait label of each of the preset attributes based on the time decay coefficient, and determine that the portrait label with the highest matching degree is the The target tags of the users to be analyzed under the preset attributes are described.

可选地,所述对所述原始充电数据进行预处理,包括:Optionally, the preprocessing of the raw charging data includes:

对所述原始充电数据进行数据清洗,得到清洗后数据,其中,所述数据清洗包括以下至少之一:异常数据清洗、重复数据清洗以及空值数据清洗;Data cleaning is performed on the original charging data to obtain data after cleaning, wherein the data cleaning includes at least one of the following: abnormal data cleaning, repeated data cleaning, and null data cleaning;

在所述清洗后数据中分别提取每个所述预设属性对应的数据字段,其中,所述预设属性包括时长属性、电量焦虑属性以及充电功率属性。Data fields corresponding to each of the preset attributes are respectively extracted from the cleaned data, wherein the preset attributes include duration attributes, power anxiety attributes, and charging power attributes.

可选地,所述在所述清洗后数据中分别提取每个所述预设属性对应的数据字段,包括:Optionally, the data fields corresponding to each of the preset attributes are respectively extracted from the cleaned data, including:

在所述清洗后数据中提取充电时长字段对应的数值,作为所述时长属性对应的数据字段;Extracting the value corresponding to the charging duration field from the cleaned data as the data field corresponding to the duration attribute;

在所述清洗后数据中提取起始剩余电量字段对应的数值以及结束剩余电量对应的数值,作为所述电量焦虑属性对应的数据字段;Extracting the value corresponding to the initial remaining power field and the value corresponding to the end remaining power field from the cleaned data, as the data field corresponding to the power anxiety attribute;

根据所述清洗后数据中的单次充电量字段对应的数值以及单次充电时长字段对应的数值,计算单桩充电平均功率,并将所述单桩充电平均功率作为所述充电功率属性对应的数据字段。According to the value corresponding to the field of single charging amount and the value corresponding to the field of single charging duration in the data after cleaning, calculate the average charging power of single pile, and use the average charging power of single pile as the corresponding value of the charging power attribute data field.

可选地,在所述清洗后数据中分别提取每个所述预设属性对应的数据字段之前,所述方法还包括:Optionally, before extracting data fields corresponding to each of the preset attributes from the cleaned data, the method further includes:

分别将每个所述清洗后数据转换为预设的数据格式;converting each of the cleaned data into a preset data format;

提取所述清洗后数据中的时间数据,并将所述时间数据转换为预设时间格式;extracting time data from the cleaned data, and converting the time data into a preset time format;

判断所述清洗后数据中是否缺失用户标识,若缺失,则利用校验数据补全所述用户标识,其中,所述校验数据为充电业务系统提供的数据。It is judged whether the user identification is missing in the cleaned data, and if it is missing, the user identification is supplemented by verification data, wherein the verification data is data provided by the charging service system.

可选地,所述分别针对每个所述预设属性对应的数据字段进行聚类操作,得到所述预设属性对应的聚类中心以及每个所述聚类中心对应的画像标签,包括:Optionally, the clustering operation is performed on the data fields corresponding to each of the preset attributes to obtain the cluster centers corresponding to the preset attributes and the portrait labels corresponding to each of the cluster centers, including:

分别将每个所述预设属性作为待聚类属性,所述待聚类属性对应的数据字段作为待聚类字段,并将每个所述待聚类字段映射为数学空间中的待聚类数据点;Each of the preset attributes is used as the attribute to be clustered, the data field corresponding to the attribute to be clustered is used as the field to be clustered, and each of the fields to be clustered is mapped to a mathematical space to be clustered data point;

设置所述待聚类属性对应的聚类簇数K,并在所述待聚类数据点中选取K个数据点作为聚类中心,将每个所述聚类中心作为一个簇;Set the number K of clusters corresponding to the attribute to be clustered, and select K data points as cluster centers in the data points to be clustered, and use each cluster center as a cluster;

分别计算每个所述待聚类数据点与每个所述聚类中心之间的距离,将所述待聚类数据点划分至距离最近的聚类中心对应的簇中;Calculate the distance between each of the data points to be clustered and each of the cluster centers, and divide the data points to be clustered into clusters corresponding to the nearest cluster centers;

根据每个簇中各数据点的坐标更新所述簇的聚类中心,并返回至分别计算每个所述待聚类数据点与每个所述聚类中心之间的距离的步骤,直至满足预设停止条件;Update the clustering centers of the clusters according to the coordinates of each data point in each cluster, and return to the step of calculating the distance between each of the data points to be clustered and each of the clustering centers, until satisfying Default stop condition;

根据每个所述聚类中心所在的簇中的数据点,确定所述聚类中心对应的画像标签。According to the data points in the cluster where each cluster center is located, the portrait label corresponding to the cluster center is determined.

可选地,所述设置所述待聚类属性对应的聚类簇数K,包括:Optionally, the setting the cluster number K corresponding to the attribute to be clustered includes:

预设多个簇数值,并分别计算在每个所述簇数值下,所述待聚类数据点对应的轮廓系数;preset a plurality of cluster values, and respectively calculate the silhouette coefficient corresponding to the data points to be clustered under each of the cluster values;

确定所述轮廓系数最大的簇数值为所述聚类簇数。Determine the cluster value with the largest silhouette coefficient as the number of clusters.

根据本申请的另一方面,提供了一种基于多标签数据的充电用户画像装置,所述装置包括:According to another aspect of the present application, a charging user portrait device based on multi-label data is provided, the device comprising:

预处理模块,用于获取多个充电用户的原始充电数据,并对所述原始充电数据进行预处理,得到多个预设属性对应的数据字段;A preprocessing module, configured to obtain the original charging data of multiple charging users, and preprocess the original charging data to obtain data fields corresponding to multiple preset attributes;

聚类模块,用于分别针对每个所述预设属性对应的数据字段进行聚类操作,得到所述预设属性对应的聚类中心以及每个所述聚类中心对应的画像标签;A clustering module, configured to perform clustering operations on the data fields corresponding to each of the preset attributes, to obtain the cluster centers corresponding to the preset attributes and the portrait labels corresponding to each of the cluster centers;

画像模块,用于获取待分析用户的待分析充电数据,并根据所述待分析充电数据在所述画像标签中确定所述待分析用户的目标标签,根据所述目标标签生成所述待分析用户的用户画像。The portrait module is used to obtain the charging data to be analyzed of the user to be analyzed, determine the target tag of the user to be analyzed in the portrait tag according to the charging data to be analyzed, and generate the user to be analyzed according to the target tag user portrait.

可选地,所述画像模块用于:Optionally, the portrait module is used for:

分别在所述待分析充电数据中提取与每个所述预设属性对应的待分析字段组,其中所述待分析字段组包括至少一个待分析字段以及每个所述待分析字段对应的动作发生时刻;Extracting a field group to be analyzed corresponding to each of the preset attributes from the charging data to be analyzed, wherein the field group to be analyzed includes at least one field to be analyzed and an action occurrence corresponding to each field to be analyzed time;

根据所述动作发生时刻确定每个所述待分析字段的时间衰减系数;determining the time decay coefficient of each field to be analyzed according to the moment when the action occurs;

针对每个所述预设属性,分别基于所述时间衰减系数计算所述待分析字段组与每个所述预设属性的画像标签的匹配度,并确定所述匹配度最高的画像标签为所述待分析用户在所述预设属性下的目标标签。For each of the preset attributes, calculate the matching degree between the field group to be analyzed and the portrait label of each of the preset attributes based on the time decay coefficient, and determine that the portrait label with the highest matching degree is the The target tags of the users to be analyzed under the preset attributes are described.

可选地,所述预处理模块用于:Optionally, the preprocessing module is used for:

对所述原始充电数据进行数据清洗,得到清洗后数据,其中,所述数据清洗包括以下至少之一:异常数据清洗、重复数据清洗以及空值数据清洗;Data cleaning is performed on the original charging data to obtain data after cleaning, wherein the data cleaning includes at least one of the following: abnormal data cleaning, repeated data cleaning, and null data cleaning;

在所述清洗后数据中分别提取每个所述预设属性对应的数据字段,其中,所述预设属性包括时长属性、电量焦虑属性以及充电功率属性。Data fields corresponding to each of the preset attributes are respectively extracted from the cleaned data, wherein the preset attributes include duration attributes, power anxiety attributes, and charging power attributes.

可选地,所述预处理模块用于:Optionally, the preprocessing module is used for:

在所述清洗后数据中提取充电时长字段对应的数值,作为所述时长属性对应的数据字段;Extracting the value corresponding to the charging duration field from the cleaned data as the data field corresponding to the duration attribute;

在所述清洗后数据中提取起始剩余电量字段对应的数值以及结束剩余电量对应的数值,作为所述电量焦虑属性对应的数据字段;Extracting the value corresponding to the initial remaining power field and the value corresponding to the end remaining power field from the cleaned data, as the data field corresponding to the power anxiety attribute;

根据所述清洗后数据中的单次充电量字段对应的数值以及单次充电时长字段对应的数值,计算单桩充电平均功率,并将所述单桩充电平均功率作为所述充电功率属性对应的数据字段。According to the value corresponding to the field of single charging amount and the value corresponding to the field of single charging duration in the data after cleaning, calculate the average charging power of single pile, and use the average charging power of single pile as the corresponding value of the charging power attribute data field.

可选地,所述预处理模块用于:Optionally, the preprocessing module is used for:

分别将每个所述清洗后数据转换为预设的数据格式;converting each of the cleaned data into a preset data format;

提取所述清洗后数据中的时间数据,并将所述时间数据转换为预设时间格式;extracting time data from the cleaned data, and converting the time data into a preset time format;

判断所述清洗后数据中是否缺失用户标识,若缺失,则利用校验数据补全所述用户标识,其中,所述校验数据为充电业务系统提供的数据。It is judged whether the user identification is missing in the cleaned data, and if it is missing, the user identification is supplemented by verification data, wherein the verification data is data provided by the charging service system.

可选地,所述聚类模块用于:Optionally, the clustering module is used for:

分别将每个所述预设属性作为待聚类属性,所述待聚类属性对应的数据字段作为待聚类字段,并将每个所述待聚类字段映射为数学空间中的待聚类数据点;Each of the preset attributes is used as the attribute to be clustered, the data field corresponding to the attribute to be clustered is used as the field to be clustered, and each of the fields to be clustered is mapped to a mathematical space to be clustered data point;

设置所述待聚类属性对应的聚类簇数K,并在所述待聚类数据点中选取K个数据点作为聚类中心,将每个所述聚类中心作为一个簇;Set the number K of clusters corresponding to the attribute to be clustered, and select K data points as cluster centers in the data points to be clustered, and use each cluster center as a cluster;

分别计算每个所述待聚类数据点与每个所述聚类中心之间的距离,将所述待聚类数据点划分至距离最近的聚类中心对应的簇中;Calculate the distance between each of the data points to be clustered and each of the cluster centers, and divide the data points to be clustered into clusters corresponding to the nearest cluster centers;

根据每个簇中各数据点的坐标更新所述簇的聚类中心,并返回至分别计算每个所述待聚类数据点与每个所述聚类中心之间的距离的步骤,直至满足预设停止条件;Update the clustering centers of the clusters according to the coordinates of each data point in each cluster, and return to the step of calculating the distance between each of the data points to be clustered and each of the clustering centers, until satisfying Default stop condition;

根据每个所述聚类中心所在的簇中的数据点,确定所述聚类中心对应的画像标签。According to the data points in the cluster where each cluster center is located, the portrait label corresponding to the cluster center is determined.

可选地,所述聚类模块用于:Optionally, the clustering module is used for:

预设多个簇数值,并分别计算在每个所述簇数值下,所述待聚类数据点对应的轮廓系数;preset a plurality of cluster values, and respectively calculate the silhouette coefficient corresponding to the data points to be clustered under each of the cluster values;

确定所述轮廓系数最大的簇数值为所述聚类簇数。Determine the cluster value with the largest silhouette coefficient as the number of clusters.

根据本申请又一个方面,提供了一种介质,其上存储有程序或指令,所述程序或指令被处理器执行时实现上述基于多标签数据的充电用户画像方法。According to another aspect of the present application, a medium is provided, on which a program or instruction is stored, and when the program or instruction is executed by a processor, the above method for charging user portrait based on multi-label data is implemented.

根据本申请再一个方面,提供了一种设备,包括存储介质和处理器,所述存储介质存储有计算机程序所述处理器执行所述计算机程序时实现上述基于多标签数据的充电用户画像方法。According to another aspect of the present application, a device is provided, including a storage medium and a processor, the storage medium stores a computer program, and when the processor executes the computer program, the above-mentioned charging user portrait method based on multi-label data is implemented.

借由上述技术方案,本申请利用多维聚类的方法挖掘数据,进行画像分析并构建用户画像,所构建的画像有利于电力企业优化管理,制定更好的充电方案。相较于传统的问卷调查的方法,本方法的原始充电数据来源于公共充电桩数据采集系统,真实性更高,并且数据量更大,因此画像更加精准。相较于传统的基于划分的硬聚类完成用户分类的方法,可以具体针对每个用户进行画像,精确到具体用户的每个属性,因此画像结果更精细,更具有实用性。With the help of the above technical solution, this application utilizes multi-dimensional clustering method to mine data, conduct portrait analysis and build user portraits. The constructed portraits are beneficial to power companies to optimize management and formulate better charging schemes. Compared with the traditional questionnaire survey method, the original charging data of this method comes from the public charging pile data collection system, which has higher authenticity and a larger amount of data, so the portrait is more accurate. Compared with the traditional method of classifying users based on partition-based hard clustering, it is possible to profile each user specifically and accurately to each attribute of a specific user, so the profile results are more refined and more practical.

上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。The above description is only an overview of the technical solution of the present application. In order to better understand the technical means of the present application, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and advantages of the present application more obvious and understandable , the following specifically cites the specific implementation manner of the present application.

附图说明Description of drawings

此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The schematic embodiments and descriptions of the application are used to explain the application and do not constitute an improper limitation to the application. In the attached picture:

图1示出了本申请实施例提供的一种基于多标签数据的充电用户画像方法的流程示意图;Fig. 1 shows a schematic flow diagram of a charging user portrait method based on multi-label data provided by an embodiment of the present application;

图2示出了本申请实施例提供的另一种基于多标签数据的充电用户画像方法的流程示意图;Fig. 2 shows a schematic flow diagram of another charging user portrait method based on multi-label data provided by the embodiment of the present application;

图3示出了本申请实施例提供的另一种基于多标签数据的充电用户画像方法的预处理方案流程示意图;Fig. 3 shows a schematic flow diagram of a preprocessing scheme of another charging user portrait method based on multi-label data provided by an embodiment of the present application;

图4示出了本申请实施例提供的另一种基于多标签数据的充电用户画像方法的聚类方案流程示意图;Fig. 4 shows a schematic flowchart of another clustering scheme based on multi-label data charging user portrait method provided by the embodiment of the present application;

图5示出了本申请实施例提供的一种基于多标签数据的充电用户画像装置的结构框图。Fig. 5 shows a structural block diagram of a charging user portrait device based on multi-label data provided by an embodiment of the present application.

具体实施方式Detailed ways

下文中将参考附图并结合实施例来详细说明本申请。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。Hereinafter, the present application will be described in detail with reference to the drawings and embodiments. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.

在本实施例中提供了一种基于多标签数据的充电用户画像方法,如图1所示,该方法包括:In this embodiment, a charging user portrait method based on multi-label data is provided, as shown in Figure 1, the method includes:

步骤101,获取多个充电用户的原始充电数据,并对原始充电数据进行预处理,得到多个预设属性对应的数据字段;Step 101, acquiring the original charging data of multiple charging users, and preprocessing the original charging data to obtain data fields corresponding to multiple preset attributes;

步骤102,分别针对每个预设属性对应的数据字段进行聚类操作,得到预设属性对应的聚类中心以及每个聚类中心对应的画像标签;Step 102, performing a clustering operation on the data fields corresponding to each preset attribute to obtain the cluster centers corresponding to the preset attributes and the portrait labels corresponding to each cluster center;

步骤103,获取待分析用户的待分析充电数据,并根据待分析充电数据在画像标签中确定待分析用户的目标标签,根据目标标签生成待分析用户的用户画像。Step 103: Obtain the charging data to be analyzed of the user to be analyzed, determine the target tag of the user to be analyzed in the portrait tag according to the charging data to be analyzed, and generate a user portrait of the user to be analyzed according to the target tag.

本申请实施例提供的基于多标签数据的充电用户画像方法,用于挖掘用户充电行为数据,进行画像分析,并构建用户画像,所构建的画像有利于电力企业优化管理,制定更好的充电方案。The charging user portrait method based on multi-label data provided by the embodiment of this application is used to mine user charging behavior data, perform portrait analysis, and construct user portraits. The constructed portraits are beneficial to power companies to optimize management and formulate better charging schemes .

具体地,首先获取充电用户的原始充电数据,其中,原始数据来源于公共充电桩数据采集系统后台的数据库,例如,可以在数据库中获取customer_id(客户ID)、start_soc(起始剩余电量)、end_soc(结束剩余电量)、chargkwh(单次充电量)、usetime(充电时长)、chargstarttime(充电起始时间)、chargendtime(充电结束时间)、station_id(充电站ID)等字段。可以理解的是,原始数据量越大,画像效果越精准。Specifically, first obtain the original charging data of the charging user, wherein the original data comes from the database in the background of the public charging pile data acquisition system, for example, customer_id (customer ID), start_soc (starting remaining power), end_soc can be obtained in the database (End remaining power), chargkwh (single charge), usetime (charging duration), chargestarttime (charging start time), chargeendtime (charging end time), station_id (charging station ID) and other fields. It is understandable that the larger the amount of raw data, the more accurate the portrait effect.

对获取到的原始充电数据进行预处理,得到每个预设属性对应的字段,进而可以针对每个属性进行聚类,以实现多维聚类,其中聚类方法可采用任意通用方法,在此不做限定。针对每个属性得到多个聚类中心,并根据每个聚类中心的特定得到其对应的画像标签,画像标签用于刻画用户某一方面的特点,例如更偏好在高峰期充电等。其中,在聚类之前可对字段进行归一化操作,以统一其量纲,而在聚类操作之后可进行反归一化操作,对其进行复原。Preprocess the obtained raw charging data to obtain the fields corresponding to each preset attribute, and then cluster each attribute to achieve multi-dimensional clustering. The clustering method can use any general method, which is not described here. Do limited. Multiple cluster centers are obtained for each attribute, and its corresponding portrait label is obtained according to the specificity of each cluster center. The portrait label is used to describe a certain aspect of the user's characteristics, such as preferring to charge during peak hours, etc. Among them, the normalization operation can be performed on the field before clustering to unify its dimensions, and the denormalization operation can be performed after the clustering operation to restore it.

优选地,每个预设属性对应的字段为数值类型的字段,也即字段值为数值而非文字。例如,station_id(充电站ID)对应的字段中每个地址都由文字表示,如:大江站,则认为该字段为有标签字段而非数值类型的字段。Preferably, the field corresponding to each preset attribute is a numerical field, that is, the field value is a numerical value instead of a text. For example, if each address in the field corresponding to station_id (charging station ID) is represented by text, such as: Dajiang Station, then this field is considered to be a field with a label rather than a numerical field.

基于此,在对待分析用户进行画像分析时,可根据待分析充电数据确定该待分析用户在每个预设属性所对应的目标标签,并综合分析多个目标标签,从多个维度刻画用户特点,得到待分析用户的用户画像。Based on this, when analyzing the profile of the user to be analyzed, the target tag corresponding to each preset attribute of the user to be analyzed can be determined according to the charging data to be analyzed, and multiple target tags are comprehensively analyzed to describe the characteristics of the user from multiple dimensions , to obtain the user profile of the user to be analyzed.

该实施例利用多维聚类的方法挖掘数据,进行画像分析并构建用户画像,所构建的画像有利于电力企业优化管理,制定更好的充电方案。相较于传统的问卷调查的方法,该实施例的原始充电数据来源于公共充电桩数据采集系统,真实性更高,并且数据量更大,因此画像更加精准。相较于传统的基于划分的硬聚类完成用户分类的方法,可以具体针对每个用户进行画像,精确到具体用户的每个属性,因此画像结果更精细,更具有实用性。This embodiment utilizes the method of multi-dimensional clustering to mine data, conduct portrait analysis and build user portraits. The constructed portraits are beneficial to power companies to optimize management and formulate better charging schemes. Compared with the traditional questionnaire survey method, the original charging data in this embodiment comes from the public charging pile data collection system, which has higher authenticity and a larger amount of data, so the portrait is more accurate. Compared with the traditional method of classifying users based on partition-based hard clustering, it is possible to profile each user specifically and accurately to each attribute of a specific user, so the profile results are more refined and more practical.

图2示出了本申请一个实施例的基于多标签数据的充电用户画像方法的技术方案流程图,如图所示,首先获取用户原始充电数据,然后对获取的数据进行预处理,输出合适的无标签字段,其中,预处理包括数据清洗以属性构造两个步骤。之后对输出的字段依次进行归一化操作,k-means聚类操作以及反归一化操作,得到四个属性的聚类中心以及每个聚类中心对应的画像标签。进而在具体分析过程中,可分别从充电时段、充电时长、充电速率以及电量焦虑四个方向分析待分析用户对应的待分析数据,分别得到四个方向对应的目标标签,进而从四个不同方向刻画用户画像,得到待分析用户的用户特点。Figure 2 shows a flow chart of the technical solution of the method for charging user portraits based on multi-label data according to an embodiment of the present application. Unlabeled fields, where the preprocessing includes two steps of data cleaning and attribute construction. After that, normalization operation, k-means clustering operation, and denormalization operation are performed on the output fields in turn to obtain the cluster centers of the four attributes and the portrait labels corresponding to each cluster center. Furthermore, in the specific analysis process, the data to be analyzed corresponding to the users to be analyzed can be analyzed from the four directions of charging period, charging time, charging rate and power anxiety, and the target labels corresponding to the four directions can be obtained respectively, and then the data can be obtained from the four directions. Depict user portraits to obtain user characteristics of users to be analyzed.

进一步地,作为上述实施例具体实施方式的细化和扩展,为了完整说明本实施例的具体实施过程,提供了另一种充电用户画像方法,在该方法中,根据待分析充电数据在画像标签中确定待分析用户的目标标签,包括如下步骤:Furthermore, as a refinement and extension of the specific implementation of the above-mentioned embodiment, in order to fully describe the specific implementation process of this embodiment, another charging user portrait method is provided. In this method, according to the charging data to be analyzed in the portrait label Determine the target label of the user to be analyzed in , including the following steps:

步骤201,分别在待分析充电数据中提取与每个预设属性对应的待分析字段组,其中待分析字段组包括至少一个待分析字段以及每个待分析字段对应的动作发生时刻;Step 201, respectively extract the field group to be analyzed corresponding to each preset attribute from the charging data to be analyzed, wherein the field group to be analyzed includes at least one field to be analyzed and the action occurrence time corresponding to each field to be analyzed;

步骤202,根据动作发生时刻确定每个待分析字段的时间衰减系数;Step 202, determine the time decay coefficient of each field to be analyzed according to the moment when the action occurs;

步骤203,针对每个预设属性,分别基于时间衰减系数计算待分析字段组与每个预设属性的画像标签的匹配度,并确定匹配度最高的画像标签为待分析用户在预设属性下的目标标签。Step 203, for each preset attribute, calculate the matching degree between the field group to be analyzed and the portrait label of each preset attribute based on the time decay coefficient, and determine the portrait label with the highest matching degree as the user to be analyzed under the preset attribute target label for .

在步骤301-303中,从多个预设属性也即多个维度分析待分析用户。首先从待分析用户对应的待分析充电数据中提取与预设属性对应的待分析字段组,具体提取方法与前述属性构建方法类似,在此不再赘述。在具体应用场景中,可依次分别针对每个预设属性进行分析,由于每个预设属性对应多个聚类中心,并且每个聚类中心对应一个画像标签,因此针对每个预设属性,可分析待分析字段组匹配哪一个画像标签,进而将匹配的那个画像标签作为目标标签。在匹配结束后,可得到多个目标标签,并且目标标签与预设属性一一对应,进而可根据多个目标标签生成待分析用户的画像。In steps 301-303, the user to be analyzed is analyzed from multiple preset attributes, that is, multiple dimensions. Firstly, the field group to be analyzed corresponding to the preset attribute is extracted from the charging data to be analyzed corresponding to the user to be analyzed. The specific extraction method is similar to the aforementioned attribute construction method and will not be repeated here. In a specific application scenario, each preset attribute can be analyzed separately in turn. Since each preset attribute corresponds to multiple cluster centers, and each cluster center corresponds to a portrait label, for each preset attribute, Which portrait tag matches the field group to be analyzed can be analyzed, and then the matched portrait tag is used as the target tag. After the matching is completed, multiple target tags can be obtained, and the target tags correspond to the preset attributes one by one, and then the portrait of the user to be analyzed can be generated according to the multiple target tags.

其中,随着时间的推移,用户的偏好可能发生变化,例如,上个月用户偏好在晚上充电,而这个月则偏好在白天充电。考虑到时间的影响,在具体的匹配过程中,可引入时间衰减系数,利用时间衰减法确定目标标签。具体地,每个待分析字段对应一个动作发生时刻,用于指示相应动作发生的时间,根据动作发生时刻的早晚为待分析字段设置时间衰减系数,可以理解的是,动作发生时刻越早,则衰减越多,而近期发生的动作衰减相对较少。在匹配目标标签的过程中,可将时间衰减度作为权重,例如,用户上个月的充电记录显示为夜间充电,而这个月的充电记录显示为白天充电,则由于引入了时间衰减度,可认为上个月的充电时间的权重较小,这个月的充电时间的权重较大,因此,选择白天充电的画像标签作为充电时段属性对应的目标标签。Wherein, as time goes by, the user's preference may change, for example, last month, the user prefers to charge at night, but this month, the user prefers to charge during the day. Considering the influence of time, in the specific matching process, the time decay coefficient can be introduced, and the target label can be determined by the time decay method. Specifically, each field to be analyzed corresponds to an action occurrence time, which is used to indicate the time when the corresponding action occurs, and the time decay coefficient is set for the field to be analyzed according to the time when the action occurs. It can be understood that the earlier the action occurs, the More decay, and relatively less recent action decay. In the process of matching target tags, the time decay degree can be used as a weight. For example, the charging record of the user last month is displayed as charging at night, while the charging record of this month is displayed as charging during the day. It is considered that the weight of charging time in the previous month is relatively small, and the weight of charging time in this month is relatively large. Therefore, the portrait label of charging during the day is selected as the target label corresponding to the charging period attribute.

可选地,对原始充电数据进行预处理,包括如下步骤:Optionally, preprocessing the raw charging data includes the following steps:

步骤301,对原始充电数据进行数据清洗,得到清洗后数据,其中,数据清洗包括以下至少之一:异常数据清洗、重复数据清洗以及空值数据清洗;Step 301, performing data cleaning on the original charging data to obtain the cleaned data, wherein the data cleaning includes at least one of the following: abnormal data cleaning, repeated data cleaning and null data cleaning;

步骤302,在清洗后数据中分别提取每个预设属性对应的数据字段,其中,预设属性包括时长属性、电量焦虑属性以及充电功率属性在清洗后数据中分别提取每个预设属性对应的数据字段,其中,预设属性包括时长属性、电量焦虑属性以及充电功率属性。Step 302, respectively extract the data field corresponding to each preset attribute from the cleaned data, wherein the preset attribute includes the duration attribute, battery anxiety attribute and charging power attribute, extract the data field corresponding to each preset attribute from the cleaned data respectively A data field, where the preset attributes include duration attributes, battery anxiety attributes, and charging power attributes.

在步骤301-302中,预处理操作包括数据清洗以及属性构造。具体地,在数据清洗过程中,可剔除对画像方法有负面影响的数据,例如异常数据、重复数据以及空值数据等。其中,异常数据包括硬件异常数据、软件异常数据、违规操作数据等。通过数据清洗操作,可以提高画像结果的质量和精度。In steps 301-302, preprocessing operations include data cleaning and attribute construction. Specifically, during the data cleaning process, data that has a negative impact on the profiling method, such as abnormal data, duplicate data, and null data, can be eliminated. The abnormal data includes hardware abnormal data, software abnormal data, illegal operation data, and the like. Through the data cleaning operation, the quality and accuracy of the portrait results can be improved.

在数据清洗操作完成后,可在清洗后数据中提取每个预设属性对应的数据字段,进而在分别针对每个预设属性进行聚类操作时,可以有针对性地分析该预设属性对应的数据字段,将相近的用户分为一类,进一步提高多维聚类效果。After the data cleaning operation is completed, the data field corresponding to each preset attribute can be extracted from the cleaned data, and then when the clustering operation is performed on each preset attribute, the corresponding data field of the preset attribute can be analyzed in a targeted manner. The data fields of similar users are divided into one category to further improve the effect of multi-dimensional clustering.

可选地,在清洗后数据中分别提取每个预设属性对应的数据字段,包括如下步骤:Optionally, the data fields corresponding to each preset attribute are respectively extracted from the cleaned data, including the following steps:

步骤401,在清洗后数据中提取充电时长字段对应的数值,作为时长属性对应的数据字段;Step 401, extracting the value corresponding to the charging duration field from the cleaned data as the data field corresponding to the duration attribute;

步骤402,在清洗后数据中提取起始剩余电量字段对应的数值以及结束剩余电量对应的数值,作为电量焦虑属性对应的数据字段;Step 402, extracting the value corresponding to the initial remaining power field and the value corresponding to the end remaining power field from the cleaned data as the data field corresponding to the power anxiety attribute;

步骤403,根据清洗后数据中的单次充电量字段对应的数值以及单次充电时长字段对应的数值,计算单桩充电平均功率,并将单桩充电平均功率作为充电功率属性对应的数据字段。Step 403 , according to the value corresponding to the single charge field and the value corresponding to the single charge duration field in the cleaned data, calculate the average charging power of the single pile, and use the average charging power of the single pile as the data field corresponding to the charging power attribute.

在步骤401-403中,可提取usetime(充电时长)作为时长属性对应的数据字段,start_soc(起始剩余电量)和end_soc(起始剩余电量)组成一个二维的电量焦虑属性对应的数据字段;chargkwh(单次充电量)和usetime(充电时长)相除所得的单桩充电平均功率作为充电功率属性对应的数据字段。In steps 401-403, usetime (charging duration) can be extracted as the data field corresponding to the duration attribute, and start_soc (starting remaining power) and end_soc (starting remaining power) form a two-dimensional data field corresponding to the power anxiety attribute; The average charging power of a single pile obtained by dividing chargkwh (single charging amount) and usetime (charging duration) is used as the data field corresponding to the charging power attribute.

其中,充电功率属性对应的计算公式如下: Among them, the calculation formula corresponding to the charging power attribute is as follows:

其中,En表示第n次充电的充电量,Tn表示第n次充电的充电时间,Pn表示第n次充电时间内的平均功率。Wherein, E n represents the charging capacity of the nth charging, T n represents the charging time of the nth charging, and P n represents the average power within the nth charging time.

可选地,在清洗后数据中分别提取每个预设属性对应的数据字段之前,还包括如下步骤:Optionally, before extracting the data fields corresponding to each preset attribute from the cleaned data, the following steps are further included:

步骤501,分别将每个清洗后数据转换为预设的数据格式;Step 501, converting each cleaned data into a preset data format;

步骤502,提取清洗后数据中的时间数据,并将时间数据转换为预设时间格式;Step 502, extracting the time data in the cleaned data, and converting the time data into a preset time format;

步骤503,判断清洗后数据中是否缺失用户标识,若缺失,则利用校验数据补全用户标识,其中,校验数据为充电业务系统提供的数据。In step 503, it is judged whether the user identification is missing in the cleaned data, and if it is missing, the user identification is supplemented by verification data, wherein the verification data is data provided by the charging service system.

在步骤501-503中,对清洗后数据的格式进行统一并补全用户ID。具体地,由于获取到的数据可能存在全角输入、半角输入、空格符号、错误字段格式等错误,因此对清洗后数据进行格式转换,转换成统一的数据格式,以便进行后续运算操作。在格式转换后,可单独针对时间数据进行时间转换,保证所有时间数据的格式统一,例如,将2020.04.20以及2020/04/20都统一转换成20200420。最后对数据进行用户ID补全操作,可以理解的是,针对有缺失的数据,可根据数据重要性采取相应的措施,例如,针对重要性低的缺失数据可采用直接删除的方法,而重要性高的缺失数据则可采用从其他渠道补全、使用其他字段计算获取或根据历史经验填充等方法,而由于用户ID的重要性相对较高,因此可采用从充电业务系统中读取数据并进行补全的方法,提高数据完整性。In steps 501-503, the format of the cleaned data is unified and the user ID is completed. Specifically, since the acquired data may have errors such as full-width input, half-width input, space symbols, and wrong field formats, the format of the cleaned data is converted into a unified data format for subsequent operations. After the format conversion, the time conversion can be performed on the time data alone to ensure that the format of all time data is unified, for example, both 2020.04.20 and 2020/04/20 are uniformly converted to 20200420. Finally, the user ID completion operation is performed on the data. It is understandable that for missing data, corresponding measures can be taken according to the importance of the data. For example, the method of direct deletion can be used for missing data with low importance, while the important High missing data can be completed from other channels, calculated using other fields, or filled in based on historical experience. Since the user ID is relatively important, it is possible to read data from the charging business system and perform Completion method to improve data integrity.

图3示出了本申请一个实施例的基于多标签数据的充电用户画像方法的预处理流程示意图,如图所示,预处理包括数据清洗以及属性筛选/构造步骤,多维原始数据在数据清洗步骤进行异常数据清洗、重复数据清洗以及空值数据清洗操作,接着进入属性筛选/构造步骤,首先进行格式转换、时间转换以及用户ID补全操作,然后对得到的数据进行属性构造,得到三种不同属性的数据,也即时长属性对应的数据字段、二维的电量焦虑属性对应的数据字段、以及充电功率属性对应的数据字段。Figure 3 shows a schematic diagram of the preprocessing flow of the charging user portrait method based on multi-label data according to an embodiment of the present application. As shown in the figure, the preprocessing includes data cleaning and attribute screening/construction steps, and the multidimensional raw data is processed in the data cleaning step Perform abnormal data cleaning, repeated data cleaning, and null data cleaning operations, and then enter the attribute screening/construction step, first perform format conversion, time conversion, and user ID completion operations, and then perform attribute construction on the obtained data to obtain three different types. The attribute data is the data field corresponding to the instant length attribute, the data field corresponding to the two-dimensional battery anxiety attribute, and the data field corresponding to the charging power attribute.

可选地,分别针对每个预设属性对应的数据字段进行聚类操作,得到预设属性对应的聚类中心以及每个聚类中心对应的画像标签,包括如下步骤:Optionally, clustering operations are performed on the data fields corresponding to each preset attribute to obtain the cluster centers corresponding to the preset attributes and the portrait labels corresponding to each cluster center, including the following steps:

步骤601,分别将每个预设属性作为待聚类属性,待聚类属性对应的数据字段作为待聚类字段,并将每个待聚类字段映射为数学空间中的待聚类数据点;Step 601, taking each preset attribute as the attribute to be clustered, the data field corresponding to the attribute to be clustered as the field to be clustered, and mapping each field to be clustered to a data point to be clustered in the mathematical space;

步骤602,设置待聚类属性对应的聚类簇数K,并在待聚类数据点中选取K个数据点作为聚类中心,将每个聚类中心作为一个簇;Step 602, setting the number K of clusters corresponding to the attribute to be clustered, and selecting K data points as cluster centers among the data points to be clustered, and using each cluster center as a cluster;

步骤603,分别计算每个待聚类数据点与每个聚类中心之间的距离,将待聚类数据点划分至距离最近的聚类中心对应的簇中;Step 603, respectively calculating the distance between each data point to be clustered and each cluster center, and dividing the data points to be clustered into clusters corresponding to the nearest cluster center;

步骤604,根据每个簇中各数据点的坐标更新簇的聚类中心,并返回至分别计算每个待聚类数据点与每个聚类中心之间的距离的步骤,直至满足预设停止条件;Step 604, update the cluster center of the cluster according to the coordinates of each data point in each cluster, and return to the step of calculating the distance between each data point to be clustered and each cluster center, until the preset stop is satisfied condition;

步骤605,根据每个聚类中心所在的簇中的数据点,确定聚类中心对应的画像标签。Step 605, according to the data points in the cluster where each cluster center is located, determine the portrait label corresponding to the cluster center.

在步骤601-605中,分别将每个预设属性作为待聚类属性,进而利用k-means分别针对每个待聚类属性进行聚类操作。k-means是聚类算法的一种,原理简单,容易理解,并且运算速度快。在具体的聚类操作过程中,首先将每个待聚类字段映射为数学空间中的待聚类数据点,其中,每个待聚类数据点可以表示为向量的形式。然后确定聚类簇数K的具体数值,其中,K值用于标识通过聚类得到K个簇,也即K个分组。在确定K值后,在所有待聚类数据点组成的集合中选择K个数据点作为聚类中心,对于集合中每个数据点,计算其与聚类中心的距离,距离哪个聚类中心最近就归属于哪个聚类中心,当所有数据点都确定所归属的聚类中心后,就完成了第一轮聚类操作,此时形成了K个簇,每个簇均包括一个聚类中心以及若干其他数据点。In steps 601-605, each preset attribute is used as an attribute to be clustered, and k-means is used to perform a clustering operation on each attribute to be clustered. K-means is a kind of clustering algorithm, the principle is simple, easy to understand, and the operation speed is fast. In a specific clustering operation process, firstly, each field to be clustered is mapped to a data point to be clustered in a mathematical space, wherein each data point to be clustered can be expressed in the form of a vector. Then determine the specific value of the cluster number K, wherein the K value is used to identify K clusters obtained through clustering, that is, K groups. After determining the K value, select K data points in the set composed of all data points to be clustered as the cluster center, and for each data point in the set, calculate its distance from the cluster center, which cluster center is the closest It depends on which clustering center it belongs to. When all data points are determined to belong to the clustering center, the first round of clustering operation is completed. At this time, K clusters are formed, and each cluster includes a clustering center and Several other data points.

在第一轮聚类操作后,在每个簇内部,根据簇内所有数据点的坐标计算出新的聚类中心,重新进行下一轮的聚类操作,其操作步骤与第一轮聚类操作类似,在此不再赘述。多次循环执行上述操作,直至满足预设停止条件,则认为聚类操作完成。在聚类操作完成后,形成K个簇,每个簇包括一个聚类中心以及若干其他数据点,此时即可根据每个簇中的数据点的特点,生成该簇的画像标签,也即该聚类中心对应的画像标签。After the first round of clustering operation, within each cluster, a new cluster center is calculated according to the coordinates of all data points in the cluster, and the next round of clustering operation is performed again, and the operation steps are the same as those of the first round of clustering The operation is similar and will not be repeated here. The above operation is performed repeatedly until the preset stop condition is met, then the clustering operation is considered to be completed. After the clustering operation is completed, K clusters are formed, and each cluster includes a cluster center and several other data points. At this time, the portrait label of the cluster can be generated according to the characteristics of the data points in each cluster, that is, The portrait label corresponding to the cluster center.

其中,预设停止条件可以为迭代次数达到最大迭代次数,也可以为聚类中心收敛等,例如,在若干次迭代过程中,若新的聚类中心与旧的聚类中心之间的距离均小于预设阈值,则认为聚类中心收敛。Among them, the preset stop condition can be that the number of iterations reaches the maximum number of iterations, or that the cluster center converges, etc., for example, in several iterations, if the distance between the new cluster center and the old cluster center is equal to If it is less than the preset threshold, the cluster center is considered to be convergent.

通过对每个预设属性进行聚类,可分别得到每个预设属性对应的多个聚类中心,进而从每个预设属性对应的角度进行画像分析,得出画像标签以及标签解释。By clustering each preset attribute, multiple cluster centers corresponding to each preset attribute can be obtained, and then the portrait analysis is performed from the perspective corresponding to each preset attribute, and the portrait label and label explanation are obtained.

可选地,设置待聚类属性对应的聚类簇数K,包括如下步骤:Optionally, setting the cluster number K corresponding to the attribute to be clustered includes the following steps:

步骤701,预设多个簇数值,并分别计算在每个簇数值下,待聚类数据点对应的轮廓系数;Step 701, preset a plurality of cluster values, and respectively calculate the silhouette coefficient corresponding to the data points to be clustered under each cluster value;

步骤702,确定轮廓系数最大的簇数值为聚类簇数。Step 702, determine the cluster value with the largest silhouette coefficient as the number of clusters.

在步骤701-702中,利用轮廓系数法确定聚类簇数K。可以理解的是,轮廓系数是一个用来描述目标对于目标所在簇与其他簇之间的相似性的系数,其数值越大表面目标与自己所在簇的匹配关系越高,与其他簇的匹配关系越低,可用于描述类的紧致性。在具体应用过程中,首先读取待聚类数据点,然后分别计算每个预设K值下待聚类数据点对应的轮廓系数,并确定轮廓系数最大的K值为最佳的K值,将其作为最终的K值。In steps 701-702, the number of clusters K is determined using the silhouette coefficient method. It can be understood that the silhouette coefficient is a coefficient used to describe the similarity between the target cluster and other clusters. The larger the value, the higher the matching relationship between the target and its own cluster, and the higher the matching relationship with other clusters. The lower can be used to describe the compactness of a class. In the specific application process, first read the data points to be clustered, and then calculate the silhouette coefficients corresponding to the data points to be clustered under each preset K value, and determine the K value with the largest silhouette coefficient as the best K value, Use this as the final K value.

该实施例利用轮廓系数法确定最佳的K值,相较于随机确定K值并进行聚类的方法,该实施例的聚类效果更佳。This embodiment uses the silhouette coefficient method to determine the optimal K value, compared with the method of randomly determining the K value and performing clustering, the clustering effect of this embodiment is better.

图4示出了本申请一个实施例的基于多标签数据的充电用户画像方法的聚类方案流程图,如图所示,聚类方案包括如下步骤:Fig. 4 shows a flow chart of a clustering scheme of a charging user portrait method based on multi-label data according to an embodiment of the present application. As shown in the figure, the clustering scheme includes the following steps:

步骤1:输入归一化后的待聚类的n维数据样本。Step 1: Input the normalized n-dimensional data samples to be clustered.

步骤2:基于轮廓系数法得到最佳簇数值,记为K,即最佳聚类簇数,随机选取数学空间中的K个点作为样本集的聚类初始中心,并初始化。Step 2: Get the optimal cluster value based on the silhouette coefficient method, denoted as K, which is the optimal cluster number, randomly select K points in the mathematical space as the initial clustering center of the sample set, and initialize.

步骤3:对于样本集中的每个数据点,计算它们与聚类中心的欧式距离,将它们归属于相隔距离最短的聚类中心,形成一类簇。Step 3: For each data point in the sample set, calculate the Euclidean distance between them and the cluster center, and assign them to the cluster center with the shortest distance to form a cluster.

步骤4:更新每个簇的聚类中心,将每类簇中所有数据点的坐标平均值作为新的聚类中心。Step 4: Update the cluster center of each cluster, and use the coordinate average of all data points in each cluster as the new cluster center.

步骤5:判断新的聚类中心是否收敛,若收敛则输出聚类结果;若不收敛,则跳到步骤3处继续往后执行。Step 5: Determine whether the new cluster center is convergent, and if convergent, output the clustering result; if not, skip to step 3 and continue to execute.

本方法所用欧式距离计算公式如下:The Euclidean distance calculation formula used in this method is as follows:

其中,d表示两个n维数据点x和y之间的数学空间内的距离,也表示这两个点的相邻程度。Among them, d represents the distance in the mathematical space between two n-dimensional data points x and y, and also represents the degree of adjacency of these two points.

在具体的应用场景中,采集了某一地区电网充电桩一年的充电记录数据,针对customer_id(客户ID)、start_soc(起始剩余电量)、end_soc(结束剩余电量)、chargkwh(单次充电量)、usetime(充电时长)、chargstarttime(充电起始时间)、chargendtime(充电结束时间)、station_id(充电站ID)等数据字段进行挖掘。基于PyCharm的集成开发环境,使用Python语言完成画像装置的搭建。使用pandas、numpy工具完成数据清洗、数据筛选/构造和数据分类的操作。使用sklearn工具完成聚类操作。In a specific application scenario, the one-year charging record data of a grid charging pile in a certain area is collected. For customer_id (customer ID), start_soc (starting remaining power), end_soc (end remaining ), usetime (charging duration), chargestarttime (charging start time), chargeendtime (charging end time), station_id (charging station ID) and other data fields for mining. Based on the integrated development environment of PyCharm, the construction of the portrait device is completed by using the Python language. Use pandas and numpy tools to complete data cleaning, data screening/construction and data classification. Clustering operations are done using sklearn tools.

数据清洗步骤中完成了对异常数据、重复数据、空值数据的清洗,其中异常数据包括:硬件异常数据、软件异常数据、违规操作数据。In the data cleaning step, the cleaning of abnormal data, repeated data, and null value data is completed. The abnormal data includes: hardware abnormal data, software abnormal data, and illegal operation data.

使用K-means聚类法对SOC属性也即电量焦虑属性(stat_SOC,end_SOC)、充电时长属性、充电功率属性进行聚类,使用轮廓系数法输出最佳聚类簇数,在最佳聚类簇数下进行聚类并分析聚类结果,最终SOC数据聚类分析输出5类行为画像标签;充电时长聚类分析输出3类画像标签;充电功率聚类分析输出4类画像标签,聚类画像后的结果如下表1所示。Use the K-means clustering method to cluster the SOC attribute, that is, the battery anxiety attribute (stat_SOC, end_SOC), the charging duration attribute, and the charging power attribute, and use the silhouette coefficient method to output the optimal cluster number. Clustering is performed several times and the clustering results are analyzed. Finally, the SOC data clustering analysis outputs 5 types of behavior portrait labels; the charging time cluster analysis outputs 3 types of portrait labels; the charging power cluster analysis outputs 4 types of portrait labels, after clustering portraits The results are shown in Table 1 below.

表1充电行为聚类画像结果Table 1 Results of charging behavior clustering portrait

在这之后将所有通过聚类法和直接分类法得到的画像标签通过时间衰减法进行最终画像标签匹配,其中,聚类法可采用前述k-means法,也可采用其他聚类方法,直接分类法是根据历史经验分类得到画像标签的方法,每一类行为画像标签中选出最能代表用户近期行为的最终画像标签。After that, all the portrait labels obtained by the clustering method and the direct classification method are used for final portrait label matching by the time decay method. Among them, the clustering method can use the aforementioned k-means method, or other clustering methods can be used for direct classification The method is to classify and obtain portrait labels based on historical experience, and select the final portrait label that best represents the user's recent behavior from each type of behavioral portrait label.

画像分析如下:The image analysis is as follows:

(1)将功率属性数据聚类结果按照聚类中心的相对大小分为了四个标签:极低,中速,快速,超快速。其中速率极快的用户占比极低,中速的用户数量最多,低速和快速的数量相当。本画像一定程度能够反映用户与充电桩的匹配性,以及充电时速度的相对快慢。(1) The power attribute data clustering results are divided into four labels according to the relative size of the cluster centers: extremely low, medium speed, fast, and super fast. Among them, the proportion of extremely fast users is extremely low, the number of medium-speed users is the largest, and the number of low-speed and fast users is equal. To a certain extent, this portrait can reflect the matching between the user and the charging pile, as well as the relative speed of charging.

(2)用户充电时长可分为三类,分别是第1类-短时间、第2类-长时间、第3类-一般时长。这样的分类标签是根据用户的使用充电时长的相对大小和簇内数量而确定的。(2) The user's charging time can be divided into three categories, namely the first category - short time, the second category - long time, and the third category - general duration. Such classification labels are determined according to the relative size of the user's usage and charging time and the number in the cluster.

(3)SOC属性可分为四类:(3) SOC attributes can be divided into four categories:

耗尽型大额充电类型:此类用户在电量接近零时选择补电,充电结束时接近满电量,所以这类用户的充电时消耗的电量比较大,固其定义为耗尽型大额充电类型。在所有用户中此类用户的数量最多,占比最大,属于常规类型。Depletion-type large-amount charging type: This type of user chooses to recharge when the power is close to zero, and when the charging ends, it is close to full power, so this type of user consumes a relatively large amount of power during charging, which is defined as depletion-type large-amount charging type. Among all users, this type of users has the largest number and proportion, and belongs to the conventional type.

经济型:此类用户主要表现为电量接近零时的低度充电,这类用户可能清楚地知道自己的行程所需要的里程数,他们会安排充至合适的电量从而节省充电时间,这类用户被定义为经济型。Economical: This type of user is mainly characterized by low-level charging when the power is close to zero. This type of user may clearly know the mileage required for their trip, and they will arrange to charge to the appropriate amount of power to save charging time. This type of user is defined as economical.

重度里程焦虑:这类用户有明显的行为特征,聚类中心显示他们通常在SOC不低于50%时选择补电,这样的行为显示他们极有可能是想进行长途旅行的用户,此类用户被定义为重度里程焦虑。Severe mileage anxiety: This type of users has obvious behavioral characteristics. The cluster center shows that they usually choose to recharge when the SOC is not lower than 50%. Such behavior shows that they are very likely to be users who want to travel long distances. Defined as severe range anxiety.

轻度里程焦虑:此类用户在电量剩下30%左右时选择补电。他们可能担心剩余电量可供行驶的里程数,错过这次充电的机会,剩余的电量可能难以支撑找到下一个充电站,所以被定义为轻度里程焦虑。Mild mileage anxiety: Such users choose to recharge when the battery is about 30% left. They may be worried about the mileage of the remaining power, miss the opportunity to charge this time, and the remaining power may be difficult to support to find the next charging station, so it is defined as mild mileage anxiety.

在此之后,对于任意待分析用户,即可根据其待分析数据分别针对每个属性进行分析,判断该用户更符合每个属性中的那个画像标签,进而得到多角度刻画的用户画像。After that, for any user to be analyzed, each attribute can be analyzed according to the data to be analyzed, and it can be judged that the user is more suitable for the portrait label in each attribute, and then a user portrait portrayed from multiple angles can be obtained.

应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。It should be understood that the sequence numbers of the steps in the above embodiments do not mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, and should not constitute any limitation to the implementation process of the embodiment of the present invention.

进一步地,作为上述基于多标签数据的充电用户画像方法的具体实现,本申请实施例提供了一种基于多标签数据的充电用户画像装置,如图5所示,该装置包括:预处理模块、聚类模块以及画像模块。Further, as a specific implementation of the charging user portrait method based on multi-label data, an embodiment of the present application provides a charging user portrait device based on multi-label data, as shown in FIG. 5 , the device includes: a preprocessing module, Clustering module and portrait module.

预处理模块,用于获取多个充电用户的原始充电数据,并对原始充电数据进行预处理,得到多个预设属性对应的数据字段;A preprocessing module, configured to obtain the original charging data of multiple charging users, and preprocess the original charging data to obtain data fields corresponding to multiple preset attributes;

聚类模块,用于分别针对每个预设属性对应的数据字段进行聚类操作,得到预设属性对应的聚类中心以及每个聚类中心对应的画像标签;The clustering module is used to perform a clustering operation on the data field corresponding to each preset attribute, and obtain the cluster center corresponding to the preset attribute and the portrait label corresponding to each cluster center;

画像模块,用于获取待分析用户的待分析充电数据,并根据待分析充电数据在画像标签中确定待分析用户的目标标签,根据目标标签生成待分析用户的用户画像。The portrait module is used to obtain the charging data to be analyzed of the user to be analyzed, determine the target label of the user to be analyzed in the portrait label according to the charging data to be analyzed, and generate the user portrait of the user to be analyzed according to the target label.

在具体的应用场景中,可选地,画像模块用于:In a specific application scenario, optionally, the portrait module is used to:

分别在待分析充电数据中提取与每个预设属性对应的待分析字段组,其中待分析字段组包括至少一个待分析字段以及每个待分析字段对应的动作发生时刻;Extracting a field group to be analyzed corresponding to each preset attribute from the charging data to be analyzed, wherein the field group to be analyzed includes at least one field to be analyzed and an action occurrence time corresponding to each field to be analyzed;

根据动作发生时刻确定每个待分析字段的时间衰减系数;Determine the time decay coefficient of each field to be analyzed according to the moment when the action occurs;

针对每个预设属性,分别基于时间衰减系数计算待分析字段组与每个预设属性的画像标签的匹配度,并确定匹配度最高的画像标签为待分析用户在预设属性下的目标标签。For each preset attribute, calculate the matching degree between the field group to be analyzed and the portrait label of each preset attribute based on the time decay coefficient, and determine the portrait label with the highest matching degree as the target label of the user to be analyzed under the preset attribute .

在具体的应用场景中,可选地,预处理模块用于:In a specific application scenario, optionally, the preprocessing module is used to:

对原始充电数据进行数据清洗,得到清洗后数据,其中,数据清洗包括以下至少之一:异常数据清洗、重复数据清洗以及空值数据清洗;Perform data cleaning on the original charging data to obtain the cleaned data, wherein the data cleaning includes at least one of the following: abnormal data cleaning, repeated data cleaning, and null data cleaning;

在清洗后数据中分别提取每个预设属性对应的数据字段,其中,预设属性包括时长属性、电量焦虑属性以及充电功率属性。The data fields corresponding to each preset attribute are respectively extracted from the cleaned data, wherein the preset attribute includes the duration attribute, battery anxiety attribute, and charging power attribute.

在具体的应用场景中,可选地,预处理模块用于:In a specific application scenario, optionally, the preprocessing module is used to:

在清洗后数据中提取充电时长字段对应的数值,作为时长属性对应的数据字段;Extract the value corresponding to the charging duration field from the cleaned data as the data field corresponding to the duration attribute;

在清洗后数据中提取起始剩余电量字段对应的数值以及结束剩余电量对应的数值,作为电量焦虑属性对应的数据字段;Extract the value corresponding to the initial remaining power field and the value corresponding to the end remaining power field from the cleaned data as the data field corresponding to the power anxiety attribute;

根据清洗后数据中的单次充电量字段对应的数值以及单次充电时长字段对应的数值,计算单桩充电平均功率,并将单桩充电平均功率作为充电功率属性对应的数据字段。According to the value corresponding to the single charge field and the value corresponding to the single charge duration field in the cleaned data, calculate the average charging power of the single pile, and use the average charging power of the single pile as the data field corresponding to the charging power attribute.

在具体的应用场景中,可选地,预处理模块用于:In a specific application scenario, optionally, the preprocessing module is used to:

分别将每个清洗后数据转换为预设的数据格式;Convert each cleaned data into a preset data format respectively;

提取清洗后数据中的时间数据,并将时间数据转换为预设时间格式;Extract the time data from the cleaned data and convert the time data to a preset time format;

判断清洗后数据中是否缺失用户标识,若缺失,则利用校验数据补全用户标识,其中,校验数据为充电业务系统提供的数据。It is judged whether the user identification is missing in the cleaned data, and if it is missing, the user identification is completed by using the verification data, wherein the verification data is the data provided by the charging service system.

在具体的应用场景中,可选地,聚类模块用于:In a specific application scenario, optionally, the clustering module is used to:

分别将每个预设属性作为待聚类属性,待聚类属性对应的数据字段作为待聚类字段,并将每个待聚类字段映射为数学空间中的待聚类数据点;Each preset attribute is used as the attribute to be clustered, the data field corresponding to the attribute to be clustered is used as the field to be clustered, and each field to be clustered is mapped to a data point to be clustered in the mathematical space;

设置待聚类属性对应的聚类簇数K,并在待聚类数据点中选取K个数据点作为聚类中心,将每个聚类中心作为一个簇;Set the cluster number K corresponding to the attribute to be clustered, and select K data points as cluster centers among the data points to be clustered, and use each cluster center as a cluster;

分别计算每个待聚类数据点与每个聚类中心之间的距离,将待聚类数据点划分至距离最近的聚类中心对应的簇中;Calculate the distance between each data point to be clustered and each cluster center, and divide the data points to be clustered into the cluster corresponding to the nearest cluster center;

根据每个簇中各数据点的坐标更新簇的聚类中心,并返回至分别计算每个待聚类数据点与每个聚类中心之间的距离的步骤,直至满足预设停止条件;Update the cluster center of the cluster according to the coordinates of each data point in each cluster, and return to the step of calculating the distance between each data point to be clustered and each cluster center, until the preset stop condition is met;

根据每个聚类中心所在的簇中的数据点,确定聚类中心对应的画像标签。According to the data points in the cluster where each cluster center is located, the portrait label corresponding to the cluster center is determined.

在具体的应用场景中,可选地,聚类模块用于:In a specific application scenario, optionally, the clustering module is used to:

预设多个簇数值,并分别计算在每个簇数值下,待聚类数据点对应的轮廓系数;Preset multiple cluster values, and calculate the silhouette coefficient corresponding to the data points to be clustered under each cluster value;

确定轮廓系数最大的簇数值为聚类簇数。Determine the cluster value with the largest silhouette coefficient as the number of clusters.

需要说明的是,本申请实施例提供的一种基于多标签数据的充电用户画像装置所涉及各功能模块的其他相应描述,可以参考上述方法中的对应描述,在此不再赘述。It should be noted that for other corresponding descriptions of the functional modules involved in the multi-tag data-based charging user portrait device provided in the embodiment of the present application, reference may be made to the corresponding descriptions in the above methods, and details are not repeated here.

基于上述方法,相应的,本申请实施例还提供了一种存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述基于多标签数据的充电用户画像方法。Based on the above method, correspondingly, an embodiment of the present application also provides a storage medium on which a computer program is stored, and when the program is executed by a processor, the above method for charging user portraits based on multi-label data is implemented.

基于这样的理解,本申请的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中,包括若干指令用以使得一台电子设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施场景所述的方法。Based on this understanding, the technical solution of the present application can be embodied in the form of software products, which can be stored in a non-volatile storage medium (which can be CD-ROM, U disk, mobile hard disk, etc.), including several The instructions are used to make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in various implementation scenarios of this application.

基于上述如图1至图4所示的方法,以及图5所示的虚拟装置实施例,为了实现上述目的,本申请实施例还提供了一种设备,具体可以为个人计算机、服务器、网络设备等,该电子设备包括存储介质和处理器;存储介质,用于存储计算机程序;处理器,用于执行计算机程序以实现上述如图1至图4所示的基于多标签数据的充电用户画像方法。Based on the above methods shown in Figures 1 to 4, and the virtual device embodiment shown in Figure 5, in order to achieve the above purpose, the embodiment of the present application also provides a device, which can be a personal computer, a server, or a network device. Etc., the electronic device includes a storage medium and a processor; the storage medium is used to store a computer program; the processor is used to execute the computer program to realize the above-mentioned charging user portrait method based on multi-label data as shown in Figures 1 to 4 .

可选地,该电子设备还可以包括用户接口、网络接口、摄像头、射频(RadioFrequency,RF)电路,传感器、音频电路、WI-FI模块等等。用户接口可以包括显示屏(Display)、输入单元比如键盘(Keyboard)等,可选用户接口还可以包括USB接口、读卡器接口等。网络接口可选的可以包括标准的有线接口、无线接口(如蓝牙接口、WI-FI接口)等。Optionally, the electronic device may further include a user interface, a network interface, a camera, a radio frequency (Radio Frequency, RF) circuit, a sensor, an audio circuit, a WI-FI module, and the like. The user interface may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the like, and optional user interfaces may also include a USB interface, a card reader interface, and the like. Optionally, the network interface may include a standard wired interface, a wireless interface (such as a Bluetooth interface, a WI-FI interface) and the like.

本领域技术人员可以理解,本实施例提供的一种电子设备结构并不构成对该电子设备的限定,可以包括更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the structure of an electronic device provided in this embodiment does not constitute a limitation to the electronic device, and may include more or less components, or combine some components, or arrange different components.

存储介质中还可以包括操作系统、网络通信模块。操作系统是管理和保存电子设备硬件和软件资源的程序,支持信息处理程序以及其它软件和/或程序的运行。网络通信模块用于实现存储介质内部各控件之间的通信,以及与该实体设备中其它硬件和软件之间通信。The storage medium may also include an operating system and a network communication module. An operating system is a program that manages and preserves hardware and software resources of an electronic device, and supports the operation of information processing programs and other software and/or programs. The network communication module is used to realize the communication between various controls inside the storage medium, and communicate with other hardware and software in the physical device.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本申请可以借助软件加必要的通用硬件平台的方式来实现,也可以通过硬件实现。Through the above description of the embodiments, those skilled in the art can clearly understand that the present application can be realized by means of software plus a necessary general-purpose hardware platform, or by hardware.

本领域技术人员可以理解附图只是一个优选实施场景的示意图,附图中的单元或流程并不一定是实施本申请所必须的。本领域技术人员可以理解实施场景中的装置中的单元可以按照实施场景描述进行分布于实施场景的装置中,也可以进行相应变化位于不同于本实施场景的一个或多个装置中。上述实施场景的单元可以合并为一个单元,也可以进一步拆分成多个子单元。Those skilled in the art can understand that the accompanying drawing is only a schematic diagram of a preferred implementation scenario, and the units or processes in the accompanying drawings are not necessarily necessary for implementing the present application. Those skilled in the art can understand that the units in the devices in the implementation scenario can be distributed among the devices in the implementation scenario according to the description of the implementation scenario, or can be located in one or more devices different from the implementation scenario according to corresponding changes. The units of the above implementation scenarios can be combined into one unit, or can be further split into multiple sub-units.

上述本申请序号仅仅为了描述,不代表实施场景的优劣。以上公开的仅为本申请的几个具体实施场景,但是,本申请并非局限于此,任何本领域的技术人员能思之的变化都应落入本申请的保护范围。The serial numbers of the above application are for description only, and do not represent the pros and cons of the implementation scenarios. The above disclosures are only several specific implementation scenarios of the present application, but the present application is not limited thereto, and any changes conceivable by those skilled in the art shall fall within the protection scope of the present application.

Claims (10)

1. A method for charging user image based on multi-tag data, the method comprising:
acquiring original charging data of a plurality of charging users, and preprocessing the original charging data to obtain data fields corresponding to a plurality of preset attributes;
clustering operation is carried out on the data fields corresponding to each preset attribute respectively, so that a clustering center corresponding to the preset attribute and portrait tags corresponding to each clustering center are obtained;
and acquiring charging data to be analyzed of the user to be analyzed, determining a target tag of the user to be analyzed in the portrait tag according to the charging data to be analyzed, and generating a user portrait of the user to be analyzed according to the target tag.
2. The method according to claim 1, wherein the determining the target tag of the user to be analyzed from the portrait tags according to the charging data to be analyzed includes:
Extracting a field group to be analyzed corresponding to each preset attribute from the charging data to be analyzed, wherein the field group to be analyzed comprises at least one field to be analyzed and action occurrence time corresponding to each field to be analyzed;
determining a time attenuation coefficient of each field to be analyzed according to the action occurrence time;
and calculating the matching degree of the field group to be analyzed and the portrait tag of each preset attribute according to each preset attribute based on the time attenuation coefficient, and determining the portrait tag with the highest matching degree as the target tag of the user to be analyzed under the preset attribute.
3. The method of claim 1, wherein the preprocessing the raw charging data comprises:
performing data cleaning on the original charging data to obtain cleaned data, wherein the data cleaning comprises at least one of the following steps: abnormal data cleaning, repeated data cleaning and null data cleaning;
and respectively extracting data fields corresponding to each preset attribute from the cleaned data, wherein the preset attributes comprise a duration attribute, an electric quantity anxiety attribute and a charging power attribute.
4. A method according to claim 3, wherein extracting the data field corresponding to each preset attribute from the cleaned data respectively includes:
extracting a numerical value corresponding to a charging time length field from the cleaned data to serve as a data field corresponding to the time length attribute;
extracting a value corresponding to the initial residual electric quantity field and a value corresponding to the end residual electric quantity from the cleaned data to serve as a data field corresponding to the electric quantity anxiety attribute;
and calculating the single pile charging average power according to the value corresponding to the single charge amount field and the value corresponding to the single charge duration field in the cleaned data, and taking the single pile charging average power as the data field corresponding to the charging power attribute.
5. A method according to claim 3, wherein before extracting the data field corresponding to each preset attribute from the cleaned data, the method further comprises:
respectively converting each cleaned data into a preset data format;
extracting time data in the cleaned data, and converting the time data into a preset time format;
Judging whether the user identification is missing in the cleaned data, if so, complementing the user identification by using check data, wherein the check data is data provided by a charging service system.
6. The method according to claim 1, wherein the clustering operation is performed for the data field corresponding to each preset attribute to obtain a cluster center corresponding to the preset attribute and a portrait tag corresponding to each cluster center, and the clustering method comprises:
respectively taking each preset attribute as an attribute to be clustered, taking a data field corresponding to the attribute to be clustered as a field to be clustered, and mapping each field to be clustered into a data point to be clustered in a mathematical space;
setting a clustering cluster number K corresponding to the attribute to be clustered, selecting K data points from the data points to be clustered as clustering centers, and taking each clustering center as a cluster;
respectively calculating the distance between each data point to be clustered and each clustering center, and dividing the data points to be clustered into clusters corresponding to the closest clustering centers;
updating the clustering center of each cluster according to the coordinates of each data point in each cluster, and returning to the step of respectively calculating the distance between each data point to be clustered and each clustering center until a preset stopping condition is met;
And determining the portrait label corresponding to each cluster center according to the data point in the cluster where each cluster center is located.
7. The method of claim 6, wherein the setting the number K of clusters corresponding to the attribute to be clustered includes:
presetting a plurality of cluster values, and respectively calculating contour coefficients corresponding to the data points to be clustered under each cluster value;
and determining the cluster value with the largest profile coefficient as the cluster number.
8. A charged user imaging device based on multi-tag data, the device comprising:
the preprocessing module is used for acquiring original charging data of a plurality of charging users, and preprocessing the original charging data to obtain data fields corresponding to a plurality of preset attributes;
the clustering module is used for carrying out clustering operation on the data fields corresponding to each preset attribute respectively to obtain a clustering center corresponding to the preset attribute and portrait tags corresponding to each clustering center;
and the portrait module is used for acquiring charging data to be analyzed of a user to be analyzed, determining a target label of the user to be analyzed in the portrait label according to the charging data to be analyzed, and generating a user portrait of the user to be analyzed according to the target label.
9. A storage medium having stored thereon a program or instructions which, when executed by a processor, implement the method of any of claims 1 to 7.
10. An electronic device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 7 when executing the program.
CN202310483328.2A 2023-04-28 2023-04-28 Charging user image drawing method and device based on multi-label data, medium and equipment Pending CN116522227A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310483328.2A CN116522227A (en) 2023-04-28 2023-04-28 Charging user image drawing method and device based on multi-label data, medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310483328.2A CN116522227A (en) 2023-04-28 2023-04-28 Charging user image drawing method and device based on multi-label data, medium and equipment

Publications (1)

Publication Number Publication Date
CN116522227A true CN116522227A (en) 2023-08-01

Family

ID=87398894

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310483328.2A Pending CN116522227A (en) 2023-04-28 2023-04-28 Charging user image drawing method and device based on multi-label data, medium and equipment

Country Status (1)

Country Link
CN (1) CN116522227A (en)

Similar Documents

Publication Publication Date Title
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
CN112819023B (en) Sample set acquisition method, device, computer equipment and storage medium
CN108399564B (en) Credit scoring method and device
CN106021362A (en) Query picture characteristic representation generation method and device, and picture search method and device
CN107657048A (en) user identification method and device
CN110119477A (en) A kind of information-pushing method, device and storage medium
CN112100506B (en) Information push method, system, device and storage medium
CN110795613B (en) Commodity searching method, device and system and electronic equipment
WO2020007177A1 (en) Quotation method executed by computer, quotation device, electronic device and storage medium
CN111507403A (en) Image classification method, apparatus, computer equipment and storage medium
CN113297472A (en) Method and device for releasing video content and commodity object information and electronic equipment
CN118747235A (en) A library book push method and system based on behavior analysis
CN118468061A (en) Automatic algorithm matching and parameter optimizing method and system
CN112967100B (en) Similar crowd expansion method, device, computing equipment and medium
CN117891939A (en) Text classification method combining particle swarm algorithm with CNN convolutional neural network
CN115393666B (en) Small sample expansion method and system based on prototype completion in image classification
CN112069269B (en) Big data and multidimensional feature-based data tracing method and big data cloud server
CN115204984A (en) Business product push method, apparatus, computer equipment and storage medium
CN105701227A (en) Cross-media similarity measure method and search method based on local association graph
CN109034392A (en) The selection and system of a kind of Tilapia mossambica corss combination system
CN117638950B (en) Electricity utilization strategy recommendation method and device, electronic equipment and storage medium
CN109144999B (en) Data positioning method, device, storage medium and program product
CN116522227A (en) Charging user image drawing method and device based on multi-label data, medium and equipment
CN116522225A (en) Method and device for generating charge value image, electronic device and readable storage medium
CN115760181A (en) Knowledge map construction method, device, and storage medium based on multidimensional charging behavior

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination