CN116151854A - User type determining method, device, equipment and storage medium - Google Patents

User type determining method, device, equipment and storage medium Download PDF

Info

Publication number
CN116151854A
CN116151854A CN202211516621.6A CN202211516621A CN116151854A CN 116151854 A CN116151854 A CN 116151854A CN 202211516621 A CN202211516621 A CN 202211516621A CN 116151854 A CN116151854 A CN 116151854A
Authority
CN
China
Prior art keywords
feature set
user
target
feature
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211516621.6A
Other languages
Chinese (zh)
Inventor
程辉
高若田
李高扬
张昊
尹泽楠
林晓静
赵晓龙
张海峰
米娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Big Data Center Of State Grid Corp Of China
Original Assignee
Big Data Center Of State Grid Corp Of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Big Data Center Of State Grid Corp Of China filed Critical Big Data Center Of State Grid Corp Of China
Priority to CN202211516621.6A priority Critical patent/CN116151854A/en
Publication of CN116151854A publication Critical patent/CN116151854A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Forestry; Mining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Agronomy & Crop Science (AREA)
  • Mining & Mineral Resources (AREA)
  • Marine Sciences & Fisheries (AREA)
  • Animal Husbandry (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a user type determining method, device, equipment and storage medium. The method comprises the following steps: acquiring electricity consumption data of users to be classified; acquiring a first feature set corresponding to the power consumption data of the user to be classified; performing feature extraction on the electricity utilization data of the users to be classified based on a TSfresh tool to obtain a second feature set; determining a target feature set according to the first feature set and the second feature set; and determining a user type corresponding to the user to be classified according to the target feature set, wherein the user type comprises: the agricultural irrigation users or non-agricultural irrigation users can be accurately identified through the technical scheme of the invention.

Description

User type determining method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a user type determining method, device, equipment and storage medium.
Background
The current agricultural irrigation motor-pumped well basically realizes electrified transformation, and has the condition of converting the water consumption of the motor-pumped well (hereinafter referred to as electric water-saving) based on the power consumption of the motor-pumped well. The premise of developing by electric water folding work is to realize accurate matching of hydropower files, wherein the agricultural irrigation user files corresponding to user electricity data are generally obtained through manual statistics.
The agricultural irrigation user files are obtained through manual statistics, the labor cost is high, the requirements on statistics staff are high, certain subjective factors exist, and the generated agricultural irrigation user files are low in accuracy.
Disclosure of Invention
The embodiment of the invention provides a user type determining method, device, equipment and storage medium, so as to realize accurate identification of agricultural irrigation users.
According to an aspect of the present invention, there is provided a user type determining method, including:
acquiring electricity consumption data of users to be classified;
acquiring a first feature set corresponding to the power consumption data of the user to be classified;
performing feature extraction on the electricity utilization data of the users to be classified based on a TSfresh tool to obtain a second feature set;
determining a target feature set according to the first feature set and the second feature set;
and determining a user type corresponding to the user to be classified according to the target feature set, wherein the user type comprises: agricultural irrigation users or non-agricultural irrigation users.
According to another aspect of the present invention, there is provided a user type determining apparatus including:
the first acquisition module is used for acquiring electricity utilization data of users to be classified;
the second acquisition module is used for acquiring a first feature set corresponding to the power consumption data of the user to be classified;
the feature extraction module is used for carrying out feature extraction on the power consumption data of the users to be classified based on a TSfresh tool to obtain a second feature set;
the feature set determining module is used for determining a target feature set according to the first feature set and the second feature set;
the user type determining module is configured to determine a user type corresponding to the user to be classified according to the target feature set, where the user type includes: agricultural irrigation users or non-agricultural irrigation users.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the user type determination method according to any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement a user type determining method according to any one of the embodiments of the present invention when executed.
The embodiment of the invention obtains the electricity consumption data of the users to be classified; acquiring a first feature set corresponding to the power consumption data of the user to be classified; performing feature extraction on the electricity utilization data of the users to be classified based on a TSfresh tool to obtain a second feature set; determining a target feature set according to the first feature set and the second feature set; and determining the user type corresponding to the user to be classified according to the target feature set, so that the agricultural irrigation user can be accurately identified.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a user type determination method in an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a user type determining apparatus in an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.
Example 1
Fig. 1 is a flowchart of a method for determining a user type according to an embodiment of the present invention, where the method may be applied to a case of determining a user type, and the method may be performed by a user type determining device according to an embodiment of the present invention, where the device may be implemented in a software and/or hardware manner, as shown in fig. 1, and the method specifically includes the following steps:
s110, acquiring electricity consumption data of users to be classified.
The electricity consumption data of the users to be classified can include: the annual electricity consumption of the users to be classified can also be: the embodiment of the invention does not limit the electricity consumption data of the users to be classified in the set time.
S120, acquiring a first feature set corresponding to the electricity consumption data of the users to be classified.
Wherein the first feature set includes: the characteristics of seasonal variation of agricultural irrigation and drainage and agricultural production electricity and the characteristics of abrupt change of power are characterized. For example, the first feature set may include: the characteristics of a day 96-day curve of the maximum power consumption of the user all the year, the characteristics of a month curve of the irrigation season, and 39-dimensional characteristics of average power, maximum power, variation coefficient and the like all the year.
Specifically, the method for obtaining the first feature set corresponding to the electricity consumption data of the user to be classified may be: and acquiring a first manually marked feature set corresponding to the power consumption data of the user to be classified. For example, according to different types of agricultural irrigation users, the aspects of irrigation area, crop type, irrigation mode and the like are comprehensively considered, and the manual marking work is completed for partial agricultural irrigation users with 380V or 10KV power consumption.
And S130, carrying out feature extraction on the electricity utilization data of the users to be classified based on a TSfresh tool to obtain a second feature set.
The first feature set, which is manually marked, cannot comprehensively extract useful features because the power consumption curve of the agricultural irrigation user has time series characteristics and covers annual data. The automatic time sequence feature generation tool Tswitch needs to be introduced as an aid to extract the features 75744 in total. 26-dimensional characteristic variables (the characteristic importance is more than or equal to 0.01) are screened out by adopting the RF base-Ni index.
And S140, determining a target feature set according to the first feature set and the second feature set.
Specifically, the manner of determining the target feature set according to the first feature set and the second feature set may be: acquiring a base index corresponding to each feature in the second feature set; screening the features in the second feature set according to the base index corresponding to each feature in the second feature set to obtain a screened second feature set; and determining a target feature set according to the first feature set and the screened second feature set. The manner of determining the target feature set from the first feature set and the second feature set may further be: the method comprises the steps of obtaining a first weight corresponding to a first feature set and a second weight corresponding to a second feature set, screening the first feature set based on the first weight to obtain a screened first feature set, screening the second feature set based on the second weight to obtain a screened second feature set, and determining the union of the screened first feature set and the screened second feature set as a target feature set.
In a specific example, the target feature set is a 56-dimensional fusion feature comprising: and (5) screening out 30-dimensional artificial features and 26-dimensional time domain features generated by Tsfresh.
S150, determining the user type corresponding to the user to be classified according to the target feature set.
Wherein the user types include: agricultural irrigation users or non-agricultural irrigation users.
Specifically, the manner of determining the user type corresponding to the user to be classified according to the target feature set may be: and inputting the target feature set into a target model to obtain the user type corresponding to the user to be classified. The method for determining the user type corresponding to the user to be classified according to the target feature set may further be: inputting the target feature set into at least two target models to obtain the user type output by each target model; the end user type is determined on a majority voting basis.
In a specific example, a target feature set (56-dimensional fusion feature) corresponding to the electricity consumption data of the users to be classified is obtained, a feature matrix is constructed according to the target feature set, the feature matrix is input into a trained model, after calculation, the model outputs 0 or 1,0 to represent the predicted value as a non-agricultural irrigation user, and 1 to represent the predicted value as an agricultural irrigation user.
Optionally, determining the user type corresponding to the user to be classified according to the target feature set includes:
inputting the target feature set into at least one target model to obtain a user type output by each target model, wherein the target model is obtained by iteratively training a classification model through a target sample set;
and determining the user type corresponding to the user to be classified according to the user type output by the at least one target model.
The target model may be plural, for example, there may be a first target model, a second target model, a third target model, and a fourth target model.
The classification model may be at least one of RF, XGBoost, KNN and SVC classification models. Wherein, random Forest (RF) belongs to the bagging algorithm in the integrated learning, and the learning of the basic learner is parallel. The method comprises the steps of randomly extracting data from an original sample by adopting a Bootstrap resampling technology to construct a plurality of samples, then constructing a plurality of decision trees by adopting a node random splitting technology for each resampled sample, finally combining the plurality of decision trees, and obtaining a final prediction result through voting.
Extreme gradient boosting (eXtreme Gradient Boosting, XGBoost) belongs to a boosting algorithm in ensemble learning, whose learning by the base learner is serial. XGBoost is different from the traditional GBDT in that only the information of the first derivative is utilized, the second-order Taylor expansion is carried out on the loss function, a regularization term is added in the objective function, the optimal solution is obtained in the whole, and the optimal solution is used for balancing the complexity degree of the objective function and the model, so that overfitting is prevented.
The KNN (K-Nearest Neighbor) method, which is the K Nearest Neighbor method, belongs to a classification algorithm in supervised learning. The algorithm thought is simple and visual: if a sample belongs to a class for the majority of the K most similar (i.e., nearest neighbor) samples in the feature space, then the sample also belongs to that class.
The support vector machine (Support Vector Machine, SVM) is a generalized linear classifier for binary classification of data in a supervised learning mode, and the decision boundary is the maximum margin hyperplane for solving the learning sample. The SVM calculates the empirical risk by using the hinge loss function and adds a regularization term in the solution to optimize the structural risk, and is a classifier with sparsity and robustness.
Specifically, the manner of determining the user type corresponding to the user to be classified according to the user type output by the at least one target model may be: processing the user type output by at least one target model based on the soft voting strategy to obtain the user type corresponding to the user to be classified, for example, the method can be that the target feature set is input into a first target model to obtain a first user type, the target feature set is input into a second target model to obtain a second user type, the target feature set is input into a third target model to obtain a third user type, and if the first user type and the second user type are both agricultural irrigation users, the user type corresponding to the user to be classified is determined to be the agricultural irrigation user.
It should be noted that, the voting method is an integrated learning model that follows a few rules of majority compliance, and the variance is reduced by integrating multiple models, so as to improve the robustness of the model. In an ideal case, the predictive effect of the voting method should be better than that of any one of the base models. In practical applications, voting method yields two requirements for better results: the effect between the base models cannot be too different; there should be less homogeneity between the base models. In view of this, the classification model in the embodiment of the invention adopts RF, XGBoost, KNN, SVC multiple differential base classifiers, and the model integration adopts a voting strategy so as to improve the generalization performance of the whole model.
Optionally, iteratively training the classification model by the target sample set includes:
obtaining a target sample set, wherein the target sample set comprises: the feature set sample and the user type corresponding to the feature set sample;
inputting the feature set samples in the target sample set into a classification model to obtain a predicted user type;
training parameters of the classification model according to an objective function formed by the predicted user type and the user type corresponding to the feature set sample;
and returning to execute the operation of inputting the characteristic set samples in the target sample set into a classification model to obtain the predicted user type until the target model is obtained.
Wherein the target sample set comprises: positive and negative samples, the positive samples comprising: the feature set sample and the user type (agricultural irrigation user) corresponding to the feature set sample, and the negative sample comprises: the feature set sample and the user type (non-agricultural irrigation user) corresponding to the feature set sample.
In a specific example, a plurality of classifiers such as RF, XGBoost, KNN, SVC, voting are respectively trained by adopting a grid search (mainly used for parameter tuning) +5-fold cross validation (mainly used for reducing overfitting of a model), and evaluation indexes such as classification accuracy, precision, recall ratio, F1 value and the like of the model are calculated based on a classification model confusion matrix to complete training and optimization work of the classification model. The Random Forest (RF) has the advantages that firstly, the modeling of the RF on the nonlinear characteristics is better, the nonlinear characteristics of the time series data are more, and the method is easy to realize. Secondly, two randomness of data sampling and feature sampling are used in the RF construction process, the generalization capability of the algorithm is strong, and the selected features are effective.
In another specific example, feature construction is performed on the marked set data (340-user electricity consumption data of agricultural irrigation users and 1320-user electricity consumption data of non-agricultural irrigation users), and a 56-dimensional feature matrix is generated on each user electricity consumption data. A target sample set is generated based on the 56-dimensional feature matrix generated from the per-household electricity data and the user type.
Optionally, obtaining the target sample set includes:
acquiring electricity consumption data of agricultural irrigation users and electricity consumption data of non-agricultural irrigation users;
acquiring a third characteristic set corresponding to electricity consumption data of an agricultural irrigation user;
performing feature extraction on the electricity consumption data of the agricultural irrigation users based on a TSfresh tool to obtain a fourth feature set;
determining a fifth feature set corresponding to the agricultural irrigation user according to the third feature set and the fourth feature set, and determining the fifth feature set corresponding to the agricultural irrigation user as a positive sample;
acquiring a sixth feature set corresponding to electricity consumption data of non-agricultural irrigation users;
performing feature extraction on the electricity consumption data of the non-agricultural irrigation users based on a TSfresh tool to obtain a seventh feature set;
determining an eighth feature set corresponding to a non-agricultural irrigation user according to the sixth feature set and the seventh feature set, and determining the eighth feature set corresponding to the non-agricultural irrigation user as a negative sample;
a target sample set is generated from the positive samples and the negative samples.
The third characteristic set corresponding to the electricity consumption data of the agricultural irrigation users is characterized by representing seasonal variation conditions of agricultural irrigation and drainage and agricultural production electricity consumption and representing abrupt change conditions of power. For example, the third feature set may include: the characteristics of a 96-day curve of the annual maximum power consumption of the agricultural irrigation users, the characteristics of a month curve of the irrigation season, and 39-dimensional characteristics of annual average power, maximum power, variation coefficient and the like.
Specifically, the manner of obtaining the third feature set corresponding to the electricity consumption data of the agricultural irrigation user may be: and acquiring a third characteristic set of the manual annotation corresponding to the electricity consumption data of the agricultural irrigation user.
The first feature set, which is manually marked, cannot comprehensively extract useful features because the power consumption curve of the agricultural irrigation user has time series characteristics and covers annual data. The automatic time sequence feature generation tool Tswitch needs to be introduced as an aid to extract the features 75744 in total. 26-dimensional characteristic variables (the characteristic importance is more than or equal to 0.01) are screened out by adopting the RF base-Ni index.
Specifically, the determining, according to the third feature set and the fourth feature set, the fifth feature set corresponding to the agricultural irrigation user may be: acquiring a base index corresponding to each feature in the fourth feature set; screening the features in the fourth feature set according to the base index corresponding to each feature in the fourth feature set to obtain a screened fourth feature set; and determining a fifth feature set corresponding to the agricultural irrigation user according to the third feature set and the filtered fourth feature set. The method for determining the fifth feature set corresponding to the agricultural irrigation user according to the third feature set and the fourth feature set may further be: the method comprises the steps of obtaining weights corresponding to a third feature set and weights corresponding to a fourth feature set, screening the third feature set based on the weights corresponding to the third feature set to obtain a screened third feature set, screening the fourth feature set based on the weights corresponding to the fourth feature set to obtain a screened fourth feature set, and determining the union of the screened third feature set and the screened fourth feature set as a fifth feature set corresponding to a pesticide irrigation user.
Wherein the positive samples include: and the agricultural irrigation users and fifth feature sets corresponding to the agricultural irrigation users.
The sixth characteristic set corresponding to the electricity consumption data of the non-agricultural irrigation users is characterized by the seasonal change condition of the electricity consumption of agricultural irrigation and drainage and agricultural production and the abrupt change condition of the power. For example, the sixth feature set may include: the characteristics of a day 96-day curve of the maximum annual power consumption of non-agricultural irrigation users, the characteristics of a month curve of irrigation season, and 39-dimensional characteristics of annual average power, maximum power, variation coefficient and the like.
Specifically, the sixth feature set mode corresponding to the electricity consumption data of the non-agricultural irrigation user may be: and obtaining a sixth manually marked feature set corresponding to the electricity consumption data of the non-agricultural irrigation user.
Specifically, the determining, according to the sixth feature set and the seventh feature set, the eighth feature set corresponding to the non-agricultural irrigation user may be: acquiring a base index corresponding to each feature in the seventh feature set; screening the features in the seventh feature set according to the base index corresponding to each feature in the seventh feature set to obtain a screened seventh feature set; and determining an eighth feature set corresponding to the non-agricultural irrigation user according to the sixth feature set and the filtered seventh feature set. The method for determining the eighth feature set corresponding to the non-agricultural irrigation user according to the sixth feature set and the seventh feature set may further be: the method comprises the steps of obtaining weights corresponding to a sixth feature set and weights corresponding to a seventh feature set, screening the sixth feature set based on the weights corresponding to the sixth feature set to obtain a screened sixth feature set, screening the seventh feature set based on the weights corresponding to the seventh feature set to obtain a screened seventh feature set, and determining a union set of the screened sixth feature set and the screened seventh feature set as an eighth feature set corresponding to a non-agricultural irrigation user.
Wherein the negative samples include: and the eighth feature set corresponds to the non-agricultural irrigation users and the non-agricultural irrigation users.
Optionally, generating a target sample set according to the positive sample and the negative sample includes:
and oversampling the positive sample and the negative sample to obtain a target sample set.
Because the quantity difference between the agricultural irrigation users and the non-agricultural irrigation users is large, the positive samples and the negative samples are required to be subjected to equalization processing based on a smote oversampling method, so that the imbalance of the positive and negative samples is ensured not to influence the model training result.
Optionally, determining a target feature set according to the first feature set and the second feature set includes:
acquiring a base index corresponding to each feature in the second feature set;
screening the features in the second feature set according to the base index corresponding to each feature in the second feature set to obtain a screened second feature set;
and determining a target feature set according to the first feature set and the screened second feature set.
Specifically, the screened 30-dimensional artificial features and 26-dimensional Tsfresh time domain features are spliced together to form a target feature set. Among the top 20 features of importance ranking, the Tsfresh time domain feature accounts for 65%, including: calculating complexity estimation, approximate entropy, fourier coefficients of discrete Fourier transform, autoregressive coefficients and the like according to a Lempel-Ziv compression algorithm by the time sequence; the artificial characteristic accounts for 35 percent, comprising: 3. maximum power of 4, 5, 9 months, average power of 6, 9, 10 months, etc. It can be seen that Tsfresh effectively extracts the hidden characteristics in the power data of the electric users for agricultural irrigation and drainage and agricultural production, and simultaneously makes up the defects of the artificial characteristics.
The basic idea of RF feature importance assessment is that: and calculating the contribution value of each feature to each tree in the RF, taking an average value, and comparing and sequencing the contribution values among the features. The contribution of features to each tree can be measured generally by the base index (Gini index) or the out-of-bag data (OOB) error rate as an evaluation index. Considering that the feature importance degree sequences of the feature importance degree sequences are consistent, the calculation efficiency of the Gini importance degree is high, and the evaluation of model classification errors by a test set is not needed, so that the embodiment of the invention adopts the base index as an evaluation index of the feature importance degree.
Optionally, determining a target feature set according to the first feature set and the screened second feature set includes:
screening the features in the first feature set to obtain a screened first feature set;
and generating a target feature set according to the first feature set after screening and the second feature set after screening.
Specifically, the method for screening the features in the first feature set to obtain the screened first feature set may be: and acquiring a quantity threshold, and screening the features in the first feature set according to the quantity threshold to obtain a screened first feature set. For example, if the first feature set includes 39 features, and the number threshold is 30, 30 features need to be screened from the first feature set.
According to the technical scheme, electricity consumption data of users to be classified are obtained; acquiring a first feature set corresponding to the power consumption data of the user to be classified; performing feature extraction on the electricity utilization data of the users to be classified based on a TSfresh tool to obtain a second feature set; determining a target feature set according to the first feature set and the second feature set; and determining the user type corresponding to the user to be classified according to the target feature set, so that the agricultural irrigation user can be accurately identified.
Example two
Fig. 2 is a schematic structural diagram of a user type determining apparatus according to an embodiment of the present invention. The present embodiment may be applied to the case of user type determination, and the apparatus may be implemented in software and/or hardware, and the apparatus may be integrated in any device that provides a user type determining function, as shown in fig. 2, where the user type determining apparatus specifically includes: the first acquisition module 210, the second acquisition module 220, the feature extraction module 230, the feature set determination module 240, and the user type determination module 250.
The first acquisition module is used for acquiring electricity utilization data of users to be classified;
the second acquisition module is used for acquiring a first feature set corresponding to the power consumption data of the user to be classified;
the feature extraction module is used for carrying out feature extraction on the power consumption data of the users to be classified based on a TSfresh tool to obtain a second feature set;
the feature set determining module is used for determining a target feature set according to the first feature set and the second feature set;
the user type determining module is configured to determine a user type corresponding to the user to be classified according to the target feature set, where the user type includes: agricultural irrigation users or non-agricultural irrigation users.
The product can execute the method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example III
Fig. 3 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 3, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the user type determination method.
In some embodiments, the user type determination method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the user type determination method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the user type determination method in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for determining a user type, comprising:
acquiring electricity consumption data of users to be classified;
acquiring a first feature set corresponding to the power consumption data of the user to be classified;
performing feature extraction on the electricity utilization data of the users to be classified based on a TSfresh tool to obtain a second feature set;
determining a target feature set according to the first feature set and the second feature set;
and determining a user type corresponding to the user to be classified according to the target feature set, wherein the user type comprises: agricultural irrigation users or non-agricultural irrigation users.
2. The method of claim 1, wherein determining the user type corresponding to the user to be classified according to the target feature set comprises:
inputting the target feature set into at least one target model to obtain a user type output by each target model, wherein the target model is obtained by iteratively training a classification model through a target sample set;
and determining the user type corresponding to the user to be classified according to the user type output by the at least one target model.
3. The method of claim 2, wherein iteratively training the classification model through the set of target samples comprises:
obtaining a target sample set, wherein the target sample set comprises: the feature set sample and the user type corresponding to the feature set sample;
inputting the feature set samples in the target sample set into a classification model to obtain a predicted user type;
training parameters of the classification model according to an objective function formed by the predicted user type and the user type corresponding to the feature set sample;
and returning to execute the operation of inputting the characteristic set samples in the target sample set into a classification model to obtain the predicted user type until the target model is obtained.
4. A method according to claim 3, wherein obtaining a target sample set comprises:
acquiring electricity consumption data of agricultural irrigation users and electricity consumption data of non-agricultural irrigation users;
acquiring a third characteristic set corresponding to electricity consumption data of an agricultural irrigation user;
performing feature extraction on the electricity consumption data of the agricultural irrigation users based on a TSfresh tool to obtain a fourth feature set;
determining a fifth feature set corresponding to the agricultural irrigation user according to the third feature set and the fourth feature set, and determining the fifth feature set corresponding to the agricultural irrigation user as a positive sample;
acquiring a sixth feature set corresponding to electricity consumption data of non-agricultural irrigation users;
performing feature extraction on the electricity consumption data of the non-agricultural irrigation users based on a TSfresh tool to obtain a seventh feature set;
determining an eighth feature set corresponding to a non-agricultural irrigation user according to the sixth feature set and the seventh feature set, and determining the eighth feature set corresponding to the non-agricultural irrigation user as a negative sample;
a target sample set is generated from the positive samples and the negative samples.
5. The method of claim 4, wherein generating a set of target samples from the positive samples and the negative samples comprises:
and oversampling the positive sample and the negative sample to obtain a target sample set.
6. The method of claim 1, wherein determining a target feature set from the first feature set and the second feature set comprises:
acquiring a base index corresponding to each feature in the second feature set;
screening the features in the second feature set according to the base index corresponding to each feature in the second feature set to obtain a screened second feature set;
and determining a target feature set according to the first feature set and the screened second feature set.
7. The method of claim 6, wherein determining a target feature set from the first feature set and the filtered second feature set comprises:
screening the features in the first feature set to obtain a screened first feature set;
and generating a target feature set according to the first feature set after screening and the second feature set after screening.
8. A user type determining apparatus, comprising:
the first acquisition module is used for acquiring electricity utilization data of users to be classified;
the second acquisition module is used for acquiring a first feature set corresponding to the power consumption data of the user to be classified;
the feature extraction module is used for carrying out feature extraction on the power consumption data of the users to be classified based on a TSfresh tool to obtain a second feature set;
the feature set determining module is used for determining a target feature set according to the first feature set and the second feature set;
the user type determining module is configured to determine a user type corresponding to the user to be classified according to the target feature set, where the user type includes: agricultural irrigation users or non-agricultural irrigation users.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the user type determination method of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to perform the user type determination method of any one of claims 1-7.
CN202211516621.6A 2022-11-29 2022-11-29 User type determining method, device, equipment and storage medium Pending CN116151854A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211516621.6A CN116151854A (en) 2022-11-29 2022-11-29 User type determining method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211516621.6A CN116151854A (en) 2022-11-29 2022-11-29 User type determining method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116151854A true CN116151854A (en) 2023-05-23

Family

ID=86353322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211516621.6A Pending CN116151854A (en) 2022-11-29 2022-11-29 User type determining method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116151854A (en)

Similar Documents

Publication Publication Date Title
Cheng et al. Enhanced state estimation and bad data identification in active power distribution networks using photovoltaic power forecasting
CN111178585A (en) Fault reporting amount prediction method based on multi-algorithm model fusion
CN112418476A (en) Ultra-short-term power load prediction method
CN113361785A (en) Power distribution network short-term load prediction method and device, terminal and storage medium
CN113408808B (en) Training method, data generation device, electronic equipment and storage medium
CN111027841A (en) Low-voltage transformer area line loss calculation method based on gradient lifting decision tree
CN113151842B (en) Method and device for determining conversion efficiency of wind-solar complementary water electrolysis hydrogen production
CN112528159B (en) Feature quality assessment method and device, electronic equipment and storage medium
CN116151854A (en) User type determining method, device, equipment and storage medium
CN115965160A (en) Data center energy consumption prediction method and device, storage medium and electronic equipment
CN106816871B (en) State similarity analysis method for power system
CN114722941A (en) Credit default identification method, apparatus, device and medium
CN114462447A (en) Voltage sag identification method and device, computer equipment and storage medium
CN114254828A (en) Power load prediction method based on hybrid convolution feature extractor and GRU
Zhang et al. Load prediction based on depthwise separable convolution model
Nie et al. Global Rényi index of the distance matrix
CN114066278B (en) Method, apparatus, medium, and program product for evaluating article recall
CN117934137A (en) Bad asset recovery prediction method, device and equipment based on model fusion
CN113723835B (en) Water consumption evaluation method and terminal equipment for thermal power plant
CN112365280B (en) Electric power demand prediction method and device
CN117828364A (en) Reduction degree determination method, device, equipment and storage medium
Chen et al. Smart Meter Fault Prediction Based on One-dimensional Convolution Neural Network Integrated Model
CN117649052A (en) Photovoltaic power station combined output typical scene generation method, device and computer equipment
CN118100151A (en) Power grid load prediction method, device, equipment and storage medium
Li et al. Research on Daily Load Curve Classification Based on Improved Fuzzy C-means Clustering Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination