CN116151854A - User type determining method, device, equipment and storage medium - Google Patents
User type determining method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN116151854A CN116151854A CN202211516621.6A CN202211516621A CN116151854A CN 116151854 A CN116151854 A CN 116151854A CN 202211516621 A CN202211516621 A CN 202211516621A CN 116151854 A CN116151854 A CN 116151854A
- Authority
- CN
- China
- Prior art keywords
- feature set
- user
- target
- feature
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000003860 storage Methods 0.000 title claims abstract description 18
- 238000003973 irrigation Methods 0.000 claims abstract description 84
- 230000002262 irrigation Effects 0.000 claims abstract description 84
- 230000005611 electricity Effects 0.000 claims abstract description 47
- 238000000605 extraction Methods 0.000 claims abstract description 17
- 238000012216 screening Methods 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 16
- 238000013145 classification model Methods 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 8
- 238000007637 random forest analysis Methods 0.000 description 13
- 238000004891 communication Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 5
- 238000012271 agricultural production Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 239000000284 extract Substances 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000001932 seasonal effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 238000012952 Resampling Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000000575 pesticide Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/02—Agriculture; Fishing; Forestry; Mining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- General Physics & Mathematics (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Tourism & Hospitality (AREA)
- Primary Health Care (AREA)
- Human Resources & Organizations (AREA)
- General Health & Medical Sciences (AREA)
- Agronomy & Crop Science (AREA)
- Mining & Mineral Resources (AREA)
- Marine Sciences & Fisheries (AREA)
- Animal Husbandry (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a user type determining method, device, equipment and storage medium. The method comprises the following steps: acquiring electricity consumption data of users to be classified; acquiring a first feature set corresponding to the power consumption data of the user to be classified; performing feature extraction on the electricity utilization data of the users to be classified based on a TSfresh tool to obtain a second feature set; determining a target feature set according to the first feature set and the second feature set; and determining a user type corresponding to the user to be classified according to the target feature set, wherein the user type comprises: the agricultural irrigation users or non-agricultural irrigation users can be accurately identified through the technical scheme of the invention.
Description
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a user type determining method, device, equipment and storage medium.
Background
The current agricultural irrigation motor-pumped well basically realizes electrified transformation, and has the condition of converting the water consumption of the motor-pumped well (hereinafter referred to as electric water-saving) based on the power consumption of the motor-pumped well. The premise of developing by electric water folding work is to realize accurate matching of hydropower files, wherein the agricultural irrigation user files corresponding to user electricity data are generally obtained through manual statistics.
The agricultural irrigation user files are obtained through manual statistics, the labor cost is high, the requirements on statistics staff are high, certain subjective factors exist, and the generated agricultural irrigation user files are low in accuracy.
Disclosure of Invention
The embodiment of the invention provides a user type determining method, device, equipment and storage medium, so as to realize accurate identification of agricultural irrigation users.
According to an aspect of the present invention, there is provided a user type determining method, including:
acquiring electricity consumption data of users to be classified;
acquiring a first feature set corresponding to the power consumption data of the user to be classified;
performing feature extraction on the electricity utilization data of the users to be classified based on a TSfresh tool to obtain a second feature set;
determining a target feature set according to the first feature set and the second feature set;
and determining a user type corresponding to the user to be classified according to the target feature set, wherein the user type comprises: agricultural irrigation users or non-agricultural irrigation users.
According to another aspect of the present invention, there is provided a user type determining apparatus including:
the first acquisition module is used for acquiring electricity utilization data of users to be classified;
the second acquisition module is used for acquiring a first feature set corresponding to the power consumption data of the user to be classified;
the feature extraction module is used for carrying out feature extraction on the power consumption data of the users to be classified based on a TSfresh tool to obtain a second feature set;
the feature set determining module is used for determining a target feature set according to the first feature set and the second feature set;
the user type determining module is configured to determine a user type corresponding to the user to be classified according to the target feature set, where the user type includes: agricultural irrigation users or non-agricultural irrigation users.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the user type determination method according to any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement a user type determining method according to any one of the embodiments of the present invention when executed.
The embodiment of the invention obtains the electricity consumption data of the users to be classified; acquiring a first feature set corresponding to the power consumption data of the user to be classified; performing feature extraction on the electricity utilization data of the users to be classified based on a TSfresh tool to obtain a second feature set; determining a target feature set according to the first feature set and the second feature set; and determining the user type corresponding to the user to be classified according to the target feature set, so that the agricultural irrigation user can be accurately identified.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a user type determination method in an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a user type determining apparatus in an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.
Example 1
Fig. 1 is a flowchart of a method for determining a user type according to an embodiment of the present invention, where the method may be applied to a case of determining a user type, and the method may be performed by a user type determining device according to an embodiment of the present invention, where the device may be implemented in a software and/or hardware manner, as shown in fig. 1, and the method specifically includes the following steps:
s110, acquiring electricity consumption data of users to be classified.
The electricity consumption data of the users to be classified can include: the annual electricity consumption of the users to be classified can also be: the embodiment of the invention does not limit the electricity consumption data of the users to be classified in the set time.
S120, acquiring a first feature set corresponding to the electricity consumption data of the users to be classified.
Wherein the first feature set includes: the characteristics of seasonal variation of agricultural irrigation and drainage and agricultural production electricity and the characteristics of abrupt change of power are characterized. For example, the first feature set may include: the characteristics of a day 96-day curve of the maximum power consumption of the user all the year, the characteristics of a month curve of the irrigation season, and 39-dimensional characteristics of average power, maximum power, variation coefficient and the like all the year.
Specifically, the method for obtaining the first feature set corresponding to the electricity consumption data of the user to be classified may be: and acquiring a first manually marked feature set corresponding to the power consumption data of the user to be classified. For example, according to different types of agricultural irrigation users, the aspects of irrigation area, crop type, irrigation mode and the like are comprehensively considered, and the manual marking work is completed for partial agricultural irrigation users with 380V or 10KV power consumption.
And S130, carrying out feature extraction on the electricity utilization data of the users to be classified based on a TSfresh tool to obtain a second feature set.
The first feature set, which is manually marked, cannot comprehensively extract useful features because the power consumption curve of the agricultural irrigation user has time series characteristics and covers annual data. The automatic time sequence feature generation tool Tswitch needs to be introduced as an aid to extract the features 75744 in total. 26-dimensional characteristic variables (the characteristic importance is more than or equal to 0.01) are screened out by adopting the RF base-Ni index.
And S140, determining a target feature set according to the first feature set and the second feature set.
Specifically, the manner of determining the target feature set according to the first feature set and the second feature set may be: acquiring a base index corresponding to each feature in the second feature set; screening the features in the second feature set according to the base index corresponding to each feature in the second feature set to obtain a screened second feature set; and determining a target feature set according to the first feature set and the screened second feature set. The manner of determining the target feature set from the first feature set and the second feature set may further be: the method comprises the steps of obtaining a first weight corresponding to a first feature set and a second weight corresponding to a second feature set, screening the first feature set based on the first weight to obtain a screened first feature set, screening the second feature set based on the second weight to obtain a screened second feature set, and determining the union of the screened first feature set and the screened second feature set as a target feature set.
In a specific example, the target feature set is a 56-dimensional fusion feature comprising: and (5) screening out 30-dimensional artificial features and 26-dimensional time domain features generated by Tsfresh.
S150, determining the user type corresponding to the user to be classified according to the target feature set.
Wherein the user types include: agricultural irrigation users or non-agricultural irrigation users.
Specifically, the manner of determining the user type corresponding to the user to be classified according to the target feature set may be: and inputting the target feature set into a target model to obtain the user type corresponding to the user to be classified. The method for determining the user type corresponding to the user to be classified according to the target feature set may further be: inputting the target feature set into at least two target models to obtain the user type output by each target model; the end user type is determined on a majority voting basis.
In a specific example, a target feature set (56-dimensional fusion feature) corresponding to the electricity consumption data of the users to be classified is obtained, a feature matrix is constructed according to the target feature set, the feature matrix is input into a trained model, after calculation, the model outputs 0 or 1,0 to represent the predicted value as a non-agricultural irrigation user, and 1 to represent the predicted value as an agricultural irrigation user.
Optionally, determining the user type corresponding to the user to be classified according to the target feature set includes:
inputting the target feature set into at least one target model to obtain a user type output by each target model, wherein the target model is obtained by iteratively training a classification model through a target sample set;
and determining the user type corresponding to the user to be classified according to the user type output by the at least one target model.
The target model may be plural, for example, there may be a first target model, a second target model, a third target model, and a fourth target model.
The classification model may be at least one of RF, XGBoost, KNN and SVC classification models. Wherein, random Forest (RF) belongs to the bagging algorithm in the integrated learning, and the learning of the basic learner is parallel. The method comprises the steps of randomly extracting data from an original sample by adopting a Bootstrap resampling technology to construct a plurality of samples, then constructing a plurality of decision trees by adopting a node random splitting technology for each resampled sample, finally combining the plurality of decision trees, and obtaining a final prediction result through voting.
Extreme gradient boosting (eXtreme Gradient Boosting, XGBoost) belongs to a boosting algorithm in ensemble learning, whose learning by the base learner is serial. XGBoost is different from the traditional GBDT in that only the information of the first derivative is utilized, the second-order Taylor expansion is carried out on the loss function, a regularization term is added in the objective function, the optimal solution is obtained in the whole, and the optimal solution is used for balancing the complexity degree of the objective function and the model, so that overfitting is prevented.
The KNN (K-Nearest Neighbor) method, which is the K Nearest Neighbor method, belongs to a classification algorithm in supervised learning. The algorithm thought is simple and visual: if a sample belongs to a class for the majority of the K most similar (i.e., nearest neighbor) samples in the feature space, then the sample also belongs to that class.
The support vector machine (Support Vector Machine, SVM) is a generalized linear classifier for binary classification of data in a supervised learning mode, and the decision boundary is the maximum margin hyperplane for solving the learning sample. The SVM calculates the empirical risk by using the hinge loss function and adds a regularization term in the solution to optimize the structural risk, and is a classifier with sparsity and robustness.
Specifically, the manner of determining the user type corresponding to the user to be classified according to the user type output by the at least one target model may be: processing the user type output by at least one target model based on the soft voting strategy to obtain the user type corresponding to the user to be classified, for example, the method can be that the target feature set is input into a first target model to obtain a first user type, the target feature set is input into a second target model to obtain a second user type, the target feature set is input into a third target model to obtain a third user type, and if the first user type and the second user type are both agricultural irrigation users, the user type corresponding to the user to be classified is determined to be the agricultural irrigation user.
It should be noted that, the voting method is an integrated learning model that follows a few rules of majority compliance, and the variance is reduced by integrating multiple models, so as to improve the robustness of the model. In an ideal case, the predictive effect of the voting method should be better than that of any one of the base models. In practical applications, voting method yields two requirements for better results: the effect between the base models cannot be too different; there should be less homogeneity between the base models. In view of this, the classification model in the embodiment of the invention adopts RF, XGBoost, KNN, SVC multiple differential base classifiers, and the model integration adopts a voting strategy so as to improve the generalization performance of the whole model.
Optionally, iteratively training the classification model by the target sample set includes:
obtaining a target sample set, wherein the target sample set comprises: the feature set sample and the user type corresponding to the feature set sample;
inputting the feature set samples in the target sample set into a classification model to obtain a predicted user type;
training parameters of the classification model according to an objective function formed by the predicted user type and the user type corresponding to the feature set sample;
and returning to execute the operation of inputting the characteristic set samples in the target sample set into a classification model to obtain the predicted user type until the target model is obtained.
Wherein the target sample set comprises: positive and negative samples, the positive samples comprising: the feature set sample and the user type (agricultural irrigation user) corresponding to the feature set sample, and the negative sample comprises: the feature set sample and the user type (non-agricultural irrigation user) corresponding to the feature set sample.
In a specific example, a plurality of classifiers such as RF, XGBoost, KNN, SVC, voting are respectively trained by adopting a grid search (mainly used for parameter tuning) +5-fold cross validation (mainly used for reducing overfitting of a model), and evaluation indexes such as classification accuracy, precision, recall ratio, F1 value and the like of the model are calculated based on a classification model confusion matrix to complete training and optimization work of the classification model. The Random Forest (RF) has the advantages that firstly, the modeling of the RF on the nonlinear characteristics is better, the nonlinear characteristics of the time series data are more, and the method is easy to realize. Secondly, two randomness of data sampling and feature sampling are used in the RF construction process, the generalization capability of the algorithm is strong, and the selected features are effective.
In another specific example, feature construction is performed on the marked set data (340-user electricity consumption data of agricultural irrigation users and 1320-user electricity consumption data of non-agricultural irrigation users), and a 56-dimensional feature matrix is generated on each user electricity consumption data. A target sample set is generated based on the 56-dimensional feature matrix generated from the per-household electricity data and the user type.
Optionally, obtaining the target sample set includes:
acquiring electricity consumption data of agricultural irrigation users and electricity consumption data of non-agricultural irrigation users;
acquiring a third characteristic set corresponding to electricity consumption data of an agricultural irrigation user;
performing feature extraction on the electricity consumption data of the agricultural irrigation users based on a TSfresh tool to obtain a fourth feature set;
determining a fifth feature set corresponding to the agricultural irrigation user according to the third feature set and the fourth feature set, and determining the fifth feature set corresponding to the agricultural irrigation user as a positive sample;
acquiring a sixth feature set corresponding to electricity consumption data of non-agricultural irrigation users;
performing feature extraction on the electricity consumption data of the non-agricultural irrigation users based on a TSfresh tool to obtain a seventh feature set;
determining an eighth feature set corresponding to a non-agricultural irrigation user according to the sixth feature set and the seventh feature set, and determining the eighth feature set corresponding to the non-agricultural irrigation user as a negative sample;
a target sample set is generated from the positive samples and the negative samples.
The third characteristic set corresponding to the electricity consumption data of the agricultural irrigation users is characterized by representing seasonal variation conditions of agricultural irrigation and drainage and agricultural production electricity consumption and representing abrupt change conditions of power. For example, the third feature set may include: the characteristics of a 96-day curve of the annual maximum power consumption of the agricultural irrigation users, the characteristics of a month curve of the irrigation season, and 39-dimensional characteristics of annual average power, maximum power, variation coefficient and the like.
Specifically, the manner of obtaining the third feature set corresponding to the electricity consumption data of the agricultural irrigation user may be: and acquiring a third characteristic set of the manual annotation corresponding to the electricity consumption data of the agricultural irrigation user.
The first feature set, which is manually marked, cannot comprehensively extract useful features because the power consumption curve of the agricultural irrigation user has time series characteristics and covers annual data. The automatic time sequence feature generation tool Tswitch needs to be introduced as an aid to extract the features 75744 in total. 26-dimensional characteristic variables (the characteristic importance is more than or equal to 0.01) are screened out by adopting the RF base-Ni index.
Specifically, the determining, according to the third feature set and the fourth feature set, the fifth feature set corresponding to the agricultural irrigation user may be: acquiring a base index corresponding to each feature in the fourth feature set; screening the features in the fourth feature set according to the base index corresponding to each feature in the fourth feature set to obtain a screened fourth feature set; and determining a fifth feature set corresponding to the agricultural irrigation user according to the third feature set and the filtered fourth feature set. The method for determining the fifth feature set corresponding to the agricultural irrigation user according to the third feature set and the fourth feature set may further be: the method comprises the steps of obtaining weights corresponding to a third feature set and weights corresponding to a fourth feature set, screening the third feature set based on the weights corresponding to the third feature set to obtain a screened third feature set, screening the fourth feature set based on the weights corresponding to the fourth feature set to obtain a screened fourth feature set, and determining the union of the screened third feature set and the screened fourth feature set as a fifth feature set corresponding to a pesticide irrigation user.
Wherein the positive samples include: and the agricultural irrigation users and fifth feature sets corresponding to the agricultural irrigation users.
The sixth characteristic set corresponding to the electricity consumption data of the non-agricultural irrigation users is characterized by the seasonal change condition of the electricity consumption of agricultural irrigation and drainage and agricultural production and the abrupt change condition of the power. For example, the sixth feature set may include: the characteristics of a day 96-day curve of the maximum annual power consumption of non-agricultural irrigation users, the characteristics of a month curve of irrigation season, and 39-dimensional characteristics of annual average power, maximum power, variation coefficient and the like.
Specifically, the sixth feature set mode corresponding to the electricity consumption data of the non-agricultural irrigation user may be: and obtaining a sixth manually marked feature set corresponding to the electricity consumption data of the non-agricultural irrigation user.
Specifically, the determining, according to the sixth feature set and the seventh feature set, the eighth feature set corresponding to the non-agricultural irrigation user may be: acquiring a base index corresponding to each feature in the seventh feature set; screening the features in the seventh feature set according to the base index corresponding to each feature in the seventh feature set to obtain a screened seventh feature set; and determining an eighth feature set corresponding to the non-agricultural irrigation user according to the sixth feature set and the filtered seventh feature set. The method for determining the eighth feature set corresponding to the non-agricultural irrigation user according to the sixth feature set and the seventh feature set may further be: the method comprises the steps of obtaining weights corresponding to a sixth feature set and weights corresponding to a seventh feature set, screening the sixth feature set based on the weights corresponding to the sixth feature set to obtain a screened sixth feature set, screening the seventh feature set based on the weights corresponding to the seventh feature set to obtain a screened seventh feature set, and determining a union set of the screened sixth feature set and the screened seventh feature set as an eighth feature set corresponding to a non-agricultural irrigation user.
Wherein the negative samples include: and the eighth feature set corresponds to the non-agricultural irrigation users and the non-agricultural irrigation users.
Optionally, generating a target sample set according to the positive sample and the negative sample includes:
and oversampling the positive sample and the negative sample to obtain a target sample set.
Because the quantity difference between the agricultural irrigation users and the non-agricultural irrigation users is large, the positive samples and the negative samples are required to be subjected to equalization processing based on a smote oversampling method, so that the imbalance of the positive and negative samples is ensured not to influence the model training result.
Optionally, determining a target feature set according to the first feature set and the second feature set includes:
acquiring a base index corresponding to each feature in the second feature set;
screening the features in the second feature set according to the base index corresponding to each feature in the second feature set to obtain a screened second feature set;
and determining a target feature set according to the first feature set and the screened second feature set.
Specifically, the screened 30-dimensional artificial features and 26-dimensional Tsfresh time domain features are spliced together to form a target feature set. Among the top 20 features of importance ranking, the Tsfresh time domain feature accounts for 65%, including: calculating complexity estimation, approximate entropy, fourier coefficients of discrete Fourier transform, autoregressive coefficients and the like according to a Lempel-Ziv compression algorithm by the time sequence; the artificial characteristic accounts for 35 percent, comprising: 3. maximum power of 4, 5, 9 months, average power of 6, 9, 10 months, etc. It can be seen that Tsfresh effectively extracts the hidden characteristics in the power data of the electric users for agricultural irrigation and drainage and agricultural production, and simultaneously makes up the defects of the artificial characteristics.
The basic idea of RF feature importance assessment is that: and calculating the contribution value of each feature to each tree in the RF, taking an average value, and comparing and sequencing the contribution values among the features. The contribution of features to each tree can be measured generally by the base index (Gini index) or the out-of-bag data (OOB) error rate as an evaluation index. Considering that the feature importance degree sequences of the feature importance degree sequences are consistent, the calculation efficiency of the Gini importance degree is high, and the evaluation of model classification errors by a test set is not needed, so that the embodiment of the invention adopts the base index as an evaluation index of the feature importance degree.
Optionally, determining a target feature set according to the first feature set and the screened second feature set includes:
screening the features in the first feature set to obtain a screened first feature set;
and generating a target feature set according to the first feature set after screening and the second feature set after screening.
Specifically, the method for screening the features in the first feature set to obtain the screened first feature set may be: and acquiring a quantity threshold, and screening the features in the first feature set according to the quantity threshold to obtain a screened first feature set. For example, if the first feature set includes 39 features, and the number threshold is 30, 30 features need to be screened from the first feature set.
According to the technical scheme, electricity consumption data of users to be classified are obtained; acquiring a first feature set corresponding to the power consumption data of the user to be classified; performing feature extraction on the electricity utilization data of the users to be classified based on a TSfresh tool to obtain a second feature set; determining a target feature set according to the first feature set and the second feature set; and determining the user type corresponding to the user to be classified according to the target feature set, so that the agricultural irrigation user can be accurately identified.
Example two
Fig. 2 is a schematic structural diagram of a user type determining apparatus according to an embodiment of the present invention. The present embodiment may be applied to the case of user type determination, and the apparatus may be implemented in software and/or hardware, and the apparatus may be integrated in any device that provides a user type determining function, as shown in fig. 2, where the user type determining apparatus specifically includes: the first acquisition module 210, the second acquisition module 220, the feature extraction module 230, the feature set determination module 240, and the user type determination module 250.
The first acquisition module is used for acquiring electricity utilization data of users to be classified;
the second acquisition module is used for acquiring a first feature set corresponding to the power consumption data of the user to be classified;
the feature extraction module is used for carrying out feature extraction on the power consumption data of the users to be classified based on a TSfresh tool to obtain a second feature set;
the feature set determining module is used for determining a target feature set according to the first feature set and the second feature set;
the user type determining module is configured to determine a user type corresponding to the user to be classified according to the target feature set, where the user type includes: agricultural irrigation users or non-agricultural irrigation users.
The product can execute the method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example III
Fig. 3 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 3, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the user type determination method.
In some embodiments, the user type determination method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the user type determination method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the user type determination method in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
Claims (10)
1. A method for determining a user type, comprising:
acquiring electricity consumption data of users to be classified;
acquiring a first feature set corresponding to the power consumption data of the user to be classified;
performing feature extraction on the electricity utilization data of the users to be classified based on a TSfresh tool to obtain a second feature set;
determining a target feature set according to the first feature set and the second feature set;
and determining a user type corresponding to the user to be classified according to the target feature set, wherein the user type comprises: agricultural irrigation users or non-agricultural irrigation users.
2. The method of claim 1, wherein determining the user type corresponding to the user to be classified according to the target feature set comprises:
inputting the target feature set into at least one target model to obtain a user type output by each target model, wherein the target model is obtained by iteratively training a classification model through a target sample set;
and determining the user type corresponding to the user to be classified according to the user type output by the at least one target model.
3. The method of claim 2, wherein iteratively training the classification model through the set of target samples comprises:
obtaining a target sample set, wherein the target sample set comprises: the feature set sample and the user type corresponding to the feature set sample;
inputting the feature set samples in the target sample set into a classification model to obtain a predicted user type;
training parameters of the classification model according to an objective function formed by the predicted user type and the user type corresponding to the feature set sample;
and returning to execute the operation of inputting the characteristic set samples in the target sample set into a classification model to obtain the predicted user type until the target model is obtained.
4. A method according to claim 3, wherein obtaining a target sample set comprises:
acquiring electricity consumption data of agricultural irrigation users and electricity consumption data of non-agricultural irrigation users;
acquiring a third characteristic set corresponding to electricity consumption data of an agricultural irrigation user;
performing feature extraction on the electricity consumption data of the agricultural irrigation users based on a TSfresh tool to obtain a fourth feature set;
determining a fifth feature set corresponding to the agricultural irrigation user according to the third feature set and the fourth feature set, and determining the fifth feature set corresponding to the agricultural irrigation user as a positive sample;
acquiring a sixth feature set corresponding to electricity consumption data of non-agricultural irrigation users;
performing feature extraction on the electricity consumption data of the non-agricultural irrigation users based on a TSfresh tool to obtain a seventh feature set;
determining an eighth feature set corresponding to a non-agricultural irrigation user according to the sixth feature set and the seventh feature set, and determining the eighth feature set corresponding to the non-agricultural irrigation user as a negative sample;
a target sample set is generated from the positive samples and the negative samples.
5. The method of claim 4, wherein generating a set of target samples from the positive samples and the negative samples comprises:
and oversampling the positive sample and the negative sample to obtain a target sample set.
6. The method of claim 1, wherein determining a target feature set from the first feature set and the second feature set comprises:
acquiring a base index corresponding to each feature in the second feature set;
screening the features in the second feature set according to the base index corresponding to each feature in the second feature set to obtain a screened second feature set;
and determining a target feature set according to the first feature set and the screened second feature set.
7. The method of claim 6, wherein determining a target feature set from the first feature set and the filtered second feature set comprises:
screening the features in the first feature set to obtain a screened first feature set;
and generating a target feature set according to the first feature set after screening and the second feature set after screening.
8. A user type determining apparatus, comprising:
the first acquisition module is used for acquiring electricity utilization data of users to be classified;
the second acquisition module is used for acquiring a first feature set corresponding to the power consumption data of the user to be classified;
the feature extraction module is used for carrying out feature extraction on the power consumption data of the users to be classified based on a TSfresh tool to obtain a second feature set;
the feature set determining module is used for determining a target feature set according to the first feature set and the second feature set;
the user type determining module is configured to determine a user type corresponding to the user to be classified according to the target feature set, where the user type includes: agricultural irrigation users or non-agricultural irrigation users.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the user type determination method of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to perform the user type determination method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211516621.6A CN116151854A (en) | 2022-11-29 | 2022-11-29 | User type determining method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211516621.6A CN116151854A (en) | 2022-11-29 | 2022-11-29 | User type determining method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116151854A true CN116151854A (en) | 2023-05-23 |
Family
ID=86353322
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211516621.6A Pending CN116151854A (en) | 2022-11-29 | 2022-11-29 | User type determining method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116151854A (en) |
-
2022
- 2022-11-29 CN CN202211516621.6A patent/CN116151854A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cheng et al. | Enhanced state estimation and bad data identification in active power distribution networks using photovoltaic power forecasting | |
CN111178585A (en) | Fault reporting amount prediction method based on multi-algorithm model fusion | |
CN112418476A (en) | Ultra-short-term power load prediction method | |
CN113361785A (en) | Power distribution network short-term load prediction method and device, terminal and storage medium | |
CN113408808B (en) | Training method, data generation device, electronic equipment and storage medium | |
CN111027841A (en) | Low-voltage transformer area line loss calculation method based on gradient lifting decision tree | |
CN113151842B (en) | Method and device for determining conversion efficiency of wind-solar complementary water electrolysis hydrogen production | |
CN112528159B (en) | Feature quality assessment method and device, electronic equipment and storage medium | |
CN116151854A (en) | User type determining method, device, equipment and storage medium | |
CN115965160A (en) | Data center energy consumption prediction method and device, storage medium and electronic equipment | |
CN106816871B (en) | State similarity analysis method for power system | |
CN114722941A (en) | Credit default identification method, apparatus, device and medium | |
CN114462447A (en) | Voltage sag identification method and device, computer equipment and storage medium | |
CN114254828A (en) | Power load prediction method based on hybrid convolution feature extractor and GRU | |
Zhang et al. | Load prediction based on depthwise separable convolution model | |
Nie et al. | Global Rényi index of the distance matrix | |
CN114066278B (en) | Method, apparatus, medium, and program product for evaluating article recall | |
CN117934137A (en) | Bad asset recovery prediction method, device and equipment based on model fusion | |
CN113723835B (en) | Water consumption evaluation method and terminal equipment for thermal power plant | |
CN112365280B (en) | Electric power demand prediction method and device | |
CN117828364A (en) | Reduction degree determination method, device, equipment and storage medium | |
Chen et al. | Smart Meter Fault Prediction Based on One-dimensional Convolution Neural Network Integrated Model | |
CN117649052A (en) | Photovoltaic power station combined output typical scene generation method, device and computer equipment | |
CN118100151A (en) | Power grid load prediction method, device, equipment and storage medium | |
Li et al. | Research on Daily Load Curve Classification Based on Improved Fuzzy C-means Clustering Algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |