CN115563193B - Big data analysis processing method for digital information - Google Patents
Big data analysis processing method for digital information Download PDFInfo
- Publication number
- CN115563193B CN115563193B CN202211568255.9A CN202211568255A CN115563193B CN 115563193 B CN115563193 B CN 115563193B CN 202211568255 A CN202211568255 A CN 202211568255A CN 115563193 B CN115563193 B CN 115563193B
- Authority
- CN
- China
- Prior art keywords
- big data
- current
- electricity consumption
- consumption big
- current electricity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 17
- 238000007405 data analysis Methods 0.000 title claims abstract description 13
- 230000005611 electricity Effects 0.000 claims abstract description 667
- 230000005856 abnormality Effects 0.000 claims abstract description 95
- 238000012545 processing Methods 0.000 claims abstract description 40
- 238000004458 analytical method Methods 0.000 claims abstract description 35
- 238000000034 method Methods 0.000 claims abstract description 20
- 230000003252 repetitive effect Effects 0.000 claims description 41
- 238000000605 extraction Methods 0.000 claims description 3
- 230000002159 abnormal effect Effects 0.000 abstract description 34
- 238000013500 data storage Methods 0.000 abstract 1
- 230000006835 compression Effects 0.000 description 16
- 238000007906 compression Methods 0.000 description 16
- 238000004364 calculation method Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 230000009466 transformation Effects 0.000 description 7
- 239000002699 waste material Substances 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000011002 quantification Methods 0.000 description 3
- 101100380328 Dictyostelium discoideum asns gene Proteins 0.000 description 2
- 241000623377 Terminalia elliptica Species 0.000 description 2
- 101150062095 asnA gene Proteins 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 241000135164 Timea Species 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000002547 anomalous effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000012517 data analytics Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/174—Redundancy elimination performed by the file system
- G06F16/1744—Redundancy elimination performed by the file system using compression, e.g. sparse files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Water Supply & Treatment (AREA)
- Public Health (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the technical field of electric digital data processing, in particular to a big data analysis processing method for digital information, which comprises the following steps: acquiring a current power consumption big data set and a historical power consumption big data sequence corresponding to each current power consumption big data; performing repeated analysis processing on each current electricity consumption big data in the current electricity consumption big data set; carrying out abnormality analysis processing on each current electricity utilization big data in the current electricity utilization big data set; clustering current electricity big data in a current electricity big data set; and classifying, storing and processing the current electricity utilization big data in the current electricity utilization big data category set. The method and the device perform clustering, compressing and storing processing on the electricity consumption big data of different areas by using relative repeatability and abnormality, solve the technical problem of low efficiency of the subsequent analysis on the abnormal degree of the electricity consumption big data, improve the efficiency of the subsequent analysis on the abnormal degree of the electricity consumption big data, and are applied to data storage of the electricity consumption big data.
Description
Technical Field
The invention relates to the technical field of electric digital data processing, in particular to a big data analysis processing method for digital information.
Background
With the development of science and technology, a large number of industries carry out digital transformation, and after the digital transformation, object resources of a hunting field are formed and called, and the digital process is often based on a large amount of information support. For example, the digitalized transformation of the smart grid often requires a large amount of power-related big data (e.g., electricity-using big data) for support, and the big data often needs to be stored in the process of analysis and processing.
For storing the power consumption big data, a conventional method is to store the power consumption big data repeatedly based on time series. A common storage method based on the repeatability of the time sequence power consumption big data comprises the following steps: when the method is used for storing data, attribute clustering of the data is often not considered, such as clustering of abnormal degrees of the data, so that when the abnormal degree of the power consumption big data is analyzed subsequently, a large amount of data is often required to be mobilized, a large amount of computing resources are often consumed to analyze the abnormal degree of the power consumption big data, the efficiency of analyzing the abnormal degree of the power consumption big data is often low, and the stored data is often required to be analyzed subsequently.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
The invention provides a big data analysis processing method for digital information, aiming at solving the technical problem of low efficiency of analyzing the abnormal degree of power consumption big data.
The invention provides a big data analysis processing method for digital information, which comprises the following steps:
acquiring a current power consumption big data set and a historical power consumption big data sequence corresponding to each current power consumption big data in the current power consumption big data set, wherein the current power consumption big data in the current power consumption big data set are power consumption big data in a current time period, the historical power consumption big data in the historical power consumption big data sequence are power consumption big data in a historical time period, and the starting time of the current time period is the ending time of the historical time period;
according to the current electricity consumption big data set, performing repeatability analysis processing on each current electricity consumption big data in the current electricity consumption big data set to obtain the corresponding relative repeatability of each current electricity consumption big data;
according to the current electricity consumption big data set and a historical electricity consumption big data sequence corresponding to each current electricity consumption big data in the current electricity consumption big data set, carrying out abnormality analysis processing on each current electricity consumption big data in the current electricity consumption big data set to obtain a relative abnormality degree corresponding to each current electricity consumption big data;
clustering the current electricity utilization big data in the current electricity utilization big data set according to the relative repeatability and the relative abnormality degree corresponding to each current electricity utilization big data in the current electricity utilization big data set to obtain a current electricity utilization big data category set;
and classifying, storing and processing the current electricity utilization big data in the current electricity utilization big data category set.
Further, according to the current electricity consumption big data set, performing repeatability analysis processing on each current electricity consumption big data in the current electricity consumption big data set to obtain relative repeatability corresponding to each current electricity consumption big data, including:
performing repeated character extraction on each current electricity consumption big data in the current electricity consumption big data set to generate a repeated character space corresponding to each current electricity consumption big data;
determining a basic repeatability set corresponding to each current electricity consumption big data according to a repeatability character space corresponding to each current electricity consumption big data in the current electricity consumption big data set;
and determining the relative repeatability corresponding to each current electricity consumption big data according to the basic repeatability set corresponding to each current electricity consumption big data.
Further, the determining a basic repetitive set corresponding to each current electricity consumption big data according to the repetitive character space corresponding to each current electricity consumption big data in the current electricity consumption big data set includes:
extracting repeated characters from a repeated character space corresponding to the current electricity consumption big data and a repeated character space corresponding to other current electricity consumption big data corresponding to the current electricity consumption big data to generate other repeated character spaces and obtain other repeated character space sets corresponding to the current electricity consumption big data, wherein the other current electricity consumption big data corresponding to the current electricity consumption big data are the current electricity consumption big data except the current electricity consumption big data in the current electricity consumption big data set;
and determining basic repeatability according to the current electricity consumption big data set and each other repetitive character space in the other repetitive character space sets corresponding to each current electricity consumption big data to obtain a basic repetitive set corresponding to each current electricity consumption big data.
Further, the current electricity consumption big data in the current electricity consumption big data set comprises: the current average power consumption and the current unit average power consumption sequence, and the historical power consumption big data in the historical power consumption big data sequence comprise: historical unit average electric quantity sequence;
the method comprises the following steps of carrying out abnormality analysis processing on each current electricity big data in the current electricity big data set according to the current electricity big data set and a historical electricity big data sequence corresponding to each current electricity big data in the current electricity big data set to obtain a relative abnormality degree corresponding to each current electricity big data, and comprises the following steps:
determining a current first abnormality corresponding to each current electricity consumption big data according to a current average electricity consumption and a current unit average electricity quantity sequence included in each current electricity consumption big data in the current electricity consumption big data set;
determining a current second abnormality corresponding to each current electricity consumption big data according to a historical unit average electric quantity sequence included in the historical electricity consumption big data sequence corresponding to each current electricity consumption big data in the current electricity consumption big data set and a current unit average electric quantity sequence included in each current electricity consumption big data;
and determining the relative abnormality degree corresponding to each current electricity consumption big data according to the current first abnormality and the current second abnormality corresponding to each current electricity consumption big data in the current electricity consumption big data set.
Further, the determining, according to the current average power consumption and the current unit average power consumption sequence included in each current power consumption big data in the current power consumption big data set, a current first abnormality corresponding to each current power consumption big data includes:
determining a current power utilization fluctuation parameter corresponding to the current time period according to the current average power consumption included by each current power utilization big data in the current power utilization big data set, the average value of the current average power consumption included by the current power utilization big data in the current power utilization big data set, and the number of the current power utilization big data in the current power utilization big data set;
determining a current unit electric fluctuation parameter corresponding to each current unit time period included in the current time period according to each current unit average electric quantity in a current unit average electric quantity sequence included in each current electric quantity data in the current electric quantity data set and the quantity of the current electric quantity data in the current electric quantity data set;
and determining a current first abnormality corresponding to each current electricity big data according to a current unit average electricity quantity sequence included in each current electricity big data in the current electricity big data set, the current electricity fluctuation parameter, a current unit electric fluctuation parameter corresponding to each current unit time period included in the current time period, and the quantity of the current electricity big data in the current electricity big data set.
Further, the determining, according to a history unit average electric quantity sequence included in the history electricity consumption big data sequence corresponding to each current electricity consumption big data in the current electricity consumption big data set and a current unit average electric quantity sequence included in each current electricity consumption big data, a current second abnormality corresponding to each current electricity consumption big data includes:
determining the average value of historical unit average electric quantity in a historical unit average electric quantity sequence included by historical electric consumption big data in a historical electric consumption big data sequence corresponding to the current electric consumption big data as the current total electric quantity average value corresponding to the current electric consumption big data;
and determining a current second abnormality corresponding to each current electricity consumption big data according to a current total electricity quantity average value corresponding to each current electricity consumption big data in the current electricity consumption big data set, a historical unit average electricity quantity sequence included in historical electricity consumption big data in a historical electricity consumption big data sequence corresponding to each current electricity consumption big data, a current unit average electricity quantity sequence included in each current electricity consumption big data, a current time period and a historical time period.
Further, the clustering the current electricity consumption big data in the current electricity consumption big data set according to the relative repeatability and the relative abnormality degree corresponding to each current electricity consumption big data in the current electricity consumption big data set to obtain a current electricity consumption big data category set includes:
determining the relative repeatability corresponding to each current electricity consumption big data in the current electricity consumption big data set as the abscissa corresponding to the current electricity consumption big data;
determining the relative abnormality degree corresponding to each current electricity consumption big data in the current electricity consumption big data set as a vertical coordinate corresponding to the current electricity consumption big data;
combining the abscissa and the ordinate corresponding to each current electricity consumption big data in the current electricity consumption big data set into the current coordinate corresponding to the current electricity consumption big data;
determining the Euclidean distance between each piece of current electricity consumption big data in the current electricity consumption big data set according to the current coordinate corresponding to each piece of current electricity consumption big data in the current electricity consumption big data set;
and clustering the current electricity utilization big data in the current electricity utilization big data set according to the Euclidean distance between each current electricity utilization big data in the current electricity utilization big data set to obtain a current electricity utilization big data category set.
Further, the classifying, storing and processing the current electricity consumption big data in the current electricity consumption big data category set includes:
compressing each current electricity consumption big data in each current electricity consumption big data category in the current electricity consumption big data category set to obtain a compressed file corresponding to the current electricity consumption big data category;
and classifying and storing the compressed files corresponding to each current electricity consumption big data category in the current electricity consumption big data category set.
The invention has the following beneficial effects:
according to the big data analysis processing method for the digital information, the cluster compression storage processing is carried out on the electricity consumption big data in different areas by using the relative repeatability and the abnormality, the technical problem that the efficiency of analyzing the abnormal degree of the electricity consumption big data is low in the follow-up process is solved, and the efficiency of analyzing the abnormal degree of the electricity consumption big data in the follow-up process is improved. Firstly, a current electricity consumption big data set and a historical electricity consumption big data sequence corresponding to each current electricity consumption big data in the current electricity consumption big data set are obtained. In actual conditions, the current electricity utilization big data set is obtained, and the repeatability of the current electricity utilization big data in the current electricity utilization big data set can be conveniently compared in the follow-up process. And secondly, acquiring a historical electricity consumption big data sequence corresponding to the current electricity consumption big data, so that the abnormality of the current electricity consumption big data can be conveniently judged according to the historical electricity consumption big data sequence. And then, according to the current electricity consumption big data set, performing repeatability analysis processing on each current electricity consumption big data in the current electricity consumption big data set to obtain the corresponding relative repeatability of each current electricity consumption big data. In an actual situation, the current electricity consumption big data is collected and subjected to repeatability analysis processing, so that the accuracy of the relative repeatability determination corresponding to the current electricity consumption big data can be improved. In addition, the current power utilization big data can be compressed conveniently based on the corresponding relative repeatability of the current power utilization big data, and the efficiency of compressing the current power utilization big data can be improved. Then, according to the current electricity consumption big data set and a historical electricity consumption big data sequence corresponding to each current electricity consumption big data in the current electricity consumption big data set, abnormality analysis processing is carried out on each current electricity consumption big data in the current electricity consumption big data set, and a relative abnormality degree corresponding to each current electricity consumption big data is obtained. In an actual situation, the current electricity consumption big data set and the historical electricity consumption big data sequence corresponding to each current electricity consumption big data are comprehensively considered, and the accuracy of determining the relative abnormality degree corresponding to the current electricity consumption big data can be improved. In addition, the classification processing of the big data with different abnormal degrees can be conveniently carried out subsequently, and the calculation waste of secondary abnormal analysis of the data by the digital transformation of the intelligent power grid can be reduced. And then, clustering the current electricity utilization big data in the current electricity utilization big data set according to the relative repeatability and the relative abnormality degree corresponding to each current electricity utilization big data in the current electricity utilization big data set to obtain a current electricity utilization big data category set. And finally, classifying, storing and processing the current power utilization big data in the current power utilization big data category set. Therefore, the clustering compression storage processing is carried out on the electricity consumption big data in different areas by utilizing the relative repeatability and the abnormality, the repeatability of data required by compression can be met during compression or storage, the big data with different abnormal degrees can be classified, the calculation waste of secondary abnormal analysis on the data by the digital transformation of the smart grid is reduced, the technical problem of low efficiency of the subsequent analysis on the abnormal degree of the electricity consumption big data is solved, and the efficiency of the subsequent analysis on the abnormal degree of the electricity consumption big data is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions and advantages of the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a big data analysis processing method for digital information according to the present invention.
Detailed Description
To further explain the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description of the embodiments, structures, features and effects of the technical solutions according to the present invention will be given with reference to the accompanying drawings and preferred embodiments. In the following description, different references to "one embodiment" or "another embodiment" do not necessarily refer to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The invention provides a big data analysis processing method for digital information, which comprises the following steps:
acquiring a current power consumption big data set and a historical power consumption big data sequence corresponding to each current power consumption big data in the current power consumption big data set;
according to the current electricity consumption big data set, performing repeatability analysis processing on each current electricity consumption big data in the current electricity consumption big data set to obtain the corresponding relative repeatability of each current electricity consumption big data;
according to the current power utilization big data set and a historical power utilization big data sequence corresponding to each current power utilization big data in the current power utilization big data set, carrying out abnormality analysis processing on each current power utilization big data in the current power utilization big data set to obtain a relative abnormality degree corresponding to each current power utilization big data;
clustering the current electricity utilization big data in the current electricity utilization big data set according to the relative repeatability and the relative abnormality degree corresponding to each current electricity utilization big data in the current electricity utilization big data set to obtain a current electricity utilization big data category set;
and classifying, storing and processing the current electricity utilization big data in the current electricity utilization big data category set.
The following steps are detailed:
referring to FIG. 1, a flow diagram of some embodiments of a big data analytics processing method for digital information is shown, in accordance with the present invention. The big data analysis processing method for the digital information comprises the following steps:
step S1, acquiring a current electricity consumption big data set and a historical electricity consumption big data sequence corresponding to each current electricity consumption big data in the current electricity consumption big data set.
In some embodiments, a current electricity big data set and a historical electricity big data sequence corresponding to each current electricity big data in the current electricity big data set may be obtained.
The current power utilization big data in the current power utilization big data set can be power utilization big data in the current time period. The power consumption big data may be big data including data related to power. The historical electricity consumption big data in the historical electricity consumption big data sequence can be electricity consumption big data in a historical time period. The start time of the current time period may be the end time of the historical time period. The duration corresponding to the current time period may be equal to 1 day. For example, the start time of the current time period may be 2022 years, 11 months, 05 days, 00 hours, 00 minutes, 00 seconds. The end time of the current time period may be 2022 years, 11 months, 05 days, 24 hours, 00 minutes and 00 seconds. The duration corresponding to the historical period of time may be equal to 7 days. For example, the starting time of the historical period may be 2022, 10, 29, 00, min, 00 seconds. The end time of the historical period may be 2022 years, 11 months, 05 days 00 hours 00 minutes 00 seconds. The duration corresponding to each historical electricity consumption big data in the historical electricity consumption big data sequence can be equal to the duration corresponding to the current electricity consumption big data. The sum of the time lengths corresponding to the historical electricity consumption big data in the historical electricity consumption big data sequence is equal to the time length corresponding to the historical time period.
For example, the electricity consumption big data may be the electricity consumption big data of residents in a certain area. The large data of the residential and domestic electric power can represent the electricity utilization condition of residents in a certain area. For example, the residential electricity consumption big data may be residential electricity consumption log data. Residential power big data may include, but is not limited to: names, detailed addresses, and electricity consumption amounts of respective residents living in a certain area. The current electricity usage big data set can represent electricity usage of residents in a certain area (e.g., a certain residential cell). The current electricity consumption big data in the current electricity consumption big data set can represent electricity consumption conditions of residents in an area (such as a residential building included in the residential community) included in the region. Wherein the population size of each area may be the same.
The area corresponding to the historical electricity consumption big data sequence corresponding to the current electricity consumption big data may be the same as the area corresponding to the current electricity consumption big data. For example, the current electricity consumption big data may be the electricity consumption big data of residents of the number 5 residential building included in a certain residential district all day of 11/05/2022. When the number of the historical electricity consumption big data in the historical electricity consumption big data sequence is 2, the historical electricity consumption big data sequence corresponding to the current electricity consumption big data may include: and the electricity consumption data of residents of the No. 5 residential building included in the residential community all day 11/04 in 2022, and the electricity consumption data of residents of the No. 5 residential building included in the residential community all day 03 in 11/03 in 2022.
And S2, according to the current electricity consumption big data set, performing repeatability analysis processing on each current electricity consumption big data in the current electricity consumption big data set to obtain the corresponding relative repeatability of each current electricity consumption big data.
In some embodiments, according to the current electricity consumption big data set, a repeatability analysis process may be performed on each current electricity consumption big data in the current electricity consumption big data set to obtain a relative repeatability corresponding to each current electricity consumption big data.
As an example, this step may include the steps of:
firstly, extracting repeated characters of each current electricity consumption big data in the current electricity consumption big data set, and generating a repeated character space corresponding to each current electricity consumption big data.
For example, repetitive character extraction may be performed on each current electricity consumption big data through an STC (space time series, deduplication) algorithm, so as to generate a repetitive character space corresponding to each current electricity consumption big data.
The repetitive character space corresponding to the current electricity consumption big data can be as follows:
wherein,is the first in the current electricity utilization big data setnAnd a repetitive character space corresponding to the current electricity utilization big data.Is the first in the current power utilization big data setnThe number of repetitions of the 1 st repeated character in the current power consumption big data. The repeated character may be a character that appears at least twice.Is the first in the current electricity utilization big data setnThe 1 st repeated character in the current power consumption big data.Is the first in the current power utilization big data setnThe number of repetitions of the 2 nd repeated character in the current power consumption big data.Is the first in the current electricity utilization big data setnThe 2 nd repeated character in the current power consumption big data.Is the first in the current electricity utilization big data setnThe current power utilization big datahThe number of repetitions of the repeated character.Is the first in the current electricity utilization big data setnThe current power utilization big datahA repeated character.Is the first in the current electricity utilization big data setnCurrent power consumption big dataTo middleHThe number of repetitions of the repeated character.Is the first in the current power utilization big data setnThe current power utilization big dataHA repeated character.HIs the number of all repeated characters.
And secondly, determining a basic repeatability set corresponding to each current electricity consumption big data according to a repeatability character space corresponding to each current electricity consumption big data in the current electricity consumption big data set.
The basic repeatability in the basic repeatability set corresponding to the current electricity consumption big data can be the basic repeatability between the current electricity consumption big data and the current electricity consumption big data in the current electricity consumption big data set except the current electricity consumption big data.
For example, this step may include the following sub-steps:
the first substep is to extract repeated characters from the repetitive character space corresponding to the current power consumption big data and the repetitive character spaces corresponding to other current power consumption big data corresponding to the current power consumption big data, generate other repetitive character spaces, and obtain other repetitive character space sets corresponding to the current power consumption big data.
And the other current electricity consumption big data corresponding to the current electricity consumption big data are the current electricity consumption big data except the current electricity consumption big data in the current electricity consumption big data set. The other repetitive character space may be a repetitive character space corresponding to the current electricity consumption big data and the current electricity consumption big data in the current electricity consumption big data set except the current electricity consumption big data, and a repetitive character space between the two repetitive character spaces.
For example, the repetitive character space corresponding to the current electricity consumption big data and the repetitive character space corresponding to other current electricity consumption big data corresponding to the current electricity consumption big data may be extracted through an STC algorithm, and the repetitive characters in the two repetitive character spaces are extracted to generate other repetitive character spaces.
And a second substep, determining basic repeatability according to the current power consumption big data set and each other repetitive character space in the other repetitive character space set corresponding to each current power consumption big data, and obtaining a basic repetitive set corresponding to each current power consumption big data.
For example, the formula for determining the basic repeatability correspondence may be:
wherein,is the first in the current electricity utilization big data setnCurrent power consumption data andbasic repeatability between current electricity utilization big data.Is the first in the current power utilization big data setnThe total length of all characters included in the current power usage big data.Is the first in the current electricity utilization big data setThe total length of all characters included in the current power usage big data.,And is and。Nis the quantity of the current electricity consumption big data in the current electricity consumption big data set.nAndthe serial number of the current electricity consumption big data in the current electricity consumption big data set.Is the first in the current electricity utilization big data setnCurrent power consumption data anda repetitive character space corresponding to the current power utilization big data, and the second repetitive character space between the two repetitive character spacesThe number of repetitions of the repeated character.Is the first in the current electricity utilization big data setnCurrent power consumption data andthe repetitive character space corresponding to the current electricity big data, the repetitive character space between the two repetitive character spacesThe repeat length of the repeated character.Is the first in the current electricity utilization big data setnCurrent power consumption data andand repeating character spaces corresponding to the current electricity utilization big data, and the total number of repeating characters in the two repeating character spaces.。Is the first in the current power utilization big data setnCurrent power consumption big data andthe current electricity utilization big data corresponds to a repetitive character space, and the serial numbers of repetitive characters in the two repetitive character spaces.
In practical cases, ifnThe current power consumption big data isnThe data of the residential electricity consumption of each area is recorded as. First, theThe current power consumption big data isThe data of the residential electricity consumption of each area is recorded as. ThenCan characterize theA region and aResidential electricity consumption big data corresponding to each areaAnddata amount of repeated characters in (1)Andthe ratio of the total amount of data, quantized by the length of the character,the larger, the more oftenAndthe more data that is repeated, the subsequent pairsAndwhen compression is performed, simultaneous compression tends to have a higher compression ratio, and vice versa. The purpose of establishing repeated character space is to use big data and the second data of every current powerThe method is more convenient when the current power consumption big data is used for carrying out repetitive character detection, and filters a small part of repetitive characters with lower repeatability, so that the calculation amount can be reduced, and the occupation of calculation resources can be reduced.
And thirdly, determining the relative repeatability corresponding to each current power consumption big data according to the basic repeatability set corresponding to each current power consumption big data.
For example, the formula for determining the relative repeatability correspondence for each current electricity usage big data may be:
wherein,is the first in the current electricity utilization big data setnThe corresponding relative repeatability of the current power utilization big data.NIs the quantity of the current electricity consumption big data in the current electricity consumption big data set.nAndthe serial number of the current electricity consumption big data in the current electricity consumption big data set.Is the first in the current power utilization big data setnCurrent power consumption big data andbasic repeatability between current electricity utilization big data.
In practical cases, ifnThe current power consumption big data isnThe data of the electricity consumption of residents in each area is recorded as. Then it is firstA region IBig data of electricity consumption of residents in the dayRelative repeatability ofIs to utilizeWith all othersAverage of the basal repetitiveness of the individual regions, the larger the value, the indicationThe more data are repeated in the electricity consumption data of the residents in other areas, and the reverse is true. The quantification of the relative repeatability characteristics of the electricity consumption big data among different areas is realized according to the electricity consumption big data in the same time node of the different areas. Also, the higher the relative repeatability between the current power consumption big data, the higher the compression ratio tends to be when compression is performed.
And S3, according to the current power utilization big data set and the historical power utilization big data sequence corresponding to each current power utilization big data in the current power utilization big data set, carrying out abnormality analysis processing on each current power utilization big data in the current power utilization big data set, and obtaining the relative abnormality degree corresponding to each current power utilization big data.
In some embodiments, according to the current electricity consumption big data set and a historical electricity consumption big data sequence corresponding to each current electricity consumption big data in the current electricity consumption big data set, abnormality analysis processing may be performed on each current electricity consumption big data in the current electricity consumption big data set to obtain a relative abnormality degree corresponding to each current electricity consumption big data.
The current electricity consumption big data in the current electricity consumption big data set may include: the current average power consumption and the current unit average power sequence. The historical electricity consumption big data in the historical electricity consumption big data sequence can comprise: and (4) historical unit average electric quantity sequence. The current average power usage may be an average power usage over the current time period. The current unit average electricity amount in the current unit average electricity amount series may be an average electricity usage amount in a unit time period included in the current time period. The time period corresponding to the unit time period may be 1 hour. The history unit average electricity amount in the history unit average electricity amount sequence may be an average electricity usage amount in a unit time period included in a time period corresponding to the history electricity usage amount data. For example, the average used amount may be a resident average used amount.
For example, the time period corresponding to the historical electricity consumption big data may be 11 months and 04 days in 2022. The unit time period may correspond to a time period of 1 hour. The historical unit average power sequence may include: average power usage over each of the hours included in 24 hours included on day 04 of 11 months in 2022.
As an example, this step may include the steps of:
the method comprises the steps of firstly, determining current first abnormality corresponding to each current electricity consumption big data according to a current average electricity consumption and a current unit average electricity quantity sequence included in each current electricity consumption big data in the current electricity consumption big data set.
For example, this step may include the following substeps:
and a first substep of determining a current electricity utilization fluctuation parameter corresponding to the current time period according to a current average electricity consumption included in each current electricity utilization big data in the current electricity utilization big data set, an average value of the current average electricity consumptions included in the current electricity utilization big data set, and the number of the current electricity utilization big data in the current electricity utilization big data set.
For example, the formula for determining the current power utilization fluctuation parameter corresponding to the current time period may be:
wherein,is the current power utilization fluctuation parameter corresponding to the current time period.Is an exponential function with a natural constant as the base.NIs the quantity of the current electricity consumption big data in the current electricity consumption big data set.nThe serial number of the current electricity consumption big data in the current electricity consumption big data set.。Is the first in the current power utilization big data set in the current time periodnThe current electricity consumption volume data includes current average electricity consumption.Is the average value of the current average power consumption included in the current power consumption big data set.
As another example, if the current time period is the first time periodtDay means, the firstnThe current power consumption big data isnThe resident electricity consumption big data of each area and the quantity of the current electricity consumption big data in the current electricity consumption big data setNEqual to the total number of all the areas, the current electricity utilization number in the big data set in the current time periodnThe current average power consumption included in the current power consumption big dataIs the firsttThe first daynAverage electricity consumption of residents of each area. Average value of current average power consumption included in current power consumption big data setIs the firsttAll of the worldNAverage electricity consumption amounts of residents of the respective areas. Current power utilization fluctuation parameter corresponding to current time periodIs the firsttThe power consumption fluctuation parameter of the day.
In the actual situation,can be the firstThe power fluctuation parameter of the day, which is calculatedThe process is toAll of the worldThe average power consumption variance of the residents of the individual areas is negated in the first placeWhen the average electricity consumption fluctuation of all the residents in the day is larger (the variance is larger),the smaller the value is, theWhen the average electricity consumption fluctuation of all the residents in the day is smaller (the variance is smaller),the larger the value. The physical logic is as followsAll of the aboveWhen the average electricity consumption difference of the residents in each area is large, the first step is carried outThe first abnormality of each area is easy to generate abnormal amplification (for example, the average electricity consumption of residents in each area has a significant difference, namely, the electricity consumption trend of different areas is not obvious in quantification in the same time), so the first abnormality is restrained by using the parameter, and the opposite is true.
And a second substep of determining a current unit electric fluctuation parameter corresponding to each current unit time period included in the current time period according to each current unit average electric quantity in a current unit average electric quantity sequence included in each current electric quantity data in the current electric quantity data set and the quantity of the current electric quantity data in the current electric quantity data set.
For example, the formula for determining the current unit electric fluctuation parameter corresponding to each current unit time period included in the current time period may be:
wherein,is the current time period includediAnd the current unit electric fluctuation parameter corresponding to the current unit time period.Is an exponential function with a natural constant as the base.Is the first in the current electricity big data setThe current power utilization big data comprises the current unit average electric quantity sequenceiCurrent average amount of electricity per unit, whereiniThe current average amount of electricity per unit may beiAverage electricity usage in the current unit time period.iMay be a sequence number of the current unit period included in the current period.iOr a serial number of the current unit average electric quantity.NThe quantity of the current electricity consumption big data in the current electricity consumption big data set.nAndthe serial number of the current electricity consumption big data in the current electricity consumption big data set.Is the first in the current electricity big data setnThe current power utilization big data comprises the current unit average electric quantity sequenceiThe current unit average capacity.,,。
As another example, if the current time period is the first time periodtDay means, the firstiFor the current unit time periodiHour denotes the number ofnThe current power consumption big data isnThe electricity consumption data of residents in each area,the current power consumption is the firstThe resident electricity consumption big data of each area and the quantity of the current electricity consumption big data in the current electricity consumption big data setNEqual to the total number of all the areas, then the current electricity utilization is the first in the big data setThe current power utilization big data comprises the current unit average electric quantity sequenceiAverage electric quantity per current unitIs the firsttThe first dayiThe first in the hourAverage electricity consumption of residents of each area. The first in the current electricity big data setnThe current power utilization big data comprises the current unit average electric quantity sequenceiAverage electric quantity per current unitIs the firsttThe day's firstiThe first in the hournAverage electricity consumption of residents of each area. Current time period includesiCurrent unit electric fluctuation parameter corresponding to current unit time periodIs the firsttThe first dayiFluctuation parameters of electricity utilization in hours.
In the actual situation,can be the firstThe first dayThe hour power consumption fluctuation parameter is different for each region, corresponding toThe values are different in size, the calculation mode is that the average power consumption of residents in the current area is used as the difference with the average power consumption of all the whole areas, then the negation attenuation is carried out, and the physical significance is expressed in the first placeAn area is taken as an example whenThe individual region is located in the second placeWhen the average electricity consumption difference of residents in the hour is large, the first time is utilizedA first of the regionsAverage electricity consumption of residents in hours to secondWhen the average electricity consumption of residents in each area is measured, the situation of inaccurate measurement is easily caused. In the normal case, assume the firstThe average electricity consumption of residents in each area is abnormal and often does not meet the first requirementThe trend of the resident electricity consumption in all the whole areas of the hour is used for the second timeA first of the regionsThe measurement of whether the average electricity consumption of the residents is abnormal or not is often inaccurate at the timeIs particularly small and is then ignored to some extent in the overall summation calculation, so that the anomalous data is not passed on to the secondA first of the regionsThe average electricity consumption abnormality of the residents in the hour has a great influence.
And a third sub-step of determining a current first abnormality corresponding to each current electricity consumption big data set according to a current unit average electricity quantity sequence included in each current electricity consumption big data in the current electricity consumption big data set, the current electricity consumption fluctuation parameter, a current unit electric fluctuation parameter corresponding to each current unit time period included in the current time period, and the number of the current electricity consumption big data in the current electricity consumption big data set.
For example, the formula for determining the correspondence of the current first abnormality corresponding to each current electricity consumption big data may be:
wherein,is the first in the current electricity utilization big data setnAnd the current power utilization big data corresponds to a current first abnormality.Is the current power utilization fluctuation parameter corresponding to the current time period.NIs the quantity of the current electricity consumption big data in the current electricity consumption big data set.nAndthe serial number of the current electricity consumption big data in the current electricity consumption big data set.Is the current time period includediA current unit electric fluctuation parameter corresponding to the current unit time period.Is the first in the current electricity big data setThe current power utilization big data comprises the current average power quantity sequence of the unitiCurrent average amount of electricity per unit, whereiniThe current average amount of electricity per unit may beiAverage electricity usage in the current unit time period.iMay be the serial number of the current unit time period included in the current time period。iOr a serial number of the current unit average electric quantity.Is the first in the current power utilization big data setnThe current power utilization big data comprises the current unit average electric quantity sequenceiThe current unit average capacity.Is the number of current unit periods included in the current period. If the duration corresponding to the current time period is 1 day and the duration corresponding to the current unit time period is 1 hour, the method determines that the current unit time period is the same as the current unit time period。
As another example, if the current time period is the first time periodtDay means the first dayiFor the current unit time periodiHour denotes thatnThe current power consumption big data isnThe electricity consumption data of residents in each area,the current power consumption is the firstThe resident electricity consumption big data of each area and the quantity of the current electricity consumption big data in the current electricity consumption big data setNEqual to the total number of all regions, thenIs the firsttThe day's firstiThe second in the hourAverage electricity usage amount of residents of each area.Is the firsttThe first dayiThe first in the hournAverage electricity usage amount of residents of each area.Is the firsttThe day's firstiThe power usage fluctuation parameter of the hour.Is the firsttThe daily power fluctuation parameters.. Can quantizenThe electricity consumption of each area and the restN-1 area of electricity consumption in the secondtThe first abnormality in the day.
In practice, by each zone (not the first one)Area) each hour and secondAverage residential power consumption per hour in each areaIs calculated under the influence of (1) and thenAnd (5) carrying out constraint on the whole to obtain an average value. With the restIs not the firstThe average electricity consumption of residents per hour of each area tends toUnder the influence of (1)Under the constraint ofA first of the regionsAbnormal calculation of average electricity usage by residents on a daily basis,the larger the description isThe first of a regionThe larger the average power consumption trend of the residents in a day is against the power consumption trend of the rest of the area, namely, the more abnormal the power consumption in the time period of the area is, and the opposite is true.
And secondly, determining a current second abnormality corresponding to each current electricity consumption big data according to a historical unit average electricity quantity sequence included by the historical electricity consumption big data in the historical electricity consumption big data sequence corresponding to each current electricity consumption big data in the current electricity consumption big data set and a current unit average electricity quantity sequence included by each current electricity consumption big data.
For example, this step may include the following sub-steps:
the first substep is to determine the average value of the historical unit average electric quantity in the historical unit average electric quantity sequence included in the historical electricity consumption big data sequence corresponding to the current electricity consumption big data as the current total electric quantity average value corresponding to the current electricity consumption big data.
And a second substep, determining a current second abnormality corresponding to each current electricity consumption big data according to a current total electricity quantity average value corresponding to each current electricity consumption big data in the current electricity consumption big data set, a historical unit average electricity quantity sequence included in the historical electricity consumption big data sequence corresponding to each current electricity consumption big data, a current unit average electricity quantity sequence included in each current electricity consumption big data, a current time period and a historical time period.
For example, the formula for determining the correspondence of the current second abnormality corresponding to each current electricity consumption big data may be:
wherein,is the first in the current electricity utilization big data setnAnd the current second abnormality corresponds to the current electricity utilization big data.nThe serial number of the current electricity consumption big data in the current electricity consumption big data set.Is the number of current unit periods included in the current period.iMay be a sequence number of the current unit period included in the current period. If the duration corresponding to the current time period is 1 day and the duration corresponding to the current unit time period is 1 hour, then。Is the first in the current electricity big data setnThe current power utilization big data comprises the current unit average electric quantity sequenceiThe current unit average capacity.iOr a serial number of the current unit average electric quantity.Is the first in the current power utilization big data setnAnd the current total electric quantity mean value corresponding to the current electric consumption big data.The quantity of the historical electricity consumption big data in the historical electricity consumption big data sequence. Historical electricity consumption big data in historical electricity consumption big data sequenceThe number may be equal to a ratio of a duration corresponding to the historical time period to a duration corresponding to the current time period.. If the duration corresponding to the current time period is 1 day and the duration corresponding to the historical time period is 7 days, then。Is the first in the current power utilization big data setnThe first electricity consumption big data sequence corresponding to the current electricity consumption big dataThe historical power utilization big data comprises the average power sequence of the historical unitsiAverage electric quantity of each historical unit.Is the first in the current electricity utilization big data setnThe first electricity consumption big data sequence corresponding to the current electricity consumption big dataThe first in the historical unit average electric quantity sequence included in the historical electric consumption big dataiAverage electric quantity of each historical unit.。
As another example, if the current time period is the first time periodtDay means, the firstiFor the current unit time periodiHour denotes thatnThe current power consumption big data isnThe electricity consumption data of residents in each area,the current power consumption is the firstThe number of the current electricity consumption big data in the current electricity consumption big data setNEqual to the total number of all regions, number twoThe big data of the historical power consumption istThe first dayBig data of electricity consumption of residents in the dayThe big data of the historical power consumption istThe first dayBig data of electricity consumption of residents in the day。Is shown asnA region ItThe day's firstiAverage electricity consumption by residents in hours.Is shown asnA region ItDay beforeThe first dayiAverage electricity usage by residents in hours.Denotes the firstnA region ItThe first dayThe first dayiAverage electricity consumption by residents in hours.Is shown asnIs a region oftBefore the dayThe third day of the dayiAverage of average electricity consumption of residents in hours. Can quantizeThe first dayElectricity consumption of individual areas and in historical dataThe day's firstA second abnormality in the amount of electricity used in the area.
In practice, the second abnormality is considered in the analysis of the second abnormalityThe average electricity consumption of residents in each area is likely to fluctuate within different time (for example, the electricity consumption time in a working day is concentrated, so the electricity consumption trend in the working day is more obvious, the electricity consumption of residents in weekends is scattered, and the electricity consumption trend in weekends is less obvious), so that the average electricity consumption of residents in each area is utilizedAverage value of electricity consumption in the same hour of dayPlus withThe average value of the difference values of the average electricity consumption of residents on different days in each same hour in time is comparedThe same hour electricity usage trend within the day is characterized and then usedThe day's firstAverage power consumption of residents in hoursThe difference value of the electricity utilization trend of the same hour in the dayThe day's firstThe degree of outlier of the average power consumption of the residents in the hour, i.e., the degree of abnormality.The larger the description isA region IAverage electricity consumption of residents in the day is compared with that of the residents in the frontThe larger the degree of abnormality in days, the opposite is true.
And thirdly, determining the relative abnormality degree corresponding to each current electricity consumption big data according to the current first abnormality and the current second abnormality corresponding to each current electricity consumption big data in the current electricity consumption big data set.
For example, the formula for determining the relative abnormality degree corresponding to each current electricity consumption big data may be:
wherein,is the first in the current electricity utilization big data setnThe relative abnormality degree corresponding to the current power utilization big data.Is the first in the current electricity utilization big data setnAnd the current power utilization big data corresponds to a current first abnormality.Is the first in the current electricity utilization big data setnAnd the current second abnormality corresponds to the current electricity utilization big data.
In practice, the first stepResidential electricity consumption big data corresponding to each areaFor example, the firstFor each area, the electricity usage in the area is often different at different times (different times may refer to different days), but within the same time (e.g., the same day)One region and the restThe power usage of the individual areas should often be similar, so that the first to different timesTrend of change in electricity consumption of individual area, first in same timeRespectively analyzing the difference of the power consumption of each area and the change trend of the power consumption of different areas, quantifying to obtain a first abnormality and a second abnormality, and then carrying out quantitative analysis on the first abnormality and the second abnormality according to the quantified resultAnd analyzing the abnormal degree of the electricity consumption data in each area. First abnormalityIs through being used to describe(ii) a regionAverage power consumption of daily residents and the restDegree of abnormality of individual region, second abnormalityIs through describing(ii) a regionAverage electricity consumption of daily residents and the current situation of the residentsDegree of abnormality in days. To the second by the product of the two dataA first of the regionsThe degree of abnormality of the average electricity usage of the residents of the day is described,the larger the description isThe first dayThe larger the degree of abnormality of the average electricity consumption of the residents of the individual areas, the average electricity consumption of the residents isA first of the regionsThe first day corresponds toBig data of electricity consumption of residents in sky areaIs obtained from so onResidential electricity consumption big data of each areaThe greater the degree of abnormality of (c); on the contrary, the firstResidential electricity consumption big data of each areaThe more abnormal degree ofIs small. The abnormal degree characteristic quantification of the electricity consumption big data of each area is carried out according to the electricity consumption big data trends of different areas at the same time and the electricity consumption big data trends of the same area at different times. Moreover, the more similar the relative abnormality degrees corresponding to the current power consumption big data are, after classification and partition storage is subsequently performed, it is often more convenient to analyze and call data for abnormality of the power consumption big data, so that computing resources are saved more, and it is often more convenient to perform storage of different compression degrees for the same type of current power consumption big data with different abnormality degrees.
And S4, clustering the current electricity utilization big data in the current electricity utilization big data set according to the relative repeatability and the relative abnormal degree corresponding to each current electricity utilization big data in the current electricity utilization big data set to obtain a current electricity utilization big data category set.
In some embodiments, the current electricity consumption big data in the current electricity consumption big data set may be clustered according to the relative repeatability and the relative abnormality degree corresponding to each current electricity consumption big data in the current electricity consumption big data set, so as to obtain a current electricity consumption big data category set.
As an example, this step may comprise the steps of:
the method comprises the following steps of firstly, determining the relative repeatability corresponding to each current electricity consumption big data in the current electricity consumption big data set as the abscissa corresponding to the current electricity consumption big data.
And secondly, determining the relative abnormality degree corresponding to each current electricity consumption big data in the current electricity consumption big data set as a vertical coordinate corresponding to the current electricity consumption big data.
And thirdly, combining the abscissa and the ordinate corresponding to each current electricity consumption big data in the current electricity consumption big data set into the current coordinate corresponding to the current electricity consumption big data.
And fourthly, determining the Euclidean distance between each piece of current electricity consumption big data in the current electricity consumption big data set according to the current coordinate corresponding to each piece of current electricity consumption big data in the current electricity consumption big data set.
And fifthly, clustering the current electricity consumption big data in the current electricity consumption big data set according to the Euclidean distance between each current electricity consumption big data in the current electricity consumption big data set to obtain a current electricity consumption big data category set.
For example, the current electricity consumption big data in the current electricity consumption big data set may be clustered by using a conventional distance clustering algorithm according to the euclidean distance between each current electricity consumption big data in the current electricity consumption big data set, so as to obtain a current electricity consumption big data category set. The relative repeatability characteristics and the abnormal degree characteristics of the power consumption big data in different areas are utilized to carry out comprehensive clustering, so that the data can be conveniently and subsequently stored in a classified manner.
In practical situations, the current power consumption big data with similar relative repeatability and similar relative abnormality are clustered by using the relative repeatability and the relative abnormality corresponding to each current power consumption big data, so that the current power consumption big data with similar relative repeatability and similar relative abnormality are clustered into a class, and subsequent processing can be facilitated.
And S5, classifying and storing the current electricity utilization big data in the current electricity utilization big data category set.
In some embodiments, the current electricity consumption big data in the current electricity consumption big data category set may be classified and stored.
As an example, this step may comprise the steps of:
the method comprises the steps of firstly, compressing each current electricity consumption big data in each current electricity consumption big data category in the current electricity consumption big data category set to obtain a compressed file corresponding to the current electricity consumption big data category.
For example, the current electricity consumption big data in the current electricity consumption big data category may be compressed by using an existing compression technology, so as to obtain a compressed file corresponding to the current electricity consumption big data category.
And secondly, classifying and storing the compressed files corresponding to each current electricity consumption big data category in the current electricity consumption big data category set.
For example, the compressed file corresponding to each current electricity big data category in the current electricity big data category set may be stored in a partitioned manner. The method can realize the partition storage of the current power consumption big data according to the current power consumption big data category where the current power consumption big data is located, can facilitate the follow-up analysis of the abnormal degree of the power consumption big data, can often accurately call data, can reduce the calculated amount, can reduce the occupation of calculation resources, can also meet the repeatability of data required by compression, and can improve the follow-up efficiency of the analysis of the abnormal degree of the power consumption big data.
According to the big data analysis processing method for the digital information, the cluster compression storage processing is carried out on the electricity consumption big data in different areas by using the relative repeatability and the abnormality, the technical problem that the efficiency of analyzing the abnormal degree of the electricity consumption big data is low in the follow-up process is solved, and the efficiency of analyzing the abnormal degree of the electricity consumption big data in the follow-up process is improved. Firstly, acquiring a current electricity consumption big data set and a historical electricity consumption big data sequence corresponding to each current electricity consumption big data in the current electricity consumption big data set. In actual conditions, the current electricity consumption big data set is obtained, so that the repeatability of the current electricity consumption big data in the current electricity consumption big data set can be conveniently compared subsequently. And secondly, acquiring a historical electricity consumption big data sequence corresponding to the current electricity consumption big data, so that the abnormality of the current electricity consumption big data can be conveniently judged according to the historical electricity consumption big data sequence. And then, according to the current electricity consumption big data set, performing repeatability analysis processing on each current electricity consumption big data in the current electricity consumption big data set to obtain the corresponding relative repeatability of each current electricity consumption big data. In an actual situation, the current electricity consumption big data is collected and subjected to repeatability analysis processing, so that the accuracy of relative repeatability determination corresponding to the current electricity consumption big data can be improved. In addition, the current power utilization big data can be compressed conveniently based on the corresponding relative repeatability of the current power utilization big data, and the efficiency of compressing the current power utilization big data can be improved. Then, according to the current electricity consumption big data set and a historical electricity consumption big data sequence corresponding to each current electricity consumption big data in the current electricity consumption big data set, abnormality analysis processing is carried out on each current electricity consumption big data in the current electricity consumption big data set, and a relative abnormality degree corresponding to each current electricity consumption big data is obtained. In actual conditions, the current electricity consumption big data set and the historical electricity consumption big data sequence corresponding to each current electricity consumption big data are comprehensively considered, and the accuracy of determining the relative abnormality degree corresponding to the current electricity consumption big data can be improved. In addition, the classification processing of the big data with different abnormal degrees can be conveniently carried out subsequently, and the computational waste of secondary abnormal analysis of the data by the digital transformation of the intelligent power grid can be reduced. And then, clustering the current electricity utilization big data in the current electricity utilization big data set according to the relative repeatability and the relative abnormality degree corresponding to each current electricity utilization big data in the current electricity utilization big data set to obtain a current electricity utilization big data category set. And finally, classifying, storing and processing the current electricity utilization big data in the current electricity utilization big data category set. Therefore, the clustering compression storage processing is carried out on the electricity consumption big data in different areas by utilizing the relative repeatability and the abnormality, the repeatability of data required by compression can be met during compression or storage, the big data with different abnormal degrees can be classified, the calculation waste of secondary abnormal analysis on the data by the digital transformation of the smart grid is reduced, the technical problem of low efficiency of the subsequent analysis on the abnormal degree of the electricity consumption big data is solved, and the efficiency of the subsequent analysis on the abnormal degree of the electricity consumption big data is improved.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; the modifications or substitutions do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present application, and are included in the protection scope of the present application.
Claims (5)
1. A big data analysis processing method for digital information is characterized by comprising the following steps:
acquiring a current power consumption big data set and a historical power consumption big data sequence corresponding to each current power consumption big data in the current power consumption big data set, wherein the current power consumption big data in the current power consumption big data set is power consumption big data in a current time period, the historical power consumption big data in the historical power consumption big data sequence is power consumption big data in a historical time period, the starting time of the current time period is the ending time of the historical time period, and the current power consumption big data in the current power consumption big data set comprises: the current average power consumption and the current unit average power consumption sequence, and the historical power consumption big data in the historical power consumption big data sequence comprise: average electric quantity sequence of historical units;
according to the current electricity consumption big data set, performing repeatability analysis processing on each current electricity consumption big data in the current electricity consumption big data set to obtain the corresponding relative repeatability of each current electricity consumption big data;
according to the current electricity consumption big data set and a historical electricity consumption big data sequence corresponding to each current electricity consumption big data in the current electricity consumption big data set, carrying out abnormality analysis processing on each current electricity consumption big data in the current electricity consumption big data set to obtain a relative abnormality degree corresponding to each current electricity consumption big data;
clustering the current electricity consumption big data in the current electricity consumption big data set according to the relative repeatability and the relative abnormality degree corresponding to each current electricity consumption big data in the current electricity consumption big data set to obtain a current electricity consumption big data category set;
classifying, storing and processing the current electricity consumption big data in the current electricity consumption big data category set;
the method comprises the following steps of carrying out abnormality analysis processing on each current electricity big data in the current electricity big data set according to the current electricity big data set and a historical electricity big data sequence corresponding to each current electricity big data in the current electricity big data set to obtain a relative abnormality degree corresponding to each current electricity big data, and comprises the following steps:
determining a current first abnormality corresponding to each current electricity consumption big data according to a current average electricity consumption and a current unit average electricity quantity sequence included in each current electricity consumption big data in the current electricity consumption big data set;
determining a current second abnormality corresponding to each current electricity consumption big data according to a historical unit average electric quantity sequence included in the historical electricity consumption big data sequence corresponding to each current electricity consumption big data in the current electricity consumption big data set and a current unit average electric quantity sequence included in each current electricity consumption big data;
determining a relative abnormality degree corresponding to each current electricity consumption big data according to a current first abnormality and a current second abnormality corresponding to each current electricity consumption big data in the current electricity consumption big data set;
the determining, according to the current average power consumption and the current unit average power consumption sequence included in each current power consumption big data in the current power consumption big data set, a current first abnormality corresponding to each current power consumption big data includes:
determining current power utilization fluctuation parameters corresponding to the current time period according to current average power consumption included by each current power utilization big data in the current power utilization big data set, an average value of the current average power consumption included by the current power utilization big data in the current power utilization big data set and the number of the current power utilization big data in the current power utilization big data set;
determining a current unit electric fluctuation parameter corresponding to each current unit time period included in the current time period according to each current unit average electric quantity in a current unit average electric quantity sequence included in each current electric quantity data in the current electric quantity data set and the quantity of the current electric quantity data in the current electric quantity data set;
determining a current first abnormality corresponding to each current electricity big data according to a current unit average electricity quantity sequence included in each current electricity big data in the current electricity big data set, the current electricity fluctuation parameter, a current unit electric fluctuation parameter corresponding to each current unit time period included in the current time period, and the quantity of the current electricity big data in the current electricity big data set;
the determining, according to a historical unit average electric quantity sequence included in the historical electricity consumption big data sequence corresponding to each current electricity consumption big data in the current electricity consumption big data set and a current unit average electric quantity sequence included in each current electricity consumption big data, a current second abnormality corresponding to each current electricity consumption big data includes:
determining the average value of historical unit average electric quantity in a historical unit average electric quantity sequence included in historical electric quantity data in a historical electric quantity sequence corresponding to the current electric consumption big data as the current total electric quantity average value corresponding to the current electric consumption big data;
and determining a current second abnormality corresponding to each current electricity consumption big data according to a current total electricity quantity average value corresponding to each current electricity consumption big data in the current electricity consumption big data set, a historical unit average electricity quantity sequence included in historical electricity consumption big data in a historical electricity consumption big data sequence corresponding to each current electricity consumption big data, a current unit average electricity quantity sequence included in each current electricity consumption big data, a current time period and a historical time period.
2. The big data analyzing and processing method for digital information according to claim 1, wherein the repeatedly analyzing and processing each current big power consumption data in the current big power consumption data set according to the current big power consumption data set to obtain the corresponding relative repeatability of each current big power consumption data comprises:
performing repeated character extraction on each current electricity consumption big data in the current electricity consumption big data set to generate a repeated character space corresponding to each current electricity consumption big data;
determining a basic repeatability set corresponding to each current electricity consumption big data according to a repeatability character space corresponding to each current electricity consumption big data in the current electricity consumption big data set;
and determining the relative repeatability corresponding to each current electricity consumption big data according to the basic repeatability set corresponding to each current electricity consumption big data.
3. The big data analyzing and processing method for digital information according to claim 2, wherein the determining a basic repetitive set corresponding to each current electricity consumption big data according to a repetitive character space corresponding to each current electricity consumption big data in the current electricity consumption big data set comprises:
extracting repeated characters from a repeated character space corresponding to the current electricity consumption big data and a repeated character space corresponding to other current electricity consumption big data corresponding to the current electricity consumption big data to generate other repeated character spaces and obtain other repeated character space sets corresponding to the current electricity consumption big data, wherein the other current electricity consumption big data corresponding to the current electricity consumption big data are the current electricity consumption big data except the current electricity consumption big data in the current electricity consumption big data set;
and determining basic repeatability according to the current electricity consumption big data set and each other repetitive character space in the other repetitive character space sets corresponding to each current electricity consumption big data to obtain a basic repetitive set corresponding to each current electricity consumption big data.
4. The big data analyzing and processing method for digital information as claimed in claim 1, wherein said clustering the current big data of electricity consumption in the current big data set according to the relative repeatability and the relative abnormality degree corresponding to each current big data of electricity consumption in the current big data set to obtain a current big data category set comprises:
determining the relative repeatability corresponding to each current electricity consumption big data in the current electricity consumption big data set as the abscissa corresponding to the current electricity consumption big data;
determining the relative abnormality degree corresponding to each current electricity consumption big data in the current electricity consumption big data set as a vertical coordinate corresponding to the current electricity consumption big data;
combining the abscissa and the ordinate corresponding to each current electricity consumption big data in the current electricity consumption big data set into the current coordinate corresponding to the current electricity consumption big data;
determining the Euclidean distance between each piece of current electricity consumption big data in the current electricity consumption big data set according to the current coordinate corresponding to each piece of current electricity consumption big data in the current electricity consumption big data set;
and clustering the current electricity consumption big data in the current electricity consumption big data set according to the Euclidean distance between the current electricity consumption big data in the current electricity consumption big data set to obtain a current electricity consumption big data category set.
5. The big data analysis and processing method for digital information according to claim 1, wherein the classifying, storing and processing the current big data of electricity consumption in the current big data category set comprises:
compressing each current electricity consumption big data in each current electricity consumption big data category in the current electricity consumption big data category set to obtain a compressed file corresponding to the current electricity consumption big data category;
and classifying and storing the compressed files corresponding to each current electricity consumption big data category in the current electricity consumption big data category set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211568255.9A CN115563193B (en) | 2022-12-08 | 2022-12-08 | Big data analysis processing method for digital information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211568255.9A CN115563193B (en) | 2022-12-08 | 2022-12-08 | Big data analysis processing method for digital information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115563193A CN115563193A (en) | 2023-01-03 |
CN115563193B true CN115563193B (en) | 2023-03-10 |
Family
ID=84770203
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211568255.9A Active CN115563193B (en) | 2022-12-08 | 2022-12-08 | Big data analysis processing method for digital information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115563193B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005147766A (en) * | 2003-11-12 | 2005-06-09 | Toshiba Corp | Charging system for electric power charge and watt-hour meter for the same |
CN105630885A (en) * | 2015-12-18 | 2016-06-01 | 国网福建省电力有限公司泉州供电公司 | Abnormal power consumption detection method and system |
CN112925827A (en) * | 2021-03-04 | 2021-06-08 | 南京怡晟安全技术研究院有限公司 | User property abnormity analysis method based on power acquisition Internet of things data |
CN113032454A (en) * | 2021-03-01 | 2021-06-25 | 南京谱隘网络科技有限公司 | Interactive user power consumption abnormity monitoring and early warning management cloud platform based on cloud computing |
CN114004296A (en) * | 2021-11-01 | 2022-02-01 | 江苏瑞中数据股份有限公司 | Method and system for reversely extracting monitoring points based on power load characteristics |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106447534A (en) * | 2016-09-22 | 2017-02-22 | 国网上海市电力公司 | Method for determining stability of power mode based on gray relational analysis |
-
2022
- 2022-12-08 CN CN202211568255.9A patent/CN115563193B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005147766A (en) * | 2003-11-12 | 2005-06-09 | Toshiba Corp | Charging system for electric power charge and watt-hour meter for the same |
CN105630885A (en) * | 2015-12-18 | 2016-06-01 | 国网福建省电力有限公司泉州供电公司 | Abnormal power consumption detection method and system |
CN113032454A (en) * | 2021-03-01 | 2021-06-25 | 南京谱隘网络科技有限公司 | Interactive user power consumption abnormity monitoring and early warning management cloud platform based on cloud computing |
CN112925827A (en) * | 2021-03-04 | 2021-06-08 | 南京怡晟安全技术研究院有限公司 | User property abnormity analysis method based on power acquisition Internet of things data |
CN114004296A (en) * | 2021-11-01 | 2022-02-01 | 江苏瑞中数据股份有限公司 | Method and system for reversely extracting monitoring points based on power load characteristics |
Non-Patent Citations (2)
Title |
---|
Analysis of PLC Transmission Data Based on Clustering;Chiguang Chen;《2017 9th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC)》;20170921;全文 * |
基于分布式计算的海量用电数据分析技术研究;蒋菱等;《计算机技术与发展》;20161231(第12期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN115563193A (en) | 2023-01-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117113235B (en) | Cloud computing data center energy consumption optimization method and system | |
CN111815060A (en) | Short-term load prediction method and device for power utilization area | |
CN108345908A (en) | Sorting technique, sorting device and the storage medium of electric network data | |
CN108898248B (en) | Power load influence factor quantitative analysis method, device, equipment and medium | |
CN112308459A (en) | Power grid household transformation relation identification method and identification device, and electronic equipment | |
Shamim et al. | Multi-domain feature extraction for improved clustering of smart meter data | |
CN112330153A (en) | Non-linear orthogonal regression-based industry scale prediction model modeling method and device | |
CN114254838A (en) | Method for determining short-term power load prediction influence factor | |
CN117973899A (en) | Land development and management information intelligent management system based on big data | |
CN114118624A (en) | Power demand response potential evaluation method, device, equipment and storage medium | |
CN118228069A (en) | Method, apparatus, device, medium and program product for predicting electric load | |
CN116883059B (en) | Distribution terminal management method and system | |
CN110781959A (en) | Power customer clustering method based on BIRCH algorithm and random forest algorithm | |
CN115563193B (en) | Big data analysis processing method for digital information | |
CN112257964B (en) | Load-intensive urban intelligent park demand aggregation modeling method | |
CN116774064A (en) | Battery self-discharge detection method, device, equipment and storage medium | |
CN114004408B (en) | User power load prediction method based on data analysis | |
CN113919449B (en) | Resident electric power data clustering method and device based on precise fuzzy clustering algorithm | |
CN112614005B (en) | Method and device for processing reworking state of enterprise | |
CN112862179A (en) | Energy consumption behavior prediction method and device and computer equipment | |
CN111797924B (en) | Three-dimensional garden portrait method and system based on clustering algorithm | |
CN108599140B (en) | Power load characteristic analysis method and device and storage medium | |
CN110807599A (en) | Method, device, server and storage medium for deciding electrochemical energy storage scheme | |
CN118503503B (en) | Multidimensional-based contracted urban data collection and arrangement method and system | |
TWI802245B (en) | Power consumption analysis system and power consumption analysis method based on non-intrusive appliance load monitoring |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |