CN117370331B - Method and device for cleaning total water consumption data of cell, terminal equipment and storage medium - Google Patents

Method and device for cleaning total water consumption data of cell, terminal equipment and storage medium Download PDF

Info

Publication number
CN117370331B
CN117370331B CN202311680742.9A CN202311680742A CN117370331B CN 117370331 B CN117370331 B CN 117370331B CN 202311680742 A CN202311680742 A CN 202311680742A CN 117370331 B CN117370331 B CN 117370331B
Authority
CN
China
Prior art keywords
data
abnormal
total
water
cell
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311680742.9A
Other languages
Chinese (zh)
Other versions
CN117370331A (en
Inventor
张自力
张娟
刘书明
牛豫海
田志民
梁策
陈司晗
王晓婷
吴以朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Xiong'an Ruitian Technology Co ltd
Hebei Construction & Investment Water Investment Co ltd
Tsinghua University
Original Assignee
Hebei Xiong'an Ruitian Technology Co ltd
Hebei Construction & Investment Water Investment Co ltd
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Xiong'an Ruitian Technology Co ltd, Hebei Construction & Investment Water Investment Co ltd, Tsinghua University filed Critical Hebei Xiong'an Ruitian Technology Co ltd
Priority to CN202311680742.9A priority Critical patent/CN117370331B/en
Publication of CN117370331A publication Critical patent/CN117370331A/en
Application granted granted Critical
Publication of CN117370331B publication Critical patent/CN117370331B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure provides a method and a device for cleaning total data of water consumption of a cell, terminal equipment and a storage medium, wherein the method comprises the following steps: performing cluster analysis on the total data of the cell water by adopting a preset first cluster radius to obtain abnormal data points in the total data of the cell water; detecting abnormal data fragments in the total data of the district water by adopting a preset sliding window, and deleting the abnormal data fragments from the total data of the district water to obtain preliminary cleaning data; performing cluster analysis on the preliminary cleaning data by adopting a preset second cluster radius to obtain abnormal data points in the preliminary cleaning data; the first cluster radius is smaller than the second cluster radius; and deleting abnormal data points in the preliminary cleaning data to obtain the total data of the cleaned community water. The method for cleaning the total data of the water used in the community can improve the accuracy of the total data of the water used in the community.

Description

Method and device for cleaning total water consumption data of cell, terminal equipment and storage medium
Technical Field
The disclosure belongs to the technical field of data processing, and in particular relates to a method and a device for cleaning total data of water used in a community, terminal equipment and a storage medium.
Background
In a smart city, the dynamic data of the user nodes mainly comprise data of three tables of water, electricity and gas, wherein water is a living necessity, and therefore, the water quantity data is particularly important in a smart city perception layer.
In practical application, the data uploaded by the water meter has the problems of interruption, deletion and the like of the power failure data, so that the existing water meter data is required to be used for supplementing the deleted water meter data, abnormal data exists in the existing water meter data, and the accuracy of the water meter data can be influenced by supplementing the deleted water meter data according to the abnormal data.
Disclosure of Invention
The invention aims to provide a method and a device for cleaning total data of water consumption of a cell, terminal equipment and a storage medium, so as to improve the accuracy of the total data of water consumption of the cell.
In a first aspect of an embodiment of the present disclosure, a method for cleaning total data of water used in a cell is provided, including:
acquiring total water consumption data of a community;
performing cluster analysis on the total water data of the cell based on a preset first cluster radius to obtain abnormal data points in the total water data of the cell;
detecting abnormal data fragments in the total water consumption data of the community based on a preset sliding window, and deleting the abnormal data fragments from the total water consumption data of the community to obtain preliminary cleaning data; the abnormal data segment comprises all data points in an abnormal sliding window, wherein the abnormal sliding window is a sliding window with the duty ratio of the abnormal data points larger than a set threshold value in the sliding windows of all positions;
performing cluster analysis on the preliminary cleaning data based on a preset second cluster radius to obtain abnormal data points in the preliminary cleaning data; the first cluster radius is smaller than the second cluster radius;
and deleting abnormal data points in the preliminary cleaning data to obtain the total data of the cleaned community water.
In a second aspect of the embodiments of the present disclosure, there is provided a total data cleaning apparatus for water for a cell, including:
the data acquisition unit is used for acquiring the total water consumption data of the community;
the first clustering unit is used for carrying out cluster analysis on the total water consumption data of the cell based on a preset first cluster radius to obtain abnormal data points in the total water consumption data of the cell;
the data screening unit is used for detecting abnormal data fragments in the total water consumption data of the cell based on a preset sliding window, deleting the abnormal data fragments from the total water consumption data of the cell and obtaining preliminary cleaning data; the abnormal data segment comprises all data points in an abnormal sliding window, wherein the abnormal sliding window is a sliding window with the duty ratio of the abnormal data points larger than a set threshold value in the sliding windows of all positions;
the second clustering unit is used for carrying out cluster analysis on the preliminary cleaning data based on a preset second clustering radius to obtain abnormal data points in the preliminary cleaning data; the first cluster radius is smaller than the second cluster radius;
and the data deleting unit is used for deleting abnormal data points in the preliminary cleaning data to obtain the total data of the cleaned community water.
In a third aspect of the disclosed embodiments, a terminal device is provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method for cleaning total data of water for a cell described above when the processor executes the computer program.
In a fourth aspect of the disclosed embodiments, there is provided a computer storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described total cell water data cleaning method.
The method and device for cleaning the total data of the community water, the terminal equipment and the storage medium have the beneficial effects that:
firstly, obtaining abnormal data points in total water data of a cell through cluster analysis, detecting abnormal data fragments in the total water data of the cell by adopting a sliding window method, deleting the abnormal data fragments from the total water data of the cell, and obtaining preliminary cleaning data; after deleting the abnormal data segment, new abnormal points may exist in the preliminary cleaning data, so that abnormal data points in the preliminary cleaning data are obtained through the clustering analysis again, and are deleted, so that the total data of the cleaned community water is obtained.
Through the above process, the embodiments of the present disclosure may achieve the following effects:
1. the first clustering radius is smaller to perform clustering analysis, so that more abnormal data points can be screened out, wherein the abnormal data points are data points with larger distances from other data points.
2. When detecting abnormal data fragments, judging whether the data fragments in the whole sliding window are abnormal data fragments according to the duty ratio of the abnormal data points, and avoiding that the detection errors of single abnormal data points affect the data cleaning effect; meanwhile, after the abnormal data fragments are screened out, the abnormal data fragments are deleted integrally, and compared with a method for deleting single abnormal data points, the method improves the deleting efficiency of the abnormal data.
3. The cluster analysis is carried out again by adopting the larger second cluster radius, the abnormal data points in the primary cleaning data are further coarsely screened, and the previous cluster analysis is equivalent to fine screening of the abnormal data points in the total water consumption data of the cell.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required for the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.
Fig. 1 is a schematic flow chart of a method for cleaning total data of water for a community according to an embodiment of the disclosure;
fig. 2 is a block diagram of a total data cleaning device for water used in a community according to an embodiment of the present disclosure;
fig. 3 is a schematic block diagram of a terminal device according to an embodiment of the present disclosure.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.
For the purposes of promoting an understanding of the principles and advantages of the disclosure, reference will now be made to the embodiments illustrated in the drawings.
Referring to fig. 1, fig. 1 is a flow chart of a method for cleaning total data of water for a cell according to an embodiment of the disclosure, where the method includes:
s100: and acquiring total data of the water used by the cell.
In this embodiment, the total cell water usage data is from the total cell water usage flow meter, i.e., the total cell water meter. The total data of the water for the district is obtained by acquiring the data of the total flowmeter of the water for the district in a period of time, such as 1 year.
S200: and carrying out cluster analysis on the total data of the cell water based on a preset first cluster radius to obtain abnormal data points in the total data of the cell water.
In this embodiment, a common cluster analysis method, such as K-means clustering and density-based clustering DBSCAN, is adopted to implement cluster analysis, and the implementation specifically adopts density-based clustering DBSCAN.
S300: detecting abnormal data fragments in the total data of the district water based on a preset sliding window, and deleting the abnormal data fragments from the total data of the district water to obtain preliminary cleaning data; the abnormal data segment comprises all data points in an abnormal sliding window, wherein the abnormal sliding window is a sliding window with the occupancy rate of the abnormal data points larger than a set threshold value in the sliding window of all positions.
In this embodiment, the method of sliding window is used to detect the abnormal data segment in the total data of the water for the cell, taking the length of the sliding window equal to 10 as an example, at the time of initialization, the data points included in the sliding window are the 1 st data point, the 2 nd data point, …, and the 10 th data point, if 8 abnormal data points are included in the 10 data points, that is, the ratio of the abnormal data points is 80%, and is greater than the set threshold (here, the set threshold is 60%), the data points in the sliding window are judged to be abnormal data segments. Then, the position of the sliding window is moved, at this time, the data points contained in the sliding window are the 2 nd data point, the 3 rd data point, the … th data point, the 10 th data point and the 11 th data point, the duty ratio of the abnormal data point in the 10 data points is calculated again, and then whether the data point in the sliding window is an abnormal data segment is judged at this time, and the like until all the data points are traversed.
The person skilled in the art can flexibly design the length of the sliding window and set the threshold according to actual needs, for example, the length of the sliding window is 10, 20, 30, etc., and the threshold is 60%,70%,80%, etc.
S400: performing cluster analysis on the preliminary cleaning data based on a preset second cluster radius to obtain abnormal data points in the preliminary cleaning data; the first cluster radius is smaller than the second cluster radius.
In this embodiment, after deleting the abnormal data segment, new abnormal data points may be brought, so that the abnormal data points in the preliminary cleaning data are detected again through cluster analysis, so as to improve the comprehensiveness of abnormal data detection.
S500: and deleting abnormal data points in the preliminary cleaning data to obtain the total data of the cleaned community water.
In this embodiment, abnormal data points obtained in the cluster analysis are deleted from the preliminary cleaning data, and the cleaning effect is further improved.
As can be seen from the above, in the embodiment of the present disclosure, firstly, abnormal data points in the total water consumption data of the cell are obtained through cluster analysis, and abnormal data segments in the total water consumption data of the cell are detected by adopting a sliding window method, and the abnormal data segments are deleted from the total water consumption data of the cell, so as to obtain preliminary cleaning data; after deleting the abnormal data segment, new abnormal points may exist in the preliminary cleaning data, so that abnormal data points in the preliminary cleaning data are obtained through the clustering analysis again, and are deleted, so that the total data of the cleaned community water is obtained.
The first clustering radius is smaller to perform clustering analysis, so that more abnormal data points can be screened out, wherein the abnormal data points are data points with larger distances from other data points.
When detecting abnormal data fragments, judging whether the data fragments in the whole sliding window are abnormal data fragments according to the duty ratio of the abnormal data points, and avoiding that the detection errors of single abnormal data points affect the data cleaning effect; meanwhile, after the abnormal data fragments are screened out, the abnormal data fragments are deleted integrally, and compared with a method for deleting single abnormal data points, the method improves the deleting efficiency of the abnormal data.
The cluster analysis is carried out again by adopting the larger second cluster radius, the abnormal data points in the primary cleaning data are further coarsely screened, and the previous cluster analysis is equivalent to fine screening of the abnormal data points in the total water consumption data of the cell.
In one embodiment of the present disclosure, performing cluster analysis on total data of water for a cell based on a preset first cluster radius to obtain abnormal data points in the total data of water for a cell, including:
and drawing a first scatter diagram according to the total water consumption data of the cell.
And carrying out cluster analysis on the first scatter diagram to obtain abnormal points in the first scatter diagram, wherein the abnormal points in the first scatter diagram are determined to be abnormal data points in the total water consumption data of the cell.
In this embodiment, a first scatter diagram of total water consumption data of a cell is drawn, DBSCAN cluster analysis is performed in the first scatter diagram, adjacent data points in the first scatter diagram are data points with time close to that of water consumption, and if a certain data point is far away from the adjacent data point, the data point is indicated to be an abnormal data point.
In one embodiment of the disclosure, performing cluster analysis on the preliminary cleaning data based on a preset second cluster radius to obtain abnormal data points in the preliminary cleaning data, including;
and drawing a second scatter diagram according to the preliminary cleaning data.
And carrying out cluster analysis on the second scatter diagram to obtain abnormal points in the second scatter diagram, wherein the abnormal points in the second scatter diagram are determined to be abnormal data points in the preliminary cleaning data.
In this embodiment, abnormal data points in the preliminary cleaning data are obtained by drawing a second scatter plot of the preliminary cleaning data and performing DBSCAN cluster analysis in the second scatter plot.
In one embodiment of the present disclosure, a method for setting a first cluster radius includes:
s1: and obtaining a preset third radius of the class.
S2: and performing cluster analysis on the sample data based on the third cluster radius. The sample data is the total data of the cell water which is extracted randomly.
S3: and if the number of the data points contained in the clustering area obtained after the clustering analysis is larger than or equal to a first preset value, reducing the third cluster radius, and returning to the execution step S2.
S4: and if the number of the data points contained in the clustering area obtained after the clustering analysis is smaller than a first preset value, determining a third cluster radius used last time when the step S2 is executed as a first cluster radius.
The second cluster radius setting method is the same as the first cluster radius setting method.
In this embodiment, the first cluster radius is determined by a method of presetting a first preset value, which specifically includes the following steps: giving a first preset value of 20, giving an initial third cluster radius of 10, carrying out cluster analysis on sample data, and reducing the third cluster radius to 9 when the number of data points contained in each cluster area is 30 and is more than or equal to the first preset value of 20 after the cluster analysis is finished; and executing cluster analysis again, wherein the number of data points contained in each cluster area is 25 and is larger than or equal to a first preset value 20, then continuing to reduce the third cluster radius to be 8, and executing cluster analysis again until the number of data points contained in each cluster area is 19 and is smaller than the first preset value when the third cluster radius is 7. At this time, the third cluster radius 8 determined in the previous cluster analysis is determined as the third cluster radius.
Through the process, the value of the first clustering radius is reduced as much as possible on the premise of meeting the first preset value.
The specific value of the first preset value can be flexibly designed according to actual needs by a person skilled in the art, and a method of multiple experiments can be adopted, for example, the first preset value is given as 25, and the corresponding third radius of the class is determined according to the above process; then reducing the first preset value to 20, and determining a corresponding third class radius according to the process; and adding a first preset value to be 30, and determining a corresponding third class radius according to the process. And comparing the effects of the multiple clustering analysis, and selecting a first preset value with the best clustering analysis effect and a third corresponding radius of the cluster.
The second cluster radius setting method is the same as the first cluster radius setting direction, that is, the second cluster radius is determined by presetting a second preset value and giving an initial fourth cluster radius, and the difference is that: the second preset value is larger than the first preset value, and the fourth cluster radius is larger than the third cluster radius.
In one embodiment of the present disclosure, each data point includes a one-to-one correspondence of water usage time and water usage data, and prior to performing the cluster analysis on the first scatter plot, further comprising:
for any one data point, a difference between the water usage data corresponding to the any one data point and the water usage data corresponding to the previous data point is calculated.
And determining the difference value as the coordinate value of any data point, updating the first scatter diagram, and determining the coordinate value as the ordinate value or the abscissa value.
In this embodiment, the unit of the water consumption time is day, the corresponding coordinate axis scale is 1, the value of the water consumption data may be several thousand tons, so the corresponding coordinate axis scale is 1000 tons, and the difference between the water consumption data of different water consumption dates may be only several tens tons, which results in that the distance between the data points corresponding to adjacent water consumption dates is very short on the first scatter diagram, which is not beneficial to detecting abnormal data points. In this embodiment, the difference between the adjacent water-using data points is calculated, and the difference is used as the coordinate value of each data point, and since the difference is only tens of tons, the scale of the corresponding coordinate axis can be set to 10, so that the difference of tens of tons between the adjacent data points can be displayed as a larger distance on the first scatter diagram, and overlapping between the adjacent data points is avoided.
In one embodiment of the present disclosure, before performing cluster analysis on the first scatter plot, the method further includes:
and carrying out normalization processing on the coordinate values of each data point to obtain normalized coordinate values.
And updating the first scatter diagram according to the normalized coordinate values.
In this embodiment, by performing normalization processing on the coordinate values, the coordinate axis scale corresponding to the water use data is used to be consistent with the coordinate axis scale corresponding to the water use time, so that the definition of the data points displayed on the first scatter diagram is further improved. The specific normalization method comprises the following steps:
for any data point a, throughA normalized value for the data point is calculated. Wherein (1)>Coordinate value for data point a>N is the number of data points for the normalized value.
In one embodiment of the present disclosure, each data point includes a one-to-one correspondence of water usage time and water usage data, and before performing a cluster analysis on the total cell water usage data based on a preset first cluster radius, further includes:
the average of the water usage data for all data points in the total cell water usage data is calculated.
And taking the water consumption data which is larger than the preset times of the average value as abnormal water consumption data, and deleting data points containing the abnormal water consumption data from the total water consumption data.
In this embodiment, by calculating the average value of the water consumption data of all the data points and deleting the data points larger than the preset multiple of the average value, the data points with obvious abnormality can be deleted, the pre-cleaning of the total water consumption data of the cell is realized, and the calculation amount of the subsequent data cleaning is reduced.
The values of the preset multiples, such as 5, 10, etc., can be flexibly designed according to actual needs by a person skilled in the art.
Corresponding to the method for cleaning total data of water for a community in the above embodiment, fig. 2 is a block diagram of a device for cleaning total data of water for a community according to an embodiment of the present disclosure. For ease of illustration, only portions relevant to embodiments of the present disclosure are shown. Referring to fig. 2, the total water-in-cell total data washing apparatus 20 includes: a data acquisition unit 21, a first clustering unit 22, a data screening unit 23, a second clustering unit 24, and a data deletion unit 25.
Wherein the data acquisition unit 21 is configured to acquire total data of water used in the cell.
The first clustering unit 22 is configured to perform cluster analysis on the total cell water consumption data based on a preset first cluster radius, so as to obtain abnormal data points in the total cell water consumption data.
The data screening unit 23 detects abnormal data segments in the total data of the water for the cell based on a preset sliding window, and deletes the abnormal data segments from the total data of the water for the cell to obtain preliminary cleaning data; the abnormal data segment comprises all data points in an abnormal sliding window, wherein the abnormal sliding window is a sliding window with the occupancy rate of the abnormal data points larger than a set threshold value in the sliding window of all positions.
A second clustering unit 24, configured to perform cluster analysis on the preliminary cleaning data based on a preset second cluster radius, so as to obtain abnormal data points in the preliminary cleaning data; the first cluster radius is smaller than the second cluster radius.
And a data deleting unit 25 for deleting abnormal data points in the preliminary cleaning data to obtain the total data of the cleaned cell water.
In one embodiment of the present disclosure, the first clustering unit 22 is specifically configured to:
and drawing a first scatter diagram according to the total water consumption data of the cell.
And carrying out cluster analysis on the first scatter diagram to obtain abnormal points in the first scatter diagram, wherein the abnormal points in the first scatter diagram are determined to be abnormal data points in the total water consumption data of the cell.
In one embodiment of the present disclosure, the second aggregation unit 24 is specifically configured to:
and drawing a second scatter diagram according to the preliminary cleaning data.
And carrying out cluster analysis on the second scatter diagram to obtain abnormal points in the second scatter diagram, wherein the abnormal points in the second scatter diagram are determined to be abnormal data points in the preliminary cleaning data.
In an embodiment of the present disclosure, the first clustering unit 22 is further configured to:
s1: and obtaining a preset third radius of the class.
S2: performing cluster analysis on the sample data based on the third cluster radius; the sample data is the total data of the cell water which is extracted randomly.
S3: and if the number of the data points contained in the clustering area obtained after the clustering analysis is larger than or equal to a first preset value, reducing the third cluster radius, and returning to the execution step S2.
S4: and if the number of the data points contained in the clustering area obtained after the clustering analysis is smaller than a first preset value, determining a third cluster radius used last time when the step S2 is executed as a first cluster radius.
The second cluster radius setting method is the same as the first cluster radius setting method.
In one embodiment of the present disclosure, each data point includes a one-to-one correspondence of water usage time and water usage data, and the first clustering unit 22 is further configured to:
for any one data point, a difference between the water usage data corresponding to the any one data point and the water usage data corresponding to the previous data point is calculated.
And determining the difference value as the coordinate value of any data point, updating the first scatter diagram, and determining the coordinate value as the ordinate value or the abscissa value.
In an embodiment of the present disclosure, the first clustering unit 22 is further configured to:
and carrying out normalization processing on the coordinate values of each data point to obtain normalized coordinate values.
And updating the first scatter diagram according to the normalized coordinate values.
In one embodiment of the present disclosure, each data point includes a one-to-one correspondence of water usage time and water usage data, and the data acquisition unit 21 is further configured to:
the average of the water usage data for all data points in the total cell water usage data is calculated.
And taking the water consumption data which is larger than the preset times of the average value as abnormal water consumption data, and deleting data points containing the abnormal water consumption data from the total water consumption data.
Referring to fig. 3, fig. 3 is a schematic block diagram of a terminal device according to an embodiment of the present disclosure. The terminal 300 in the present embodiment as shown in fig. 3 may include: one or more processors 301, one or more input devices 302, one or more output devices 303, and one or more memories 304. The processor 301, the input device 302, the output device 303, and the memory 304 communicate with each other via a communication bus 305. The memory 304 is used to store a computer program comprising program instructions. The processor 301 is configured to execute program instructions stored in the memory 304. Wherein the processor 301 is configured to invoke program instructions to perform the functions of the modules/units of the various device embodiments described above, such as the functions of the modules 21-25 shown in fig. 2.
It should be appreciated that in the disclosed embodiments, the processor 301 may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The input device 302 may include a touch pad, a fingerprint sensor (for collecting fingerprint information of a user and direction information of a fingerprint), a microphone, etc., and the output device 303 may include a display (LCD, etc.), a speaker, etc.
The memory 304 may include read only memory and random access memory and provides instructions and data to the processor 301. A portion of memory 304 may also include non-volatile random access memory. For example, the memory 304 may also store information of device type.
In a specific implementation, the processor 301, the input device 302, and the output device 303 described in the embodiments of the present disclosure may perform the implementation manners described in the first embodiment and the second embodiment of the method for cleaning total data of water for a cell provided in the embodiments of the present disclosure, and may also perform the implementation manner of the terminal described in the embodiments of the present disclosure, which is not described herein again.
In another embodiment of the present disclosure, a computer storage medium is provided, where the computer storage medium stores a computer program, where the computer program includes program instructions, where the program instructions, when executed by a processor, implement all or part of the procedures in the method embodiments described above, or may be implemented by instructing related hardware by the computer program, where the computer program may be stored in a computer storage medium, where the computer program, when executed by the processor, implements the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.
The computer storage medium may be an internal storage unit of the terminal of any of the foregoing embodiments, for example, a hard disk or a memory of the terminal. The computer storage medium may also be an external storage device of the terminal, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal. Further, the computer storage medium may also include both an internal storage unit and an external storage device of the terminal. The computer storage medium is used for storing computer programs and other programs and data required by the terminal. The computer storage medium may also be used for temporarily storing data that has been output or is to be output.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working procedures of the terminal and the unit described above may refer to the corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In several embodiments provided in the present application, it should be understood that the disclosed terminal and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via some interfaces or units, or may be an electrical, mechanical, or other form of connection.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purposes of the embodiments of the present disclosure.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The foregoing is merely a specific embodiment of the present disclosure, but the protection scope of the present disclosure is not limited thereto, and any equivalent modifications or substitutions will be apparent to those skilled in the art within the scope of the present disclosure, and these modifications or substitutions should be covered in the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (10)

1. A method for cleaning total data of water used in a community, comprising the steps of:
acquiring total water consumption data of a community;
performing cluster analysis on the total water data of the cell based on a preset first cluster radius to obtain abnormal data points in the total water data of the cell;
detecting abnormal data fragments in the total water consumption data of the community based on a preset sliding window, and deleting the abnormal data fragments from the total water consumption data of the community to obtain preliminary cleaning data; the abnormal data segment comprises all data points in an abnormal sliding window, wherein the abnormal sliding window is a sliding window with the duty ratio of the abnormal data points larger than a set threshold value in the sliding windows of all positions;
performing cluster analysis on the preliminary cleaning data based on a preset second cluster radius to obtain abnormal data points in the preliminary cleaning data; the first cluster radius is smaller than the second cluster radius;
and deleting abnormal data points in the preliminary cleaning data to obtain the total data of the cleaned community water.
2. The method for cleaning total data of water for a cell as claimed in claim 1, wherein the performing cluster analysis on the total data of water for a cell based on a preset first cluster radius to obtain abnormal data points in the total data of water for a cell comprises:
drawing a first scatter diagram according to the total water consumption data of the cell;
performing cluster analysis on the first scatter diagram to obtain abnormal points in the first scatter diagram, and determining the abnormal points in the first scatter diagram as abnormal data points in the total water consumption data of the cell.
3. The method for cleaning total data of water for a cell as claimed in claim 1, wherein the performing cluster analysis on the preliminary cleaning data based on a preset second cluster radius to obtain abnormal data points in the preliminary cleaning data comprises:
drawing a second scatter diagram according to the preliminary cleaning data;
performing cluster analysis on the second scatter diagram to obtain abnormal points in the second scatter diagram, and determining the abnormal points in the second scatter diagram as abnormal data points in the preliminary cleaning data.
4. The method for cleaning total data of water for a cell as claimed in claim 1, wherein the method for setting the first cluster radius comprises:
s1: acquiring a preset third class radius;
s2: performing cluster analysis on the sample data based on the third cluster radius; the sample data are total data of cell water extracted randomly;
s3: if the number of the data points contained in the clustering area obtained after the clustering analysis is larger than or equal to a first preset value, reducing the third radius of the third class, and returning to the step S2;
s4: if the number of the data points contained in the clustering area obtained after the clustering analysis is smaller than a first preset value, determining a third cluster radius used in the last execution of the step S2 as a first cluster radius;
the setting method of the second cluster radius is the same as the setting method of the first cluster radius.
5. The method of total data cleaning for a cell water according to claim 2, wherein each data point includes a one-to-one correspondence of water time and water data, and further comprising, prior to performing cluster analysis on the first scatter plot:
for any data point, calculating a difference value between the water consumption data corresponding to any data point and the water consumption data corresponding to the previous data point;
and determining the difference value as a coordinate value of any data point, and updating the first scatter diagram, wherein the coordinate value is a vertical coordinate value or a horizontal coordinate value.
6. The method of total data cleaning for a cell water of claim 5, further comprising, prior to performing cluster analysis on the first scatter plot:
normalizing the coordinate value of each data point to obtain a normalized coordinate value;
and updating the first scatter diagram according to the normalized coordinate values.
7. The method of claim 1, wherein each data point includes a one-to-one correspondence of water usage time and water usage data, and further comprising, prior to performing a cluster analysis on the total cell water usage data based on a preset first cluster radius:
calculating the average value of water consumption data of all data points in the total water consumption data of the cell;
and taking the water consumption data which is larger than the average value preset multiple as abnormal water consumption data, and deleting data points containing the abnormal water consumption data from the total water consumption data.
8. A total data cleaning apparatus for a cell water, comprising:
the data acquisition unit is used for acquiring the total water consumption data of the community;
the first clustering unit is used for carrying out cluster analysis on the total water consumption data of the cell based on a preset first cluster radius to obtain abnormal data points in the total water consumption data of the cell;
the data screening unit is used for detecting abnormal data fragments in the total water consumption data of the cell based on a preset sliding window, deleting the abnormal data fragments from the total water consumption data of the cell and obtaining preliminary cleaning data; the abnormal data segment comprises all data points in an abnormal sliding window, wherein the abnormal sliding window is a sliding window with the duty ratio of the abnormal data points larger than a set threshold value in the sliding windows of all positions;
the second clustering unit is used for carrying out cluster analysis on the preliminary cleaning data based on a preset second clustering radius to obtain abnormal data points in the preliminary cleaning data; the first cluster radius is smaller than the second cluster radius;
and the data deleting unit is used for deleting abnormal data points in the preliminary cleaning data to obtain the total data of the cleaned community water.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when the computer program is executed.
10. A computer storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 7.
CN202311680742.9A 2023-12-08 2023-12-08 Method and device for cleaning total water consumption data of cell, terminal equipment and storage medium Active CN117370331B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311680742.9A CN117370331B (en) 2023-12-08 2023-12-08 Method and device for cleaning total water consumption data of cell, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311680742.9A CN117370331B (en) 2023-12-08 2023-12-08 Method and device for cleaning total water consumption data of cell, terminal equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117370331A CN117370331A (en) 2024-01-09
CN117370331B true CN117370331B (en) 2024-02-20

Family

ID=89389680

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311680742.9A Active CN117370331B (en) 2023-12-08 2023-12-08 Method and device for cleaning total water consumption data of cell, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117370331B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605362A (en) * 2013-09-11 2014-02-26 天津工业大学 Learning and anomaly detection method based on multi-feature motion modes of vehicle traces
CN115129503A (en) * 2022-07-12 2022-09-30 上海交通大学 Equipment fault data cleaning method and system
WO2022262869A1 (en) * 2021-06-18 2022-12-22 工业互联网创新中心(上海)有限公司 Data processing method and apparatus, network device, and storage medium
CN115878599A (en) * 2022-10-26 2023-03-31 河北雄安睿天科技有限公司 Sewage industry data cleaning method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103605362A (en) * 2013-09-11 2014-02-26 天津工业大学 Learning and anomaly detection method based on multi-feature motion modes of vehicle traces
WO2022262869A1 (en) * 2021-06-18 2022-12-22 工业互联网创新中心(上海)有限公司 Data processing method and apparatus, network device, and storage medium
CN115129503A (en) * 2022-07-12 2022-09-30 上海交通大学 Equipment fault data cleaning method and system
CN115878599A (en) * 2022-10-26 2023-03-31 河北雄安睿天科技有限公司 Sewage industry data cleaning method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于多阶段递进识别的风电机组异常运行数据清洗方法;王一妹,刘辉,宋鹏;《可再生能源》;全文 *

Also Published As

Publication number Publication date
CN117370331A (en) 2024-01-09

Similar Documents

Publication Publication Date Title
WO2021184727A1 (en) Data abnormality detection method and apparatus, electronic device and storage medium
CN109034244B (en) Line loss abnormity diagnosis method and device based on electric quantity curve characteristic model
CN110365703B (en) Internet of things terminal abnormal state detection method and device and terminal equipment
CN108090567B (en) Fault diagnosis method and device for power communication system
WO2019169826A1 (en) Risk control method for determining irregular medical insurance behavior by means of data analysis
US20140006044A1 (en) System and method for preparing healthcare service bundles
CN111709756A (en) Method and device for identifying suspicious communities, storage medium and computer equipment
CN111428978A (en) Subsidy calculation method and device, electronic equipment and storage medium
CN111090780A (en) Method and device for determining suspicious transaction information, storage medium and electronic equipment
CN113034046A (en) Data risk metering method and device, electronic equipment and storage medium
CN117370331B (en) Method and device for cleaning total water consumption data of cell, terminal equipment and storage medium
CN113379469A (en) Abnormal flow detection method, device, equipment and storage medium
CN113823372A (en) Data collection and processing system for liver cancer recurrence prediction
CN111242779B (en) Financial data characteristic selection and prediction method, device, equipment and storage medium
CN112949697A (en) Method and device for confirming pipeline abnormity and computer readable storage medium
CN109615204B (en) Quality evaluation method, device and equipment of medical data and readable storage medium
WO2020087970A1 (en) Neural network-based disease type score verification method and computing device
CN108830663B (en) Electric power customer value evaluation method and system and terminal equipment
CN108961071B (en) Method for automatically predicting combined service income and terminal equipment
CN110265127A (en) Disease cost measuring method, device and terminal device
CN111383766A (en) Computer data processing method, device, medium and electronic equipment
CN112598228B (en) Enterprise competitiveness analysis method, device, equipment and storage medium
CN111199345B (en) Method for measuring and calculating design rainfall of sponge city and terminal equipment
CN110648213A (en) Address verification method, device, equipment and computer readable medium
CN114785616A (en) Data risk detection method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant