CN110059236A - A kind of application network crawler technology carries out the data sampling and processing method of power planning receipts money - Google Patents

A kind of application network crawler technology carries out the data sampling and processing method of power planning receipts money Download PDF

Info

Publication number
CN110059236A
CN110059236A CN201910235907.9A CN201910235907A CN110059236A CN 110059236 A CN110059236 A CN 110059236A CN 201910235907 A CN201910235907 A CN 201910235907A CN 110059236 A CN110059236 A CN 110059236A
Authority
CN
China
Prior art keywords
data
acquisition
acquired
mode
website
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910235907.9A
Other languages
Chinese (zh)
Other versions
CN110059236B (en
Inventor
张国华
江明水
王毅峰
王彦铭
李小娴
郑维明
黄东明
马会军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Fujian Electric Power Co Ltd
Quanzhou Power Supply Co of State Grid Fujian Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Fujian Electric Power Co Ltd
Quanzhou Power Supply Co of State Grid Fujian Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Fujian Electric Power Co Ltd, Quanzhou Power Supply Co of State Grid Fujian Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201910235907.9A priority Critical patent/CN110059236B/en
Publication of CN110059236A publication Critical patent/CN110059236A/en
Application granted granted Critical
Publication of CN110059236B publication Critical patent/CN110059236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention discloses a kind of application network crawler technologies to carry out the data sampling and processing method that power planning receives money, and the present invention improves the efficiency that power planning receives money work in such a way that network data acquisition robot replaces artificially collecting data;Meanwhile data are screened by specific algorithm, are handled, improve the treatment effeciency of data.

Description

A kind of application network crawler technology carries out the data sampling and processing of power planning receipts money Method
Technical field
The present invention relates to a kind of data sampling and processing method, especially a kind of application network crawler technology carries out electric power rule Draw the data sampling and processing method for receiving money.
Background technique
It needs to carry out diagnostic analysis by status of the data collection to power grid before power planning.It is universal with information technology, The daily production of grid company, management, have all realized informationization, and power planning is not able to achieve always informationization, to find out its cause, All data needed for power planning is distributed in each information system, as master network equipment account exists in PMS, distribution net equipment account The creation datas such as distribution PMS system, apparatus of load rate market class data in electricity consumption acquisition system in IDP600, electricity, user etc., Inter-system data interconnection and interflow is difficult to realize.Therefore, power planning receipts capital's formula is still manually collected from each system at present Data recycle the office softwares such as Excel to process data, and whole process lacks effective all by being accomplished manually Information technology support, inefficiency.
Summary of the invention
It is an object of the invention to overcome the deficiencies in the prior art places, and provide one kind with network data acquisition robot Instead of the mode for the data of artificially collecting, the efficiency that power planning receives money work is improved;Meanwhile by specific algorithm to data into Row is screened, is handled, and the data for improving a kind of application network crawler technology progress power planning receipts money of the treatment effeciency of data are adopted Collection, processing method.
A kind of application network crawler technology carries out the data sampling and processing method of power planning receipts money, and (1) basis first is wanted The data type and its website sources of acquisition determine acquisition mode, including following several dataset acquisition modes: 35kV or more Grid equipment data source is acquired in PMS2.0 with the library Selenium;35kV and the above grid equipment, 10kV line load and negative The data sources such as load rate are acquired in IDP600 with the library BS4;10kV and become the data such as device data, load and load factor to leave office From electricity consumption acquisition system, acquired with the library PyAmf;
(2) after determines acquisition mode, for every kind of acquisition mode, different acquisition modules is worked out
1. after imports corresponding library, investigating the data structure and acquisition modes of the website to be acquired, it is known that its data structure and obtain After taking mode, so that it may the message that simulation browser sends corresponding message and parsing receives;
2. after finds out the data structure and acquisition modes of website, simulation logs in website, logs in and successfully record COOKIES afterwards Information, subsequent message, which is sent, needs this information;
3. simulation browser sends data acquisition request (while sending COOKIES information), receive after data to the data received It parsed, judged, if inconsistent or do not receive data with expected data, illustrate data structure and acquisition to website The investigation of mode is wrong, need to investigate again;
4. after receiving anticipatory data, simultaneously according to the COOKIES information in the data that receive to original COOKIES information into Row updates,
All data are had received 5. checking whether, if not provided, repeating 3., 4. that step is until receiving all numbers to be acquired According to;
(3) is respectively acquired the corresponding acquisition module of the data call to be acquired, and carries out clearly after collecting data to data It washes, cleaning refers to converting the data into required format, and data are checked after cleaning with the presence or absence of exception, if there is different Often, then data are indicated, manually the abnormal data of mark is checked, is corrected, revised data add data Library, it is such as without exception, then the data after cleaning are directly stored in database;
(4) processes data, creates linked database, and the data that different data sources obtains are associated;
(5) data application: diagnostic analysis is carried out using data as needed, is visualized.
In step (2), there are two the methods of investigation, first is that checking webpage by the debugging mode of Google's Chrome browser The message that source code and parsing send and receive, second is that being parsed by special packet parsing software (that the author is Charles) Message uses method two when method one can not parse based on method one again.
In summary, the present invention following advantage compared with prior art:
The application method makes full use of mature web crawlers technology, acquires from each system automatically each needed for power planning Class data, and automatic technology all data is made full use of to be processed, handled, compared with the prior art, have the advantage that
One, raising efficiency, reduce artificial investment, after method and technique used by the application, may be implemented data collection, Processing is full-automatic, is not necessarily to manpower intervention, and according to measuring and calculating, efficiency can promote 96% or more, originally by a group (6 ~ 8 people) time-consuming 5 ~ 6 working day sides achievable receipts money and data preparation work, use after web crawlers and automatic technology 1 working day with Inside it can be completed.
Realize that power grid diagnoses normalization, the power grid diagnosis under the prior art, because of its many and diverse workload, one under ordinary circumstance Year only carries out once, and after using the present processes, because its efficiency is substantially improved, and without manpower intervention, electricity may be implemented Net diagnosis normalization, finds the new problem occurred in power network development, in time in order to take appropriate measures in time as needed.
Detailed description of the invention
Fig. 1 is the work flow diagram of the embodiment of the present invention.
Fig. 2 is the work flow diagram of acquisition module of the invention.
Specific embodiment
The present invention is described in more detail below with reference to embodiment.
Embodiment 1
Acquisition mode, including following several dataset acquisition sides are determined according to the data type and its website sources to be acquired first Formula: 35kV and the above grid equipment data source are acquired in PMS2.0 with the library Selenium;35kV and the above grid equipment, The data sources such as 10kV line load and load factor are acquired in IDP600 with the library BS4;10kV and become device data, negative to leave office The data sources such as lotus and load factor are acquired in electricity consumption acquisition system with the library PyAmf.
After determining acquisition mode, for every kind of acquisition mode, different acquisition modules is worked out
1. after imports corresponding library, investigating the data structure and acquisition modes of the website to be acquired, it is known that its data structure and obtain After taking mode, so that it may the message that simulation browser sends corresponding message and parsing receives.There are two the methods of investigation, and one It is that webpage source code and the message that parsing sends and receives are checked by the debugging mode of Google's Chrome browser, second is that by special Packet parsing software (that the author is Charles) analytic message of door, based on method one, again when method one can not parse Using method two.
After 2. finds out the data structure and acquisition modes of website, so that it may which simulation logs in website, logs in successfully postscript COOKIES information is recorded, subsequent message, which is sent, needs this information.
3. simulation browser sends data acquisition request (while sending COOKIES information), receive after data to receiving Data are parsed, are judged, if inconsistent or do not receive data with expected data, illustrate to the data structure of website and The investigation of acquisition modes is wrong, need to investigate again.
4. after receiving anticipatory data, to be believed simultaneously according to the COOKIES information in the data received original COOKIES Breath is updated,
All data are had received 5. checking whether, if not provided, repeating 3., 4. that step is until receiving all numbers to be acquired According to.
The corresponding acquisition module of the data call to be acquired is acquired respectively, data are carried out clearly after collecting data (cleaning refers to converting the data into required format) is washed, data are checked after cleaning with the presence or absence of exception, if there is different Often, then data are indicated, manually the abnormal data of mark is checked, is corrected, revised data add data Library, it is such as without exception, then the data after cleaning are directly stored in database.
Data are processed, linked database is created, the data that different data sources obtains are associated.
Data application: diagnostic analysis, visual presentation etc. are carried out using data as needed.
The not described part of the present embodiment is same as the prior art.

Claims (2)

1. a kind of application network crawler technology carries out the data sampling and processing method that power planning receives money, which is characterized in that step It is as follows: (1) acquisition mode, including following several data centralized procurements to be determined according to the data type and its website sources to be acquired first Mode set: 35kV and the above grid equipment data source are acquired in PMS2.0 with the library Selenium;35kV and the above power grid are set The data sources such as standby, 10kV line load and load factor are acquired in IDP600 with the library BS4;10kV and with leave office become device data, The data sources such as load and load factor are acquired in electricity consumption acquisition system with the library PyAmf;
(2) after determines acquisition mode, for every kind of acquisition mode, different acquisition modules is worked out
1. after imports corresponding library, investigating the data structure and acquisition modes of the website to be acquired, it is known that its data structure and obtain After taking mode, so that it may the message that simulation browser sends corresponding message and parsing receives;
2. after finds out the data structure and acquisition modes of website, simulation logs in website, logs in and successfully record COOKIES afterwards Information, subsequent message, which is sent, needs this information;
3. simulation browser sends data acquisition request (while sending COOKIES information), receive after data to the data received It parsed, judged, if inconsistent or do not receive data with expected data, illustrate data structure and acquisition to website The investigation of mode is wrong, need to investigate again;
4. after receiving anticipatory data, simultaneously according to the COOKIES information in the data that receive to original COOKIES information into Row updates,
All data are had received 5. checking whether, if not provided, repeating 3., 4. that step is until receiving all numbers to be acquired According to;
(3) is respectively acquired the corresponding acquisition module of the data call to be acquired, and carries out clearly after collecting data to data It washes, cleaning refers to converting the data into required format, and data are checked after cleaning with the presence or absence of exception, if there is different Often, then data are indicated, manually the abnormal data of mark is checked, is corrected, revised data add data Library, it is such as without exception, then the data after cleaning are directly stored in database;
(4) processes data, creates linked database, and the data that different data sources obtains are associated;
(5) data application: diagnostic analysis is carried out using data as needed, is visualized.
2. application network crawler technology according to claim 1 carries out the data sampling and processing method that power planning receives money, It is characterized in that, there are two the methods of investigation in step (2), first is that being checked by the debugging mode of Google's Chrome browser The message that webpage source code and parsing send and receive, second is that passing through special packet parsing software (that the author is Charles) Analytic message uses method two when method one can not parse based on method one again.
CN201910235907.9A 2019-03-27 2019-03-27 Data acquisition and processing method for power planning and collecting by using web crawler technology Active CN110059236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910235907.9A CN110059236B (en) 2019-03-27 2019-03-27 Data acquisition and processing method for power planning and collecting by using web crawler technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910235907.9A CN110059236B (en) 2019-03-27 2019-03-27 Data acquisition and processing method for power planning and collecting by using web crawler technology

Publications (2)

Publication Number Publication Date
CN110059236A true CN110059236A (en) 2019-07-26
CN110059236B CN110059236B (en) 2023-05-05

Family

ID=67315934

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910235907.9A Active CN110059236B (en) 2019-03-27 2019-03-27 Data acquisition and processing method for power planning and collecting by using web crawler technology

Country Status (1)

Country Link
CN (1) CN110059236B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112697946A (en) * 2021-03-23 2021-04-23 广东电网有限责任公司佛山供电局 Main transformer on-line oil chromatography monitoring method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009042113A2 (en) * 2007-09-21 2009-04-02 Unlimited Cad Services, Llc Document acquisition & authentication system
US20110191664A1 (en) * 2010-02-04 2011-08-04 At&T Intellectual Property I, L.P. Systems for and methods for detecting url web tracking and consumer opt-out cookies
CN102289447A (en) * 2011-06-16 2011-12-21 北京亿赞普网络技术有限公司 Website webpage evaluation system based on communication network message
CN104539053A (en) * 2014-12-31 2015-04-22 国家电网公司 Power dispatching automation polling robot and method based on reptile technology
CN104881424A (en) * 2015-03-13 2015-09-02 国家电网公司 Regular expression-based acquisition, storage and analysis method of power big data
CN105139281A (en) * 2015-08-20 2015-12-09 北京中电普华信息技术有限公司 Method and system for processing big data of electric power marketing
WO2017051420A1 (en) * 2015-09-21 2017-03-30 Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. Advanced computer implementation for crawling and/or detecting related electronically catalogued data using improved metadata processing
CN107679076A (en) * 2017-08-28 2018-02-09 国网上海市电力公司 A kind of acquisition analysis system of electric power data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009042113A2 (en) * 2007-09-21 2009-04-02 Unlimited Cad Services, Llc Document acquisition & authentication system
US20110191664A1 (en) * 2010-02-04 2011-08-04 At&T Intellectual Property I, L.P. Systems for and methods for detecting url web tracking and consumer opt-out cookies
CN102289447A (en) * 2011-06-16 2011-12-21 北京亿赞普网络技术有限公司 Website webpage evaluation system based on communication network message
CN104539053A (en) * 2014-12-31 2015-04-22 国家电网公司 Power dispatching automation polling robot and method based on reptile technology
CN104881424A (en) * 2015-03-13 2015-09-02 国家电网公司 Regular expression-based acquisition, storage and analysis method of power big data
CN105139281A (en) * 2015-08-20 2015-12-09 北京中电普华信息技术有限公司 Method and system for processing big data of electric power marketing
WO2017051420A1 (en) * 2015-09-21 2017-03-30 Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. Advanced computer implementation for crawling and/or detecting related electronically catalogued data using improved metadata processing
CN107679076A (en) * 2017-08-28 2018-02-09 国网上海市电力公司 A kind of acquisition analysis system of electric power data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GAO, KAI等: "Applied Methods and Techniques for Modeling and Control on Micro-Blog Data Crawler" *
李鑫欣等: "基于Python的豆瓣读书网站用户信息采集" *
汤艳君等: "暗网案件的爬虫取证技术研究" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112697946A (en) * 2021-03-23 2021-04-23 广东电网有限责任公司佛山供电局 Main transformer on-line oil chromatography monitoring method and device

Also Published As

Publication number Publication date
CN110059236B (en) 2023-05-05

Similar Documents

Publication Publication Date Title
CN110674189B (en) Method for monitoring secondary state and positioning fault of intelligent substation
CN108971807B (en) Intelligent management control method and management system for field welding construction process
CN101262682A (en) A configuration management for monitoring batch management of base station devices
CN102692558A (en) Monitoring and analyzing system for electricity data and realization method thereof
CN111444169A (en) Transformer substation electrical equipment state monitoring and diagnosis system and method
CN107479540A (en) Method for diagnosing faults and system
CN106789251A (en) Net silver running state monitoring system and method
CN110688389A (en) Transformer substation secondary equipment defect cloud management system
CN112348521A (en) Intelligent risk quality inspection method and system based on business audit and electronic equipment
CN107643956A (en) The method and apparatus for positioning the abnormal origin of abnormal data
CN111400505A (en) Method and system for matching fault elimination scheme of power consumption information acquisition system
CN111800299A (en) Operation maintenance system and method of edge cloud
CN105574678A (en) Employee performance assessment data automation system based on executive force indexes
CN110059236A (en) A kind of application network crawler technology carries out the data sampling and processing method of power planning receipts money
CN113778064A (en) Intelligent device remote detection and diagnosis system
CN111611665B (en) Intelligent substation design method based on three-dimensional modular design
CN111768113A (en) Public cloud-based hydraulic engineering management system and method
CN111652500A (en) Automatic duty-on auxiliary system and equipment based on scheduling and report generation method thereof
CN107979174B (en) Workflow operation method based on power grid operation management system
CN110334001A (en) A kind of method and apparatus that batch automatically generates echo test
CN205883276U (en) With on --spot failure diagnosis system of electric information collection fortune dimension
CN109592525A (en) Elevator frequency converter fault diagnosis system and method
CN211928020U (en) Electric energy acquisition data analysis early warning device
CN110298585B (en) Hierarchical automatic auditing method for monitoring information of substation equipment
CN112686583A (en) Method and system for generating automatic handling flow of civil aviation event

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant