CN110059236A - A kind of application network crawler technology carries out the data sampling and processing method of power planning receipts money - Google Patents
A kind of application network crawler technology carries out the data sampling and processing method of power planning receipts money Download PDFInfo
- Publication number
- CN110059236A CN110059236A CN201910235907.9A CN201910235907A CN110059236A CN 110059236 A CN110059236 A CN 110059236A CN 201910235907 A CN201910235907 A CN 201910235907A CN 110059236 A CN110059236 A CN 110059236A
- Authority
- CN
- China
- Prior art keywords
- data
- acquisition
- acquired
- mode
- website
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/06—Electricity, gas or water supply
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02E—REDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
- Y02E40/00—Technologies for an efficient electrical power generation, transmission or distribution
- Y02E40/70—Smart grids as climate change mitigation technology in the energy generation sector
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Abstract
The invention discloses a kind of application network crawler technologies to carry out the data sampling and processing method that power planning receives money, and the present invention improves the efficiency that power planning receives money work in such a way that network data acquisition robot replaces artificially collecting data;Meanwhile data are screened by specific algorithm, are handled, improve the treatment effeciency of data.
Description
Technical field
The present invention relates to a kind of data sampling and processing method, especially a kind of application network crawler technology carries out electric power rule
Draw the data sampling and processing method for receiving money.
Background technique
It needs to carry out diagnostic analysis by status of the data collection to power grid before power planning.It is universal with information technology,
The daily production of grid company, management, have all realized informationization, and power planning is not able to achieve always informationization, to find out its cause,
All data needed for power planning is distributed in each information system, as master network equipment account exists in PMS, distribution net equipment account
The creation datas such as distribution PMS system, apparatus of load rate market class data in electricity consumption acquisition system in IDP600, electricity, user etc.,
Inter-system data interconnection and interflow is difficult to realize.Therefore, power planning receipts capital's formula is still manually collected from each system at present
Data recycle the office softwares such as Excel to process data, and whole process lacks effective all by being accomplished manually
Information technology support, inefficiency.
Summary of the invention
It is an object of the invention to overcome the deficiencies in the prior art places, and provide one kind with network data acquisition robot
Instead of the mode for the data of artificially collecting, the efficiency that power planning receives money work is improved;Meanwhile by specific algorithm to data into
Row is screened, is handled, and the data for improving a kind of application network crawler technology progress power planning receipts money of the treatment effeciency of data are adopted
Collection, processing method.
A kind of application network crawler technology carries out the data sampling and processing method of power planning receipts money, and (1) basis first is wanted
The data type and its website sources of acquisition determine acquisition mode, including following several dataset acquisition modes: 35kV or more
Grid equipment data source is acquired in PMS2.0 with the library Selenium;35kV and the above grid equipment, 10kV line load and negative
The data sources such as load rate are acquired in IDP600 with the library BS4;10kV and become the data such as device data, load and load factor to leave office
From electricity consumption acquisition system, acquired with the library PyAmf;
(2) after determines acquisition mode, for every kind of acquisition mode, different acquisition modules is worked out
1. after imports corresponding library, investigating the data structure and acquisition modes of the website to be acquired, it is known that its data structure and obtain
After taking mode, so that it may the message that simulation browser sends corresponding message and parsing receives;
2. after finds out the data structure and acquisition modes of website, simulation logs in website, logs in and successfully record COOKIES afterwards
Information, subsequent message, which is sent, needs this information;
3. simulation browser sends data acquisition request (while sending COOKIES information), receive after data to the data received
It parsed, judged, if inconsistent or do not receive data with expected data, illustrate data structure and acquisition to website
The investigation of mode is wrong, need to investigate again;
4. after receiving anticipatory data, simultaneously according to the COOKIES information in the data that receive to original COOKIES information into
Row updates,
All data are had received 5. checking whether, if not provided, repeating 3., 4. that step is until receiving all numbers to be acquired
According to;
(3) is respectively acquired the corresponding acquisition module of the data call to be acquired, and carries out clearly after collecting data to data
It washes, cleaning refers to converting the data into required format, and data are checked after cleaning with the presence or absence of exception, if there is different
Often, then data are indicated, manually the abnormal data of mark is checked, is corrected, revised data add data
Library, it is such as without exception, then the data after cleaning are directly stored in database;
(4) processes data, creates linked database, and the data that different data sources obtains are associated;
(5) data application: diagnostic analysis is carried out using data as needed, is visualized.
In step (2), there are two the methods of investigation, first is that checking webpage by the debugging mode of Google's Chrome browser
The message that source code and parsing send and receive, second is that being parsed by special packet parsing software (that the author is Charles)
Message uses method two when method one can not parse based on method one again.
In summary, the present invention following advantage compared with prior art:
The application method makes full use of mature web crawlers technology, acquires from each system automatically each needed for power planning
Class data, and automatic technology all data is made full use of to be processed, handled, compared with the prior art, have the advantage that
One, raising efficiency, reduce artificial investment, after method and technique used by the application, may be implemented data collection,
Processing is full-automatic, is not necessarily to manpower intervention, and according to measuring and calculating, efficiency can promote 96% or more, originally by a group (6 ~ 8 people) time-consuming 5
~ 6 working day sides achievable receipts money and data preparation work, use after web crawlers and automatic technology 1 working day with
Inside it can be completed.
Realize that power grid diagnoses normalization, the power grid diagnosis under the prior art, because of its many and diverse workload, one under ordinary circumstance
Year only carries out once, and after using the present processes, because its efficiency is substantially improved, and without manpower intervention, electricity may be implemented
Net diagnosis normalization, finds the new problem occurred in power network development, in time in order to take appropriate measures in time as needed.
Detailed description of the invention
Fig. 1 is the work flow diagram of the embodiment of the present invention.
Fig. 2 is the work flow diagram of acquisition module of the invention.
Specific embodiment
The present invention is described in more detail below with reference to embodiment.
Embodiment 1
Acquisition mode, including following several dataset acquisition sides are determined according to the data type and its website sources to be acquired first
Formula: 35kV and the above grid equipment data source are acquired in PMS2.0 with the library Selenium;35kV and the above grid equipment,
The data sources such as 10kV line load and load factor are acquired in IDP600 with the library BS4;10kV and become device data, negative to leave office
The data sources such as lotus and load factor are acquired in electricity consumption acquisition system with the library PyAmf.
After determining acquisition mode, for every kind of acquisition mode, different acquisition modules is worked out
1. after imports corresponding library, investigating the data structure and acquisition modes of the website to be acquired, it is known that its data structure and obtain
After taking mode, so that it may the message that simulation browser sends corresponding message and parsing receives.There are two the methods of investigation, and one
It is that webpage source code and the message that parsing sends and receives are checked by the debugging mode of Google's Chrome browser, second is that by special
Packet parsing software (that the author is Charles) analytic message of door, based on method one, again when method one can not parse
Using method two.
After 2. finds out the data structure and acquisition modes of website, so that it may which simulation logs in website, logs in successfully postscript
COOKIES information is recorded, subsequent message, which is sent, needs this information.
3. simulation browser sends data acquisition request (while sending COOKIES information), receive after data to receiving
Data are parsed, are judged, if inconsistent or do not receive data with expected data, illustrate to the data structure of website and
The investigation of acquisition modes is wrong, need to investigate again.
4. after receiving anticipatory data, to be believed simultaneously according to the COOKIES information in the data received original COOKIES
Breath is updated,
All data are had received 5. checking whether, if not provided, repeating 3., 4. that step is until receiving all numbers to be acquired
According to.
The corresponding acquisition module of the data call to be acquired is acquired respectively, data are carried out clearly after collecting data
(cleaning refers to converting the data into required format) is washed, data are checked after cleaning with the presence or absence of exception, if there is different
Often, then data are indicated, manually the abnormal data of mark is checked, is corrected, revised data add data
Library, it is such as without exception, then the data after cleaning are directly stored in database.
Data are processed, linked database is created, the data that different data sources obtains are associated.
Data application: diagnostic analysis, visual presentation etc. are carried out using data as needed.
The not described part of the present embodiment is same as the prior art.
Claims (2)
1. a kind of application network crawler technology carries out the data sampling and processing method that power planning receives money, which is characterized in that step
It is as follows: (1) acquisition mode, including following several data centralized procurements to be determined according to the data type and its website sources to be acquired first
Mode set: 35kV and the above grid equipment data source are acquired in PMS2.0 with the library Selenium;35kV and the above power grid are set
The data sources such as standby, 10kV line load and load factor are acquired in IDP600 with the library BS4;10kV and with leave office become device data,
The data sources such as load and load factor are acquired in electricity consumption acquisition system with the library PyAmf;
(2) after determines acquisition mode, for every kind of acquisition mode, different acquisition modules is worked out
1. after imports corresponding library, investigating the data structure and acquisition modes of the website to be acquired, it is known that its data structure and obtain
After taking mode, so that it may the message that simulation browser sends corresponding message and parsing receives;
2. after finds out the data structure and acquisition modes of website, simulation logs in website, logs in and successfully record COOKIES afterwards
Information, subsequent message, which is sent, needs this information;
3. simulation browser sends data acquisition request (while sending COOKIES information), receive after data to the data received
It parsed, judged, if inconsistent or do not receive data with expected data, illustrate data structure and acquisition to website
The investigation of mode is wrong, need to investigate again;
4. after receiving anticipatory data, simultaneously according to the COOKIES information in the data that receive to original COOKIES information into
Row updates,
All data are had received 5. checking whether, if not provided, repeating 3., 4. that step is until receiving all numbers to be acquired
According to;
(3) is respectively acquired the corresponding acquisition module of the data call to be acquired, and carries out clearly after collecting data to data
It washes, cleaning refers to converting the data into required format, and data are checked after cleaning with the presence or absence of exception, if there is different
Often, then data are indicated, manually the abnormal data of mark is checked, is corrected, revised data add data
Library, it is such as without exception, then the data after cleaning are directly stored in database;
(4) processes data, creates linked database, and the data that different data sources obtains are associated;
(5) data application: diagnostic analysis is carried out using data as needed, is visualized.
2. application network crawler technology according to claim 1 carries out the data sampling and processing method that power planning receives money,
It is characterized in that, there are two the methods of investigation in step (2), first is that being checked by the debugging mode of Google's Chrome browser
The message that webpage source code and parsing send and receive, second is that passing through special packet parsing software (that the author is Charles)
Analytic message uses method two when method one can not parse based on method one again.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910235907.9A CN110059236B (en) | 2019-03-27 | 2019-03-27 | Data acquisition and processing method for power planning and collecting by using web crawler technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910235907.9A CN110059236B (en) | 2019-03-27 | 2019-03-27 | Data acquisition and processing method for power planning and collecting by using web crawler technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110059236A true CN110059236A (en) | 2019-07-26 |
CN110059236B CN110059236B (en) | 2023-05-05 |
Family
ID=67315934
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910235907.9A Active CN110059236B (en) | 2019-03-27 | 2019-03-27 | Data acquisition and processing method for power planning and collecting by using web crawler technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110059236B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112697946A (en) * | 2021-03-23 | 2021-04-23 | 广东电网有限责任公司佛山供电局 | Main transformer on-line oil chromatography monitoring method and device |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009042113A2 (en) * | 2007-09-21 | 2009-04-02 | Unlimited Cad Services, Llc | Document acquisition & authentication system |
US20110191664A1 (en) * | 2010-02-04 | 2011-08-04 | At&T Intellectual Property I, L.P. | Systems for and methods for detecting url web tracking and consumer opt-out cookies |
CN102289447A (en) * | 2011-06-16 | 2011-12-21 | 北京亿赞普网络技术有限公司 | Website webpage evaluation system based on communication network message |
CN104539053A (en) * | 2014-12-31 | 2015-04-22 | 国家电网公司 | Power dispatching automation polling robot and method based on reptile technology |
CN104881424A (en) * | 2015-03-13 | 2015-09-02 | 国家电网公司 | Regular expression-based acquisition, storage and analysis method of power big data |
CN105139281A (en) * | 2015-08-20 | 2015-12-09 | 北京中电普华信息技术有限公司 | Method and system for processing big data of electric power marketing |
WO2017051420A1 (en) * | 2015-09-21 | 2017-03-30 | Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. | Advanced computer implementation for crawling and/or detecting related electronically catalogued data using improved metadata processing |
CN107679076A (en) * | 2017-08-28 | 2018-02-09 | 国网上海市电力公司 | A kind of acquisition analysis system of electric power data |
-
2019
- 2019-03-27 CN CN201910235907.9A patent/CN110059236B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009042113A2 (en) * | 2007-09-21 | 2009-04-02 | Unlimited Cad Services, Llc | Document acquisition & authentication system |
US20110191664A1 (en) * | 2010-02-04 | 2011-08-04 | At&T Intellectual Property I, L.P. | Systems for and methods for detecting url web tracking and consumer opt-out cookies |
CN102289447A (en) * | 2011-06-16 | 2011-12-21 | 北京亿赞普网络技术有限公司 | Website webpage evaluation system based on communication network message |
CN104539053A (en) * | 2014-12-31 | 2015-04-22 | 国家电网公司 | Power dispatching automation polling robot and method based on reptile technology |
CN104881424A (en) * | 2015-03-13 | 2015-09-02 | 国家电网公司 | Regular expression-based acquisition, storage and analysis method of power big data |
CN105139281A (en) * | 2015-08-20 | 2015-12-09 | 北京中电普华信息技术有限公司 | Method and system for processing big data of electric power marketing |
WO2017051420A1 (en) * | 2015-09-21 | 2017-03-30 | Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. | Advanced computer implementation for crawling and/or detecting related electronically catalogued data using improved metadata processing |
CN107679076A (en) * | 2017-08-28 | 2018-02-09 | 国网上海市电力公司 | A kind of acquisition analysis system of electric power data |
Non-Patent Citations (3)
Title |
---|
GAO, KAI等: "Applied Methods and Techniques for Modeling and Control on Micro-Blog Data Crawler" * |
李鑫欣等: "基于Python的豆瓣读书网站用户信息采集" * |
汤艳君等: "暗网案件的爬虫取证技术研究" * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112697946A (en) * | 2021-03-23 | 2021-04-23 | 广东电网有限责任公司佛山供电局 | Main transformer on-line oil chromatography monitoring method and device |
Also Published As
Publication number | Publication date |
---|---|
CN110059236B (en) | 2023-05-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110674189B (en) | Method for monitoring secondary state and positioning fault of intelligent substation | |
CN108971807B (en) | Intelligent management control method and management system for field welding construction process | |
CN101262682A (en) | A configuration management for monitoring batch management of base station devices | |
CN102692558A (en) | Monitoring and analyzing system for electricity data and realization method thereof | |
CN111444169A (en) | Transformer substation electrical equipment state monitoring and diagnosis system and method | |
CN107479540A (en) | Method for diagnosing faults and system | |
CN106789251A (en) | Net silver running state monitoring system and method | |
CN110688389A (en) | Transformer substation secondary equipment defect cloud management system | |
CN112348521A (en) | Intelligent risk quality inspection method and system based on business audit and electronic equipment | |
CN107643956A (en) | The method and apparatus for positioning the abnormal origin of abnormal data | |
CN111400505A (en) | Method and system for matching fault elimination scheme of power consumption information acquisition system | |
CN111800299A (en) | Operation maintenance system and method of edge cloud | |
CN105574678A (en) | Employee performance assessment data automation system based on executive force indexes | |
CN110059236A (en) | A kind of application network crawler technology carries out the data sampling and processing method of power planning receipts money | |
CN113778064A (en) | Intelligent device remote detection and diagnosis system | |
CN111611665B (en) | Intelligent substation design method based on three-dimensional modular design | |
CN111768113A (en) | Public cloud-based hydraulic engineering management system and method | |
CN111652500A (en) | Automatic duty-on auxiliary system and equipment based on scheduling and report generation method thereof | |
CN107979174B (en) | Workflow operation method based on power grid operation management system | |
CN110334001A (en) | A kind of method and apparatus that batch automatically generates echo test | |
CN205883276U (en) | With on --spot failure diagnosis system of electric information collection fortune dimension | |
CN109592525A (en) | Elevator frequency converter fault diagnosis system and method | |
CN211928020U (en) | Electric energy acquisition data analysis early warning device | |
CN110298585B (en) | Hierarchical automatic auditing method for monitoring information of substation equipment | |
CN112686583A (en) | Method and system for generating automatic handling flow of civil aviation event |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |