CN108132969A - Quality of data big data administers implementation method, electronic equipment and storage medium - Google Patents

Quality of data big data administers implementation method, electronic equipment and storage medium Download PDF

Info

Publication number
CN108132969A
CN108132969A CN201711252654.3A CN201711252654A CN108132969A CN 108132969 A CN108132969 A CN 108132969A CN 201711252654 A CN201711252654 A CN 201711252654A CN 108132969 A CN108132969 A CN 108132969A
Authority
CN
China
Prior art keywords
data
page data
preset rules
analysis
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711252654.3A
Other languages
Chinese (zh)
Inventor
王永才
庞伟林
余永忠
陈轶斌
宋才华
林浩
范婷
徐培瑶
刘胜强
蓝源娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan Power Supply Bureau of Guangdong Power Grid Corp
Original Assignee
Foshan Power Supply Bureau of Guangdong Power Grid Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan Power Supply Bureau of Guangdong Power Grid Corp filed Critical Foshan Power Supply Bureau of Guangdong Power Grid Corp
Priority to CN201711252654.3A priority Critical patent/CN108132969A/en
Publication of CN108132969A publication Critical patent/CN108132969A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/72Code refactoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses quality of data big datas to administer implementation method, includes the following steps:Page data is extracted according to the first preset rules, page data is classified with the second preset rules, to form the corresponding metadata of corresponding classification;Page data is based on program analysis technique and carries out code refactoring, to automatically generate service interface;The data origin information of page data, the corresponding rule mark of the first preset rules and page data is combined to form data file, which is sent to server, so that server is parsed and preserved to data file.The present invention supports data to realize api interface, independent of genuine quotient can data reconstruction, can be closed in database, document source code missing, development teams are vacant, in the case of third party's COTS components, the technology for carrying out quality of data governance model is realized.

Description

Quality of data big data administers implementation method, electronic equipment and storage medium
Technical field
Big data, which is administered, the present invention relates to a kind of isomeric data Treatment process more particularly to data administers implementation method, electricity Sub- equipment and storage medium.
Background technology
At present, field is optimized for the quality of data, industry mainstream uses centralized data quality management system.Tradition is concentrated Formula data quality management system realizes the management of verification rule specification, rule performs time scheduling, quality of data report is unified The abilities such as management improve the efficiency and management level of quality of data verification.This centralization data quality management system limitation Property be using traditional database centralised storage, be susceptible to performance bottleneck when handling mass data.
In data governance process, data acquisition is an essential ring, as data volume increasingly increases, data acquisition Challenge also become especially prominent, including:Data source is varied, and data volume is big, and variation is fast, how to ensure that data acquire Reliability and performance, how to avoid repeated data, how to ensure the challenges such as the quality of data.At present in quality of data system The data acquisition modes database export of mainstream, common data extraction tool have ETL (Extract-Transform- Load), essence is that the principle in library is led using data;ETL tools are divided into two kinds, and a kind of is the ETL that database manufacturer carries Tool, such as Oracle warehouse builder, Oracle Data Integrator.Also there is third party's tool provider, Such as Kettle;Also there are many ETL tools, Various Functions in field of increasing income, and power differs.The mode of other data acquisition is for example: It is applied for Web, Stanford University and MIT scholar propose Webzeitgeist, are disposed in browser on agency Core renders Web page, then with crawler capturing page data, which delivers and international top-level meeting CHI;Packet capturing technology:By net The operations such as the data packet that sends and receives of network transmission is intercepted and captured, retransmitted, edited, unloading, the data object obtained be ICP/IP protocol layer, the data of capture are the session sequences between client and server, and not direct focused data object Semanteme.In addition, also WebService, the conventional data transmissions methods such as database middle database is direct-connected.
But there are following defects for existing technology:
Above common technology, as the shortcomings that batch data leading-in technique (using ETL as representative) is to lead library technology first The permission of offer database is needed, this point is all difficult to coordinate for data owner or source system development quotient, especially For perpendicular system, subordinate unit is even more the permission that can not obtain database.Next also needs to technical staff to source system data Situations such as flow in library, data dictionary, is very familiar, larger to project implementation cycle influences;WebService modes need industry Business system and data receiver, which develop producer's exploitation service interface, could carry out data interaction, and the engineering amount of taking is grown, built into This height.And above-mentioned common technology, it can not accomplish that B systems are write into the interaction and write-in of business, such as the data of A systems It is written in C system in system or by the data of A and B system.
Invention content
For overcome the deficiencies in the prior art, one of the objects of the present invention is to provide a kind of improvement of quality of data big data Implementation method can solve the problems, such as data interaction between heterogeneous system, realize the interaction of business.
One of the second object of the present invention is to provide a kind of electronic equipment, can achieve the object of the present invention.
The third object of the present invention is to provide a kind of computer readable storage medium, can realize the purpose of the present invention it One.
An object of the present invention adopts the following technical scheme that realization:
Quality of data big data administers implementation method, includes the following steps:
Data acquisition step:According to the first preset rules extract page data, by page data with the second preset rules into Row classification, to form the corresponding metadata of corresponding classification;
Code refactoring step:Page data is based on program analysis technique and carries out code refactoring, is connect with automatically generating service Mouthful, described program analytical technology is included in source code analysis, bytecode analysis, the analysis of interface sectional drawing snapshot and TCP flow analysis It is one or more;
Data transmission step:The data of page data, the corresponding rule mark of the first preset rules and page data are come Source information is combined to form data file, which is sent to server, so that server carries out data file Parsing and preservation.
Further, the type of the page data includes interface sectional drawing snapshot, interface keyword, interface data.
Further, first preset rules are:Detect user input keyword or during user's trigger button immediately Extract page data.
Further, the type of the button includes Edit button, volume button, deletes button.
The second object of the present invention adopts the following technical scheme that realization:
A kind of electronic equipment can be run on a memory and on a processor including memory, processor and storage Computer program, the processor realize following steps when performing the computer program:
Data acquisition step:According to the first preset rules extract page data, by page data with the second preset rules into Row classification, to form the corresponding metadata of corresponding classification;
Code refactoring step:Page data is based on program analysis technique and carries out code refactoring, is connect with automatically generating service Mouthful, described program analytical technology is included in source code analysis, bytecode analysis, the analysis of interface sectional drawing snapshot and TCP flow analysis It is one or more;
Data transmission step:The data of page data, the corresponding rule mark of the first preset rules and page data are come Source information is combined to form data file, which is sent to server, so that server carries out data file Parsing and preservation.
Further, the type of the page data includes interface sectional drawing snapshot, interface keyword, interface data.
Further, first preset rules are:Detect user input keyword or during user's trigger button immediately Extract page data.
The third object of the present invention adopts the following technical scheme that realization:
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor The method described in above-mentioned any one is realized during row.
Compared with prior art, the beneficial effects of the present invention are:
The present invention by data support realize api interface, independent of genuine quotient can data reconstruction, can be in database In the case of closing, document source code missing, development teams omission, third party's COTS components, Web application real-time and precise data are carried out The technology of quality of data governance model under the lower big data pattern of acquisition is realized.
Description of the drawings
Fig. 1 is that the quality of data big data of the present invention administers the flow chart of implementation method.
Specific embodiment
In the following, with reference to attached drawing and specific embodiment, the present invention is described further, it should be noted that not Under the premise of conflicting, new implementation can be formed between various embodiments described below or between each technical characteristic in any combination Example.
As shown in Figure 1, the present invention, which provides a kind of quality of data big data, administers implementation method, include the following steps:
S1:Page data is extracted according to the first preset rules, page data is classified with the second preset rules, with structure Into the corresponding corresponding metadata of classification;
Before application, first with reference to quality of data management experience, the examination rule of refining data quality combs data and grabs Range is taken, formulates suitable grasping condition, that is, the first preset rules.In the present invention, the first preset rules are:Detect use Page data is extracted immediately when inputting keyword or user's trigger button in family.The application page such as is submitted an expense account in financial system, monitoring is used It clicks specific " editor ", " submission " or " deletion " and data etc. is captured when buttons in family.Then data management staff can will be in the page Data can be got and carry out taxonomic revision, form the metadata of corresponding service system, and be cured in database.
S2:Page data is based on program analysis technique and carries out code refactoring, to automatically generate service interface, described program Analytical technology includes one or more in source code analysis, bytecode analysis, the analysis of interface sectional drawing snapshot and TCP flow analysis;
System uses from the presentation layer of information system and extracts data, by the machine learning to system interface and data flow, System data is asked and shown and is converted into open, succinct data service API, this method can high efficiency extraction system data, And access is externally provided in the form of data-interface, support most of development languages, middleware, database isomery B S frameworks The data acquisition of application system and data interface encapsulation.Application system architecture reconfiguration technique when being run based on reflection, is passed through Program analysis technique, user action Capture-replay technology, analysis application system internal data access interrelated logic, do not change Become former application system external behavior, reconfiguration system GUI code automatically generates service interface.
S3:The data origin information of page data, the corresponding rule mark of the first preset rules and page data is carried out Combination forms data file, which is sent to server, so that server is parsed and preserved to data file.
When user makes the operation in the range of data grabber in the specified monitoring page, " submission " button, system are such as clicked Automatic trigger data grabber operates, according to the rule set in advance, to input data being specified to be recorded and converted in the page.Often One page generates the data file of a structuring, and system adds the mark and data origin information of respective rule hereof Afterwards, files through network agreement is transferred in data collection server automatically.Data collection server receives data file Afterwards, file is subjected to Construction analysis, be saved in database, remained data quality management system and extract and verify.
It is stored using distributed management, after quality of data big data improvement platform defines the metadata of each operation system, Related data verification rule is made according to business demand, operation system real time data is got on isomeric data receiving platform Afterwards, related data is verified and is verified, relevant issues are timely feedbacked, and achieved, for carrying out in the future Statistical analysis and report generation.Since data improvement is related to multiple operation systems, and operation system routine work is various, data volume It is hundreds of millions of, such as problem data is stored and inquired using tradition centralized quality of data system, performance will necessarily be caused Bottleneck, therefore propose the solution based on Hadoop distributed treatment frames.It, can be defect number using Hadoop clusters It is detached according to from Oracle, dispersion is stored in cluster on multiple servers, effectively improves magnetic disc i/o performance and data are analytical Energy.
Still further aspect, the present invention also provides a kind of electronic equipment, including memory, processor and are stored in storage On device and the computer program that can run on a processor, processor realize following steps when performing the computer program:
Data acquisition step:According to the first preset rules extract page data, by page data with the second preset rules into Row classification, to form the corresponding metadata of corresponding classification;
Code refactoring step:Page data is based on program analysis technique and carries out code refactoring, is connect with automatically generating service Mouthful, described program analytical technology is included in source code analysis, bytecode analysis, the analysis of interface sectional drawing snapshot and TCP flow analysis It is one or more;
Data transmission step:The data of page data, the corresponding rule mark of the first preset rules and page data are come Source information is combined to form data file, which is sent to server, so that server carries out data file Parsing and preservation.
Wherein, the type of page data includes interface sectional drawing snapshot, interface keyword, interface data.First preset rules For:It detects and extracts page data immediately when user inputs keyword or user's trigger button.The type of button includes editor and presses Button, deletes button at volume button.
A kind of computer readable storage medium is also provided, is stored thereon with computer program, which is handled Realize that the big qualitative data of data administers implementation method when device performs.
The above embodiment is only the preferred embodiment of the present invention, it is impossible to the scope of protection of the invention is limited with this, The variation and replacement for any unsubstantiality that those skilled in the art is done on the basis of the present invention belong to institute of the present invention Claimed range.

Claims (9)

1. quality of data big data administers implementation method, which is characterized in that includes the following steps:
Data acquisition step:Page data is extracted according to the first preset rules, page data is divided with the second preset rules Class, to form the corresponding metadata of corresponding classification;
Code refactoring step:Page data is based on program analysis technique and carries out code refactoring, to automatically generate service interface, institute State program analysis technique include source code analysis, bytecode analysis, interface sectional drawing snapshot analysis and TCP flow analysis in one kind or It is a variety of;
Data transmission step:The data source of page data, the corresponding rule mark of the first preset rules and page data is believed Breath is combined to form data file, which is sent to server, so that server parses data file And preservation.
2. quality of data big data as described in claim 1 administers implementation method, which is characterized in that the class of the page data Type includes interface sectional drawing snapshot, interface keyword, interface data.
3. quality of data big data as described in claim 1 administers implementation method, which is characterized in that first preset rules For:It detects and extracts page data immediately when user inputs keyword or user's trigger button.
4. quality of data big data quality implementation method as claimed in claim 3, which is characterized in that the type packet of the button It includes Edit button, volume button, delete button.
5. a kind of electronic equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the processor realizes following steps when performing the computer program:
Data acquisition step:Page data is extracted according to the first preset rules, page data is divided with the second preset rules Class, to form the corresponding metadata of corresponding classification;
Code refactoring step:Page data is based on program analysis technique and carries out code refactoring, to automatically generate service interface, institute State program analysis technique include source code analysis, bytecode analysis, interface sectional drawing snapshot analysis and TCP flow analysis in one kind or It is a variety of;
Data transmission step:The data source of page data, the corresponding rule mark of the first preset rules and page data is believed Breath is combined to form data file, which is sent to server, so that server parses data file And preservation.
6. electronic equipment as claimed in claim 5, which is characterized in that it is fast that the type of the page data includes interface sectional drawing According to, interface keyword, interface data.
7. electronic equipment as claimed in claim 5, which is characterized in that first preset rules are:Detect that user inputs Page data is extracted immediately when keyword or user's trigger button.
8. electronic equipment as claimed in claim 7, which is characterized in that the type of the button includes Edit button, volume is pressed Button deletes button.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt The method as described in claim 1-4 any one is realized when processor performs.
CN201711252654.3A 2017-12-01 2017-12-01 Quality of data big data administers implementation method, electronic equipment and storage medium Pending CN108132969A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711252654.3A CN108132969A (en) 2017-12-01 2017-12-01 Quality of data big data administers implementation method, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711252654.3A CN108132969A (en) 2017-12-01 2017-12-01 Quality of data big data administers implementation method, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN108132969A true CN108132969A (en) 2018-06-08

Family

ID=62389995

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711252654.3A Pending CN108132969A (en) 2017-12-01 2017-12-01 Quality of data big data administers implementation method, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108132969A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271949A (en) * 2018-09-28 2019-01-25 中国科学院长春光学精密机械与物理研究所 Multispectral image data extraction method, device, equipment and readable storage medium storing program for executing
CN110188135A (en) * 2019-05-30 2019-08-30 中国联合网络通信集团有限公司 Document generating method and equipment
CN110263229A (en) * 2019-06-27 2019-09-20 北京中油瑞飞信息技术有限责任公司 A kind of data administering method and device based on data lake

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145163A (en) * 2007-10-30 2008-03-19 金蝶软件(中国)有限公司 Method and system for obtaining data from a plurality of data pool
CN104346681A (en) * 2013-08-08 2015-02-11 中国科学院计算机网络信息中心 Method for actively acquiring data from heterogeneous enterprise information systems
CN104882040A (en) * 2015-05-15 2015-09-02 陈爱秋 Intelligent system applied in Chinese language teaching
CN106547749A (en) * 2015-09-16 2017-03-29 北京国双科技有限公司 The method and apparatus of collecting webpage data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145163A (en) * 2007-10-30 2008-03-19 金蝶软件(中国)有限公司 Method and system for obtaining data from a plurality of data pool
CN104346681A (en) * 2013-08-08 2015-02-11 中国科学院计算机网络信息中心 Method for actively acquiring data from heterogeneous enterprise information systems
CN104882040A (en) * 2015-05-15 2015-09-02 陈爱秋 Intelligent system applied in Chinese language teaching
CN106547749A (en) * 2015-09-16 2017-03-29 北京国双科技有限公司 The method and apparatus of collecting webpage data

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271949A (en) * 2018-09-28 2019-01-25 中国科学院长春光学精密机械与物理研究所 Multispectral image data extraction method, device, equipment and readable storage medium storing program for executing
CN110188135A (en) * 2019-05-30 2019-08-30 中国联合网络通信集团有限公司 Document generating method and equipment
CN110188135B (en) * 2019-05-30 2021-07-27 中国联合网络通信集团有限公司 File generation method and equipment
CN110263229A (en) * 2019-06-27 2019-09-20 北京中油瑞飞信息技术有限责任公司 A kind of data administering method and device based on data lake

Similar Documents

Publication Publication Date Title
CN105069142B (en) Data increment extraction conversion and dissemination system and method
CN108804630B (en) Industry application-oriented big data intelligent analysis service system
CN104331435B (en) A kind of efficient mass data abstracting method of low influence based on Hadoop big data platforms
CN106708815A (en) Data processing method, device and system
CN108132969A (en) Quality of data big data administers implementation method, electronic equipment and storage medium
US8572563B2 (en) User interfaces and software reuse in model based software systems
CN105786998A (en) Database middleware system and method for processing data through database middleware system
CN104572122A (en) Software application data generating device and method
CN107807872A (en) A kind of power transmission and transformation system method for monitoring operation states
US9123006B2 (en) Techniques for parallel business intelligence evaluation and management
CN112347071B (en) Power distribution network cloud platform data fusion method and power distribution network cloud platform
CN106777101A (en) Data processing engine
CN105577411B (en) Cloud service monitoring method and device based on service origin
CN105472412A (en) Big data processing method capable of distinguishing state of intelligent television
CN108737549A (en) A kind of log analysis method and device of big data quantity
CN105589791A (en) Method for application system log monitoring management in cloud computing environment
CN107656858A (en) A kind of method and system of automatic O&M monitoring oracle database
CN107103064A (en) Data statistical approach and device
CN107832187A (en) A kind of power transmission and transformation equipment state monitoring system
CN112181960A (en) Intelligent operation and maintenance framework system based on AIOps
CN108628600A (en) Software dynamic behavior modeling method and apparatus based on control flow analysis
US8819619B2 (en) Method and system for capturing user interface structure in a model based software system
CN106559498A (en) Air control data collection platform and its collection method
WO2019000895A1 (en) Method and system for managing anomaly log of flash player
CN108073582A (en) A kind of Computational frame selection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180608

RJ01 Rejection of invention patent application after publication