CN108132969A - Quality of data big data administers implementation method, electronic equipment and storage medium - Google Patents
Quality of data big data administers implementation method, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN108132969A CN108132969A CN201711252654.3A CN201711252654A CN108132969A CN 108132969 A CN108132969 A CN 108132969A CN 201711252654 A CN201711252654 A CN 201711252654A CN 108132969 A CN108132969 A CN 108132969A
- Authority
- CN
- China
- Prior art keywords
- data
- page data
- preset rules
- analysis
- page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/72—Code refactoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses quality of data big datas to administer implementation method, includes the following steps:Page data is extracted according to the first preset rules, page data is classified with the second preset rules, to form the corresponding metadata of corresponding classification;Page data is based on program analysis technique and carries out code refactoring, to automatically generate service interface;The data origin information of page data, the corresponding rule mark of the first preset rules and page data is combined to form data file, which is sent to server, so that server is parsed and preserved to data file.The present invention supports data to realize api interface, independent of genuine quotient can data reconstruction, can be closed in database, document source code missing, development teams are vacant, in the case of third party's COTS components, the technology for carrying out quality of data governance model is realized.
Description
Technical field
Big data, which is administered, the present invention relates to a kind of isomeric data Treatment process more particularly to data administers implementation method, electricity
Sub- equipment and storage medium.
Background technology
At present, field is optimized for the quality of data, industry mainstream uses centralized data quality management system.Tradition is concentrated
Formula data quality management system realizes the management of verification rule specification, rule performs time scheduling, quality of data report is unified
The abilities such as management improve the efficiency and management level of quality of data verification.This centralization data quality management system limitation
Property be using traditional database centralised storage, be susceptible to performance bottleneck when handling mass data.
In data governance process, data acquisition is an essential ring, as data volume increasingly increases, data acquisition
Challenge also become especially prominent, including:Data source is varied, and data volume is big, and variation is fast, how to ensure that data acquire
Reliability and performance, how to avoid repeated data, how to ensure the challenges such as the quality of data.At present in quality of data system
The data acquisition modes database export of mainstream, common data extraction tool have ETL (Extract-Transform-
Load), essence is that the principle in library is led using data;ETL tools are divided into two kinds, and a kind of is the ETL that database manufacturer carries
Tool, such as Oracle warehouse builder, Oracle Data Integrator.Also there is third party's tool provider,
Such as Kettle;Also there are many ETL tools, Various Functions in field of increasing income, and power differs.The mode of other data acquisition is for example:
It is applied for Web, Stanford University and MIT scholar propose Webzeitgeist, are disposed in browser on agency
Core renders Web page, then with crawler capturing page data, which delivers and international top-level meeting CHI;Packet capturing technology:By net
The operations such as the data packet that sends and receives of network transmission is intercepted and captured, retransmitted, edited, unloading, the data object obtained be
ICP/IP protocol layer, the data of capture are the session sequences between client and server, and not direct focused data object
Semanteme.In addition, also WebService, the conventional data transmissions methods such as database middle database is direct-connected.
But there are following defects for existing technology:
Above common technology, as the shortcomings that batch data leading-in technique (using ETL as representative) is to lead library technology first
The permission of offer database is needed, this point is all difficult to coordinate for data owner or source system development quotient, especially
For perpendicular system, subordinate unit is even more the permission that can not obtain database.Next also needs to technical staff to source system data
Situations such as flow in library, data dictionary, is very familiar, larger to project implementation cycle influences;WebService modes need industry
Business system and data receiver, which develop producer's exploitation service interface, could carry out data interaction, and the engineering amount of taking is grown, built into
This height.And above-mentioned common technology, it can not accomplish that B systems are write into the interaction and write-in of business, such as the data of A systems
It is written in C system in system or by the data of A and B system.
Invention content
For overcome the deficiencies in the prior art, one of the objects of the present invention is to provide a kind of improvement of quality of data big data
Implementation method can solve the problems, such as data interaction between heterogeneous system, realize the interaction of business.
One of the second object of the present invention is to provide a kind of electronic equipment, can achieve the object of the present invention.
The third object of the present invention is to provide a kind of computer readable storage medium, can realize the purpose of the present invention it
One.
An object of the present invention adopts the following technical scheme that realization:
Quality of data big data administers implementation method, includes the following steps:
Data acquisition step:According to the first preset rules extract page data, by page data with the second preset rules into
Row classification, to form the corresponding metadata of corresponding classification;
Code refactoring step:Page data is based on program analysis technique and carries out code refactoring, is connect with automatically generating service
Mouthful, described program analytical technology is included in source code analysis, bytecode analysis, the analysis of interface sectional drawing snapshot and TCP flow analysis
It is one or more;
Data transmission step:The data of page data, the corresponding rule mark of the first preset rules and page data are come
Source information is combined to form data file, which is sent to server, so that server carries out data file
Parsing and preservation.
Further, the type of the page data includes interface sectional drawing snapshot, interface keyword, interface data.
Further, first preset rules are:Detect user input keyword or during user's trigger button immediately
Extract page data.
Further, the type of the button includes Edit button, volume button, deletes button.
The second object of the present invention adopts the following technical scheme that realization:
A kind of electronic equipment can be run on a memory and on a processor including memory, processor and storage
Computer program, the processor realize following steps when performing the computer program:
Data acquisition step:According to the first preset rules extract page data, by page data with the second preset rules into
Row classification, to form the corresponding metadata of corresponding classification;
Code refactoring step:Page data is based on program analysis technique and carries out code refactoring, is connect with automatically generating service
Mouthful, described program analytical technology is included in source code analysis, bytecode analysis, the analysis of interface sectional drawing snapshot and TCP flow analysis
It is one or more;
Data transmission step:The data of page data, the corresponding rule mark of the first preset rules and page data are come
Source information is combined to form data file, which is sent to server, so that server carries out data file
Parsing and preservation.
Further, the type of the page data includes interface sectional drawing snapshot, interface keyword, interface data.
Further, first preset rules are:Detect user input keyword or during user's trigger button immediately
Extract page data.
The third object of the present invention adopts the following technical scheme that realization:
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor
The method described in above-mentioned any one is realized during row.
Compared with prior art, the beneficial effects of the present invention are:
The present invention by data support realize api interface, independent of genuine quotient can data reconstruction, can be in database
In the case of closing, document source code missing, development teams omission, third party's COTS components, Web application real-time and precise data are carried out
The technology of quality of data governance model under the lower big data pattern of acquisition is realized.
Description of the drawings
Fig. 1 is that the quality of data big data of the present invention administers the flow chart of implementation method.
Specific embodiment
In the following, with reference to attached drawing and specific embodiment, the present invention is described further, it should be noted that not
Under the premise of conflicting, new implementation can be formed between various embodiments described below or between each technical characteristic in any combination
Example.
As shown in Figure 1, the present invention, which provides a kind of quality of data big data, administers implementation method, include the following steps:
S1:Page data is extracted according to the first preset rules, page data is classified with the second preset rules, with structure
Into the corresponding corresponding metadata of classification;
Before application, first with reference to quality of data management experience, the examination rule of refining data quality combs data and grabs
Range is taken, formulates suitable grasping condition, that is, the first preset rules.In the present invention, the first preset rules are:Detect use
Page data is extracted immediately when inputting keyword or user's trigger button in family.The application page such as is submitted an expense account in financial system, monitoring is used
It clicks specific " editor ", " submission " or " deletion " and data etc. is captured when buttons in family.Then data management staff can will be in the page
Data can be got and carry out taxonomic revision, form the metadata of corresponding service system, and be cured in database.
S2:Page data is based on program analysis technique and carries out code refactoring, to automatically generate service interface, described program
Analytical technology includes one or more in source code analysis, bytecode analysis, the analysis of interface sectional drawing snapshot and TCP flow analysis;
System uses from the presentation layer of information system and extracts data, by the machine learning to system interface and data flow,
System data is asked and shown and is converted into open, succinct data service API, this method can high efficiency extraction system data,
And access is externally provided in the form of data-interface, support most of development languages, middleware, database isomery B S frameworks
The data acquisition of application system and data interface encapsulation.Application system architecture reconfiguration technique when being run based on reflection, is passed through
Program analysis technique, user action Capture-replay technology, analysis application system internal data access interrelated logic, do not change
Become former application system external behavior, reconfiguration system GUI code automatically generates service interface.
S3:The data origin information of page data, the corresponding rule mark of the first preset rules and page data is carried out
Combination forms data file, which is sent to server, so that server is parsed and preserved to data file.
When user makes the operation in the range of data grabber in the specified monitoring page, " submission " button, system are such as clicked
Automatic trigger data grabber operates, according to the rule set in advance, to input data being specified to be recorded and converted in the page.Often
One page generates the data file of a structuring, and system adds the mark and data origin information of respective rule hereof
Afterwards, files through network agreement is transferred in data collection server automatically.Data collection server receives data file
Afterwards, file is subjected to Construction analysis, be saved in database, remained data quality management system and extract and verify.
It is stored using distributed management, after quality of data big data improvement platform defines the metadata of each operation system,
Related data verification rule is made according to business demand, operation system real time data is got on isomeric data receiving platform
Afterwards, related data is verified and is verified, relevant issues are timely feedbacked, and achieved, for carrying out in the future
Statistical analysis and report generation.Since data improvement is related to multiple operation systems, and operation system routine work is various, data volume
It is hundreds of millions of, such as problem data is stored and inquired using tradition centralized quality of data system, performance will necessarily be caused
Bottleneck, therefore propose the solution based on Hadoop distributed treatment frames.It, can be defect number using Hadoop clusters
It is detached according to from Oracle, dispersion is stored in cluster on multiple servers, effectively improves magnetic disc i/o performance and data are analytical
Energy.
Still further aspect, the present invention also provides a kind of electronic equipment, including memory, processor and are stored in storage
On device and the computer program that can run on a processor, processor realize following steps when performing the computer program:
Data acquisition step:According to the first preset rules extract page data, by page data with the second preset rules into
Row classification, to form the corresponding metadata of corresponding classification;
Code refactoring step:Page data is based on program analysis technique and carries out code refactoring, is connect with automatically generating service
Mouthful, described program analytical technology is included in source code analysis, bytecode analysis, the analysis of interface sectional drawing snapshot and TCP flow analysis
It is one or more;
Data transmission step:The data of page data, the corresponding rule mark of the first preset rules and page data are come
Source information is combined to form data file, which is sent to server, so that server carries out data file
Parsing and preservation.
Wherein, the type of page data includes interface sectional drawing snapshot, interface keyword, interface data.First preset rules
For:It detects and extracts page data immediately when user inputs keyword or user's trigger button.The type of button includes editor and presses
Button, deletes button at volume button.
A kind of computer readable storage medium is also provided, is stored thereon with computer program, which is handled
Realize that the big qualitative data of data administers implementation method when device performs.
The above embodiment is only the preferred embodiment of the present invention, it is impossible to the scope of protection of the invention is limited with this,
The variation and replacement for any unsubstantiality that those skilled in the art is done on the basis of the present invention belong to institute of the present invention
Claimed range.
Claims (9)
1. quality of data big data administers implementation method, which is characterized in that includes the following steps:
Data acquisition step:Page data is extracted according to the first preset rules, page data is divided with the second preset rules
Class, to form the corresponding metadata of corresponding classification;
Code refactoring step:Page data is based on program analysis technique and carries out code refactoring, to automatically generate service interface, institute
State program analysis technique include source code analysis, bytecode analysis, interface sectional drawing snapshot analysis and TCP flow analysis in one kind or
It is a variety of;
Data transmission step:The data source of page data, the corresponding rule mark of the first preset rules and page data is believed
Breath is combined to form data file, which is sent to server, so that server parses data file
And preservation.
2. quality of data big data as described in claim 1 administers implementation method, which is characterized in that the class of the page data
Type includes interface sectional drawing snapshot, interface keyword, interface data.
3. quality of data big data as described in claim 1 administers implementation method, which is characterized in that first preset rules
For:It detects and extracts page data immediately when user inputs keyword or user's trigger button.
4. quality of data big data quality implementation method as claimed in claim 3, which is characterized in that the type packet of the button
It includes Edit button, volume button, delete button.
5. a kind of electronic equipment including memory, processor and stores the meter that can be run on a memory and on a processor
Calculation machine program, which is characterized in that the processor realizes following steps when performing the computer program:
Data acquisition step:Page data is extracted according to the first preset rules, page data is divided with the second preset rules
Class, to form the corresponding metadata of corresponding classification;
Code refactoring step:Page data is based on program analysis technique and carries out code refactoring, to automatically generate service interface, institute
State program analysis technique include source code analysis, bytecode analysis, interface sectional drawing snapshot analysis and TCP flow analysis in one kind or
It is a variety of;
Data transmission step:The data source of page data, the corresponding rule mark of the first preset rules and page data is believed
Breath is combined to form data file, which is sent to server, so that server parses data file
And preservation.
6. electronic equipment as claimed in claim 5, which is characterized in that it is fast that the type of the page data includes interface sectional drawing
According to, interface keyword, interface data.
7. electronic equipment as claimed in claim 5, which is characterized in that first preset rules are:Detect that user inputs
Page data is extracted immediately when keyword or user's trigger button.
8. electronic equipment as claimed in claim 7, which is characterized in that the type of the button includes Edit button, volume is pressed
Button deletes button.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt
The method as described in claim 1-4 any one is realized when processor performs.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711252654.3A CN108132969A (en) | 2017-12-01 | 2017-12-01 | Quality of data big data administers implementation method, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711252654.3A CN108132969A (en) | 2017-12-01 | 2017-12-01 | Quality of data big data administers implementation method, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108132969A true CN108132969A (en) | 2018-06-08 |
Family
ID=62389995
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711252654.3A Pending CN108132969A (en) | 2017-12-01 | 2017-12-01 | Quality of data big data administers implementation method, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108132969A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109271949A (en) * | 2018-09-28 | 2019-01-25 | 中国科学院长春光学精密机械与物理研究所 | Multispectral image data extraction method, device, equipment and readable storage medium storing program for executing |
CN110188135A (en) * | 2019-05-30 | 2019-08-30 | 中国联合网络通信集团有限公司 | Document generating method and equipment |
CN110263229A (en) * | 2019-06-27 | 2019-09-20 | 北京中油瑞飞信息技术有限责任公司 | A kind of data administering method and device based on data lake |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101145163A (en) * | 2007-10-30 | 2008-03-19 | 金蝶软件(中国)有限公司 | Method and system for obtaining data from a plurality of data pool |
CN104346681A (en) * | 2013-08-08 | 2015-02-11 | 中国科学院计算机网络信息中心 | Method for actively acquiring data from heterogeneous enterprise information systems |
CN104882040A (en) * | 2015-05-15 | 2015-09-02 | 陈爱秋 | Intelligent system applied in Chinese language teaching |
CN106547749A (en) * | 2015-09-16 | 2017-03-29 | 北京国双科技有限公司 | The method and apparatus of collecting webpage data |
-
2017
- 2017-12-01 CN CN201711252654.3A patent/CN108132969A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101145163A (en) * | 2007-10-30 | 2008-03-19 | 金蝶软件(中国)有限公司 | Method and system for obtaining data from a plurality of data pool |
CN104346681A (en) * | 2013-08-08 | 2015-02-11 | 中国科学院计算机网络信息中心 | Method for actively acquiring data from heterogeneous enterprise information systems |
CN104882040A (en) * | 2015-05-15 | 2015-09-02 | 陈爱秋 | Intelligent system applied in Chinese language teaching |
CN106547749A (en) * | 2015-09-16 | 2017-03-29 | 北京国双科技有限公司 | The method and apparatus of collecting webpage data |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109271949A (en) * | 2018-09-28 | 2019-01-25 | 中国科学院长春光学精密机械与物理研究所 | Multispectral image data extraction method, device, equipment and readable storage medium storing program for executing |
CN110188135A (en) * | 2019-05-30 | 2019-08-30 | 中国联合网络通信集团有限公司 | Document generating method and equipment |
CN110188135B (en) * | 2019-05-30 | 2021-07-27 | 中国联合网络通信集团有限公司 | File generation method and equipment |
CN110263229A (en) * | 2019-06-27 | 2019-09-20 | 北京中油瑞飞信息技术有限责任公司 | A kind of data administering method and device based on data lake |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105069142B (en) | Data increment extraction conversion and dissemination system and method | |
CN108804630B (en) | Industry application-oriented big data intelligent analysis service system | |
CN104331435B (en) | A kind of efficient mass data abstracting method of low influence based on Hadoop big data platforms | |
CN106708815A (en) | Data processing method, device and system | |
CN108132969A (en) | Quality of data big data administers implementation method, electronic equipment and storage medium | |
US8572563B2 (en) | User interfaces and software reuse in model based software systems | |
CN105786998A (en) | Database middleware system and method for processing data through database middleware system | |
CN104572122A (en) | Software application data generating device and method | |
CN107807872A (en) | A kind of power transmission and transformation system method for monitoring operation states | |
US9123006B2 (en) | Techniques for parallel business intelligence evaluation and management | |
CN112347071B (en) | Power distribution network cloud platform data fusion method and power distribution network cloud platform | |
CN106777101A (en) | Data processing engine | |
CN105577411B (en) | Cloud service monitoring method and device based on service origin | |
CN105472412A (en) | Big data processing method capable of distinguishing state of intelligent television | |
CN108737549A (en) | A kind of log analysis method and device of big data quantity | |
CN105589791A (en) | Method for application system log monitoring management in cloud computing environment | |
CN107656858A (en) | A kind of method and system of automatic O&M monitoring oracle database | |
CN107103064A (en) | Data statistical approach and device | |
CN107832187A (en) | A kind of power transmission and transformation equipment state monitoring system | |
CN112181960A (en) | Intelligent operation and maintenance framework system based on AIOps | |
CN108628600A (en) | Software dynamic behavior modeling method and apparatus based on control flow analysis | |
US8819619B2 (en) | Method and system for capturing user interface structure in a model based software system | |
CN106559498A (en) | Air control data collection platform and its collection method | |
WO2019000895A1 (en) | Method and system for managing anomaly log of flash player | |
CN108073582A (en) | A kind of Computational frame selection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180608 |
|
RJ01 | Rejection of invention patent application after publication |