CN103970880A - Distributed multi-point data extraction method - Google Patents

Distributed multi-point data extraction method Download PDF

Info

Publication number
CN103970880A
CN103970880A CN201410208607.9A CN201410208607A CN103970880A CN 103970880 A CN103970880 A CN 103970880A CN 201410208607 A CN201410208607 A CN 201410208607A CN 103970880 A CN103970880 A CN 103970880A
Authority
CN
China
Prior art keywords
data
guid
data source
enter step
source table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410208607.9A
Other languages
Chinese (zh)
Other versions
CN103970880B (en
Inventor
白崇明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201410208607.9A priority Critical patent/CN103970880B/en
Publication of CN103970880A publication Critical patent/CN103970880A/en
Application granted granted Critical
Publication of CN103970880B publication Critical patent/CN103970880B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a distributed multi-point data extraction method. The distributed multi-point data extraction method comprises the following steps that (101) a data source table is built for external data sources DB and field structures; (102) the data source table is built; (103) an internal data source table is built; (104) the data fields needing introduction are selected; (105) a data table positioning field GUID is added; (106) an internal data source table structure is generated; (107) a GUID alignment code generator is arranged; (108) an internal data source table with alignment codes is generated; (109) a program positioning data table is built; (110) an intelligent constraint condition generator is arranged; (111) a user types in screening conditions; (112) the screening conditions and colors are marked on cells; (113) table names, field names, recording conditions, time and client names are marked through the GUID; (114) GUID conditions are set; (115) the SELECT is generated; (116) target data are obtained; (117) clustering analysis and judgment is carried out; (118) a report form is analyzed. The user can obtain the needed data screening results of any number.

Description

Distributed Multi data pick-up method
Technical field
The present invention relates to technical field of data processing, be specifically related to Distributed Multi data pick-up method.
Background technology
Conventionally the major way of data analysis technique, is by data screening, obtains the data element satisfying condition.At present, realizing data analysis screening, in the data platforms such as SQL, Access, Oracle, is to realize data screening by the statement that programs, and its advantage is to pass through its statement fuction etc., and the write statement that programs is realized various the selection result.But can not be on its data platform directly operate by mouse or keyboard click commands interfaceization, realize data screening, can not direct construction go out screening conditions and data element are bound and recorded.In Excel software, screening conditions can be set and obtain the selection result, but user's screening conditions cannot preserve, more can not be by screening conditions and cell binding; , in the information of publishing, there is not the Distributed Multi Data Extraction Technology that the claims relate in other application of existing China and foreign countries or special softwares yet.
Summary of the invention
Object of the present invention, in order to address the above problem, provides Distributed Multi data pick-up method.
For achieving the above object, the invention provides Distributed Multi data pick-up method, comprise the following steps:
Step 101: first set up data source table for external data source DB and field structure, for in step 102 set up that data source table judges whether to enter step 103 set up internal data source table, if do not need to set up internal data source table, enter step 107:GUID alignment code maker; If need to set up internal data source table, enter so step 104: choose and need to introduce data field and step 105: add tables of data location field GUID, then enter step 106: generate internal data source list structure; Then enter step 107:GUID alignment code maker, processed generating internal data source list structure by GUID alignment code maker, then enter step 108: generate the internal data source table with alignment code; Enter step 109 for generation with the internal data source table of alignment code: creation facilities program (CFP) locator data table, creation facilities program (CFP) locator data table is determined whether and will generate step 110: constraint condition Intelligence Generator, if not, enter step 113: by GUID mark table name, field name, record condition, time, customer name; If, enter step 110: constraint condition Intelligence Generator, enter again step 111: user's typing screening conditions, by constraint condition Intelligence Generator, user's typing screening conditions are judged whether to meet, if eligible, enter step 112: cell location mark screening conditions and color, if ineligible, enter step 113: by GUID mark table name, field name, record condition, time, customer name; The GUID condition that is generated step 114 by step 113, enters step 115 for GUID condition: generate SELECT; Thereby enter step 116: obtain target data, enter step 117 for the target data of obtaining: cluster analysis judgement; Finally for drawing step 118 by cluster analysis judgement: analysis report table.
The present invention has following beneficial effect: adopt method of the present invention, can be the in the situation that of coding statement not, allow user's can completely set any amount of data screening condition, obtain any amount of garbled data result needing, and any amount of screening conditions combination is recorded in tables of data.
Brief description of the drawings
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is Distributed Multi data pick-up method processing flow chart of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.
Referring to Fig. 1, the invention provides Distributed Multi data pick-up method, comprise the following steps:
Step 101: first set up data source table for external data source DB and field structure, for in step 102 set up that data source table judges whether to enter step 103 set up internal data source table, if do not need to set up internal data source table, enter step 107:GUID alignment code maker; If need to set up internal data source table, enter so step 104: choose and need to introduce data field and step 105: add tables of data location field GUID, then enter step 106: generate internal data source list structure; Then enter step 107:GUID alignment code maker, processed generating internal data source list structure by GUID alignment code maker, then enter step 108: generate the internal data source table with alignment code; Enter step 109 for generation with the internal data source table of alignment code: creation facilities program (CFP) locator data table, creation facilities program (CFP) locator data table is determined whether and will generate step 110: constraint condition Intelligence Generator, if not, enter step 113: by GUID mark table name, field name, record condition, time, customer name; If, enter step 110: constraint condition Intelligence Generator, enter again step 111: user's typing screening conditions, by constraint condition Intelligence Generator, user's typing screening conditions are judged whether to meet, if eligible, enter step 112: cell location mark screening conditions and color, if ineligible, enter step 113: by GUID mark table name, field name, record condition, time, customer name; The GUID condition that is generated step 114 by step 113, enters step 115 for GUID condition: generate SELECT; Thereby enter step 116: obtain target data, enter step 117 for the target data of obtaining: cluster analysis judgement; Finally for drawing step 118 by cluster analysis judgement: analysis report table.In two-dimensional data table, taking cell location information as binding point, record the condition of the data analysis screening that user sets, and several data screening conditions that each cell of row has been arranged carry out mathematical logic association, application data Filter sentence extracts the data sample that meets combination condition.And the screening conditions set that several cells and data line form, the data that user need to be screened with complete associated with each data cell of statement mode, are distributed in data cell in form, form Distributed Multi Data Extraction Technology.
The present invention illustrates: 2-D data is established row mark X and line identifier Y:
Row set X={X1, X2, X3, X4, X5......Xn}
Row set Y={Y1, Y2, Y3, Y4, Y5......Yn}
X1X2X3X4X5……Xn
XY = ∪ i = 1 j = 1 ∞ ( X i , Y j ) = { Dij }
I is line number value: i={1,2,3,4,5......m},
J is columns value: j={1,2,3,4,5......n}
Row subset X: Xj={Dj1, Dj2, Dj3, Dj4, the complete or collected works of Dj5......Djm}Xj ∈ XY j row
Row subset Y:Yi={Di1, Di2, Di3, Di4, the capable complete or collected works of Di5......Din}Yi ∈ XY i
Ranks subset: Dxy={Dij}
Data cell (element) D:Dij
Dxj ∈ XjDxj is the subset of j row set;
Dyi ∈ XiDyi is the subset of i row set.
The first, impose a condition, extract row sample set:
At cells D ij, sample drawn condition Pij is set, from field row, asks the element subset D xj of the Pij that satisfies condition:
Dxj ∈ Xj is expressed as: Dxj={Xji|Pij}Pij is a condition element that obtains the set of Xj row.
Being expressed as Dxj set is the sample set that meets Pij extracting from the set of j row.
The second, in line item, set the set of circumstances Pi of sample drawn:
Pi={Pxj}
Be expressed as the multiple combinations that impose a condition at each capable row (X) of i, these set of circumstances form mathematical logic set according to interrelated logic, as the set of circumstances of sample drawn in complete or collected works XY.
The 3rd, will from XY complete or collected works, extract multiple lines and multiple rows element samples subset D xy by set of circumstances Pi:
Dxy={XY|Pi}
The 4th, Pn is the set of the condition Pi that n is capable, will extract many group sample set Dxy, and the condition complete or collected works that we establish in whole 2-D data are for this reason Pn: Pn={Pi}
The sample set extracting is Dn:Dn={Dxy|Pn}Dn ∈ XY
Adopt method of the present invention, in two-dimensional data table, foundation has No. ID, the row of uniqueness, positional information with corresponding each data cell of locking of field of particular table, application program produces into software operation interface, by user in software interface according to actual needs, setting data screening conditions and mathematical logic relation, by program design, data cells positional information and the combination of this screening conditions are bound, show as and in corresponding table unit lattice, effectively recorded user and screen analysis condition, carry out screening at corresponding cell and will obtain some groups of different data samples.
The present invention is application message theory, set theory and computer technology etc., sum up information society and data analysis is freely recorded to the needs of analysis condition and achievement, propose, by the screening conditions of multidimensional data are bound with corresponding data element, to record the garbled data sample that user need to obtain.The contained each data element of multidimensional data all possesses the environment of recording different screening conditions, thereby produces multipoint data extract function.As in similar two-dimensional data table, screening conditions are recorded in data element relevant information position, set after screening conditions user, all can filter out the garbled data sample of expection in respective cells.Its mathematical theory, according to being based on subclass condition, is set set based algorithm with good conditionsi, selected subset element samples from complete or collected works, and set up the set taking condition as object, thus obtain the subset of diversification.From the angle of information theory, be by information fusion technology, from overall information, obtain according to information requirement the there is homogeney information of (roughly the same), and realize any amount of class condition, obtain the clustering information sample of diversification.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any amendment of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (1)

1. Distributed Multi data pick-up method, is characterized in that: comprise the following steps:
Step 101: first set up data source table for external data source DB and field structure, for in step 102 set up that data source table judges whether to enter step 103 set up internal data source table, if do not need to set up internal data source table, enter step 107:GUID alignment code maker; If need to set up internal data source table, enter so step 104: choose and need to introduce data field and step 105: add tables of data location field GUID, then enter step 106: generate internal data source list structure; Then enter step 107:GUID alignment code maker, processed generating internal data source list structure by GUID alignment code maker, then enter step 108: generate the internal data source table with alignment code; Enter step 109 for generation with the internal data source table of alignment code: creation facilities program (CFP) locator data table, creation facilities program (CFP) locator data table is determined whether and will generate step 110: constraint condition Intelligence Generator, if not, enter step 113: by GUID mark table name, field name, record condition, time, customer name; If, enter step 110: constraint condition Intelligence Generator, enter again step 111: user's typing screening conditions, by constraint condition Intelligence Generator, user's typing screening conditions are judged whether to meet, if eligible, enter step 112: cell location mark screening conditions and color, if ineligible, enter step 113: by GUID mark table name, field name, record condition, time, customer name; The GUID condition that is generated step 114 by step 113, enters step 115 for GUID condition: generate SELECT; Thereby enter step 116: obtain target data, enter step 117 for the target data of obtaining: cluster analysis judgement; Finally for drawing step 118 by cluster analysis judgement: analysis report table.
CN201410208607.9A 2014-05-17 2014-05-17 Distributed Multi data pick-up method Active CN103970880B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410208607.9A CN103970880B (en) 2014-05-17 2014-05-17 Distributed Multi data pick-up method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410208607.9A CN103970880B (en) 2014-05-17 2014-05-17 Distributed Multi data pick-up method

Publications (2)

Publication Number Publication Date
CN103970880A true CN103970880A (en) 2014-08-06
CN103970880B CN103970880B (en) 2018-12-18

Family

ID=51240377

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410208607.9A Active CN103970880B (en) 2014-05-17 2014-05-17 Distributed Multi data pick-up method

Country Status (1)

Country Link
CN (1) CN103970880B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909256A (en) * 2019-11-20 2020-03-24 华育昌(肇庆)智能科技研究有限公司 Artificial intelligence information filtering system for computer

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100037161A1 (en) * 2008-08-11 2010-02-11 Innography, Inc. System and method of applying globally unique identifiers to relate distributed data sources
CN102339323A (en) * 2011-11-11 2012-02-01 江苏鸿信系统集成有限公司 Data extracting, scheduling and displaying method focused on DB2 data warehouse
US20120209886A1 (en) * 2010-12-30 2012-08-16 Coral Networks, Inc. System and method for creating, deploying, integrating, and distributing
CN102902750A (en) * 2012-09-20 2013-01-30 浪潮齐鲁软件产业有限公司 Universal data extraction and conversion method
CN103064659A (en) * 2011-10-21 2013-04-24 镇江金软计算机科技有限责任公司 Software as a service (SAAS) model based on metadata extraction user-defined worksheet system
CN103235807A (en) * 2013-04-19 2013-08-07 浪潮集团山东通用软件有限公司 Data extracting and processing method supporting high-concurrency large-volume data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100037161A1 (en) * 2008-08-11 2010-02-11 Innography, Inc. System and method of applying globally unique identifiers to relate distributed data sources
US20120209886A1 (en) * 2010-12-30 2012-08-16 Coral Networks, Inc. System and method for creating, deploying, integrating, and distributing
CN103064659A (en) * 2011-10-21 2013-04-24 镇江金软计算机科技有限责任公司 Software as a service (SAAS) model based on metadata extraction user-defined worksheet system
CN102339323A (en) * 2011-11-11 2012-02-01 江苏鸿信系统集成有限公司 Data extracting, scheduling and displaying method focused on DB2 data warehouse
CN102902750A (en) * 2012-09-20 2013-01-30 浪潮齐鲁软件产业有限公司 Universal data extraction and conversion method
CN103235807A (en) * 2013-04-19 2013-08-07 浪潮集团山东通用软件有限公司 Data extracting and processing method supporting high-concurrency large-volume data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宋强等: "半结构化文档中非标记化表格的抽取", 《计算机工程》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909256A (en) * 2019-11-20 2020-03-24 华育昌(肇庆)智能科技研究有限公司 Artificial intelligence information filtering system for computer
CN110909256B (en) * 2019-11-20 2020-11-24 华育昌(肇庆)智能科技研究有限公司 Artificial intelligence information filtering system for computer

Also Published As

Publication number Publication date
CN103970880B (en) 2018-12-18

Similar Documents

Publication Publication Date Title
Robertson et al. Biogeo: an R package for assessing and improving data quality of occurrence record datasets
Hanck An intersection test for panel unit roots
CN102135938A (en) Software product testing method and system
CN103473056B (en) A kind of remote measurement configuration file automatic generation method
CN104820707A (en) Automatic test paper composition method in B/S (Brower/Server) mode based on knowledge hierarchy in field of computers
CN110336838B (en) Account abnormity detection method, device, terminal and storage medium
CN101013451A (en) Automatic generation system for designing BOM
Muñoz‐Pajares SIDIER: substitution and indel distances to infer evolutionary relationships
CN101976394B (en) Data acquiring and counting system and method
CN104574141A (en) Service influence degree analysis method
CN103077255B (en) Identification method and system for 3D (three-dimensional) model of nuclear power station
New et al. Model America–data and models of every US building
WO2023134134A1 (en) Method and apparatus for generating association viewing model, and computer device and storage medium
Rao et al. Modeling and simulation of net centric system of systems using systems modeling language and colored Petri‐nets: A demonstration using the global earth observation system of systems
Deo et al. Nested areas of endemism analysis
CN104471530A (en) Executable software specification generation
Sarrazin et al. An introduction to the SAFE Matlab Toolbox with practical examples and guidelines
CN110109843A (en) Automatic test cases construction method and system based on Robot Framework
CN103455466A (en) Calculation method and system of calculator
CN103970880A (en) Distributed multi-point data extraction method
CN105843605A (en) Data mapping data and device
CN109064036B (en) Ecosystem service supply and demand index change detection method facing management field
CN105338104A (en) Business request responding method, analysis method and analysis system
CN107315721B (en) A kind of methods of sampling and system of the resident family of community based on low diversity factor ordered series of numbers
Vasudavan et al. Smart City: the state of the art, definitions, characteristics and dimensions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant