CN103970880A - Distributed multi-point data extraction method - Google Patents
Distributed multi-point data extraction method Download PDFInfo
- Publication number
- CN103970880A CN103970880A CN201410208607.9A CN201410208607A CN103970880A CN 103970880 A CN103970880 A CN 103970880A CN 201410208607 A CN201410208607 A CN 201410208607A CN 103970880 A CN103970880 A CN 103970880A
- Authority
- CN
- China
- Prior art keywords
- data
- guid
- data source
- enter step
- source table
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a distributed multi-point data extraction method. The distributed multi-point data extraction method comprises the following steps that (101) a data source table is built for external data sources DB and field structures; (102) the data source table is built; (103) an internal data source table is built; (104) the data fields needing introduction are selected; (105) a data table positioning field GUID is added; (106) an internal data source table structure is generated; (107) a GUID alignment code generator is arranged; (108) an internal data source table with alignment codes is generated; (109) a program positioning data table is built; (110) an intelligent constraint condition generator is arranged; (111) a user types in screening conditions; (112) the screening conditions and colors are marked on cells; (113) table names, field names, recording conditions, time and client names are marked through the GUID; (114) GUID conditions are set; (115) the SELECT is generated; (116) target data are obtained; (117) clustering analysis and judgment is carried out; (118) a report form is analyzed. The user can obtain the needed data screening results of any number.
Description
Technical field
The present invention relates to technical field of data processing, be specifically related to Distributed Multi data pick-up method.
Background technology
Conventionally the major way of data analysis technique, is by data screening, obtains the data element satisfying condition.At present, realizing data analysis screening, in the data platforms such as SQL, Access, Oracle, is to realize data screening by the statement that programs, and its advantage is to pass through its statement fuction etc., and the write statement that programs is realized various the selection result.But can not be on its data platform directly operate by mouse or keyboard click commands interfaceization, realize data screening, can not direct construction go out screening conditions and data element are bound and recorded.In Excel software, screening conditions can be set and obtain the selection result, but user's screening conditions cannot preserve, more can not be by screening conditions and cell binding; , in the information of publishing, there is not the Distributed Multi Data Extraction Technology that the claims relate in other application of existing China and foreign countries or special softwares yet.
Summary of the invention
Object of the present invention, in order to address the above problem, provides Distributed Multi data pick-up method.
For achieving the above object, the invention provides Distributed Multi data pick-up method, comprise the following steps:
Step 101: first set up data source table for external data source DB and field structure, for in step 102 set up that data source table judges whether to enter step 103 set up internal data source table, if do not need to set up internal data source table, enter step 107:GUID alignment code maker; If need to set up internal data source table, enter so step 104: choose and need to introduce data field and step 105: add tables of data location field GUID, then enter step 106: generate internal data source list structure; Then enter step 107:GUID alignment code maker, processed generating internal data source list structure by GUID alignment code maker, then enter step 108: generate the internal data source table with alignment code; Enter step 109 for generation with the internal data source table of alignment code: creation facilities program (CFP) locator data table, creation facilities program (CFP) locator data table is determined whether and will generate step 110: constraint condition Intelligence Generator, if not, enter step 113: by GUID mark table name, field name, record condition, time, customer name; If, enter step 110: constraint condition Intelligence Generator, enter again step 111: user's typing screening conditions, by constraint condition Intelligence Generator, user's typing screening conditions are judged whether to meet, if eligible, enter step 112: cell location mark screening conditions and color, if ineligible, enter step 113: by GUID mark table name, field name, record condition, time, customer name; The GUID condition that is generated step 114 by step 113, enters step 115 for GUID condition: generate SELECT; Thereby enter step 116: obtain target data, enter step 117 for the target data of obtaining: cluster analysis judgement; Finally for drawing step 118 by cluster analysis judgement: analysis report table.
The present invention has following beneficial effect: adopt method of the present invention, can be the in the situation that of coding statement not, allow user's can completely set any amount of data screening condition, obtain any amount of garbled data result needing, and any amount of screening conditions combination is recorded in tables of data.
Brief description of the drawings
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is Distributed Multi data pick-up method processing flow chart of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.
Referring to Fig. 1, the invention provides Distributed Multi data pick-up method, comprise the following steps:
Step 101: first set up data source table for external data source DB and field structure, for in step 102 set up that data source table judges whether to enter step 103 set up internal data source table, if do not need to set up internal data source table, enter step 107:GUID alignment code maker; If need to set up internal data source table, enter so step 104: choose and need to introduce data field and step 105: add tables of data location field GUID, then enter step 106: generate internal data source list structure; Then enter step 107:GUID alignment code maker, processed generating internal data source list structure by GUID alignment code maker, then enter step 108: generate the internal data source table with alignment code; Enter step 109 for generation with the internal data source table of alignment code: creation facilities program (CFP) locator data table, creation facilities program (CFP) locator data table is determined whether and will generate step 110: constraint condition Intelligence Generator, if not, enter step 113: by GUID mark table name, field name, record condition, time, customer name; If, enter step 110: constraint condition Intelligence Generator, enter again step 111: user's typing screening conditions, by constraint condition Intelligence Generator, user's typing screening conditions are judged whether to meet, if eligible, enter step 112: cell location mark screening conditions and color, if ineligible, enter step 113: by GUID mark table name, field name, record condition, time, customer name; The GUID condition that is generated step 114 by step 113, enters step 115 for GUID condition: generate SELECT; Thereby enter step 116: obtain target data, enter step 117 for the target data of obtaining: cluster analysis judgement; Finally for drawing step 118 by cluster analysis judgement: analysis report table.In two-dimensional data table, taking cell location information as binding point, record the condition of the data analysis screening that user sets, and several data screening conditions that each cell of row has been arranged carry out mathematical logic association, application data Filter sentence extracts the data sample that meets combination condition.And the screening conditions set that several cells and data line form, the data that user need to be screened with complete associated with each data cell of statement mode, are distributed in data cell in form, form Distributed Multi Data Extraction Technology.
The present invention illustrates: 2-D data is established row mark X and line identifier Y:
Row set X={X1, X2, X3, X4, X5......Xn}
Row set Y={Y1, Y2, Y3, Y4, Y5......Yn}
X1X2X3X4X5……Xn
I is line number value: i={1,2,3,4,5......m},
J is columns value: j={1,2,3,4,5......n}
Row subset X: Xj={Dj1, Dj2, Dj3, Dj4, the complete or collected works of Dj5......Djm}Xj ∈ XY j row
Row subset Y:Yi={Di1, Di2, Di3, Di4, the capable complete or collected works of Di5......Din}Yi ∈ XY i
Ranks subset: Dxy={Dij}
Data cell (element) D:Dij
Dxj ∈ XjDxj is the subset of j row set;
Dyi ∈ XiDyi is the subset of i row set.
The first, impose a condition, extract row sample set:
At cells D ij, sample drawn condition Pij is set, from field row, asks the element subset D xj of the Pij that satisfies condition:
Dxj ∈ Xj is expressed as: Dxj={Xji|Pij}Pij is a condition element that obtains the set of Xj row.
Being expressed as Dxj set is the sample set that meets Pij extracting from the set of j row.
The second, in line item, set the set of circumstances Pi of sample drawn:
Pi={Pxj}
Be expressed as the multiple combinations that impose a condition at each capable row (X) of i, these set of circumstances form mathematical logic set according to interrelated logic, as the set of circumstances of sample drawn in complete or collected works XY.
The 3rd, will from XY complete or collected works, extract multiple lines and multiple rows element samples subset D xy by set of circumstances Pi:
Dxy={XY|Pi}
The 4th, Pn is the set of the condition Pi that n is capable, will extract many group sample set Dxy, and the condition complete or collected works that we establish in whole 2-D data are for this reason Pn: Pn={Pi}
The sample set extracting is Dn:Dn={Dxy|Pn}Dn ∈ XY
Adopt method of the present invention, in two-dimensional data table, foundation has No. ID, the row of uniqueness, positional information with corresponding each data cell of locking of field of particular table, application program produces into software operation interface, by user in software interface according to actual needs, setting data screening conditions and mathematical logic relation, by program design, data cells positional information and the combination of this screening conditions are bound, show as and in corresponding table unit lattice, effectively recorded user and screen analysis condition, carry out screening at corresponding cell and will obtain some groups of different data samples.
The present invention is application message theory, set theory and computer technology etc., sum up information society and data analysis is freely recorded to the needs of analysis condition and achievement, propose, by the screening conditions of multidimensional data are bound with corresponding data element, to record the garbled data sample that user need to obtain.The contained each data element of multidimensional data all possesses the environment of recording different screening conditions, thereby produces multipoint data extract function.As in similar two-dimensional data table, screening conditions are recorded in data element relevant information position, set after screening conditions user, all can filter out the garbled data sample of expection in respective cells.Its mathematical theory, according to being based on subclass condition, is set set based algorithm with good conditionsi, selected subset element samples from complete or collected works, and set up the set taking condition as object, thus obtain the subset of diversification.From the angle of information theory, be by information fusion technology, from overall information, obtain according to information requirement the there is homogeney information of (roughly the same), and realize any amount of class condition, obtain the clustering information sample of diversification.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any amendment of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.
Claims (1)
1. Distributed Multi data pick-up method, is characterized in that: comprise the following steps:
Step 101: first set up data source table for external data source DB and field structure, for in step 102 set up that data source table judges whether to enter step 103 set up internal data source table, if do not need to set up internal data source table, enter step 107:GUID alignment code maker; If need to set up internal data source table, enter so step 104: choose and need to introduce data field and step 105: add tables of data location field GUID, then enter step 106: generate internal data source list structure; Then enter step 107:GUID alignment code maker, processed generating internal data source list structure by GUID alignment code maker, then enter step 108: generate the internal data source table with alignment code; Enter step 109 for generation with the internal data source table of alignment code: creation facilities program (CFP) locator data table, creation facilities program (CFP) locator data table is determined whether and will generate step 110: constraint condition Intelligence Generator, if not, enter step 113: by GUID mark table name, field name, record condition, time, customer name; If, enter step 110: constraint condition Intelligence Generator, enter again step 111: user's typing screening conditions, by constraint condition Intelligence Generator, user's typing screening conditions are judged whether to meet, if eligible, enter step 112: cell location mark screening conditions and color, if ineligible, enter step 113: by GUID mark table name, field name, record condition, time, customer name; The GUID condition that is generated step 114 by step 113, enters step 115 for GUID condition: generate SELECT; Thereby enter step 116: obtain target data, enter step 117 for the target data of obtaining: cluster analysis judgement; Finally for drawing step 118 by cluster analysis judgement: analysis report table.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410208607.9A CN103970880B (en) | 2014-05-17 | 2014-05-17 | Distributed Multi data pick-up method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410208607.9A CN103970880B (en) | 2014-05-17 | 2014-05-17 | Distributed Multi data pick-up method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103970880A true CN103970880A (en) | 2014-08-06 |
CN103970880B CN103970880B (en) | 2018-12-18 |
Family
ID=51240377
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410208607.9A Active CN103970880B (en) | 2014-05-17 | 2014-05-17 | Distributed Multi data pick-up method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103970880B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110909256A (en) * | 2019-11-20 | 2020-03-24 | 华育昌(肇庆)智能科技研究有限公司 | Artificial intelligence information filtering system for computer |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100037161A1 (en) * | 2008-08-11 | 2010-02-11 | Innography, Inc. | System and method of applying globally unique identifiers to relate distributed data sources |
CN102339323A (en) * | 2011-11-11 | 2012-02-01 | 江苏鸿信系统集成有限公司 | Data extracting, scheduling and displaying method focused on DB2 data warehouse |
US20120209886A1 (en) * | 2010-12-30 | 2012-08-16 | Coral Networks, Inc. | System and method for creating, deploying, integrating, and distributing |
CN102902750A (en) * | 2012-09-20 | 2013-01-30 | 浪潮齐鲁软件产业有限公司 | Universal data extraction and conversion method |
CN103064659A (en) * | 2011-10-21 | 2013-04-24 | 镇江金软计算机科技有限责任公司 | Software as a service (SAAS) model based on metadata extraction user-defined worksheet system |
CN103235807A (en) * | 2013-04-19 | 2013-08-07 | 浪潮集团山东通用软件有限公司 | Data extracting and processing method supporting high-concurrency large-volume data |
-
2014
- 2014-05-17 CN CN201410208607.9A patent/CN103970880B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100037161A1 (en) * | 2008-08-11 | 2010-02-11 | Innography, Inc. | System and method of applying globally unique identifiers to relate distributed data sources |
US20120209886A1 (en) * | 2010-12-30 | 2012-08-16 | Coral Networks, Inc. | System and method for creating, deploying, integrating, and distributing |
CN103064659A (en) * | 2011-10-21 | 2013-04-24 | 镇江金软计算机科技有限责任公司 | Software as a service (SAAS) model based on metadata extraction user-defined worksheet system |
CN102339323A (en) * | 2011-11-11 | 2012-02-01 | 江苏鸿信系统集成有限公司 | Data extracting, scheduling and displaying method focused on DB2 data warehouse |
CN102902750A (en) * | 2012-09-20 | 2013-01-30 | 浪潮齐鲁软件产业有限公司 | Universal data extraction and conversion method |
CN103235807A (en) * | 2013-04-19 | 2013-08-07 | 浪潮集团山东通用软件有限公司 | Data extracting and processing method supporting high-concurrency large-volume data |
Non-Patent Citations (1)
Title |
---|
宋强等: "半结构化文档中非标记化表格的抽取", 《计算机工程》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110909256A (en) * | 2019-11-20 | 2020-03-24 | 华育昌(肇庆)智能科技研究有限公司 | Artificial intelligence information filtering system for computer |
CN110909256B (en) * | 2019-11-20 | 2020-11-24 | 华育昌(肇庆)智能科技研究有限公司 | Artificial intelligence information filtering system for computer |
Also Published As
Publication number | Publication date |
---|---|
CN103970880B (en) | 2018-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Robertson et al. | Biogeo: an R package for assessing and improving data quality of occurrence record datasets | |
Hanck | An intersection test for panel unit roots | |
CN102135938A (en) | Software product testing method and system | |
CN103473056B (en) | A kind of remote measurement configuration file automatic generation method | |
CN104820707A (en) | Automatic test paper composition method in B/S (Brower/Server) mode based on knowledge hierarchy in field of computers | |
CN110336838B (en) | Account abnormity detection method, device, terminal and storage medium | |
CN101013451A (en) | Automatic generation system for designing BOM | |
Muñoz‐Pajares | SIDIER: substitution and indel distances to infer evolutionary relationships | |
CN101976394B (en) | Data acquiring and counting system and method | |
CN104574141A (en) | Service influence degree analysis method | |
CN103077255B (en) | Identification method and system for 3D (three-dimensional) model of nuclear power station | |
New et al. | Model America–data and models of every US building | |
WO2023134134A1 (en) | Method and apparatus for generating association viewing model, and computer device and storage medium | |
Rao et al. | Modeling and simulation of net centric system of systems using systems modeling language and colored Petri‐nets: A demonstration using the global earth observation system of systems | |
Deo et al. | Nested areas of endemism analysis | |
CN104471530A (en) | Executable software specification generation | |
Sarrazin et al. | An introduction to the SAFE Matlab Toolbox with practical examples and guidelines | |
CN110109843A (en) | Automatic test cases construction method and system based on Robot Framework | |
CN103455466A (en) | Calculation method and system of calculator | |
CN103970880A (en) | Distributed multi-point data extraction method | |
CN105843605A (en) | Data mapping data and device | |
CN109064036B (en) | Ecosystem service supply and demand index change detection method facing management field | |
CN105338104A (en) | Business request responding method, analysis method and analysis system | |
CN107315721B (en) | A kind of methods of sampling and system of the resident family of community based on low diversity factor ordered series of numbers | |
Vasudavan et al. | Smart City: the state of the art, definitions, characteristics and dimensions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |