CN103970880B - Distributed Multi data pick-up method - Google Patents

Distributed Multi data pick-up method Download PDF

Info

Publication number
CN103970880B
CN103970880B CN201410208607.9A CN201410208607A CN103970880B CN 103970880 B CN103970880 B CN 103970880B CN 201410208607 A CN201410208607 A CN 201410208607A CN 103970880 B CN103970880 B CN 103970880B
Authority
CN
China
Prior art keywords
data source
guid
data
source table
condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410208607.9A
Other languages
Chinese (zh)
Other versions
CN103970880A (en
Inventor
白崇明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201410208607.9A priority Critical patent/CN103970880B/en
Publication of CN103970880A publication Critical patent/CN103970880A/en
Application granted granted Critical
Publication of CN103970880B publication Critical patent/CN103970880B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Abstract

Present invention relates particularly to Distributed Multi data pick-up methods;The following steps are included: step 101: establishing Data source table first against external data source DB and field structure, step 102: establishing Data source table;Step 103 establishes internal data source table;Step 104: selection need to introduce data field and step 105: addition tables of data location field GUID, step 106: generating internal data source table structure;Step 107:GUID positions code generator, step 108: generating the internal data source table with alignment code;Step 109: establishing program location data table, step 110: constraint condition Intelligence Generator, step 111: user's typing screening conditions, step 112: cell location marks screening conditions and color, step 113: identifying table name, field name, record condition, time, customer name by GUID;The GUID condition of step 114, step 115: generating SELECT;Step 116: obtaining target data, step 117: clustering judgement;Step 118: analysis report table;User is allowed to obtain any amount of garbled data result needed.

Description

Distributed Multi data pick-up method
Technical field
The present invention relates to technical field of data processing, and in particular to Distributed Multi data pick-up method.
Background technique
The major way of usual data analysis technique is to obtain the data element for the condition that meets by data screening.Mesh Before, it realizes data Analysis and Screening, is that number is realized by the sentence that programs in the data platforms such as SQL, Access, Oracle According to screening, advantage is can be by its statement fuction etc., and the write statement that programs realizes various the selection results.But it can not be It is directly operated by mouse or keyboard click commands interfaceization on its data platform, realizes data screening, be unable to direct construction and go out Screening conditions are bound and recorded with data element.In Excel software, screening conditions can be set and obtain the selection result, but User's screening conditions can not save, and can not bind screening conditions and cell;Other existing China and foreign countries' applications or special-purpose software, Do not occur the Distributed Multi Data Extraction Technology that the claims are related in the information published yet.
Summary of the invention
It is an object of the present invention to solve the above problems, Distributed Multi data pick-up method is provided.
To achieve the above object, the present invention provides Distributed Multi data pick-up methods, comprising the following steps:
Step 101: being directed to external data source DB and field structure;
Step 102: Data source table is established, then carries out determining whether to establish internal data source table again, if it is, into Enter step 103: establishing internal data source table;If otherwise entering step 107:GUID positioning code generator;If necessary to establish Internal data source table, then entering step 104: selection need to introduce data field and step 105: addition internal data source table positioning Field GUID enters back into step 106: generating internal data source table structure;Step 107:GUID positioning code generator is subsequently entered, The internal data source table structure of generation is handled by GUID positioning code generator, subsequently into step 108: generating band positioning The internal data source table of code;109 are entered step for the internal data source table with alignment code is generated: establishing location data table, it is right It establishes location data table to be made to determine whether to want generation step 110: constraint condition Intelligence Generator, if it is not, then entering step 113: table name, field name, record condition, time, customer name are identified by GUID;If it is, entering step 110: constraint condition Intelligence Generator enters back into step 111: user's typing screening conditions, is screened by constraint condition Intelligence Generator to user's typing Condition carries out judging whether to meet, and 112 are entered step if eligible: cell location marks screening conditions and color, 113 are entered step if ineligible: table name, field name, record condition, time, customer name are identified by GUID;By step 113 obtain the GUID condition of step 114, enter step 115 for GUID condition: generating SELECT statement;Hence into step 116: obtaining target data, enter step 117 for the target data of acquisition: clustering judgement;It is final to divide for by cluster Analysis judgement obtains step 118: analysis report table.
The invention has the following advantages: can be allowed in the case where not writing program statement using method of the invention User's can completely sets any amount of data screening condition, obtains any amount of garbled data of needs as a result, and will Any amount of screening conditions combination is recorded in tables of data.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention without any creative labor, may be used also for those of ordinary skill in the art To obtain other drawings based on these drawings.
Fig. 1 is Distributed Multi data pick-up method process flow diagram of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Referring to Fig. 1, the present invention provides Distributed Multi data pick-up methods, comprising the following steps:
Step 101: being directed to external data source DB and field structure;
Step 102: Data source table is established, then carries out determining whether to establish internal data source table again, if it is, into Enter step 103: establishing internal data source table;If otherwise entering step 107:GUID positioning code generator;If necessary to establish Internal data source table, then entering step 104: selection need to introduce data field and step 105: addition internal data source table positioning Field GUID enters back into step 106: generating internal data source table structure;Step 107:GUID positioning code generator is subsequently entered, The internal data source table structure of generation is handled by GUID positioning code generator, subsequently into step 108: generating band positioning The internal data source table of code;109 are entered step for the internal data source table with alignment code is generated: establishing location data table, it is right It establishes location data table to be made to determine whether to want generation step 110: constraint condition Intelligence Generator, if it is not, then entering step 113: table name, field name, record condition, time, customer name are identified by GUID;If it is, entering step 110: constraint condition Intelligence Generator enters back into step 111: user's typing screening conditions, is screened by constraint condition Intelligence Generator to user's typing Condition carries out judging whether to meet, and 112 are entered step if eligible: cell location marks screening conditions and color, 113 are entered step if ineligible: table name, field name, record condition, time, customer name are identified by GUID;By step 113 obtain the GUID condition of step 114, enter step 115 for GUID condition: generating SELECT statement;Hence into step 116: obtaining target data, enter step 117 for the target data of acquisition: clustering judgement;It is final to divide for by cluster Analysis judgement obtains step 118: analysis report table.
In two-dimensional data table, using cell location information as binding point, data Analysis and Screening set by user is recorded Condition, and several data screening conditions that capable each unit lattice have been set carry out mathematical logic association, sieve using data Sentence is selected to extract the data sample for meeting combination condition.And the screening conditions set that several cells and data line are formed, it will User needs the data screened, and is completely associated with each data cell in a manner of sentence, is distributed in data cell in form, shape At distributed multipoint data extraction technique.
The present invention for example: 2-D data sets column mark X and line identifier Y:
It arranges set X={ X1, X2, X3, X4, X5......Xn }
Row set Y={ Y1, Y2, Y3, Y4, Y5......Yn }
I is line number value: i={ 1,2,3,4,5......m },
J is columns value: j={ 1,2,3,4,5......n }
Column subset X: the complete or collected works of Xj={ Dj1, Dj2, Dj3, Dj4, Dj5......Djm } Xj ∈ XY jth column
The complete or collected works of row subset Y:Yi={ Di1, Di2, Di3, Di4, Di5......Din } Yi ∈ i-th row of XY
Ranks subset: Dxy={ Dij }
Data cell (element) D:Dij
Dxj ∈ XjDxj is the subset of jth column set;
Dyi ∈ XiDyi is the subset of the i-th row set.
First, it imposes a condition, extracts column sample set:
Sample drawn condition Pij is set in cells D ij, the subset of elements Dxj for meeting condition Pij is sought from field column:
Dxj ∈ Xj is indicated are as follows: Dxj={ Xji | Pij } Pij be the condition element for obtaining Xj column and gathering.
Being expressed as Dxj set is the sample set for meeting Pij extracted from j column set.
Second, it is expert at and sets the set of circumstances Pi of sample drawn in record:
Pi={ Pxj }
The multiple combinations to impose a condition in each column (X) of the i-th row are expressed as, these set of circumstances are according to interrelated logic shape Set of circumstances at mathematical logic set, as the sample drawn in complete or collected works XY.
Third will extract multiple lines and multiple rows element samples subset D xy by set of circumstances Pi from XY complete or collected works:
Dxy=XY | Pi }
4th, Pn are the set of the condition Pi of n row, will extract multiple groups sample set Dxy, we set entire two dimension thus Condition complete or collected works in data are Pn, then: Pn={ Pi }
The sample set of extraction is Dn:Dn={ Dxy | Pn } Dn ∈ XY
Using method of the invention, in two-dimensional data table, row ID number with uniqueness, the field with specific table are established The location information of the corresponding each data cell of locking, application program are generated at software operation interface, by user in software interface According to actual needs, set data screening condition and mathematical logic relationship, believed data cells position by programming Breath combines binding with the screening conditions, shows as effectively describing user's screening analysis condition in corresponding table unit lattice, right The cell answered, which executes screening, will acquire the different data sample of several groups.
The present invention is theoretical application message, set theory and computer technology etc., summarizes information-intensive society and analyzes freely data The needs of documenting analysis condition and achievement, proposition bind the screening conditions to multidimensional data with corresponding data element, record User needs the garbled data sample obtained.Each data element contained by multidimensional data is provided with the ring for recording different screening conditions Border, to generate multipoint data extract function.Such as in similar two-dimensional data table, screening conditions are recorded in data element correlation In information position, after user sets screening conditions, expected garbled data sample can be filtered out in respective cells.It is counted Theory sets conditional set based algorithm according to being to be based on subclass condition, the selected subset element samples from complete or collected works, and builds It stands using condition as the set of object, to obtain the subset of diversification.It is then that skill is merged by information from the angle of information theory Art obtains the information with homogeney (similar) according to information requirement from overall information, and realizes any amount of class condition, Obtain the clustering information sample of diversification.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (1)

1. Distributed Multi data pick-up method, it is characterised in that: the following steps are included:
Step 101: being directed to external data source DB and field structure;
Step 102: establishing Data source table, then carry out determining whether to establish internal data source table again, if it is, entering step Rapid 103: establishing internal data source table;If otherwise entering step 107:GUID positioning code generator;If necessary to establish inside Data source table, then entering step 104: selection need to introduce data field and step 105: addition internal data source table location field GUID enters back into step 106: generating internal data source table structure;Step 107:GUID positioning code generator is subsequently entered, by GUID positioning code generator handles the internal data source table structure of generation, subsequently into step 108: generating band alignment code Internal data source table;109 are entered step for the internal data source table with alignment code is generated: location data table are established, to building Vertical location data table is made to determine whether to want generation step 110: constraint condition Intelligence Generator, if it is not, then entering step 113: table name, field name, record condition, time, customer name are identified by GUID;If it is, entering step 110: constraint condition Intelligence Generator enters back into step 111: user's typing screening conditions, is screened by constraint condition Intelligence Generator to user's typing Condition carries out judging whether to meet, and 112 are entered step if eligible: cell location marks screening conditions and color, 113 are entered step if ineligible: table name, field name, record condition, time, customer name are identified by GUID;By step 113 obtain the GUID condition of step 114, enter step 115 for GUID condition: generating SELECT statement;Hence into step 116: obtaining target data, enter step 117 for the target data of acquisition: clustering judgement;It is final to divide for by cluster Analysis judgement obtains step 118: analysis report table.
CN201410208607.9A 2014-05-17 2014-05-17 Distributed Multi data pick-up method Active CN103970880B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410208607.9A CN103970880B (en) 2014-05-17 2014-05-17 Distributed Multi data pick-up method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410208607.9A CN103970880B (en) 2014-05-17 2014-05-17 Distributed Multi data pick-up method

Publications (2)

Publication Number Publication Date
CN103970880A CN103970880A (en) 2014-08-06
CN103970880B true CN103970880B (en) 2018-12-18

Family

ID=51240377

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410208607.9A Active CN103970880B (en) 2014-05-17 2014-05-17 Distributed Multi data pick-up method

Country Status (1)

Country Link
CN (1) CN103970880B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909256B (en) * 2019-11-20 2020-11-24 华育昌(肇庆)智能科技研究有限公司 Artificial intelligence information filtering system for computer

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102339323A (en) * 2011-11-11 2012-02-01 江苏鸿信系统集成有限公司 Data extracting, scheduling and displaying method focused on DB2 data warehouse
CN102902750A (en) * 2012-09-20 2013-01-30 浪潮齐鲁软件产业有限公司 Universal data extraction and conversion method
CN103064659A (en) * 2011-10-21 2013-04-24 镇江金软计算机科技有限责任公司 Software as a service (SAAS) model based on metadata extraction user-defined worksheet system
CN103235807A (en) * 2013-04-19 2013-08-07 浪潮集团山东通用软件有限公司 Data extracting and processing method supporting high-concurrency large-volume data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9727628B2 (en) * 2008-08-11 2017-08-08 Innography, Inc. System and method of applying globally unique identifiers to relate distributed data sources
US8775476B2 (en) * 2010-12-30 2014-07-08 Skai, Inc. System and method for creating, deploying, integrating, and distributing nodes in a grid of distributed graph databases

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103064659A (en) * 2011-10-21 2013-04-24 镇江金软计算机科技有限责任公司 Software as a service (SAAS) model based on metadata extraction user-defined worksheet system
CN102339323A (en) * 2011-11-11 2012-02-01 江苏鸿信系统集成有限公司 Data extracting, scheduling and displaying method focused on DB2 data warehouse
CN102902750A (en) * 2012-09-20 2013-01-30 浪潮齐鲁软件产业有限公司 Universal data extraction and conversion method
CN103235807A (en) * 2013-04-19 2013-08-07 浪潮集团山东通用软件有限公司 Data extracting and processing method supporting high-concurrency large-volume data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
半结构化文档中非标记化表格的抽取;宋强等;《计算机工程》;20050930;第31卷(第18期);第81-83,171页 *

Also Published As

Publication number Publication date
CN103970880A (en) 2014-08-06

Similar Documents

Publication Publication Date Title
CN106599230A (en) Method and system for evaluating distributed data mining model
Taylor et al. R package wgaim: QTL analysis in bi-parental populations using linear mixed models
CN101739454B (en) Data processing system
Li et al. Assembly processes of waterbird communities across subsidence wetlands in China: A functional and phylogenetic approach
CN109101519B (en) Information acquisition system and heterogeneous information fusion system
Hankin Introducing untb, an R package for simulating ecological drift under the unified neutral theory of biodiversity
CN108491228A (en) A kind of binary vulnerability Code Clones detection method and system
CN110336838A (en) Account method for detecting abnormality, device, terminal and storage medium
CN107092932A (en) A kind of multi-tag Active Learning Method that tally set is relied on based on condition
CN108446720A (en) Abnormal deviation data examination method and system
CN110019116A (en) Data traceability method, apparatus, data processing equipment and computer storage medium
CN103970880B (en) Distributed Multi data pick-up method
Burdick et al. Table extraction and understanding for scientific and enterprise applications
Chalmandrier et al. Comparing spatial diversification and meta-population models in the Indo-Australian Archipelago
CN105843605A (en) Data mapping data and device
CN103227810B (en) A kind of methods, devices and systems identifying remote desktop semanteme in network monitoring
CN109064036B (en) Ecosystem service supply and demand index change detection method facing management field
CN112000389B (en) Configuration recommendation method, system, device and computer storage medium
CN103092617A (en) High reliability workflow development method based on backup services
CN110363198A (en) A kind of neural network weight matrix fractionation and combined method
CN115995092A (en) Drawing text information extraction method, device and equipment
Lu et al. Bi-temporal Attention Transformer for Building Change Detection and Building Damage Assessment
CN105184168B (en) The method for tracing that the association of android system source code loophole influences
CN114331226A (en) Intelligent enterprise demand diagnosis method and system and storage medium
CN114328681A (en) Data conversion method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant