CN103970880B

CN103970880B - Distributed Multi data pick-up method

Info

Publication number: CN103970880B
Application number: CN201410208607.9A
Authority: CN
Inventors: 白崇明
Original assignee: Individual
Current assignee: Individual
Priority date: 2014-05-17
Filing date: 2014-05-17
Publication date: 2018-12-18
Anticipated expiration: 2034-05-17
Also published as: CN103970880A

Abstract

Present invention relates particularly to Distributed Multi data pick-up methods；The following steps are included: step 101: establishing Data source table first against external data source DB and field structure, step 102: establishing Data source table；Step 103 establishes internal data source table；Step 104: selection need to introduce data field and step 105: addition tables of data location field GUID, step 106: generating internal data source table structure；Step 107:GUID positions code generator, step 108: generating the internal data source table with alignment code；Step 109: establishing program location data table, step 110: constraint condition Intelligence Generator, step 111: user's typing screening conditions, step 112: cell location marks screening conditions and color, step 113: identifying table name, field name, record condition, time, customer name by GUID；The GUID condition of step 114, step 115: generating SELECT；Step 116: obtaining target data, step 117: clustering judgement；Step 118: analysis report table；User is allowed to obtain any amount of garbled data result needed.

Description

Distributed Multi data pick-up method

Technical field

The present invention relates to technical field of data processing, and in particular to Distributed Multi data pick-up method.

Background technique

The major way of usual data analysis technique is to obtain the data element for the condition that meets by data screening.Mesh Before, it realizes data Analysis and Screening, is that number is realized by the sentence that programs in the data platforms such as SQL, Access, Oracle According to screening, advantage is can be by its statement fuction etc., and the write statement that programs realizes various the selection results.But it can not be It is directly operated by mouse or keyboard click commands interfaceization on its data platform, realizes data screening, be unable to direct construction and go out Screening conditions are bound and recorded with data element.In Excel software, screening conditions can be set and obtain the selection result, but User's screening conditions can not save, and can not bind screening conditions and cell；Other existing China and foreign countries' applications or special-purpose software, Do not occur the Distributed Multi Data Extraction Technology that the claims are related in the information published yet.

Summary of the invention

It is an object of the present invention to solve the above problems, Distributed Multi data pick-up method is provided.

To achieve the above object, the present invention provides Distributed Multi data pick-up methods, comprising the following steps:

Step 101: being directed to external data source DB and field structure；

Step 102: Data source table is established, then carries out determining whether to establish internal data source table again, if it is, into Enter step 103: establishing internal data source table；If otherwise entering step 107:GUID positioning code generator；If necessary to establish Internal data source table, then entering step 104: selection need to introduce data field and step 105: addition internal data source table positioning Field GUID enters back into step 106: generating internal data source table structure；Step 107:GUID positioning code generator is subsequently entered, The internal data source table structure of generation is handled by GUID positioning code generator, subsequently into step 108: generating band positioning The internal data source table of code；109 are entered step for the internal data source table with alignment code is generated: establishing location data table, it is right It establishes location data table to be made to determine whether to want generation step 110: constraint condition Intelligence Generator, if it is not, then entering step 113: table name, field name, record condition, time, customer name are identified by GUID；If it is, entering step 110: constraint condition Intelligence Generator enters back into step 111: user's typing screening conditions, is screened by constraint condition Intelligence Generator to user's typing Condition carries out judging whether to meet, and 112 are entered step if eligible: cell location marks screening conditions and color, 113 are entered step if ineligible: table name, field name, record condition, time, customer name are identified by GUID；By step 113 obtain the GUID condition of step 114, enter step 115 for GUID condition: generating SELECT statement；Hence into step 116: obtaining target data, enter step 117 for the target data of acquisition: clustering judgement；It is final to divide for by cluster Analysis judgement obtains step 118: analysis report table.

The invention has the following advantages: can be allowed in the case where not writing program statement using method of the invention User's can completely sets any amount of data screening condition, obtains any amount of garbled data of needs as a result, and will Any amount of screening conditions combination is recorded in tables of data.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention without any creative labor, may be used also for those of ordinary skill in the art To obtain other drawings based on these drawings.

Fig. 1 is Distributed Multi data pick-up method process flow diagram of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

Referring to Fig. 1, the present invention provides Distributed Multi data pick-up methods, comprising the following steps:

Step 101: being directed to external data source DB and field structure；

In two-dimensional data table, using cell location information as binding point, data Analysis and Screening set by user is recorded Condition, and several data screening conditions that capable each unit lattice have been set carry out mathematical logic association, sieve using data Sentence is selected to extract the data sample for meeting combination condition.And the screening conditions set that several cells and data line are formed, it will User needs the data screened, and is completely associated with each data cell in a manner of sentence, is distributed in data cell in form, shape At distributed multipoint data extraction technique.

The present invention for example: 2-D data sets column mark X and line identifier Y:

It arranges set X={ X1, X2, X3, X4, X5......Xn }

Row set Y={ Y1, Y2, Y3, Y4, Y5......Yn }

I is line number value: i={ 1,2,3,4,5......m },

J is columns value: j={ 1,2,3,4,5......n }

Column subset X: the complete or collected works of Xj={ Dj1, Dj2, Dj3, Dj4, Dj5......Djm } Xj ∈ XY jth column

The complete or collected works of row subset Y:Yi={ Di1, Di2, Di3, Di4, Di5......Din } Yi ∈ i-th row of XY

Ranks subset: Dxy={ Dij }

Data cell (element) D:Dij

Dxj ∈ XjDxj is the subset of jth column set；

Dyi ∈ XiDyi is the subset of the i-th row set.

First, it imposes a condition, extracts column sample set:

Sample drawn condition Pij is set in cells D ij, the subset of elements Dxj for meeting condition Pij is sought from field column:

Dxj ∈ Xj is indicated are as follows: Dxj={ Xji | Pij } Pij be the condition element for obtaining Xj column and gathering.

Being expressed as Dxj set is the sample set for meeting Pij extracted from j column set.

Second, it is expert at and sets the set of circumstances Pi of sample drawn in record:

Pi={ Pxj }

The multiple combinations to impose a condition in each column (X) of the i-th row are expressed as, these set of circumstances are according to interrelated logic shape Set of circumstances at mathematical logic set, as the sample drawn in complete or collected works XY.

Third will extract multiple lines and multiple rows element samples subset D xy by set of circumstances Pi from XY complete or collected works:

Dxy=XY | Pi }

4th, Pn are the set of the condition Pi of n row, will extract multiple groups sample set Dxy, we set entire two dimension thus Condition complete or collected works in data are Pn, then: Pn={ Pi }

The sample set of extraction is Dn:Dn={ Dxy | Pn } Dn ∈ XY

Using method of the invention, in two-dimensional data table, row ID number with uniqueness, the field with specific table are established The location information of the corresponding each data cell of locking, application program are generated at software operation interface, by user in software interface According to actual needs, set data screening condition and mathematical logic relationship, believed data cells position by programming Breath combines binding with the screening conditions, shows as effectively describing user's screening analysis condition in corresponding table unit lattice, right The cell answered, which executes screening, will acquire the different data sample of several groups.

The present invention is theoretical application message, set theory and computer technology etc., summarizes information-intensive society and analyzes freely data The needs of documenting analysis condition and achievement, proposition bind the screening conditions to multidimensional data with corresponding data element, record User needs the garbled data sample obtained.Each data element contained by multidimensional data is provided with the ring for recording different screening conditions Border, to generate multipoint data extract function.Such as in similar two-dimensional data table, screening conditions are recorded in data element correlation In information position, after user sets screening conditions, expected garbled data sample can be filtered out in respective cells.It is counted Theory sets conditional set based algorithm according to being to be based on subclass condition, the selected subset element samples from complete or collected works, and builds It stands using condition as the set of object, to obtain the subset of diversification.It is then that skill is merged by information from the angle of information theory Art obtains the information with homogeney (similar) according to information requirement from overall information, and realizes any amount of class condition, Obtain the clustering information sample of diversification.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

1. Distributed Multi data pick-up method, it is characterised in that: the following steps are included:

Step 101: being directed to external data source DB and field structure；

Step 102: establishing Data source table, then carry out determining whether to establish internal data source table again, if it is, entering step Rapid 103: establishing internal data source table；If otherwise entering step 107:GUID positioning code generator；If necessary to establish inside Data source table, then entering step 104: selection need to introduce data field and step 105: addition internal data source table location field GUID enters back into step 106: generating internal data source table structure；Step 107:GUID positioning code generator is subsequently entered, by GUID positioning code generator handles the internal data source table structure of generation, subsequently into step 108: generating band alignment code Internal data source table；109 are entered step for the internal data source table with alignment code is generated: location data table are established, to building Vertical location data table is made to determine whether to want generation step 110: constraint condition Intelligence Generator, if it is not, then entering step 113: table name, field name, record condition, time, customer name are identified by GUID；If it is, entering step 110: constraint condition Intelligence Generator enters back into step 111: user's typing screening conditions, is screened by constraint condition Intelligence Generator to user's typing Condition carries out judging whether to meet, and 112 are entered step if eligible: cell location marks screening conditions and color, 113 are entered step if ineligible: table name, field name, record condition, time, customer name are identified by GUID；By step 113 obtain the GUID condition of step 114, enter step 115 for GUID condition: generating SELECT statement；Hence into step 116: obtaining target data, enter step 117 for the target data of acquisition: clustering judgement；It is final to divide for by cluster Analysis judgement obtains step 118: analysis report table.