CN103970880A

CN103970880A - Distributed multi-point data extraction method

Info

Publication number: CN103970880A
Application number: CN201410208607.9A
Authority: CN
Inventors: 白崇明
Original assignee: Individual
Current assignee: Individual
Priority date: 2014-05-17
Filing date: 2014-05-17
Publication date: 2014-08-06
Anticipated expiration: 2034-05-17
Also published as: CN103970880B

Abstract

The invention relates to a distributed multi-point data extraction method. The distributed multi-point data extraction method comprises the following steps that (101) a data source table is built for external data sources DB and field structures; (102) the data source table is built; (103) an internal data source table is built; (104) the data fields needing introduction are selected; (105) a data table positioning field GUID is added; (106) an internal data source table structure is generated; (107) a GUID alignment code generator is arranged; (108) an internal data source table with alignment codes is generated; (109) a program positioning data table is built; (110) an intelligent constraint condition generator is arranged; (111) a user types in screening conditions; (112) the screening conditions and colors are marked on cells; (113) table names, field names, recording conditions, time and client names are marked through the GUID; (114) GUID conditions are set; (115) the SELECT is generated; (116) target data are obtained; (117) clustering analysis and judgment is carried out; (118) a report form is analyzed. The user can obtain the needed data screening results of any number.

Description

Distributed Multi data pick-up method

Technical field

The present invention relates to technical field of data processing, be specifically related to Distributed Multi data pick-up method.

Background technology

Conventionally the major way of data analysis technique, is by data screening, obtains the data element satisfying condition.At present, realizing data analysis screening, in the data platforms such as SQL, Access, Oracle, is to realize data screening by the statement that programs, and its advantage is to pass through its statement fuction etc., and the write statement that programs is realized various the selection result.But can not be on its data platform directly operate by mouse or keyboard click commands interfaceization, realize data screening, can not direct construction go out screening conditions and data element are bound and recorded.In Excel software, screening conditions can be set and obtain the selection result, but user's screening conditions cannot preserve, more can not be by screening conditions and cell binding; , in the information of publishing, there is not the Distributed Multi Data Extraction Technology that the claims relate in other application of existing China and foreign countries or special softwares yet.

Summary of the invention

Object of the present invention, in order to address the above problem, provides Distributed Multi data pick-up method.

For achieving the above object, the invention provides Distributed Multi data pick-up method, comprise the following steps:

Step 101: first set up data source table for external data source DB and field structure, for in step 102 set up that data source table judges whether to enter step 103 set up internal data source table, if do not need to set up internal data source table, enter step 107:GUID alignment code maker; If need to set up internal data source table, enter so step 104: choose and need to introduce data field and step 105: add tables of data location field GUID, then enter step 106: generate internal data source list structure; Then enter step 107:GUID alignment code maker, processed generating internal data source list structure by GUID alignment code maker, then enter step 108: generate the internal data source table with alignment code; Enter step 109 for generation with the internal data source table of alignment code: creation facilities program (CFP) locator data table, creation facilities program (CFP) locator data table is determined whether and will generate step 110: constraint condition Intelligence Generator, if not, enter step 113: by GUID mark table name, field name, record condition, time, customer name; If, enter step 110: constraint condition Intelligence Generator, enter again step 111: user's typing screening conditions, by constraint condition Intelligence Generator, user's typing screening conditions are judged whether to meet, if eligible, enter step 112: cell location mark screening conditions and color, if ineligible, enter step 113: by GUID mark table name, field name, record condition, time, customer name; The GUID condition that is generated step 114 by step 113, enters step 115 for GUID condition: generate SELECT; Thereby enter step 116: obtain target data, enter step 117 for the target data of obtaining: cluster analysis judgement; Finally for drawing step 118 by cluster analysis judgement: analysis report table.

The present invention has following beneficial effect: adopt method of the present invention, can be the in the situation that of coding statement not, allow user's can completely set any amount of data screening condition, obtain any amount of garbled data result needing, and any amount of screening conditions combination is recorded in tables of data.

Brief description of the drawings

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is Distributed Multi data pick-up method processing flow chart of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.

Referring to Fig. 1, the invention provides Distributed Multi data pick-up method, comprise the following steps:

Step 101: first set up data source table for external data source DB and field structure, for in step 102 set up that data source table judges whether to enter step 103 set up internal data source table, if do not need to set up internal data source table, enter step 107:GUID alignment code maker; If need to set up internal data source table, enter so step 104: choose and need to introduce data field and step 105: add tables of data location field GUID, then enter step 106: generate internal data source list structure; Then enter step 107:GUID alignment code maker, processed generating internal data source list structure by GUID alignment code maker, then enter step 108: generate the internal data source table with alignment code; Enter step 109 for generation with the internal data source table of alignment code: creation facilities program (CFP) locator data table, creation facilities program (CFP) locator data table is determined whether and will generate step 110: constraint condition Intelligence Generator, if not, enter step 113: by GUID mark table name, field name, record condition, time, customer name; If, enter step 110: constraint condition Intelligence Generator, enter again step 111: user's typing screening conditions, by constraint condition Intelligence Generator, user's typing screening conditions are judged whether to meet, if eligible, enter step 112: cell location mark screening conditions and color, if ineligible, enter step 113: by GUID mark table name, field name, record condition, time, customer name; The GUID condition that is generated step 114 by step 113, enters step 115 for GUID condition: generate SELECT; Thereby enter step 116: obtain target data, enter step 117 for the target data of obtaining: cluster analysis judgement; Finally for drawing step 118 by cluster analysis judgement: analysis report table.In two-dimensional data table, taking cell location information as binding point, record the condition of the data analysis screening that user sets, and several data screening conditions that each cell of row has been arranged carry out mathematical logic association, application data Filter sentence extracts the data sample that meets combination condition.And the screening conditions set that several cells and data line form, the data that user need to be screened with complete associated with each data cell of statement mode, are distributed in data cell in form, form Distributed Multi Data Extraction Technology.

The present invention illustrates: 2-D data is established row mark X and line identifier Y:

Row set X={X1, X2, X3, X4, X5......Xn}

Row set Y={Y1, Y2, Y3, Y4, Y5......Yn}

X1X2X3X4X5……Xn

XY = {\underset{i = 1}{\cup}}_{j = 1}^{\infty} (X_{i}, Y_{j}) = {Dij}

I is line number value: i={1,2,3,4,5......m},

J is columns value: j={1,2,3,4,5......n}

Row subset X: Xj={Dj1, Dj2, Dj3, Dj4, the complete or collected works of Dj5......Djm}Xj ∈ XY j row

Row subset Y:Yi={Di1, Di2, Di3, Di4, the capable complete or collected works of Di5......Din}Yi ∈ XY i

Ranks subset: Dxy={Dij}

Data cell (element) D:Dij

Dxj ∈ XjDxj is the subset of j row set;

Dyi ∈ XiDyi is the subset of i row set.

The first, impose a condition, extract row sample set:

At cells D ij, sample drawn condition Pij is set, from field row, asks the element subset D xj of the Pij that satisfies condition:

Dxj ∈ Xj is expressed as: Dxj={Xji|Pij}Pij is a condition element that obtains the set of Xj row.

Being expressed as Dxj set is the sample set that meets Pij extracting from the set of j row.

The second, in line item, set the set of circumstances Pi of sample drawn:

Pi＝{Pxj}

Be expressed as the multiple combinations that impose a condition at each capable row (X) of i, these set of circumstances form mathematical logic set according to interrelated logic, as the set of circumstances of sample drawn in complete or collected works XY.

The 3rd, will from XY complete or collected works, extract multiple lines and multiple rows element samples subset D xy by set of circumstances Pi:

Dxy＝{XY|Pi}

The 4th, Pn is the set of the condition Pi that n is capable, will extract many group sample set Dxy, and the condition complete or collected works that we establish in whole 2-D data are for this reason Pn: Pn={Pi}

The sample set extracting is Dn:Dn={Dxy|Pn}Dn ∈ XY

Adopt method of the present invention, in two-dimensional data table, foundation has No. ID, the row of uniqueness, positional information with corresponding each data cell of locking of field of particular table, application program produces into software operation interface, by user in software interface according to actual needs, setting data screening conditions and mathematical logic relation, by program design, data cells positional information and the combination of this screening conditions are bound, show as and in corresponding table unit lattice, effectively recorded user and screen analysis condition, carry out screening at corresponding cell and will obtain some groups of different data samples.

The present invention is application message theory, set theory and computer technology etc., sum up information society and data analysis is freely recorded to the needs of analysis condition and achievement, propose, by the screening conditions of multidimensional data are bound with corresponding data element, to record the garbled data sample that user need to obtain.The contained each data element of multidimensional data all possesses the environment of recording different screening conditions, thereby produces multipoint data extract function.As in similar two-dimensional data table, screening conditions are recorded in data element relevant information position, set after screening conditions user, all can filter out the garbled data sample of expection in respective cells.Its mathematical theory, according to being based on subclass condition, is set set based algorithm with good conditionsi, selected subset element samples from complete or collected works, and set up the set taking condition as object, thus obtain the subset of diversification.From the angle of information theory, be by information fusion technology, from overall information, obtain according to information requirement the there is homogeney information of (roughly the same), and realize any amount of class condition, obtain the clustering information sample of diversification.

The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any amendment of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. Distributed Multi data pick-up method, is characterized in that: comprise the following steps: