CN107025288A - Distributed data digging method and system - Google Patents

Distributed data digging method and system Download PDF

Info

Publication number
CN107025288A
CN107025288A CN201710241931.4A CN201710241931A CN107025288A CN 107025288 A CN107025288 A CN 107025288A CN 201710241931 A CN201710241931 A CN 201710241931A CN 107025288 A CN107025288 A CN 107025288A
Authority
CN
China
Prior art keywords
model
data
information
mining
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710241931.4A
Other languages
Chinese (zh)
Inventor
李存昌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Jiuding Credit Suisse Software Development Co Ltd
Original Assignee
Sichuan Jiuding Credit Suisse Software Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Jiuding Credit Suisse Software Development Co Ltd filed Critical Sichuan Jiuding Credit Suisse Software Development Co Ltd
Priority to CN201710241931.4A priority Critical patent/CN107025288A/en
Publication of CN107025288A publication Critical patent/CN107025288A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of distributed data digging method and system.The system includes:First server and distributed type assemblies group, distributed type assemblies group include multiple second servers.First server obtains the mining data information that user selects in browser interface.First server is based on mining data information and carries out variable-definition, to carry out model calculation.First server obtains the model information that user selects in browser interface, and the model information of mining data information and selection is sent into distributed type assemblies group.Distributed type assemblies group receives the mining data information of first server transmission and the model information of selection, and model calculation and mining analysis are carried out according to the model that user selects.Thus, by using the distributed structure/architecture treatment scale of data extending transversely, optimize the definition to data model, it is not necessary to the client of highly-specialised, mitigate and the specialty of technical staff is required, reduce learning cost.

Description

Distributed data digging method and system
Technical field
The present invention relates to field of computer technology, in particular to a kind of distributed data digging method and system.
Background technology
In existing data mining technology, generally only configure a server and data are handled, data processing amount It is small, server burden weight.And need to carry out model definition on the client end interface of highly-specialised, it is necessary to by specialty Programming language is realized.Thus, the technical staff for carrying out data mining needs to have the professional technique of higher level, adds phase The technological learning cost answered.
The content of the invention
In order to overcome above-mentioned deficiency of the prior art, the present invention provides a kind of distributed data digging method and system, It uses distributed structure/architecture, can data extending transversely treatment scale, optimize the definition to data model, mitigate to technology people The specialty requirement of member, reduces learning cost.
The first object of the present invention is to provide a kind of distributed data digging method, and methods described is applied to distributed number According to digging system, the distributed data digging system includes:First server and distributed type assemblies group, the distributed type assemblies Group includes multiple second servers for being used to carry out model calculation and mining analysis, and methods described includes:
First server obtains the mining data information that user selects in browser interface;
First server is based on the mining data information and carries out variable-definition, to carry out model calculation;
First server obtains the model information that is selected in browser interface of user, and by mining data information and selection The model information is sent to distributed type assemblies group;
Distributed type assemblies group receives the mining data information of the first server transmission and the model information of selection, Model calculation and mining analysis are carried out according to the model that user selects.
The second object of the present invention is to provide a kind of distributed data digging system, the distributed data digging system Including:First server and distributed type assemblies group, the distributed type assemblies group include multiple for carrying out model calculation and excavation The second server of analysis, wherein:
The first server, for obtaining the mining data information that user selects in browser interface;
The first server, is additionally operable to carry out variable-definition based on the mining data information, to carry out model fortune Calculate;
The first server, is additionally operable to obtain the model information that is selected in browser interface of user, and by mining data Information and the model information of selection are sent to distributed type assemblies group;
The distributed type assemblies group, for receiving the mining data information that the first server sends and selection Model information, model calculation and mining analysis are carried out according to the model that user selects.
In terms of existing technologies, the invention has the advantages that:
The present invention provides a kind of distributed data digging method and system.The distributed data digging system includes:The One server and distributed type assemblies group, the distributed type assemblies group include multiple being used to carrying out the of model calculation and mining analysis Two servers.First server obtains the mining data information that user selects in browser interface.First server is based on described Mining data information carries out variable-definition, to carry out model calculation.First server obtains user and selected in browser interface Model information, and mining data information and the model information of selection are sent to distributed type assemblies group.Distributed type assemblies Group receives the mining data information of the first server transmission and the model information of selection, the model selected according to user Carry out model calculation and mining analysis.Thus, by using the distributed structure/architecture treatment scale of data extending transversely, optimize Definition to data model, it is not necessary to the client of highly-specialised, mitigates and the specialty of technical staff is required, reduce study Cost.
Brief description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be attached to what is used required in embodiment Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore is not construed as pair The restriction of scope, for those of ordinary skill in the art, on the premise of not paying creative work, can also be according to this A little accompanying drawings obtain other related accompanying drawings.
Fig. 1 is the block diagram for the distributed data digging system that present pre-ferred embodiments are provided.
Fig. 2 is the block diagram of the first server shown in Fig. 1 that present pre-ferred embodiments are provided.
Fig. 3 is the block diagram of the second server shown in Fig. 1 that present pre-ferred embodiments are provided.
Fig. 4 is one of step flow chart of distributed data digging method that present pre-ferred embodiments are provided.
Fig. 5 is the sub-step flow chart of the step S110 shown in Fig. 4 that present pre-ferred embodiments are provided.
Fig. 6 is the sub-step flow chart of the step S130 shown in Fig. 4 that present pre-ferred embodiments are provided.
Fig. 7 is the two of the step flow chart for the distributed data digging method that present pre-ferred embodiments are provided.
Icon:10- distributed data digging systems;100- first servers;110- first memories;120- first is handled Device;130- first network modules;200- distributed type assemblies groups;210- second servers;212- second memories;At 214- second Manage device;The mixed-media network modules mixed-medias of 216- second.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is a part of embodiment of the invention, rather than whole embodiments.Generally herein The component of the embodiment of the present invention described and illustrated in place's accompanying drawing can be arranged and designed with a variety of configurations.Therefore, The detailed description of the embodiments of the invention to providing in the accompanying drawings is not intended to limit the model of claimed invention below Enclose, but be merely representative of the selected embodiment of the present invention.Based on the embodiment in the present invention, those of ordinary skill in the art are not having There is the every other embodiment made and obtained under the premise of creative work, belong to the scope of protection of the invention.
It should be noted that:Similar label and letter represents similar terms in following accompanying drawing, therefore, once a certain Xiang Yi It is defined in individual accompanying drawing, then it further need not be defined and explained in subsequent accompanying drawing.Meanwhile, the present invention's In description, term " first ", " second " etc. are only used for distinguishing description, and it is not intended that indicating or implying relative importance.
The present invention provides a kind of distributed data digging system.Fig. 1 is refer to, Fig. 1 is that present pre-ferred embodiments are provided Distributed data digging system 10 block diagram.The distributed data digging system 10 includes being in communication with each other connection First server 100 and distributed type assemblies group 200.The distributed type assemblies group 200 includes multiple second servers 210.
In the present embodiment, the first server 100 is responsible for the associative operation that response user is carried out in browser interface, And perform the related service of data mining and data mining business is managed.The distributed type assemblies group 200 is specially negative Duty carries out computing to data.The distributed type assemblies group 200 realizes that the distributed of data is transported by multiple second servers 210 Calculate, be extended with the treatment scale to data.
In the present embodiment, using B/S frameworks, (Browser/Server is browsed the distributed data digging system 10 Device/server mode), B/S is a kind of network structure pattern after Web rises, and Web browser is the topmost application of client Software.This pattern has unified client, and the core that systemic-function is realized is focused on server, system is simplified Develop, safeguard and use.As long as installing a browser, such as Netscape Navigator or Internet in client Explorer, server installs the databases such as SQL Server, Oracle, MYSQL, and browser passes through the same data of Web Server Storehouse carries out data interaction.Thus, it is not necessary to carry out the defining operation of model by the client of highly-specialised, it is easy to being System is extended, and also reduces cost.
In the present embodiment, the distributed data digging system 10 carries out data processing using distributed structure/architecture, passes through The data partition largely calculated will be needed into fritter, be respectively calculated, result is united again by multiple servers after calculating One merging draws data conclusion.It is easier to be extended data treatment scale using the system of distributed structure/architecture, with to a large amount of Data carry out computing.
Fig. 2 is refer to, Fig. 2 is the square frame signal of the first server 100 shown in Fig. 1 that present pre-ferred embodiments are provided Figure.The first server 100 includes first memory 110, first processor 120 and first network module 130.
The first memory 110, first processor 120 and first network module 130 are each other directly or indirectly It is electrically connected with, to realize the transmission or interaction of data.For example, these elements can pass through one or more communication bus each other Or signal wire is realized and is electrically connected with.Be stored with multiple software function modules in first memory 110, the first processor 120 Software program and module in first memory 110 are stored in by operation, so as to perform various function application and data Processing.
Wherein, the first memory 110 may be, but not limited to, random access memory (Random Access Memory, RAM), read-only storage (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..Wherein, first memory 110 is used for storage program, and the first processor 120 is receiving execution After instruction, described program is performed.Further, the software program and module in above-mentioned first memory 110 may also include behaviour Make system, it may include various for the soft of management system task (such as memory management, storage device control, power management) Part component and/or driving, and can be in communication with each other with various hardware or component software, so as to provide the operation ring of other software component Border.
The first processor 120 can be a kind of IC chip, the disposal ability with signal.Above-mentioned first Processor 120 can be at general processor, including central processing unit (Central Processing Unit, CPU), network Manage device (Network Processor, NP) etc..Can also be digital signal processor (DSP), application specific integrated circuit (ASIC), It is field programmable gate array (FPGA) or other PLDs, discrete gate or transistor logic, discrete hard Part component.It can realize or perform disclosed each method, step and the logic diagram in the embodiment of the present invention.General processor Can be microprocessor or the processor can also be any conventional processor etc..
First network module 130 is used to set up multiple the of first server 100 and distributed type assemblies group 200 by network Communication connection between two servers 210, realizes the transmission operation of network signal and data.
It is appreciated that the structure described in Fig. 2 be only signal, first server 100 may also include it is more more than shown in Fig. 2 or The less component of person, or with the configuration different from shown in Fig. 2.Each component shown in Fig. 2 can using hardware, software or It, which is combined, realizes.
Fig. 3 is refer to, Fig. 3 is the square frame signal of the second server 210 shown in Fig. 1 that present pre-ferred embodiments are provided Figure.The second server 210 includes second memory 212, the mixed-media network modules mixed-media 216 of second processor 214 and second.
Be stored with database in the second memory 212, and the database, which is used to store, needs the data message of computing And the data result after computing.
Wherein, the second memory 212, the mixed-media network modules mixed-media 216 of second processor 214 and second are deposited with first in Fig. 2 The hardware configuration of reservoir 110, first processor 120 and first network module 130 is identical, just no longer introduces one by one herein.
The present invention also provides a kind of distributed data digging method.Fig. 4 is refer to, Fig. 4 is the distribution that the present invention is provided One of step flow chart of data digging method.The distributed data digging method is applied to distributed data digging system 10.Distributed data digging method idiographic flow is described in detail below.
Step S110, first server 100 obtains the mining data information that user selects in browser interface.
Referring to Fig. 5, Fig. 5 is the sub-step flow chart of the step S110 shown in Fig. 4 that first embodiment of the invention is provided. The step S110 includes:Sub-step S111, sub-step S112 and sub-step S113.
Sub-step S111, first server 100 obtains user and carries out mining analysis the need for browser interface is selected Exchange Service information and data field information.
In the present embodiment, the Exchange Service information refers to the tables of data (or table) stored in database.Data Table is a very important object in database, is the basis of other objects, and database is a framework, and tables of data is only Real substantive content.For example, by taking Education Administration Information System as an example, the database of Education Administration Information System can include:Teacher's table, Multiple tables of data such as raw table, list of results, curriculum schedule and class's table.By these tables of data come to teaching process middle school student, teacher, The information such as course are managed.Row in the data field information and date table are corresponding, Data field names respective column Title, the type of data field type respective column.
Sub-step S112, first server 100 responds user to the Exchange Service information and data field information respectively The operation of setting is associated, and obtains the Exchange Service information and data field information set by association.
In the present embodiment, the response user of first server 100 is believed the Exchange Service information and data field The operation that breath is associated setting is referred to:User is by database language by tables of data each independent in database by building Vertical relation is associated, the operation that the response user of first server 100 is associated to tables of data.Wherein, the database language Speech includes:SQL (SQL), Oracle, db2, sql server, Sybase, Mysql etc..In the present embodiment, It is preferred to use SQL (SQL).
Sub-step S113, when user needs to filter the data in Exchange Service, the response of first server 100 is used Family carries out the operation of filter condition setting to data, to be filtered to the data.
In the present embodiment, when user needs to filter the data in Exchange Service, first server 100 passes through The where conditions that user is filled in SQL database language are read, data for needing to filter are filtered with this.
Step S130, first server 100 is based on the mining data information and carries out variable-definition, to carry out model fortune Calculate.
Referring to Fig. 6, Fig. 6 is the sub-step flow chart of the step S130 shown in Fig. 4 that the present invention is provided.The step S130 includes:Sub-step S131 and sub-step S132.
Sub-step S131, first server 100 responds user to needing the data field into model calculation to carry out parameter The operation of configuration.
In the present embodiment, the parameter of data field configuration includes:Data field names, data field type, variable number According to type, interpolation value, null value rate, input/output variable etc..
In the present embodiment, data field information of the first server 100 based on acquisition carries out field type definition, and root According to the data type of a variable of data field type allocating default.Wherein, the data field type includes:DOUBLE (double precisions Floating type), the type, the variable data such as STRING (character type) and BIGINT (more than the integer of int integer data scopes) Type includes:Sectional type, continuous type, discrete type and classification type.Discrete type:Value is limited specifiable classification or numerical value; Continuous type:Value is the continuous variable of numeric type;Sectional type:To continuous variable discretization, such as by 1-10 serial number It is divided into three sections:[1,3), [3,7), [7,10];Classification type:Discrete variable is divided into less major class, such as by DBA engineer, Front end engineer is classified as IT engineer.
In the present embodiment, the first server 100 can configure corresponding variable number according to the data field type According to type.For example, DOUBLE types and BIGINT types are corresponding with continuous type, STRING types are corresponding with discrete type.
In the present embodiment, the response user of first server 100 carries out discretization setting to different types of data field Operation, to classify to the data field, wherein, the discretization, which is set, to be included:Subsection setup, sort out set and Label is set.Subsection setup:For numeric field/variable of continuous type, field/variable of value type is separated into multiple points Section interval is analyzed, and subsection setup is exactly the process for filling in each piecewise interval.Sort out and set:Can by the field of discrete type/ Variable classification analysis, field/variable of discrete type is dragged under pre-defined classification.Label is set:It is to output variable Key words sorting is carried out, only output variable can enter row label setting.In a mining process, only allow have an output change Amount, can be selected from original field/variable, can also be self-defined.
In the present embodiment, the response user of first server 100 carries out the operation of type selecting to interpolation data, works as data When there is missing data in field, first server 100 is inserted based on the interpolation data type that user selects to missing data Mend value.The interpolation data type includes:Average, mode, maximum, minimum value etc..If for example, the interpolation number of user's selection It is average according to type, then using missing data, the average number of the numerical value of column in tables of data is used as interpolation to first server 100 Value number fills up the missing data.
Sub-step S132, first server 100 will need the data field for carrying out model calculation to carry out conversion definition, conversion For the variable for carrying out computing can be brought in model into.
In the present embodiment, because data field type is numerous, first server 100 needs to be changed data field Definition, is converted to the numerical variable that can be recognized by calculation procedure, computing is carried out to be brought into model.For example, by teacher's word Section is defined as 0, and student's field definition is 1, is brought into calculation procedure and is calculated with this.
Step S140, first server 100 obtains the model information that is selected in browser interface of user, and by mining data Information and the model information of selection are sent to distributed type assemblies group 200.
In the present embodiment, first server 100 obtains the model information that user selects in browser interface.First service Device 100 configures corresponding mining algorithm based on the model information that user selects, and is shown to user by browser.First service Device 100 is the analytical parameters of every kind of model allocating default, and is shown to user by browser.The response of first server 100 is used The operation that family is modified to the mining algorithm and analytical parameters.First server 100 is by mining data information and selection The model information is sent to multiple second servers 210 of the distributed type assemblies group 200.
In the present embodiment, the model of conventional data mining and machine learning includes:It is disaggregated model, regression model, poly- Class model, forecast model, association mining model etc..Different models performs different tasks, the different data processing side of correspondence Formula, and every kind of model also has corresponding one or more mathematical algorithms.
In the present embodiment, the mining data information includes:Mining algorithm, analytical parameters and the change by conversion definition Amount.Wherein, the analytical parameters include:The general parameters such as training ratio, test ratio and the distinctive special parameters of every kind of model. The training ratio is 1 with test ratio sum, training ratio generally is set as into 0.7, test ratio is set as 0.3.I.e. 70% sample data is used to set up model as training data, and 30% sample data is used for foundation as test data Modelling effect is detected.First server 100 can also respond user and training ratio and test ratio are adjusted according to demand Whole operation, it is 1 with test ratio sum that need to only ensure training ratio.
Step S150, the distributed type assemblies group 200 receives the mining data letter that the first server 100 is sent Breath and the model information of selection, model calculation and mining analysis are carried out according to the model that user selects.
In the present embodiment, multiple second servers 210 receive the mining data that the first server 100 is sent Information and the model information of selection.Mining algorithm that multiple second servers 210 are included based on the mining data information, analysis The model that parameter and the variable defined by conversion are selected user carries out computing, and mining analysis is carried out to operation result.In fortune After calculation, analysis terminate, the data result and model result by computing and analysis can be stored in number by multiple second servers 210 According in storehouse, and the data result and model result are sent to first server 100, data result and model result is led to Cross browser and be shown to user.
Fig. 7 is refer to, Fig. 7 is the two of the step flow chart for the distributed data digging method that the present invention is provided.The side Method also includes step S120 and step S160.
Step S120, the pre-analysis operation that the response user of first server 100 clicks on, is carried out to the mining data information Preanalysis, and preanalysis result is shown to user by browser, to respond user according to the preanalysis result to institute State the operation that data field carries out discretization setting.
In the present embodiment, the preanalysis result includes:Total line number, average value and value, maximum, minimum value, pole Difference, variance, median, null value rate etc..First server 100 carries out preanalysis to obtain preanalysis knot to mining data information Really, user can check the different pieces of information information of preanalysis result to carry out discretization setting according to data type of a variable.For example, even Ideotype can be checked:The data messages such as total line number, average value, maximum, minimum value, discrete type can check every kind of field Quantity of type etc..
Step S160, the mining analysis model of 100 pairs of establishments of first server is managed.
In the present embodiment, analysis result is shown to user by first server 100 by browser, and responds user couple The subsequent operation that analysis result is carried out.The subsequent operation includes:Generate datagraphic, issue analysis result, model and algorithm Replacement, model refreshing and deletion etc..For example, user can be carried out to the figure and graphic parameter (size, color etc.) for needing to add Selection, first server 100 responds operation of the user to addition figure selecting, generates corresponding figure and is simultaneously presented to user.
In summary, the present invention provides a kind of distributed data digging method and system.The distributed data digging system System includes:First server and distributed type assemblies group, the distributed type assemblies group include multiple for carrying out model calculation and digging Dig the second server of analysis.First server obtains the mining data information that user selects in browser interface.First service Device is based on the mining data information and carries out variable-definition, to carry out model calculation.First server obtains user and browsed The model information of device interface selection, and mining data information and the model information of selection are sent to distributed type assemblies group. Distributed type assemblies group receives the mining data information of the first server transmission and the model information of selection, according to user The model of selection carries out model calculation and mining analysis.
Thus, by using the distributed structure/architecture treatment scale of data extending transversely, optimize and data model is determined Justice, it is not necessary to the client of highly-specialised, mitigates and the specialty of technical staff is required, reduce learning cost.
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims (10)

1. a kind of distributed data digging method, it is characterised in that methods described is applied to distributed data digging system, described Distributed data digging system includes:First server and distributed type assemblies group, the distributed type assemblies group include multiple be used for The second server of model calculation and mining analysis is carried out, methods described includes:
First server obtains the mining data information that user selects in browser interface;
First server is based on the mining data information and carries out variable-definition, to carry out model calculation;
First server obtains the model information that user selects in browser interface, and by described in mining data information and selection Model information is sent to distributed type assemblies group;
Distributed type assemblies group receives the mining data information of the first server transmission and the model information of selection, according to The model of user's selection carries out model calculation and mining analysis.
2. according to the method described in claim 1, it is characterised in that the first server obtains user and selected in browser interface The step of mining data information selected, includes:
Obtain Exchange Service information and data field information that user carries out mining analysis the need for browser interface is selected;
Response user is associated the operation of setting to the Exchange Service information and data field information respectively, and obtains process Associate the Exchange Service information and data field information set;
When user needs to filter the data in Exchange Service, response user carries out the behaviour of filter condition setting to data Make, to be filtered to the data.
3. method according to claim 2, it is characterised in that the first server is entered based on the mining data information Row variable-definition, to include the step of carrying out model calculation:
Response user is to needing the data field into model calculation to carry out the operation of parameter configuration;
The data field for carrying out model calculation will be needed to carry out conversion definition, the change for carrying out computing can be brought in model into by being converted to Amount.
4. method according to claim 3, it is characterised in that the first server responds user to needing to enter model The step of data field of computing carries out the operation of parameter configuration includes:
Data field information based on acquisition carries out field type definition, and according to the variable number of data field type allocating default According to type;
The operation that user carries out discretization setting to different types of data field is responded, to be divided the data field Class, wherein, the discretization, which is set, to be included:Subsection setup, classification are set and label is set;
The operation that user carries out type selecting to interpolation data is responded, when there is missing data in data field, server base The interpolation data type selected in user carries out interpolation value to missing data.
5. method according to claim 4, it is characterised in that the first server obtains user and selected in browser interface The model information selected, and wrap the step of the model information of mining data information and selection is sent into distributed type assemblies group Include:
Obtain the model information that user selects in browser interface;
The model information selected based on user configures corresponding mining algorithm, and is shown to user by browser;
For the analytical parameters of every kind of model allocating default, and user is shown to by browser;
The operation that response user is modified to the mining algorithm and analytical parameters;
The model information of mining data information and selection is sent to multiple second servers of the distributed type assemblies group, Wherein, the mining data information includes:Mining algorithm, analytical parameters and the variable by conversion definition.
6. method according to claim 5, it is characterised in that the distributed type assemblies group receives the first server hair The mining data information and the model information of selection sent, model calculation and mining analysis are carried out according to the model that user selects The step of include:
Multiple second servers receive the mining data information of the first server transmission and the model information of selection;
Multiple second servers carry out computing based on the model that the mining data information is selected user, and operation result is carried out Mining analysis;
Multiple second servers are stored to the data result and model result by computing and analysis.
7. method according to claim 4, it is characterised in that be based on the mining data information in the first server Before the step of carrying out variable-definition, methods described also includes:
The pre-analysis operation that first server response user clicks on, carries out preanalysis, and will divide in advance to the mining data information Analysis result user is shown to by browser, so as to respond user according to the preanalysis result data field is carried out from The operation that dispersion is set.
8. method according to claim 7, it is characterised in that methods described also includes:
First server is managed to the mining analysis model of establishment, mining analysis model of the first server to establishment The step of being managed includes:
Analysis result is shown to user by browser;
The subsequent operation that response user is carried out to analysis result, the subsequent operation includes:Generate datagraphic, issue analysis knot Really, model and algorithm are reset, model refreshes and deleted.
9. a kind of distributed data digging system, it is characterised in that the distributed data digging system includes:First server And distributed type assemblies group, the distributed type assemblies group includes multiple being used to carry out the second services of model calculation and mining analysis Device, wherein:
The first server, for obtaining the mining data information that user selects in browser interface;
The first server, is additionally operable to carry out variable-definition based on the mining data information, to carry out model calculation;
The first server, is additionally operable to obtain the model information that is selected in browser interface of user, and by mining data information And the model information of selection is sent to distributed type assemblies group;
The distributed type assemblies group, for receiving the mining data information of the first server transmission and the model of selection Information, model calculation and mining analysis are carried out according to the model that user selects.
10. system according to claim 9, it is characterised in that:
The first server, is additionally operable to respond the pre-analysis operation that user clicks on, the mining data information is divided in advance Analysis, and preanalysis result is shown to user by browser, to respond user according to the preanalysis result to the number The operation of discretization setting is carried out according to field;
The first server, is additionally operable to be managed the mining analysis model of establishment.
CN201710241931.4A 2017-04-14 2017-04-14 Distributed data digging method and system Pending CN107025288A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710241931.4A CN107025288A (en) 2017-04-14 2017-04-14 Distributed data digging method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710241931.4A CN107025288A (en) 2017-04-14 2017-04-14 Distributed data digging method and system

Publications (1)

Publication Number Publication Date
CN107025288A true CN107025288A (en) 2017-08-08

Family

ID=59526837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710241931.4A Pending CN107025288A (en) 2017-04-14 2017-04-14 Distributed data digging method and system

Country Status (1)

Country Link
CN (1) CN107025288A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943986A (en) * 2017-11-30 2018-04-20 睿视智觉(深圳)算法技术有限公司 A kind of big data analysis digging system
CN108108209A (en) * 2017-12-28 2018-06-01 中交水运规划设计院有限公司 Interworking architecture, exchange method and the storage medium of computation model
CN108171617A (en) * 2017-12-08 2018-06-15 全球能源互联网研究院有限公司 A kind of power grid big data analysis method and device
CN108256029A (en) * 2018-01-11 2018-07-06 北京神州泰岳软件股份有限公司 Statistical classification model training apparatus and training method
CN111061559A (en) * 2019-11-13 2020-04-24 成都安思科技有限公司 Distributed data mining and statistical method based on data deduplication
CN111221877A (en) * 2020-01-15 2020-06-02 成都深思科技有限公司 Multi-channel concurrent data packet mining and statistical method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1975720A (en) * 2006-12-27 2007-06-06 章毅 Data tapping system based on Wcb and control method thereof
CN101169798A (en) * 2007-12-06 2008-04-30 中国电信股份有限公司 Data excavation system and method
CN102567396A (en) * 2010-12-30 2012-07-11 中国移动通信集团公司 Method, system and device for data mining on basis of cloud computing
CN103838617A (en) * 2014-02-18 2014-06-04 河海大学 Method for constructing data mining platform in big data environment
US20150046459A1 (en) * 2010-04-15 2015-02-12 Microsoft Corporation Mining multilingual topics
CN106452899A (en) * 2016-10-27 2017-02-22 中国工商银行股份有限公司 Distributed data mining system and method
CN106484914A (en) * 2016-10-26 2017-03-08 国云科技股份有限公司 A kind of modular assembly method for quickly realizing data mining analysis
CN106528795A (en) * 2016-11-10 2017-03-22 中国农业银行股份有限公司 Data mining method and apparatus

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1975720A (en) * 2006-12-27 2007-06-06 章毅 Data tapping system based on Wcb and control method thereof
CN101169798A (en) * 2007-12-06 2008-04-30 中国电信股份有限公司 Data excavation system and method
US20150046459A1 (en) * 2010-04-15 2015-02-12 Microsoft Corporation Mining multilingual topics
CN102567396A (en) * 2010-12-30 2012-07-11 中国移动通信集团公司 Method, system and device for data mining on basis of cloud computing
CN103838617A (en) * 2014-02-18 2014-06-04 河海大学 Method for constructing data mining platform in big data environment
CN106484914A (en) * 2016-10-26 2017-03-08 国云科技股份有限公司 A kind of modular assembly method for quickly realizing data mining analysis
CN106452899A (en) * 2016-10-27 2017-02-22 中国工商银行股份有限公司 Distributed data mining system and method
CN106528795A (en) * 2016-11-10 2017-03-22 中国农业银行股份有限公司 Data mining method and apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王慧斌 等: "《信息系统集成与融合技术及其应用》", 30 April 2006, 国防工业出版社 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107943986A (en) * 2017-11-30 2018-04-20 睿视智觉(深圳)算法技术有限公司 A kind of big data analysis digging system
CN108171617A (en) * 2017-12-08 2018-06-15 全球能源互联网研究院有限公司 A kind of power grid big data analysis method and device
CN108108209A (en) * 2017-12-28 2018-06-01 中交水运规划设计院有限公司 Interworking architecture, exchange method and the storage medium of computation model
CN108256029A (en) * 2018-01-11 2018-07-06 北京神州泰岳软件股份有限公司 Statistical classification model training apparatus and training method
CN111061559A (en) * 2019-11-13 2020-04-24 成都安思科技有限公司 Distributed data mining and statistical method based on data deduplication
CN111221877A (en) * 2020-01-15 2020-06-02 成都深思科技有限公司 Multi-channel concurrent data packet mining and statistical method

Similar Documents

Publication Publication Date Title
CN107025288A (en) Distributed data digging method and system
CN106295983A (en) Power marketing data visualization statistical analysis technique and system
CN106100880A (en) A kind of cloud data resource is disposed and visual management method
CN107169628A (en) A kind of distribution network reliability evaluation method based on big data mutual information attribute reduction
Eldrandaly Exploring multi-criteria decision strategies in GIS with linguistic quantifiers: an extension of the analytical network process using ordered weighted averaging operators
CN103970872B (en) Multi-level data processing method based on service aperture
CN109656912A (en) Data model management-control method, device and server
DE202016009121U1 (en) Dashboard interface, platform and environment for matching subscribers with subscriber providers and presenting advanced subscription provider performance metrics
CN110162445A (en) The host health assessment method and device of Intrusion Detection based on host log and performance indicator
CN105930439A (en) System and method for managing multiple versions of power grid model
CN105354680B (en) GIS-based power grid section auxiliary determination analysis method
CN109740372A (en) Based on the system and method for realizing that conceptual schematic drawing paper is divided in automatic audit room on WEB
CN105869100A (en) Method for fusion and prediction of multi-field monitoring data of landslides based on big data thinking
CN108009940A (en) Same period line loss exception analysis method and system based on Tableau
CH698890B1 (en) Modeling a complex system.
EP3345342B1 (en) Determining a network topology of a hierarchical power supply network
CN108090854A (en) The Web-based instruction and resource-sharing management platform based on Information Environment
Hardyanto et al. Model development of management information system of internship
CN109063223A (en) The light weight method and device of BIM model and the processing method and system of BIM model
CN109034722A (en) High-speed railway touching net computing system, method and computer equipment
CN104133680A (en) Fast building method of ERP form module
CN101576981A (en) Scene-type service system
CN109472115A (en) Large-scale complex network modeling method and device based on geographic information
Ng et al. Relationships between interdependency, reliability, and vulnerability of infrastructure systems: Case study of biofuel infrastructure development
CN109102385A (en) Financial accounting service parameter management method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170808

RJ01 Rejection of invention patent application after publication