CN107103050A - A kind of big data Modeling Platform and method - Google Patents
A kind of big data Modeling Platform and method Download PDFInfo
- Publication number
- CN107103050A CN107103050A CN201710211258.XA CN201710211258A CN107103050A CN 107103050 A CN107103050 A CN 107103050A CN 201710211258 A CN201710211258 A CN 201710211258A CN 107103050 A CN107103050 A CN 107103050A
- Authority
- CN
- China
- Prior art keywords
- data
- submodule
- variable
- algorithm
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of big data Modeling Platform and method, platform include:Data assets module, for the upload of data source, by user data update to cloud platform by the way of uploading or automatically updating manually, user handles the data of oneself upload by way of pulling modeling by hand;Data cleansing module, the ETL processing for carrying out data to data source finds and corrected the mistake that can recognize that in data file, including check data consistency, processing invalid value and missing values;Data check module, for carrying out detection and basic statistical work to data;Algoritic module, is modeled to mass data using the classical classification of some in machine learning or clustering algorithm, is then predicted using model;Front end display module, is patterned and shows for the data to having treated or the data not handled.The present invention includes many-sided function such as structural data modeling, data exhibiting, the self-service Intellectual analysis of support, can towed data exhibiting and modeling.
Description
" technical field "
The invention belongs to the technical fields such as electronic information, big data, and in particular to collection, storage, analysis, the exhibition of big data
A kind of existing big data Modeling Platform and method.
" background technology "
With the fast development of internet, the data volume produced daily is very huge.Before the appearance of big data technology, pass
The data processing of system encounters many bottlenecks.Will in the case that data volume is very big firstly, for traditional database
Storage is caused to reach the upper limit, solution is to change the bigger hard disk of capacity, but the cost done so is very high.Next is exactly to count
Calculation machine can not quickly handle big data quantity, and bottleneck can be also run into data processing speed.
At present, using big data technology can solve the poor autgmentability that traditional information technology infrastructure is present, poor fault tolerance,
Performance is low, installation and deployment and many bottlenecks such as difficult in maintenance.Data are carried out using Hadoop HDFS distributed file systems
Storage, favorable expandability, fault-tolerance are high.Large-scale dataset (being more than 1TB) is counted parallel using Hadoop MapReduce
Calculate, improve calculating speed, performance is high.Realize that traditional database carries out the transmission of data with Hadoop using Sqoop components.But
Existing big data technology is less susceptible to for non-technical personnel using big data technology.
" content of the invention "
The present invention is intended to provide a kind of big data Modeling Platform and method, include structural data modeling, data exhibiting etc.
Many-sided function, support self-service Intellectual analysis, can towed data exhibiting and modeling etc., can be within the extremely short time
The management cockpit and extemporaneous query analysis decision-making platform that decision-making foundation is provided are produced for business decision layer.The mesh of the present invention
Realized by following technical scheme:
A kind of big data Modeling Platform, including:
Data assets module, for the upload of data source, by user data by the way of uploading or automatically updating manually
Cloud platform is updated, user handles the data of oneself upload by way of pulling modeling by hand;
Data cleansing module, the ETL processing for carrying out data to data source finds and corrected in data file can recognize that
Mistake, including check data consistency, processing invalid value and missing values;
Data check module, for carrying out detection and basic statistical work to data;
Algoritic module, is modeled, so using the classical classification of some in machine learning or clustering algorithm to mass data
It is predicted afterwards using model;
Front end display module, is patterned and shows for the data to having treated or the data not handled.
As specific technical scheme, the data assets module includes three kinds of upload data modes, local file uploads,
Bottom data is uploaded, database is uploaded, and wherein database, which is uploaded, supports tri- kinds of databases of MySql, Oracle, Sqlserver.
As specific technical scheme, the data cleansing module includes Sql processing submodule, sampling submodule, classification
Collects submodule, merging data submodule, deletion repeat submodule, data partition submodule, sorting sub-module, Data Discretization
Submodule, data normalization submodule, filtered variable submodule, transposition submodule, field reset submodule, missing values processing
Module, outlier processing submodule, lookup transform subblock, insertion variable submodule, weighting submodule, the balanced submodule of sample
Block, participle analyzing sub-module;Sql processing submodules are performed for direct editing Sql sentences, and sampling submodule is used to utilize
Different sampling modes is sampled processing to data, and Classifying Sum submodule is used for field variable content in table according to equal
Value, counting, summing mode are calculated, and generate respective labels variable column, wherein it is configurable to collect variable with calculating variable;Merge
Data submodule is used to add the data of two tables according to row record addition or row variable, and row please be kept during row record addition
Name variable is consistent, otherwise by newly-increased variable column;Delete and repeat submodule for the duplicate contents in selected variable to be deleted;Number
It is used for the quantity or ratio for specifying sample data in training center and test section according to subregion submodule;The change that sorting sub-module will be selected
Content is measured to arrange according to ascending order or descending;Data Discretization submodule is used for the variable column of selected continuous type, according to wide
Branch mailbox waits frequency division case method, carries out discretization and is classified;Data normalization submodule only supports numeric type by selected
Variable column carry out 0-1 standardization, as a result fall on [0,1] interval;Carry out Z standardization, data fit standardized normal distribution, average
For 0, standard deviation is 1;Filtered variable submodule is used to be deleted selected variable row;Transposition submodule is used for institute in data
It is ranks conversion that some row and columns, which carry out transposition,;Field, which resets submodule, to be used to rearrange the row variable position in data;
Missing values processing submodule is used to variable will have been selected to be empty row record deletion;Outlier processing submodule is used for according to exceptional value
Recognition rule is deleted exceptional value by setting ratio, and recognition rule includes standard deviation and quantile, i.e., certain apart from average
Data beyond the standard deviation or quantile of multiple are identified as abnormal data;Searching transform subblock is used in selected variable
Hold according to the lookup that imposes a condition, and replace with desired value;Insertion variable submodule is used to selected variable carrying out arithmetic, raw
Into new variables row, in algorithm frame, the title of variable column is manually entered, arithmetic expression is edited.Weighting submodule is used to become selected
Amount is weighted, in weight factor, input weight numerical value.Sample equilibrium submodule is used in selected numeric type variable
In row, according to the lookup target data that imposes a condition, and the input weight factor, target data weighting is handled;Participle parses submodule
Block is used for after the content of text parsing by selected participle field, according to entry generation row record after parsing.
As specific technical scheme, the data check module include data examination & verification submodule, Frequence Analysis submodule,
Descriptive statistic submodule;Sample index and detection that data examination & verification submodule is used in statistical analysis selected variable, index bag
Include virtual value, invalid value, null value and its accounting;Frequence Analysis submodule is used in selected variable, occurs to all the elements
The frequency is counted;Descriptive statistic submodule, which is used to arrange specifying variable, carries out average, mode, median, the statistics amounted to
Calculate.
As specific technical scheme, the algorithm model module includes Apriori algorithm submodule, Kmeans algorithms
Module, NB Algorithm submodule, logistic regression algorithm submodule, ridge regression algorithm submodule, LASSO algorithm submodules
Block, linear regression algorithm submodule;Apriori algorithm submodule is used to combine the associate field content statisticses frequency, by dimension word
Content in section carries out the probability calculation of two frequency collection, and draws analysis indexes such as support;Kmeans algorithm submodules are used for will
The data of word selection section are divided into n cluster, are configured to wherein clustering number, iterations, random count parameter, realize data
The function of convergence;Naive Bayesian submodule, logistic regression submodule, ridge regression algorithm submodule, LASSO algorithm submodules,
Linear regression algorithm submodule contributes to simulate sorting algorithm model to be predicted to new data.
As specific technical scheme, the front end display module by way of dilatory drag visual operating assembly come
Tested, data cleansing and Algorithm Analysis are carried out according to well-established business model, by result data collection by connecting
Front end display module in chart carry out visualizing multidimensional degree and show.
As specific technical scheme, the visualizing multidimensional degree show including:1st, with different data structures with not
Same subtype is intuitively showed;2nd, the form bored below carries out the displaying of multi-dimensional data;3rd, customized with node
Display form shows to being customized of special data;4th, exhibition preferably is carried out to data in the form of the linkage displaying of many figures
It is existing.
A kind of modeling method based on above-mentioned big data Modeling Platform, step is as follows:
First, it would be desirable to which the data source of processing is upload the data on platform using interpolation data source node in data assets
In case subsequently using;
Then, the need for according to business scenario, data are cleaned using the functional node in data cleansing module, such as
Handled using missing values and delete field in data for empty data line, carried out the field that business needs using filtered variable
Retain, a series of processing such as other field delete processing obtain the data of desired specific format;
Secondly, if the demand in business scenario not to algorithm, it is possible to utilize graphically entering for front end display module
Row final data show;Show pattern class in the selection of functional node in front end various, select difference according to demand
Figure carry out Data Representation;Algorithm is used if desired, it is necessary to add algorithm node;
Finally, respective nodes have been selected, have been showed data source nodes, data cleansing node, algorithm node or without, front end
Node is attached preservation, and logical whole flow process can just be run by clicking on operation, and final data is come out with graphic exhibition.
In summary, the present invention is very flexible in terms of whole flow chart of data processing, and user can be according to different need
Seek the corresponding workflow of completion;The stage is uploaded in data source, there are a variety of upload modes to provide selection;In the data cleansing stage, have
A variety of processing modes provide selection;The algorithm stage, similarly comprising many algorithms;In the data exhibiting stage, include a variety of figures;This
A variety of core main flow algorithms library out-of-the-boxs are supported in invention so that big data analysis is simplified and popular, and user understands pole
Few statistics and Data Mining knowledge, just easily can carry out data mining and modeling analysis using the platform to big data.
" brief description of the drawings "
Fig. 1 be big data Modeling Platform provided in an embodiment of the present invention in imported using Sqoop technologies from database
HDFS schematic diagram.
Fig. 2 is transmitting file on being carried out in big data Modeling Platform provided in an embodiment of the present invention in the form of data flow
Schematic diagram.
" embodiment "
The embodiment to the present invention is described further below in conjunction with the accompanying drawings:
The big data Modeling Platform that the present embodiment is provided includes:Data assets, data cleansing, data detection, algorithm model,
Front end shows.Each module is introduced in detail below:
Data assets module is used for the upload of data source, by the way of uploading or automatically updating manually by user data more
Cloud platform is newly arrived, user can handle the data of oneself upload by way of pulling modeling by hand.
Data cleansing module is used for the ETL processing that data are carried out to data source, finds and correct in data file can recognize that
Mistake, including check data consistency, handle invalid value and missing values etc..
Data check module is used to carry out detection and basic statistical work to data.
The purposes of algoritic module is that mass data is carried out using the classical classification of some in machine learning or clustering algorithm
Modeling, is then predicted using model.
Front end display module is used to the data treated or the data not handled are patterned and showed, and gives
The more intuitive form of expression of user.
Data assets module includes three kinds of upload data modes, and local file is uploaded, bottom data is uploaded, on database
Pass, wherein database, which is uploaded, supports tri- kinds of databases of MySql, Oracle, Sqlserver.
Data cleansing module includes Sql processing, sampling, Classifying Sum, merging data, deletion repetition, data partition, row
Sequence, Data Discretization, data normalization, filtered variable, transposition, field reset, missing values processing, search conversion, insertion variable,
The modules such as weighting, balanced, the participle parsing of sample.Sql is handled to be performed for direct editing Sql sentences.Sample for utilizing
Different sampling modes (N takes 1, random % etc.) are sampled processing to data.Classifying Sum is used in field variable in table
Hold and calculated according to average, counting (collecting), summation (total) mode, generate respective labels variable column, wherein collecting variable and meter
It is configurable to calculate variable.Merging data is used to add the data of two tables according to row record addition or row variable, row note
Record please keep row name variable consistent when additional, otherwise by newly-increased variable column.Deletion is recycled and reused for the repetition in selected variable
Content is deleted.Data partition is used for the quantity or ratio for specifying sample data in training center and test section.Sort selected change
Content is measured to arrange according to ascending order or descending.Data Discretization is used to, by the variable column of selected continuous type, according to wide branch mailbox (divide
Case width) or frequency division case method (branch mailbox quantity) is waited, carry out discretization and classified.Data normalization arranges selected variable
(only supporting numeric type) carries out 0-1 standardization (it is interval that result falls on [0,1]);Z standardization (data fit standard normal point
Cloth, average is 0, and 1) standard deviation is.Filtered variable is used to be deleted selected variable row.Transposition is used for will be all in data
It is ranks conversion that row and column, which carries out transposition,.Field is reset for the row variable position in data to be rearranged.Missing values processing
For variable will to have been selected to be empty row record deletion.Outlier, which is handled, to be used for exceptional value according to outlier identification rule by setting
Ratio is deleted.(recognition rule include standard deviation and quantile, i.e., apart from average certain multiple standard deviation or quantile with
Outer data are identified as abnormal data).Searching conversion is used for the content of selected variable according to the lookup that imposes a condition, and replaces with
Desired value.Insertion variable is used to selected variable carrying out arithmetic, generation new variables row (alias).In algorithm frame, manually
The title of input variable row, edits arithmetic expression.Weight for selected variable to be weighted, in weight factor, input
Weighted value.Sample equilibrium is used in selected numeric type variable column, according to the lookup target data that imposes a condition, and inputs power
Repeated factor, target data weighting is handled.Participle is parsed for after the content of text parsing by selected participle field, according to solution
Entry generation row record after analysis.
Data check module includes data examination & verification, frequency disribution, descriptive statistic module.Data are audited for statistical analysis
Sample index's (virtual value, invalid value, null value and its accounting) and detection in selected variable.Frequence Analysis is used in selected variable
In, the frequency that all the elements occur is counted.Descriptive statistic is used to carry out average, mode, middle position to specifying variable row
The statistics such as number, total are calculated.
Algorithm model module includes Apriori algorithm, Kmeans algorithms, NB Algorithm, logistic regression algorithm, ridge
The modules such as regression algorithm, LASSO algorithms, linear regression algorithm.Apriori algorithm is used to combine associate field content statisticses frequency
It is secondary, the content in dimension field is carried out to the probability calculation of two frequency collection, and draw analysis indexes such as support etc..Kmeans algorithms
For the data of the section of word selection to be divided into n cluster, wherein cluster number, iterations, random count parameter can be configured,
Realize the function of convergence of data.Naive Bayesian, logistic regression scheduling algorithm are all sorting algorithms, and basic ideas are similar, for mould
Sorting algorithm model is drawn up to be predicted new data.
Front end display module is the data analysis platform integrated with showing.User can be visual by way of dilatory drag
The operating assembly of change is tested so that engineer without machine learning background can also play data digging well by left-hand seat easily
Pick.Platform carries out data cleansing and Algorithm Analysis according to well-established business model, by result data collection by connecting
Chart in the display module of front end carries out showing for various dimensions.
Intuitively showed with different subtypes with different data structures.
The form bored below carries out the displaying of multi-dimensional data.
Customize display form to show being customized of special data with node.
Preferably data are showed in the form of the linkage displaying of many figures.
Data picture is converted into data query, each item data interaction linkage, display data under different dimensions index
In the tendency, ratio, relation of different angles, user's identification trend is helped, the knowledge and rule of data behind is found.Except original
The data exhibiting modes such as some cake charts, column diagram, thermal map, geography information figure, can also be by the color of image, brightness, big
The various ways such as small, shape, movement tendency are analyzed data in a series of figures, are helped user by interaction, are excavated
Association between data.And the upper brill test of data, multidimensional parallel parsing are supported, promote decision-making using data.
Visualization can provide the user a total general view, then by scaling and screening, for needed for people provide it more
Deep detailed information.Visual process is served when helping people to obtain more complete customer information using big data
Key effect.And crisscross relation is the important ring in numerous big data scenes, social networks is perhaps exactly most significant
Example, it is desirable to understand that big data information therein is extremely difficult by the form of text or form;On the contrary, visualization but can
It is enough that the trend and natural mode of these networks are showed relatively sharp.The relation between social network user is embodied in image
When, usually used is the method for visualizing based on cloud computing.Describe user node in social networks by correlation models
Hierarchical relationship, this method can intuitively show the social relationships of user.In addition, it can also be by the sea using cloud
Dupp software platform (Hadoop) is by visualization process parallelization, so that the big data for accelerating social networks is collected.
Big data visualization can be realized by a variety of methods, such as in multi-angle display data, focusing mass data
Dynamic change, and filter information (including dynamic inquiry screening, star chart displaying, and close-coupled) etc..It is following some can
Depending on change method analyzed and classified according to different data types (Large volume data, delta data and dynamic data)
's:
Tree-shaped schema:Space filling method for visualizing based on individual-layer data.
Circular filled type:The direct replacement of tree-shaped schema.It uses circle as original-shape, and can dividing from higher level
Introduce more circular in Rotating fields.
Rising sun type:Polar coordinate system is transformed on the basis of dendrogram visualization.Variable parameter therein is by wide and high change
Into radius and arc length.
Parallel coordinates formula:By visual analyzing, the multiple data factor in not homology theory township is expanded and come.
Steam schema:One kind of stack region figure, data are deployed around an axis, and with flowing and organic shape
State.
Recirculating network schema:Data around circular arrangement, and according to their own related sex rate by curve phase
Connect.Generally with different line widths or the correlation of color saturation measurement data object.
The main functional modules of big data Modeling Platform are described above, the present invention is while above-mentioned functions are disclosed, also
Whole flow process process is disclosed, step is as follows:
First, it would be desirable to which the data source of processing is upload the data on platform using interpolation data source node in data assets
In case subsequently using.
Then, the need for according to business scenario, data are cleaned using the functional node in data cleansing module, such as
Handled using missing values and delete field in data for empty data line, carried out the field that business needs using filtered variable
Retain, a series of processing such as other field delete processing obtain the data of desired specific format.
Secondly, if the demand in business scenario not to algorithm, it is possible to utilize graphically entering for front end display module
Row final data show.Show pattern class in the selection of functional node in front end various, select difference according to demand
Figure carry out Data Representation.It is such as naive Bayesian, linear if needing exist for using algorithm, it is necessary to add algorithm node
Return etc..
Finally, respective nodes have been selected, it is necessary to by data source nodes, data cleansing node, algorithm node or without, front end
Show node and be attached preservation, logical whole flow process can just be run by clicking on operation, and final data is come out with graphic exhibition.
The function and realization principle to each module are further described in detail below:
1. data assets
, can be with due to Modeling Platform data source disunity, it is necessary to which different data sources to be converted into unified data source
By relevant database, such as Oracle, Mysql, Sqlserver etc., the data of file format, such as txt, csv etc., also may be used
With on the basis of existing data source carry out processing form new data source, be converted into unified HIVE data sources, it is flat for modeling
Platform flow processing provides data source.
(1) be directed to relevant database, using Sqoop technical finesses, Sqoop be by a MapReduce operation from
A table is imported in database, this operation is extracted from table to be recorded line by line, is then written to HDFS, as shown in Figure 1.
Before importing starts, Sqoop checks the table that will be imported using JDBC.Retrieve row all in table and
The SQL data types of row.These SQL types (VARCHAR, INTEGER) be mapped to Java data types (String,
Integer etc.), it will preserve the value of field using these corresponding Java types in MapReduce applications.Sqoop generation
Code generator creates the class of corresponding table using these information, the record extracted for preserving from table.
(2) data of file format are directed to, upper transmitting file is carried out in the form of data flow, is carried out using Hadoop technologies
Processing, local file is uploaded to file on HDFS by MapReduce, then by file by specified table name and field name,
It is stored in Hive, as shown in Figure 2.
(3) existing data source is directed to, is handled using Hive technologies, Hive is used on the basis of legacy data source
Select sentences create new Hive tables, produce new data source, can also be directly using the data source existed.
2. data cleansing
In order to ensure that logical permanent big data Modeling Platform, for the requirement of data consistency, data cleansing work(is provided for this
Can, mainly include SQL processing, sampling, Classifying Sum, merging data collection, data partition, sequence, data discrete, data standard,
Filter scalar, transposition, field rearrangement, weighting, sample equilibrium etc..
(1) SQL processing is that new data source is created according to original data source using Hive select sentences.
(2) sampling is that original data source is sampled using Hive, produces new data source.
(3) Classifying Sum foundation collects variable and is grouped, and average is calculated using variable is calculated, collects, amount to.
(4) merging data collection is divided into row record addition and row variable is added, and row variable additional demand selection combining variable makes
Handled with Hive and produce data set.
(5) duplicate keys are deleted and filters the data for removing and repeating according to duplicate removal variable using Hive.
(6) data partition carries out data partition according to specified training sample.
(7) sort by processing variable is ranked up to data source.
(8) Data Discretization produces discrete data formation result set according to processing variable.
(9) data normalization is according to processing variable, and selection standard method produces data set.
(10) filtered variable deletes it according to variable is deleted from result set, produces new data source.
(11) transposition is by all row and column transposition, after transposition, and newly-generated row name naming rule is
transposition_1,transposition_2,……,transposition_15。
(12) field resets the order of specific field.
(13) missing values processing removes data according to processing variable, produces new result set.
(14) conversion is searched according to processing variable, if processing variable meets condition, is replaced.
(15) outlier is handled, and it is handled using exclusion pattern and recognition rule according to processing variable.
(16) the insertion variable variable new according to original row insertion.
(17) weighting adds weighted factor according to processing variable to it.
(18) sample is balanced according to processing variable, to its adding conditional, if meeting condition, according to its progress of factor pair
Conversion.
Data cleansing is mainly handled using Hive Sql technologies, and the undesirable data of removal, which are mainly, endless
Whole data, the data of mistake, the data repeated, can also be handled on the basis of legacy data.
3. data check
In order to meet requirement of the logical permanent big data Modeling Platform to data processing, data check function is provided for this, can
To provide the flexibility ratio of processing data, the data for not meeting index can be appointed as invalid data.Mainly audited including data,
Frequency disribution, descriptive statistic etc..
(1) data examination & verification is to use to handle data source by Hive select sentences, invalid value detection method
Be divided into two classes, field type detection and numerical value detection, index be effective sample, effective sample %, null value, null value %, invalid value,
Invalid value %, collects according to processing variable packet, produces percentage, and then produce result set.
(2) frequency disribution formula is handled data source by Hive select sentences, is grouped according to processing variable
And obtain total (count).
(3) descriptive statistic is that data source is handled using hive select sentences, first according to processing variable
Carry out processing and obtain mode, average, median, total, maximum, minimum value, scope, standard error of mean can also be obtained,
Percentile the and percentile_approx functions provided according to hive, obtain data statistics result quartile and
Five quantiles etc..
4. algorithm model
Logical perseverance big data Modeling Platform is a machine learning algorithm platform based on Distributed Calculation engine.User passes through
The dilatory visual operating assembly of mode dragged is tested so that the engineer without machine learning background can also be easily
Left-hand seat plays data mining well.Platform provide Apriori, K_means, naive Bayesian, logistic regression, ridge regression, LASSO,
The abundant machine language such as linear regression.
(1) algorithm model is mainly realized using Spark technologies, and the data set and training data of preparation are submitted into Spark
Cluster efficient process simultaneously obtains result set.
(2) algorithm is realized using Java voices, and the algorithm routine of realization is broken into jar bags first disposes respectively with platform,
And then Modeling Platform and the degree of coupling of algorithm are reduced, in Deployment Algorithm, do not interfere with the use of platform.
(3) implementing for algorithm is that task is submitted into the processing of Spark clusters, can by the Distributed Calculation of cluster
Fast and effectively to iterate to calculate.
The algorithm model module of big data Modeling Platform mainly make use of spark mllib api to be programmed realization,
Computing engines arithmetic speeds of the spark based on internal memory is fast, and many machine learning algorithms are included in spark mllib storehouses:
Apriori, kmeans, naive Bayesian, logistic regression, ridge regression, lasso scheduling algorithms, these algorithms are largely divided into two classes:Point
Class and cluster.Kmeans algorithms belong to cluster inside these algorithms, and above-named algorithm belongs to sorting algorithm, in code
In realization, there are different logics in two class problems, will illustrate skill that algorithm model module is related in terms of the two below
Art problem.
Clustering algorithm
Cluster, Cluster analysis are also translated into cluster class sometimes, and its core missions is:By one group of target
The object that object is divided between several clusters, each cluster is similar as far as possible, and the object between cluster and cluster is as far as possible
It is different.So-called clustering problem, is exactly to give an element set D, wherein each element has n observable attribute, uses
D is divided into k subset by certain algorithm, it is desirable to which distinctiveness ratio is as low as possible between the element of each intra-subset, and different subsets
Element distinctiveness ratio it is as high as possible.Wherein each subset is called a cluster.
Kmeans belongs to the iteration based on square error and reassigns clustering algorithm, its core concept very simple:
(1) K central point is randomly choosed.
(2) distance for arriving this K central point a little is calculated, the nearest central point of chosen distance is the cluster where it.
(3) center of K cluster is simply recalculated using arithmetic average (mean).
(4) repeat step 2 and 3, until cluster class is not changing or reaching greatest iteration value.
(5) output result.
The result quality of Kmeans algorithms is easily trapped into locally optimal solution, to K dependent on the selection to initial cluster center
The no criterion of selection of value can be followed, more sensitive to abnormal data, can only handle the data of numerical attribute, cluster structure may
It is uneven.
Kmeans algorithm flows and ins and outs are described below.Data source is obtained, interpolation data source is simultaneously in data assets
Data source is dragged in painting canvas.Then the node of connection data cleansing carries out necessary ETL processing to data source so that data source
It disclosure satisfy that calling for algorithm part.After the operation of data cleansing node, one can be deposited in the data warehouse hive of cluster
Data after this node is treated, are called for algorithm part.
It is described in detail in algorithm part.Data source nodes need to be connected kmeans after completing with data cleansing node
Algorithm node, kmeans algorithm nodes are dragged in painting canvas, are double-clicked minor node, can be ejected the configuration page, are wrapped in the configuration page
Contain:1. choose which row to run kmeans algorithms, because in actual business demand, can not necessarily use all row;
2. needing to configure cluster class number, refer to that current data source is thought finally to be polymerized to how many classes;3. maximum iteration, algorithm performs are needed
Want iteration how many times;4. random number of times;Click on and preserve after configuration is good, then click on operation and start configuration processor.
In the code of backstage, when program judges nodeType (node type) for K_Means, it can enter
In KmeansServiceImpl stepKmeans methods.Inside this method, obtain first in parameters such as configuration interfaces,
Set methods are performed to these parameters using KmeansInfo instance object, the parameter that kmeans algorithms need all is preserved
In KmeansInfo instance objects.Then toKmeansString methods are performed, the parameter character with space-separated is obtained
String.
Then, the formatTableData methods in DataRevert are performed, the effect of this method is to carry out feature to turn
Change, because unavoidable in data source have character string, and spark kmeans algorithms require that data are double types, so secondary
It is extremely important whether method runs succeeded for algorithm.
Perform algorithm jar bags here be spark yarn-client submission patterns, the benefit of this pattern is
Script need not be write, jar bags can be directly run.It can perform afterwards in Co-Insight-mllib.jar
KMeansInfo, parameter therein is incoming in web terminal, and main thinking is to obtain word selection segment data from the specified tables of hive,
Corresponding format conversion is carried out to data, the vector format of requirement is changed into.Generated using Kmeans.train api training datas
KmeansModel models, secondary step is the most important step of whole algorithm, and only generating model could be using model
Predict methods determine the cluster situation of data.Finally result is stored in hdfs, then hive builds table and reads hdfs numbers
According to.Finally show data, result hive tables and prediction data are merged into displaying, (cluster is calculated to this basic kmeans algorithm
Method) complete.
Sorting algorithm
What is sorting algorithmIn simple terms, exactly the object with some characteristics is sorted out and corresponds to a known class
Not Ji He in some classification on.For mathematical angle, it can be defined as follows:
Known collection:C={ y1, y2 .., yn } and I={ x1, x2 .., xm .. }, determines mapping ruler y=f (x), makes
Any xi ∈ I one and only one yj ∈ C causes yj=f (xi) to set up.
Wherein, C is category set, and I is object to be sorted, and f is then grader, and the main task of sorting algorithm is exactly structure
Make grader f.
The construction of sorting algorithm usually requires the set of a known class to be trained, and as a rule trains what is come
Sorting algorithm can not possibly reach 100% accuracy rate.The quality of grader often with training data, checking data, training data
The factors such as sample size are related.
For example, a stranger is seen in our daily lifes, the first thing feelings to be done are exactly to judge its sex,
The process for judging sex is exactly the process of a classification.According to the conventional experience of life, hair length, dress ornament and body are generally gone through
These three key elements of type are with regard to that can judge the sex of a people.Here " experience of life " be exactly one train on sex
The model of judgement, its training data is the panoramic people run into daily life.Have one day suddenly, ma's big gun is gone to
In face of you, close-fitting clothing are worn in long hair fluttering, but build but very man, and then you just feel uncertain, according to conventional warp
Test --- the model namely trained, it is impossible to judge the sex of this people.Then you have learned to judge by Adam's apple
Its sex, the quality that so your model is trained to is higher.But it is undeniable to be, occur that one you can not sentence forever
Disconnected property others.So model be unable to reach forever 100% it is accurate, only can infinitely be connect with being on the increase for training data
Nearly 100% it is accurate.
It is a difference in that the realization of spark mllib bottoms is different in sorting algorithm, between algorithms of different, is calling
In the case of api, simply the method parameter of training data can be somewhat different, and other programmed logics are substantially similar, here with simplicity
Described in detail exemplified by bayesian algorithm.
Naive Bayes Classification, Naive Bayes, you can also be its NB algorithm.Its core concept is very simple:For
A certain prediction term, calculates the probability that the prediction term is each classification respectively, and what then select probability was maximum is categorized as its prediction point
Class.Just look like that you predict that ma's big gun is that the possibility of woman is 40%, the possibility for being man is 41%, then can just be sentenced
Breaking, he is man.
NB Algorithm flow and ins and outs are described below.Sorting algorithm is different from clustering algorithm flow, secondary stream
Journey needs to obtain two data sources, and a data source carries label column as training data, and another data source label is classified as
Sky is used as prediction data.Two data sources require that field name is identical with type, next will utilize the conjunction in data cleansing
And data set, two data sources are merged in hive as follow-up processing in a table, it is normal herein directly to utilize row
Record addition.
ETL operations can be carried out by having merged data set, carried out cleaning treatment to data, be then dragged in the Piao in algorithm model
The algorithm node connection of plain Bayes, in the configuration page of naive Bayesian node, can select label column, which row conduct
Perform algorithm to use, alpha attributes, training data ratio.Preserve operation after configuration is good, web terminal execution logic substantially with
The execution logic of kmeans algorithm web terminals is similar.
Algorithmic code is right when choosing training data in Co-Insight-mllib.jar NativeBayes
The hive tables merged before choose the data set that training dataset is predicted with needs so that whether label column is empty.Then utilize
NaiveBayes.train methods train NaiveBayesModel models, and forecast set is entered also with the predict of model
Row classification prediction, most result is stored in hive table at last, and the front end for after shows.
Summarize, the technology that algoritic module is mainly utilized is that spark mllib api is called, the reusability in algorithm realization
By force, development rate is fast, and training pattern efficiency high can be good at utilizing cluster resource, substantially meet the universal of algorithm.
5. front end shows
Logical perseverance big data Modeling Platform provides abundant instrument and showed, and platform is in time using form that is more lively, having had
Reveal and be hidden in fast changing and extraneous data behind business and see clearly.No matter in fields such as traffic, communications, by interactive real
When data visualization come help business personnel find, diagnosis traffic issues, increasingly become in big data solution to close weight
The ring wanted.Mainly include form displaying, block diagram, bar chart, line chart, scatter diagram, bubble diagram, worm hole, geographic distribution
Deng.
(1) form displaying shows data in table form.
(2) block diagram, bar chart, line chart, pie chart, area-graph, scatter diagram etc. are according to X-axis and Y-axis display data.
(3) circular chart is according to classified variable and collects variable, and calculation is quantity or summation.
(4) radar map is according to classified variable, reduced parameters 1 and reduced parameters 2, and calculation is summation, maximum, equal
Value.
Front end shows mainly to be showed using front end JQuery technologies.
The big data Modeling Platform of the application can allow non-technical personnel to require no knowledge about the situation of bottom big data technology
Under can easily use.Platform utilize Gooflow procedure technologies, only need to carry out to data source, data processing, algorithm,
The dragging connection of the nodes such as data exhibiting can just realize big data processing procedure.Big data Modeling Platform mainly utilizes Hive numbers
Data are stored according to warehouse, data processing section is directly realized using Hive sql sentences.When the huge situation of data volume can also
Reply, excellent performance well.Realized using Spark millib big data Modeling Platform algorithm part.Spark's is excellent
Point is that output result can be stored in internal memory in the middle of job, so as to no longer need to read and write HDFS, is calculated based on internal memory, operation effect
Rate is high.Spark machine learning storehouse includes algorithm wide variety, and classification, cluster scheduling algorithm disclosure satisfy that the demand of user.Create
One-stop data analysis flow, finishing service demand are realized in data source, data processing, algorithm, the connection of data exhibiting node.
Above example is only that abundant disclosure is not intended to limit the present invention, all based on creation purport of the invention, without creating
Property work equivalence techniques feature replacement, should be considered as the application exposure scope.
Claims (8)
1. a kind of big data Modeling Platform, it is characterised in that including:
Data assets module, for the upload of data source, by user data update by the way of uploading or automatically updating manually
To cloud platform, user handles the data of oneself upload by way of pulling modeling by hand;
Data cleansing module, the ETL processing for carrying out data to data source finds and corrected the mistake that can recognize that in data file
Miss, including check data consistency, processing invalid value and missing values;
Data check module, for carrying out detection and basic statistical work to data;
Algoritic module, is modeled, Ran Houli using the classical classification of some in machine learning or clustering algorithm to mass data
It is predicted with model;
Front end display module, is patterned and shows for the data to having treated or the data not handled.
2. big data Modeling Platform according to claim 1, it is characterised in that the data assets module is included on three kinds
Biography data mode, local file is uploaded, bottom data is uploaded, database is uploaded, wherein database upload support MySql,
Tri- kinds of databases of Oracle, Sqlserver.
3. big data Modeling Platform according to claim 2, it is characterised in that the data cleansing module is included at Sql
Manage submodule, sampling submodule, Classifying Sum submodule, merging data submodule, deletion repetition submodule, data partition submodule
Block, sorting sub-module, Data Discretization submodule, data normalization submodule, filtered variable submodule, transposition submodule, word
Section reset submodule, missing values processing submodule, outlier processing submodule, search transform subblock, insertion variable submodule,
Weight the balanced submodule of submodule, sample, participle analyzing sub-module;Sql handles submodule and carried out for direct editing Sql sentences
Perform, sampling submodule is used to be sampled data processing using different sampling modes, and Classifying Sum submodule is used for will
In table field variable content according to average, counting, summing mode calculate, generate respective labels variable column, wherein collect variable with
It is configurable to calculate variable;Merging data submodule is used to chase after the data of two tables according to row record addition or row variable
Plus, row name variable please be keep consistent during row record addition, otherwise by newly-increased variable column;Delete and repeat submodule for that will select
Duplicate contents in variable are deleted;Data partition submodule is used to specify the quantity of sample data or ratio in training center and test section
Example;Sorting sub-module arranges selected variant content according to ascending order or descending;Data Discretization submodule is used for will be selected
The variable column of continuous type, according to wide branch mailbox or waits frequency division case method, carries out discretization and is simultaneously classified;Data normalization submodule
The selected variable column for only supporting numeric type is carried out 0-1 standardization by block, as a result falls on [0,1] interval;Carry out Z standardization, number
According to standardized normal distribution is met, average is 0, and standard deviation is 1;Filtered variable submodule is used to be deleted selected variable row;
It is ranks conversion that transposition submodule, which is used to row and column all in data carrying out transposition,;Field, which resets submodule, to be used for data
In row variable position rearrange;Missing values processing submodule is used to variable will have been selected to be empty row record deletion;Outlier
Processing submodule is used to be deleted exceptional value by setting ratio according to outlier identification rule, and recognition rule includes standard deviation
And quantile, the i.e. data beyond the standard deviation or quantile of average certain multiple are identified as abnormal data;Search conversion
Submodule is used for the content of selected variable according to the lookup that imposes a condition, and replaces with desired value;Insertion variable submodule is used for
Selected variable is subjected to arithmetic, generation new variables row in algorithm frame, are manually entered the title of variable column, edit computing
Formula.Weighting submodule is used to selected variable being weighted, in weight factor, input weight numerical value.The balanced son of sample
Module is used in selected numeric type variable column, according to the lookup target data that imposes a condition, and the input weight factor, by target
Data weighting processing;Participle analyzing sub-module is used for after the content of text parsing by selected participle field, according to word after parsing
Bar generation row record.
4. big data Modeling Platform according to claim 3, it is characterised in that the data check module is examined including data
Nucleon module, Frequence Analysis submodule, descriptive statistic submodule;Data examination & verification submodule is used in statistical analysis selected variable
Sample index and detection, index include virtual value, invalid value, null value and its accounting;Frequence Analysis submodule is used for selected
In variable, the frequency that all the elements occur is counted;Descriptive statistic submodule be used for specifying variable row carry out average,
Mode, median, the statistics amounted to are calculated.
5. big data Modeling Platform according to claim 4, it is characterised in that the algorithm model module includes
Apriori algorithm submodule, Kmeans algorithm submodules, NB Algorithm submodule, logistic regression algorithm submodule, ridge
Regression algorithm submodule, LASSO algorithm submodules, linear regression algorithm submodule;Apriori algorithm submodule, which is used to combine, to close
Join the field contents statistics frequency, the content in dimension field is carried out to the probability calculation of two frequency collection, and draw analysis indexes such as branch
Degree of holding;Kmeans algorithm submodules be used for by word selection section data be divided into n cluster, to wherein cluster number, iterations, with
Machine count parameter is configured, and realizes the function of convergence of data;Naive Bayesian submodule, logistic regression submodule, ridge regression
Algorithm submodule, LASSO algorithm submodules, linear regression algorithm submodule contribute to simulate sorting algorithm model to new
Data are predicted.
6. big data Modeling Platform according to claim 5, it is characterised in that the front end display module is dragged by dilatory
Mode visual operating assembly tested, carry out data cleansing and algorithm point according to well-established business model
Analysis, showing for visualizing multidimensional degree is carried out by result data collection by the chart in the front end display module that connects.
7. big data Modeling Platform according to claim 6, it is characterised in that the visualizing multidimensional degree shows bag
Include:1st, intuitively showed with different subtypes with different data structures;2nd, the form bored below carries out various dimensions
The displaying of data;3rd, customize display form to show being customized of special data with node;4th, to scheme linkage exhibition more
The form shown preferably shows to data.
8. a kind of modeling method based on big data Modeling Platform described in claim 1, step is as follows:
First, it would be desirable to the data source of processing upload the data to using interpolation data source node in data assets on platform in case
Subsequently use;
Then, the need for according to business scenario, data are cleaned using the functional node in data cleansing module, such as utilized
Missing values processing deletes field in data for empty data line, is protected the field that business needs using filtered variable
Stay, a series of processing such as other field delete processing, obtain the data of desired specific format;
Secondly, if the demand in business scenario not to algorithm, it is possible to using front end display module graphical progress most
Whole data show;Show pattern class in the selection of functional node in front end various, select different figures according to demand
Shape carries out Data Representation;Algorithm is used if desired, it is necessary to add algorithm node;
Finally, respective nodes have been selected, by data source nodes, data cleansing node, algorithm node or without, front end have showed node
Preservation is attached, logical whole flow process can just be run by clicking on operation, and final data is come out with graphic exhibition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710211258.XA CN107103050A (en) | 2017-03-31 | 2017-03-31 | A kind of big data Modeling Platform and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710211258.XA CN107103050A (en) | 2017-03-31 | 2017-03-31 | A kind of big data Modeling Platform and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107103050A true CN107103050A (en) | 2017-08-29 |
Family
ID=59676193
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710211258.XA Pending CN107103050A (en) | 2017-03-31 | 2017-03-31 | A kind of big data Modeling Platform and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107103050A (en) |
Cited By (79)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107526832A (en) * | 2017-09-05 | 2017-12-29 | 江苏电力信息技术有限公司 | A kind of method for building the big data business model that technology is pulled based on the page |
CN107544450A (en) * | 2017-10-11 | 2018-01-05 | 齐鲁工业大学 | Process industry network model construction method and system based on data |
CN107609064A (en) * | 2017-08-30 | 2018-01-19 | 成都中建科联网络科技有限公司 | Rival's intelligent analysis method based on data mining |
CN107643970A (en) * | 2017-09-13 | 2018-01-30 | 曙光信息产业(北京)有限公司 | What thermal map configured shows method and shows system |
CN107679129A (en) * | 2017-09-21 | 2018-02-09 | 无线生活(杭州)信息科技有限公司 | A kind of big data processing method and processing device |
CN107844634A (en) * | 2017-09-30 | 2018-03-27 | 平安科技(深圳)有限公司 | Polynary universal model platform modeling method, electronic equipment and computer-readable recording medium |
CN107958268A (en) * | 2017-11-22 | 2018-04-24 | 用友金融信息技术股份有限公司 | The training method and device of a kind of data model |
CN108170770A (en) * | 2017-12-26 | 2018-06-15 | 山东联科云计算股份有限公司 | A kind of analyzing and training platform based on big data |
CN108229828A (en) * | 2018-01-04 | 2018-06-29 | 上海电气集团股份有限公司 | A kind of analysis system based on industrial data |
CN108334501A (en) * | 2018-03-21 | 2018-07-27 | 王欣 | Electronic document analysis system based on machine learning and method |
CN108415695A (en) * | 2018-01-25 | 2018-08-17 | 新智数字科技有限公司 | A kind of data processing method, device and equipment based on visualization component |
CN108447118A (en) * | 2018-03-20 | 2018-08-24 | 北京知道创宇信息技术有限公司 | Big data method for visualizing, device and the electronic equipment that 3D visions are presented |
CN108460087A (en) * | 2018-01-22 | 2018-08-28 | 北京邮电大学 | Heuristic high dimensional data visualization device and method |
CN108509485A (en) * | 2018-02-07 | 2018-09-07 | 深圳壹账通智能科技有限公司 | Preprocess method, device, computer equipment and the storage medium of data |
CN108595627A (en) * | 2018-04-23 | 2018-09-28 | 温州市鹿城区中津先进科技研究院 | A kind of self-service data analysis Modeling Platform |
CN108898426A (en) * | 2018-06-14 | 2018-11-27 | 上海米飞网络科技有限公司 | The visualization system and method for payment data processing classification |
CN108959480A (en) * | 2018-06-21 | 2018-12-07 | 江苏赛睿信息科技股份有限公司 | The method and device of stream data realization data visualization |
CN108981785A (en) * | 2018-06-19 | 2018-12-11 | 江苏高远智能科技有限公司 | A kind of intelligent Detection of coal breaker equipment safety |
CN109063964A (en) * | 2018-07-02 | 2018-12-21 | 浙江百先得服饰有限公司 | A kind of platform data processing system |
CN109241107A (en) * | 2018-08-03 | 2019-01-18 | 北京邮电大学 | Big data controlling device based on Hadoop |
CN109240163A (en) * | 2018-09-25 | 2019-01-18 | 南京信息工程大学 | Intelligent node and its control method for industrialization manufacture |
CN109255524A (en) * | 2018-08-16 | 2019-01-22 | 广西电网有限责任公司电力科学研究院 | A kind of measuring equipment data analyzing evaluation method and system |
CN109307811A (en) * | 2018-08-06 | 2019-02-05 | 国网浙江省电力有限公司宁波供电公司 | A kind of user's dedicated transformer electricity consumption monitoring method excavated based on big data |
CN109325541A (en) * | 2018-09-30 | 2019-02-12 | 北京字节跳动网络技术有限公司 | Method and apparatus for training pattern |
CN109376152A (en) * | 2018-09-13 | 2019-02-22 | 广州帷策智能科技有限公司 | Big data system file data preparation method and system |
CN109389143A (en) * | 2018-06-19 | 2019-02-26 | 北京九章云极科技有限公司 | A kind of Data Analysis Services system and method for automatic modeling |
CN109558398A (en) * | 2018-10-31 | 2019-04-02 | 平安医疗健康管理股份有限公司 | Data cleaning method and relevant apparatus based on big data |
CN109558395A (en) * | 2018-10-17 | 2019-04-02 | 中国光大银行股份有限公司 | Data processing system and data digging method |
WO2019062444A1 (en) * | 2017-09-26 | 2019-04-04 | 深圳市宇数科技有限公司 | Data exploring and discovering method and system, electronic device and storage medium |
CN109635026A (en) * | 2018-11-29 | 2019-04-16 | 宝晟(广州)生物信息技术有限公司 | A kind of biological sample bank data distributing nodes sharing method, system and device |
CN109636482A (en) * | 2018-12-21 | 2019-04-16 | 苏宁易购集团股份有限公司 | Data processing method and system based on similarity model |
CN109634941A (en) * | 2018-11-14 | 2019-04-16 | 金色熊猫有限公司 | Medical data processing method, device, electronic equipment and storage medium |
CN109657803A (en) * | 2018-03-23 | 2019-04-19 | 新华三大数据技术有限公司 | The building of machine learning model |
CN109783859A (en) * | 2018-12-13 | 2019-05-21 | 重庆金融资产交易所有限责任公司 | Model building method, device and computer readable storage medium |
CN109800277A (en) * | 2018-12-18 | 2019-05-24 | 合肥天源迪科信息技术有限公司 | A kind of machine learning platform and the data model optimization method based on the platform |
CN109947826A (en) * | 2019-03-29 | 2019-06-28 | 山东浪潮云信息技术有限公司 | A method of with big data technology building region portrait analysis model |
CN110007989A (en) * | 2018-12-13 | 2019-07-12 | 国网信通亿力科技有限责任公司 | Data visualization platform system |
CN110175191A (en) * | 2019-05-14 | 2019-08-27 | 复旦大学 | Data filtering rule modeling method in data analysis |
CN110188887A (en) * | 2018-09-26 | 2019-08-30 | 第四范式(北京)技术有限公司 | The data managing method and device of Machine oriented study |
CN110245875A (en) * | 2019-06-21 | 2019-09-17 | 深圳前海微众银行股份有限公司 | Risk of fraud appraisal procedure, device, equipment and storage medium |
CN110363321A (en) * | 2018-03-26 | 2019-10-22 | 吕纪竹 | A kind of method of real-time prediction big data variation tendency |
CN110362605A (en) * | 2019-06-04 | 2019-10-22 | 苏州神州数码捷通科技有限公司 | A kind of E book data verification method based on big data |
CN110378569A (en) * | 2019-06-19 | 2019-10-25 | 平安国际智慧城市科技股份有限公司 | Industrial relations chain building method, apparatus, equipment and storage medium |
WO2019204975A1 (en) * | 2018-04-24 | 2019-10-31 | 深圳职业技术学院 | Multiparty quantum summation method and system |
CN110442620A (en) * | 2019-08-05 | 2019-11-12 | 赵玉德 | A kind of big data is explored and cognitive approach, device, equipment and computer storage medium |
CN110502509A (en) * | 2019-08-27 | 2019-11-26 | 广东工业大学 | A kind of traffic big data cleaning method and relevant apparatus based on Hadoop Yu Spark frame |
CN110727670A (en) * | 2019-10-11 | 2020-01-24 | 集奥聚合(北京)人工智能科技有限公司 | Data structure prediction transfer and automatic data processing method based on flow chart |
CN110850824A (en) * | 2019-11-12 | 2020-02-28 | 北京矿冶科技集团有限公司 | Implementation method for acquiring data of distributed control system to Hadoop platform |
CN110909039A (en) * | 2019-10-25 | 2020-03-24 | 北京华如科技股份有限公司 | Big data mining tool and method based on drag type process |
CN110908573A (en) * | 2019-12-03 | 2020-03-24 | 北京明略软件系统有限公司 | Algorithm model training method, device, equipment and storage medium |
CN110928922A (en) * | 2019-11-27 | 2020-03-27 | 开普云信息科技股份有限公司 | Public policy analysis model deployment method and system based on big data mining |
CN110990384A (en) * | 2019-11-04 | 2020-04-10 | 武汉中卫慧通科技有限公司 | Big data platform BI analysis method |
CN111080170A (en) * | 2019-12-30 | 2020-04-28 | 北京云享智胜科技有限公司 | Workflow modeling method and device, electronic equipment and storage medium |
CN111125052A (en) * | 2019-10-25 | 2020-05-08 | 北京华如科技股份有限公司 | Big data intelligent modeling system and method based on dynamic metadata |
CN111177200A (en) * | 2019-12-31 | 2020-05-19 | 北京九章云极科技有限公司 | Data processing system and method |
CN111177220A (en) * | 2019-12-26 | 2020-05-19 | 中国平安财产保险股份有限公司 | Data analysis method, device and equipment based on big data and readable storage medium |
CN111222833A (en) * | 2018-11-27 | 2020-06-02 | 中云开源数据技术(上海)有限公司 | Algorithm configuration combination platform based on data lake server |
CN111367969A (en) * | 2020-03-19 | 2020-07-03 | 北京三维天地科技股份有限公司 | Data mining method and system |
CN111399838A (en) * | 2020-06-04 | 2020-07-10 | 成都四方伟业软件股份有限公司 | Data modeling method and device based on spark SQ L and materialized view |
CN111537982A (en) * | 2020-05-08 | 2020-08-14 | 东南大学 | Distortion drag array line spectrum feature enhancement method and system |
CN111538494A (en) * | 2020-07-09 | 2020-08-14 | 南京红松信息技术有限公司 | Big data automatic modeling and verification engine system and method |
CN111654853A (en) * | 2020-08-04 | 2020-09-11 | 索信达(北京)数据技术有限公司 | Data analysis method based on user information |
CN111756600A (en) * | 2020-06-24 | 2020-10-09 | 厦门长江电子科技有限公司 | Multi-communication system and method for realizing multiple switch test machines |
CN111931945A (en) * | 2020-07-31 | 2020-11-13 | 北京百度网讯科技有限公司 | Data processing method, device and equipment based on label engine and storage medium |
CN111949640A (en) * | 2020-08-04 | 2020-11-17 | 上海微亿智造科技有限公司 | Intelligent parameter adjusting method and system based on industrial big data |
CN112182333A (en) * | 2020-09-25 | 2021-01-05 | 山东亿云信息技术有限公司 | Talent space-time big data processing method and system based on random forest |
CN112214524A (en) * | 2020-08-27 | 2021-01-12 | 优学汇信息科技(广东)有限公司 | Data evaluation system and evaluation method based on deep data mining |
CN112308410A (en) * | 2020-10-30 | 2021-02-02 | 云南电网有限责任公司电力科学研究院 | Enterprise asset data management method based on asset classification |
CN112328216A (en) * | 2020-11-03 | 2021-02-05 | 成都中科大旗软件股份有限公司 | Method, system, computer device and storage medium for developing data based on canvas nodes |
CN112506930A (en) * | 2020-12-15 | 2021-03-16 | 北京三维天地科技股份有限公司 | Data insight platform based on machine learning technology |
WO2021047506A1 (en) * | 2019-09-11 | 2021-03-18 | 中兴通讯股份有限公司 | System and method for statistical analysis of data, and computer-readable storage medium |
CN112667735A (en) * | 2020-12-23 | 2021-04-16 | 武汉烽火众智数字技术有限责任公司 | Visualization model establishing and analyzing system and method based on big data |
CN112685380A (en) * | 2020-12-03 | 2021-04-20 | 成都大数据产业技术研究院有限公司 | Big data value discovery and application innovation platform system |
CN113220566A (en) * | 2021-04-26 | 2021-08-06 | 深圳市云网万店科技有限公司 | Interface performance test script generation method and device and computer equipment |
CN113468187A (en) * | 2021-09-02 | 2021-10-01 | 太平金融科技服务(上海)有限公司深圳分公司 | Multi-party data integration method and device, computer equipment and storage medium |
CN114205164A (en) * | 2021-12-16 | 2022-03-18 | 北京百度网讯科技有限公司 | Traffic classification method and device, training method and device, equipment and medium |
CN114254588A (en) * | 2021-12-16 | 2022-03-29 | 马上消费金融股份有限公司 | Data tag processing method and device |
CN115345461A (en) * | 2022-08-08 | 2022-11-15 | 航天神舟智慧系统技术有限公司 | Police service efficiency evaluation method and device based on data modeling |
CN115357657A (en) * | 2022-10-24 | 2022-11-18 | 成都数联云算科技有限公司 | Data processing method and device, computer equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070203939A1 (en) * | 2003-07-31 | 2007-08-30 | Mcardle James M | Alert Flags for Data Cleaning and Data Analysis |
CN102201037A (en) * | 2011-06-14 | 2011-09-28 | 中国农业大学 | Agricultural disaster forecast method |
CN105956015A (en) * | 2016-04-22 | 2016-09-21 | 四川中软科技有限公司 | Service platform integration method based on big data |
CN106022477A (en) * | 2016-05-18 | 2016-10-12 | 国网信通亿力科技有限责任公司 | Intelligent analysis decision system and method |
-
2017
- 2017-03-31 CN CN201710211258.XA patent/CN107103050A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070203939A1 (en) * | 2003-07-31 | 2007-08-30 | Mcardle James M | Alert Flags for Data Cleaning and Data Analysis |
CN102201037A (en) * | 2011-06-14 | 2011-09-28 | 中国农业大学 | Agricultural disaster forecast method |
CN105956015A (en) * | 2016-04-22 | 2016-09-21 | 四川中软科技有限公司 | Service platform integration method based on big data |
CN106022477A (en) * | 2016-05-18 | 2016-10-12 | 国网信通亿力科技有限责任公司 | Intelligent analysis decision system and method |
Non-Patent Citations (1)
Title |
---|
关大伟: "数据挖掘中的数据预处理", 《中国优秀博硕士学位论文全文数据库 (硕士) 信息科技辑》 * |
Cited By (103)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107609064A (en) * | 2017-08-30 | 2018-01-19 | 成都中建科联网络科技有限公司 | Rival's intelligent analysis method based on data mining |
CN107526832A (en) * | 2017-09-05 | 2017-12-29 | 江苏电力信息技术有限公司 | A kind of method for building the big data business model that technology is pulled based on the page |
CN107643970A (en) * | 2017-09-13 | 2018-01-30 | 曙光信息产业(北京)有限公司 | What thermal map configured shows method and shows system |
CN107679129A (en) * | 2017-09-21 | 2018-02-09 | 无线生活(杭州)信息科技有限公司 | A kind of big data processing method and processing device |
WO2019062444A1 (en) * | 2017-09-26 | 2019-04-04 | 深圳市宇数科技有限公司 | Data exploring and discovering method and system, electronic device and storage medium |
CN107844634A (en) * | 2017-09-30 | 2018-03-27 | 平安科技(深圳)有限公司 | Polynary universal model platform modeling method, electronic equipment and computer-readable recording medium |
CN107544450A (en) * | 2017-10-11 | 2018-01-05 | 齐鲁工业大学 | Process industry network model construction method and system based on data |
CN107544450B (en) * | 2017-10-11 | 2019-06-21 | 齐鲁工业大学 | Process industry network model construction method and system based on data |
CN107958268A (en) * | 2017-11-22 | 2018-04-24 | 用友金融信息技术股份有限公司 | The training method and device of a kind of data model |
CN108170770A (en) * | 2017-12-26 | 2018-06-15 | 山东联科云计算股份有限公司 | A kind of analyzing and training platform based on big data |
CN108229828A (en) * | 2018-01-04 | 2018-06-29 | 上海电气集团股份有限公司 | A kind of analysis system based on industrial data |
CN108460087A (en) * | 2018-01-22 | 2018-08-28 | 北京邮电大学 | Heuristic high dimensional data visualization device and method |
CN108415695A (en) * | 2018-01-25 | 2018-08-17 | 新智数字科技有限公司 | A kind of data processing method, device and equipment based on visualization component |
CN108509485A (en) * | 2018-02-07 | 2018-09-07 | 深圳壹账通智能科技有限公司 | Preprocess method, device, computer equipment and the storage medium of data |
CN108447118A (en) * | 2018-03-20 | 2018-08-24 | 北京知道创宇信息技术有限公司 | Big data method for visualizing, device and the electronic equipment that 3D visions are presented |
CN108334501A (en) * | 2018-03-21 | 2018-07-27 | 王欣 | Electronic document analysis system based on machine learning and method |
CN108334501B (en) * | 2018-03-21 | 2021-07-20 | 王欣 | Electronic document analysis system and method based on machine learning |
CN109657803B (en) * | 2018-03-23 | 2020-04-03 | 新华三大数据技术有限公司 | Construction of machine learning models |
CN109657803A (en) * | 2018-03-23 | 2019-04-19 | 新华三大数据技术有限公司 | The building of machine learning model |
CN110363321A (en) * | 2018-03-26 | 2019-10-22 | 吕纪竹 | A kind of method of real-time prediction big data variation tendency |
CN110363321B (en) * | 2018-03-26 | 2024-04-19 | 吕纪竹 | Method for predicting big data change trend in real time |
CN108595627A (en) * | 2018-04-23 | 2018-09-28 | 温州市鹿城区中津先进科技研究院 | A kind of self-service data analysis Modeling Platform |
WO2019204975A1 (en) * | 2018-04-24 | 2019-10-31 | 深圳职业技术学院 | Multiparty quantum summation method and system |
CN108898426A (en) * | 2018-06-14 | 2018-11-27 | 上海米飞网络科技有限公司 | The visualization system and method for payment data processing classification |
CN113935434A (en) * | 2018-06-19 | 2022-01-14 | 北京九章云极科技有限公司 | Data analysis processing system and automatic modeling method |
CN108981785A (en) * | 2018-06-19 | 2018-12-11 | 江苏高远智能科技有限公司 | A kind of intelligent Detection of coal breaker equipment safety |
CN109389143A (en) * | 2018-06-19 | 2019-02-26 | 北京九章云极科技有限公司 | A kind of Data Analysis Services system and method for automatic modeling |
CN108959480B (en) * | 2018-06-21 | 2020-07-14 | 江苏赛睿信息科技股份有限公司 | Method and device for realizing data visualization of stream data |
CN108959480A (en) * | 2018-06-21 | 2018-12-07 | 江苏赛睿信息科技股份有限公司 | The method and device of stream data realization data visualization |
CN109063964A (en) * | 2018-07-02 | 2018-12-21 | 浙江百先得服饰有限公司 | A kind of platform data processing system |
CN109241107A (en) * | 2018-08-03 | 2019-01-18 | 北京邮电大学 | Big data controlling device based on Hadoop |
CN109307811A (en) * | 2018-08-06 | 2019-02-05 | 国网浙江省电力有限公司宁波供电公司 | A kind of user's dedicated transformer electricity consumption monitoring method excavated based on big data |
CN109255524A (en) * | 2018-08-16 | 2019-01-22 | 广西电网有限责任公司电力科学研究院 | A kind of measuring equipment data analyzing evaluation method and system |
CN109376152A (en) * | 2018-09-13 | 2019-02-22 | 广州帷策智能科技有限公司 | Big data system file data preparation method and system |
CN109240163A (en) * | 2018-09-25 | 2019-01-18 | 南京信息工程大学 | Intelligent node and its control method for industrialization manufacture |
CN109240163B (en) * | 2018-09-25 | 2024-01-02 | 南京信息工程大学 | Intelligent node for industrial manufacturing and control method thereof |
CN110188887B (en) * | 2018-09-26 | 2022-11-08 | 第四范式(北京)技术有限公司 | Data management method and device for machine learning |
CN110188887A (en) * | 2018-09-26 | 2019-08-30 | 第四范式(北京)技术有限公司 | The data managing method and device of Machine oriented study |
CN109325541A (en) * | 2018-09-30 | 2019-02-12 | 北京字节跳动网络技术有限公司 | Method and apparatus for training pattern |
CN109558395A (en) * | 2018-10-17 | 2019-04-02 | 中国光大银行股份有限公司 | Data processing system and data digging method |
CN109558398B (en) * | 2018-10-31 | 2023-09-19 | 深圳平安医疗健康科技服务有限公司 | Data cleaning method based on big data and related device |
CN109558398A (en) * | 2018-10-31 | 2019-04-02 | 平安医疗健康管理股份有限公司 | Data cleaning method and relevant apparatus based on big data |
CN109634941A (en) * | 2018-11-14 | 2019-04-16 | 金色熊猫有限公司 | Medical data processing method, device, electronic equipment and storage medium |
CN111222833A (en) * | 2018-11-27 | 2020-06-02 | 中云开源数据技术(上海)有限公司 | Algorithm configuration combination platform based on data lake server |
CN109635026A (en) * | 2018-11-29 | 2019-04-16 | 宝晟(广州)生物信息技术有限公司 | A kind of biological sample bank data distributing nodes sharing method, system and device |
CN110007989A (en) * | 2018-12-13 | 2019-07-12 | 国网信通亿力科技有限责任公司 | Data visualization platform system |
CN109783859A (en) * | 2018-12-13 | 2019-05-21 | 重庆金融资产交易所有限责任公司 | Model building method, device and computer readable storage medium |
CN109800277A (en) * | 2018-12-18 | 2019-05-24 | 合肥天源迪科信息技术有限公司 | A kind of machine learning platform and the data model optimization method based on the platform |
CN109636482A (en) * | 2018-12-21 | 2019-04-16 | 苏宁易购集团股份有限公司 | Data processing method and system based on similarity model |
CN109636482B (en) * | 2018-12-21 | 2021-07-27 | 南京星云数字技术有限公司 | Data processing method and system based on similarity model |
CN109947826A (en) * | 2019-03-29 | 2019-06-28 | 山东浪潮云信息技术有限公司 | A method of with big data technology building region portrait analysis model |
CN110175191A (en) * | 2019-05-14 | 2019-08-27 | 复旦大学 | Data filtering rule modeling method in data analysis |
CN110362605A (en) * | 2019-06-04 | 2019-10-22 | 苏州神州数码捷通科技有限公司 | A kind of E book data verification method based on big data |
CN110378569A (en) * | 2019-06-19 | 2019-10-25 | 平安国际智慧城市科技股份有限公司 | Industrial relations chain building method, apparatus, equipment and storage medium |
CN110245875A (en) * | 2019-06-21 | 2019-09-17 | 深圳前海微众银行股份有限公司 | Risk of fraud appraisal procedure, device, equipment and storage medium |
CN110442620B (en) * | 2019-08-05 | 2023-08-29 | 赵玉德 | Big data exploration and cognition method, device, equipment and computer storage medium |
CN110442620A (en) * | 2019-08-05 | 2019-11-12 | 赵玉德 | A kind of big data is explored and cognitive approach, device, equipment and computer storage medium |
CN110502509A (en) * | 2019-08-27 | 2019-11-26 | 广东工业大学 | A kind of traffic big data cleaning method and relevant apparatus based on Hadoop Yu Spark frame |
CN110502509B (en) * | 2019-08-27 | 2023-04-18 | 广东工业大学 | Traffic big data cleaning method based on Hadoop and Spark framework and related device |
WO2021047506A1 (en) * | 2019-09-11 | 2021-03-18 | 中兴通讯股份有限公司 | System and method for statistical analysis of data, and computer-readable storage medium |
CN110727670A (en) * | 2019-10-11 | 2020-01-24 | 集奥聚合(北京)人工智能科技有限公司 | Data structure prediction transfer and automatic data processing method based on flow chart |
CN110727670B (en) * | 2019-10-11 | 2022-08-09 | 北京小向创新人工智能科技有限公司 | Data structure prediction transfer and automatic data processing method based on flow chart |
CN110909039A (en) * | 2019-10-25 | 2020-03-24 | 北京华如科技股份有限公司 | Big data mining tool and method based on drag type process |
CN111125052A (en) * | 2019-10-25 | 2020-05-08 | 北京华如科技股份有限公司 | Big data intelligent modeling system and method based on dynamic metadata |
CN110990384B (en) * | 2019-11-04 | 2023-08-22 | 武汉中卫慧通科技有限公司 | Big data platform BI analysis method |
CN110990384A (en) * | 2019-11-04 | 2020-04-10 | 武汉中卫慧通科技有限公司 | Big data platform BI analysis method |
CN110850824A (en) * | 2019-11-12 | 2020-02-28 | 北京矿冶科技集团有限公司 | Implementation method for acquiring data of distributed control system to Hadoop platform |
CN110928922A (en) * | 2019-11-27 | 2020-03-27 | 开普云信息科技股份有限公司 | Public policy analysis model deployment method and system based on big data mining |
CN110928922B (en) * | 2019-11-27 | 2020-07-24 | 开普云信息科技股份有限公司 | Public policy analysis model deployment method and system based on big data mining |
CN110908573B (en) * | 2019-12-03 | 2021-07-06 | 北京明略软件系统有限公司 | Algorithm model training method, device, equipment and storage medium |
CN110908573A (en) * | 2019-12-03 | 2020-03-24 | 北京明略软件系统有限公司 | Algorithm model training method, device, equipment and storage medium |
CN111177220A (en) * | 2019-12-26 | 2020-05-19 | 中国平安财产保险股份有限公司 | Data analysis method, device and equipment based on big data and readable storage medium |
CN111080170A (en) * | 2019-12-30 | 2020-04-28 | 北京云享智胜科技有限公司 | Workflow modeling method and device, electronic equipment and storage medium |
CN111080170B (en) * | 2019-12-30 | 2023-09-05 | 北京云享智胜科技有限公司 | Workflow modeling method and device, electronic equipment and storage medium |
CN111177200A (en) * | 2019-12-31 | 2020-05-19 | 北京九章云极科技有限公司 | Data processing system and method |
CN111177200B (en) * | 2019-12-31 | 2021-05-11 | 北京九章云极科技有限公司 | Data processing system and method |
CN111367969B (en) * | 2020-03-19 | 2020-12-01 | 北京三维天地科技股份有限公司 | Data mining method and system |
CN111367969A (en) * | 2020-03-19 | 2020-07-03 | 北京三维天地科技股份有限公司 | Data mining method and system |
CN111537982A (en) * | 2020-05-08 | 2020-08-14 | 东南大学 | Distortion drag array line spectrum feature enhancement method and system |
CN111537982B (en) * | 2020-05-08 | 2022-04-12 | 东南大学 | Distortion drag array line spectrum feature enhancement method and system |
CN111399838A (en) * | 2020-06-04 | 2020-07-10 | 成都四方伟业软件股份有限公司 | Data modeling method and device based on spark SQ L and materialized view |
CN111756600A (en) * | 2020-06-24 | 2020-10-09 | 厦门长江电子科技有限公司 | Multi-communication system and method for realizing multiple switch test machines |
CN111538494A (en) * | 2020-07-09 | 2020-08-14 | 南京红松信息技术有限公司 | Big data automatic modeling and verification engine system and method |
CN111931945A (en) * | 2020-07-31 | 2020-11-13 | 北京百度网讯科技有限公司 | Data processing method, device and equipment based on label engine and storage medium |
CN111654853A (en) * | 2020-08-04 | 2020-09-11 | 索信达(北京)数据技术有限公司 | Data analysis method based on user information |
CN111654853B (en) * | 2020-08-04 | 2020-11-10 | 索信达(北京)数据技术有限公司 | Data analysis method based on user information |
CN111949640A (en) * | 2020-08-04 | 2020-11-17 | 上海微亿智造科技有限公司 | Intelligent parameter adjusting method and system based on industrial big data |
CN112214524A (en) * | 2020-08-27 | 2021-01-12 | 优学汇信息科技(广东)有限公司 | Data evaluation system and evaluation method based on deep data mining |
CN112182333A (en) * | 2020-09-25 | 2021-01-05 | 山东亿云信息技术有限公司 | Talent space-time big data processing method and system based on random forest |
CN112308410A (en) * | 2020-10-30 | 2021-02-02 | 云南电网有限责任公司电力科学研究院 | Enterprise asset data management method based on asset classification |
CN112328216A (en) * | 2020-11-03 | 2021-02-05 | 成都中科大旗软件股份有限公司 | Method, system, computer device and storage medium for developing data based on canvas nodes |
CN112685380A (en) * | 2020-12-03 | 2021-04-20 | 成都大数据产业技术研究院有限公司 | Big data value discovery and application innovation platform system |
CN112506930A (en) * | 2020-12-15 | 2021-03-16 | 北京三维天地科技股份有限公司 | Data insight platform based on machine learning technology |
CN112667735A (en) * | 2020-12-23 | 2021-04-16 | 武汉烽火众智数字技术有限责任公司 | Visualization model establishing and analyzing system and method based on big data |
CN113220566A (en) * | 2021-04-26 | 2021-08-06 | 深圳市云网万店科技有限公司 | Interface performance test script generation method and device and computer equipment |
CN113468187A (en) * | 2021-09-02 | 2021-10-01 | 太平金融科技服务(上海)有限公司深圳分公司 | Multi-party data integration method and device, computer equipment and storage medium |
CN113468187B (en) * | 2021-09-02 | 2021-11-23 | 太平金融科技服务(上海)有限公司深圳分公司 | Multi-party data integration method and device, computer equipment and storage medium |
CN114254588A (en) * | 2021-12-16 | 2022-03-29 | 马上消费金融股份有限公司 | Data tag processing method and device |
CN114254588B (en) * | 2021-12-16 | 2023-10-13 | 马上消费金融股份有限公司 | Data tag processing method and device |
CN114205164A (en) * | 2021-12-16 | 2022-03-18 | 北京百度网讯科技有限公司 | Traffic classification method and device, training method and device, equipment and medium |
CN115345461A (en) * | 2022-08-08 | 2022-11-15 | 航天神舟智慧系统技术有限公司 | Police service efficiency evaluation method and device based on data modeling |
CN115357657B (en) * | 2022-10-24 | 2023-03-24 | 成都数联云算科技有限公司 | Data processing method and device, computer equipment and storage medium |
CN115357657A (en) * | 2022-10-24 | 2022-11-18 | 成都数联云算科技有限公司 | Data processing method and device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107103050A (en) | A kind of big data Modeling Platform and method | |
CN107193967A (en) | A kind of multi-source heterogeneous industry field big data handles full link solution | |
US20180196868A1 (en) | Multi-Dimensional Modeling in a Functional Information System | |
Wang et al. | Graphs in scientific visualization: A survey | |
CN108701254A (en) | System and method for the tracking of dynamic family, reconstruction and life cycle management | |
CN110008259A (en) | The method and terminal device of visualized data analysis | |
Dhaenens et al. | Metaheuristics for big data | |
San Martín et al. | Representing, querying and transforming social networks with RDF/SPARQL | |
CN111444348A (en) | Method, system and medium for constructing and applying knowledge graph architecture | |
Wang | Big Data Algebra (BDA): A Denotational Mathematical Structure for Big Data Science and Engineering | |
CN112667735A (en) | Visualization model establishing and analyzing system and method based on big data | |
CN110737805A (en) | Method and device for processing graph model data and terminal equipment | |
Wang et al. | Research on evaluation model of music education informatization system based on machine learning | |
Wang et al. | Association rules mining in parallel conditional tree based on grid computing inspired partition algorithm | |
Ledesma et al. | Educational tool for generation and analysis of multidimensional modeling on data warehouse | |
Agocs et al. | Interactive graph query language for multidimensional data in collaboration spotting visual analytics framework | |
Sayed et al. | A conceptual framework for using big data in Egyptian agriculture | |
Tsitseklis et al. | Scalable community detection for complex data graphs via hyperbolic network embedding and graph databases | |
Wang | Graph-based techniques for visual analytics of scientific data sets | |
Cao | Design and optimization of a decision support system for sports training based on data mining technology | |
Palivela et al. | Survey on mining techniques for breast cancer related data | |
Feng et al. | ASMaaS: Automatic Semantic Modeling as a Service | |
Ulhaq | Mapping System Model and Clustering of Fishery Products using K-Means Algorithm with Web GIS Approach | |
Kamakshaiah et al. | Prototype survey analysis of different information retrieval classification and grouping approaches for categorical information | |
Tiwari et al. | DBSCAN: An Assessment of Density Based Clustering and It’s Approaches |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170829 |
|
RJ01 | Rejection of invention patent application after publication |