CN105868206A - Algorithm flow based complicated multi-variable data processing method - Google Patents

Algorithm flow based complicated multi-variable data processing method Download PDF

Info

Publication number
CN105868206A
CN105868206A CN201510030208.2A CN201510030208A CN105868206A CN 105868206 A CN105868206 A CN 105868206A CN 201510030208 A CN201510030208 A CN 201510030208A CN 105868206 A CN105868206 A CN 105868206A
Authority
CN
China
Prior art keywords
data
algorithm
stream
data processing
algorithm stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510030208.2A
Other languages
Chinese (zh)
Inventor
曾仲大
陈爱明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Chemdatasolution Information Technology Co Ltd
Original Assignee
Dalian Chemdatasolution Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Chemdatasolution Information Technology Co Ltd filed Critical Dalian Chemdatasolution Information Technology Co Ltd
Priority to CN201510030208.2A priority Critical patent/CN105868206A/en
Publication of CN105868206A publication Critical patent/CN105868206A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses an algorithm flow based complicated data processing method, which is suitable for high-dimensional, high-throughput and high-complexity data analysis processing and information extraction mining, and belongs to the field of analytical chemometrics. According to the method, smart data analysis and information mining are realized by integrating and optimizing data processing processes, namely, quick, convenient, accurate and smart analysis of big data is realized by constructing a process optimization combination containing different data processing methods which include batch data loading, preprocessing, characteristic selection, model construction, unknown sample prediction and the like, setting method parameters and injecting to-be-analyzed data into algorithm flow (training set, calibration set, validation set, prediction set, etc.), quick, convenient, accurate and smart analysis of big data is realized. Particularly, the change of algorithm flow construction can realize one-key processing and multi-model processing of complicated data, influence of the data processing methods and the parameters on an analytic result, influence of the same data processing method (the algorithm flow) on processing of different data sets and the like, and smart optimization combination of personalized data and the data processing methods is truly achieved.

Description

A kind of amount data processing method complicated and changeable based on algorithm stream
Technical field
The present invention relates to amount data processing method complicated and changeable based on algorithm stream, belong to the Chemical Measurement field in analytical chemistry.Specifically To needing amount data complicated and changeable to be processed, it is loaded into data prediction from data, chooses model construction and extensive whole stream from key feature Cheng Jinhang integrates and optimizes, and creates the algorithm stream that amount data complicated and changeable process, it is achieved the quick wisdom of data processes.Actual complex high flux data Analysis, it is only necessary in algorithm stream, add target data can realize a key and process and the intelligentized data such as multi-model analysis process and information excavating.
Background technology
Amount data complicated and changeable process to be excavated with information extraction, strong depend-ence mathematics, statistics, artificial intelligence, chemistry and bioinformatics, and The application of chemometrics method and development, " the big data " of the most chemical and biological association area process, and need especially by means of calculating quickly Intelligence, result accurately and reliably, are adaptable to the basic algorithm (higher-dimension, high flux and high complexity) of " three high " data analysis, and this is also several According to the key point processed.The process the most very very complicated that data process, as a example by the process of the analytical instrument data such as chromatogram and spectrum, generally Comprise sophisticated signal to smooth, the blob detection under very noisy and ambient interferences, the automatic deconvolution of many components eclipse effect, containing hundreds and thousands of complicated components The intelligent shift of retention time calibration of Massive Sample, the selection and optimization of key variables, types of databases and search strategy, totally unknown little molecule The visualization of Structural Identification, large-scale data and analysis result, pattern recognition and classification, quantitative model and model evaluation algorithm etc..
Mining data information obtain analysis result rapidly and accurately, is data owner and thing that data analysis worker dreams of.But pass On the one hand data analysis on system is the thing wasted time and energy very much, needs artificial optimization to combine various data processing method, and a certain subclass Data process, just relate to different specific algorithms, not even with use order or the algorithm parameter of data processing algorithm, also result produced notable Impact.Obtain preferable result, it is necessary to constantly trial and error, cause the substantial amounts of time to be wasted.The sequence flow that particularly data process, the most previous The result output of secondary method, as the input of latter method, extends the stand-by period that data process.On the other hand, the mode standard that data process Typically require loading Given information sample, set up sane reliable model with this, then model is acted on known checking collection sample or the forecast set of the unknown Sample etc..The most progressively select data processing algorithm and the method for real data, cause data analysis process to overflow long and complex, and be difficulty with Intelligence is quickly.As a example by structure y=f (X) model, the diversity of f and X all makes conventional method cannot really realize the process of " big data " wisdom Needs with information extraction.Software in terms of the such as data process such as spectrum, chromatogram and mass spectrum, is all according to the design of above-mentioned conventional method, tissue Framework is also implemented, including the chemometrics application software of current international mainstream, such as The Unscrambler and SIMCA etc., operation exception Time-consuming complicated, each data processing method needs constantly to repeat, manually find the Combination of Methods of optimum.
Amount data complicated and changeable based on algorithm stream process, can the extensively analyzing and processing of the produced data of applied analysis instrument and information extraction excavation (as Chromatogram, mass spectrum and spectrum etc.), also can be used for network and internet " big data " simultaneously, and such as pharmacy, tobacco, wine brewing, agricultural, food, Manufacture and the analyzing and processing of service trade data such as petrochemical industry, environment, quality supervision, biology, have wide range of applications, have good prospects.
Summary of the invention
A kind of amount data processing algorithm stream complicated and changeable (hereinafter referred to as algorithm stream) of offer is provided.Complex data can be simplified by this algorithm stream Processing procedure, it is only necessary to the various methods that will need in data handling procedure, including data prediction, feature selecting and series modeling method etc. in advance Add (design) in algorithm stream, and set corresponding Optimal Parameters, various complex datas can be realized a key and process and multi-model processes, suddenly The slightly troublesome operation flow process needed for one group of data of single treatment, improves data-handling efficiency, thus cost-effective, increase the benefit.Its core point includes: 1), arbitrarily add or remove amount data processing algorithm complicated and changeable and algorithm parameter is set, free arrangement algorithm call order, create include one Or the algorithm stream of multiple data processing step;2), it can be modified or the exchange of each method operation order when application algorithm stream;3), " note Entering " processing data to be analyzed (model, verify and predict) is in data processing algorithm stream, and algorithmically in stream, the time order and function order of each method is the most successively Operation program, it is thus achieved that often walk intermediate object program and the final calculation result of computing;4), by the different configuration to algorithm stream, it is achieved the one of complex data Key processes and multi-model processes, and data processing method and parameter thereof arrange the impact on data results, and identical data processing method (algorithm Stream) impact on different types of data collection result, it is achieved individuation data and the optimum organization of data processing method thereof.
The present invention is compared with traditional data processing method, and superiority is obvious.First, can be at integrated arbitrary data by algorithm stream of the present invention Reason method, is significantly better than conventional method isolated operation different pieces of information processing method, constantly repeats to call various method and the problem of analyzed data, subtracts Less and optimize operating process;Secondly, data processing method parameter can be arranged and be integrated in algorithm stream by the present invention, can by the change of method parameter with The comparison of model result, it is achieved the optimizing of parameter and the Combinatorial Optimization of method;Especially, intelligent one can be realized by algorithm stream of the present invention Key data processes and multi-model analyzing and processing etc., and this is also one of amount data complicated and changeable process maximum difficult point so far.
Problems such as " multi-method, difficult optimizations, mass data, flow process complexity, complex operations " that software is difficult to solve is processed relative to traditional data, The invention provides good solution, in complicated high flux data message excavates, there is applications well prospect.
Loaded down with trivial details and the repeated analysis met with for the amount Data processing moment complicated and changeable, the present invention by the realization integration to data processing method, I.e. by the data processing method needed for arbitrarily selecting to the algorithm stream of change flexibly, when processing different target datas, it is only necessary to it " is injected " Algorithm stream can realize the process to complicated high flux data, it is to avoid use the analysis of single method traditionally, be both needed to method to set up ginseng manually Number, adds data one by one, and cannot reach the quick, intelligent analysis after causing input data.The present invention by by required method integration in algorithm stream, real The most quickly, method choice easily, and set relevant parameter, the complex data analysis being particularly suitable for fixing means and flow process processes, such as based on A certain standard carries out the routine analysis of quality evaluation or inspection monitoring etc., the real work liberating loaded down with trivial details repetition to product and service.
Accompanying drawing explanation
Fig. 1, traditional amount data processing mode complicated and changeable and intelligent data processing mode based on algorithm stream.
Fig. 2, the structure example of amount data processing algorithm stream complicated and changeable based on algorithm stream.In figure, region 1 is integrated with Various types of data processing method; Region 2 display has been added to the method in algorithm stream, can realize the amendment to algorithm stream by increase, deletion, sequentially exchange etc.;Region 3 is In region 2, the parameter of currently selected method is arranged;The state of region 4 then Dynamic Announce algorithm stream.
Fig. 3, it is achieved intelligent data based on algorithm stream process, i.e. by selecting target algorithm stream, and selects in " injection " algorithm stream to be analyzed Multivariate data, including data training set, calibration set, interference collection, checking collection and forecast set etc., according to the structure of algorithm stream, complete target The analyzing and processing of data.
Fig. 4, an actual near-infrared data instance.
Detailed description of the invention
Embodiment: below as a example by the analyzing and processing of the near infrared spectrum data of a wheat, illustrates amount data processing algorithm complicated and changeable of the present invention Stream and using method thereof.
According to the structure of algorithm stream of the present invention, by adding or remove different multivariate data processing methods in advance, and algorithm is added in setting to Method parameter in stream, arbitrary arrangement algorithm order, create algorithm stream.Fig. 1 illustrate traditional amount data processing mode complicated and changeable with based on algorithm stream Intelligent data process.Usually, the analyzing and processing of multivariate data needs through numerous analytical procedures, the such as analyzing and processing of near-infrared data, Generally including data quickly (criticizing) to be loaded into, smooth and derivation, the pretreatment operation such as background deduction and baseline correction is to improve the quality of data, and passes through Variables choice finds the characteristic variable that module part relevance yet to be built with target is high, and last method for optimizing is set up, evaluated and extensive model, has i.e. used The model established, classifies to totally unknown sample, returns or decision-making prediction.Traditional data processing method needs progressively to run above-mentioned each Method involved by step, it is thus achieved that operation result, and for the analyzing and processing of next stage.But data processing step is many, even one step bag Include multiple specific analytical method and the change of parameter interval so that operation exception is complicated;And data processing method of based on algorithm stream, the most only need to arrange calculation Method stream just without carrying out any other degree of depth intervention again, just can directly obtain the analysis processing result to data;Meanwhile, data can be processed calculation Algorithm included in method stream is arbitrarily revised, and toward modeling, checking or the prediction data of " injection " process to be analyzed in data processing algorithm stream, Just intermediate object program and the final calculation result often walking computing can be obtained.
Structural model based on algorithm stream, Fig. 2 shows the make of typical algorithm stream in a real data processing system.User can be any Call various data processing method, edit relevant parameter, control method operation order, it is achieved the quick, intelligent analysis to data.Described in Fig. 2 The algorithm stream that method builds, has good animal migration, algorithm can be used for analyzing and processing and the information excavating of different pieces of information, meanwhile realize The comparison of algorithm stream and the Combinatorial Optimization of method.Fig. 3 then illustrates the data adding process to be analyzed in Fig. 2 built algorithm stream, it is achieved data process.
Fig. 4 is an actual wheat near-infrared datagraphic.Based on the algorithm stream analyzing and processing to multivariate data, can be according to Fig. 2 and Fig. 3 institute The process stated, adds data processing method, selects appropriate algorithm parameter, quickly realizes the analysis to data, and the selection of method includes but not limited to: 1), data prediction, halve that difference, general interpolation, data transposition, data plus noise, sample standardization, variable be upscaled, standard normal becomes Change, Quantile standardization, data operation, smooth, derivation, background deduction, drift correction, multiplicative scatter correction, Orthogonal Signal Correction Analyze, go Trending;2), variables choice,;Not method of weighting, method of weighting, Fisher than method, Stepwise Regression Method, Projection Character importance, selectivity ratios, Eliminating without information variable, Monte Carlo eliminates without information variable, moving window offset minimum binary, S-Plot method, competition self adaptation weight weight sampling, at random Frog, is spaced impact analysis;3), exploratory analysis, principal component analysis, HCA cluster, K-means cluster;4), classification analysis, K arest neighbors Analysis, PCA-MD, the independent soft pattern analysis of bunch class, offset minimum binary-discriminant analysis, orthogonal offset minimum binary-discriminant analysis, support vector classification Machine;5), regression analysis, main composition recurrence, multiple linear regression, offset minimum binary, orthogonal offset minimum binary, support vector regression.
By running the algorithm stream that constructed of aforesaid way, and add the real data shown in Fig. 4 in algorithm stream, each algorithm of operation can be obtained Obtained intermediate object program and final mask result, include the output form of form and figure, it is achieved the rapid and convenient of amount data complicated and changeable simultaneously, One key processes and multi-model analysis, causes intelligent data analysis requirements to reach.

Claims (9)

1. an amount data processing method complicated and changeable based on algorithm stream, it is characterised in that comprise the steps of
A. according to user's demand, the most arbitrarily select (add or remove) different classes of and the multivariate data Processing Algorithm of purposes, algorithm parameter, free arrangement, the operation order of swapping algorithm are set, create the algorithm stream comprising the data each step of process;
B. the algorithm included in algorithm stream can be revised (add or remove), adjustable order, can revise algorithm parameter;
C. toward algorithm stream arbitrarily selects/distributes the data of process to be analyzed, each algorithm is run according to the sequence design of algorithm stream, it is thus achieved that the intermediate object program of each algorithm computing and final result;
D. the distribution of data and call and include modeling data collection in algorithm stream, correction data set, interference data set, checking data set, and predictive data set;
E. concrete data processing algorithm and the selection of modeling method in algorithm stream, the difference of parameter arranges and realizes a key data and process, quickly, wisdom multi-model analyze, and the comparison of data processed result and optimization.
2. according to the amount data processing algorithm stream complicated and changeable described in claims 1, it is characterized in that being integrated with mass data processing method, big class mainly comprises: data are criticized loading method, preprocess method, feature selection approach, Exploring Analysis method, classified and homing method, the checking of unknown data and prediction, and intelligent decision etc..
3. according to the method in the data processing algorithm stream described in claims 1, it is characterized in that can editing data processing method included in algorithm stream flexibly, user can arbitrarily select required method to add in algorithm stream, and can be added, delete and adjust sequence to operate the method in algorithm stream, it is achieved the free optimum organization of method.
4. according to the setting of algorithm stream parameter described in claims 1, it is characterized in that the parameter that can carry out the method in algorithm stream rationally, give tacit consent to is arranged, the parameter of every kind of method arranges the science having in advance and limits, and prevents user improper because parameter is arranged and causes data processed result unreasonable.
5. according to the selection different pieces of information processing method subclass described in claims 2, it is characterised in that in the data processed result that application algorithm stream obtains, comprise the chart intermediate object program of each method, final result and method evaluation index.
6. according to the method choice in algorithm stream described in claims 3, it is characterized in that being suitably applied data processes relevant software (such as Chemical Measurement and bioinformatics software), and data processing method in software is integrated in algorithm stream, facilitate user's rapid and convenient to use, reduce frequently calling and the frequent selection to data data processing method.
7. according to process to algorithm stream described in claims 4, it is characterized in that can realizing a key of data is processed, i.e. realize data processing method and parameter arranges and the most creates in advance in algorithm stream, when processing real data, algorithm stream only need to be applied to pending variety classes, different processed target data, get final product a key and obtain final data results.
8. according to process to algorithm stream described in claims 4, it is characterized in that by selecting multiple modeling method in algorithm stream, then can realize data multi-model analysis, and in the case of selecting identical pretreatment or feature selection approach, analyze while can realizing multiple model, generate the result of each model.
9. according to setting to algorithm stream parameter described in claims 4, can be by more whether adding a certain algorithm, or change the algorithm parameter impact on model result, it is achieved the Combinatorial Optimization of method and parameter optimization in algorithm stream.
CN201510030208.2A 2015-01-21 2015-01-21 Algorithm flow based complicated multi-variable data processing method Pending CN105868206A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510030208.2A CN105868206A (en) 2015-01-21 2015-01-21 Algorithm flow based complicated multi-variable data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510030208.2A CN105868206A (en) 2015-01-21 2015-01-21 Algorithm flow based complicated multi-variable data processing method

Publications (1)

Publication Number Publication Date
CN105868206A true CN105868206A (en) 2016-08-17

Family

ID=56623081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510030208.2A Pending CN105868206A (en) 2015-01-21 2015-01-21 Algorithm flow based complicated multi-variable data processing method

Country Status (1)

Country Link
CN (1) CN105868206A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222310A (en) * 2019-05-17 2019-09-10 科迈恩(北京)科技有限公司 A kind of shared AI scientific instrument Data Analysis Services system and method
CN111611236A (en) * 2020-05-28 2020-09-01 宁波和利时智能科技有限公司 Data analysis method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222310A (en) * 2019-05-17 2019-09-10 科迈恩(北京)科技有限公司 A kind of shared AI scientific instrument Data Analysis Services system and method
CN111611236A (en) * 2020-05-28 2020-09-01 宁波和利时智能科技有限公司 Data analysis method and system

Similar Documents

Publication Publication Date Title
CN108629365B (en) Analysis data analysis device and analysis data analysis method
Kose et al. Visualizing plant metabolomic correlation networks using clique–metabolite matrices
Paton et al. Iolite: Freeware for the visualisation and processing of mass spectrometric data
CN110674604A (en) Transformer DGA data prediction method based on multi-dimensional time sequence frame convolution LSTM
US20060004528A1 (en) Apparatus and method for extracting similar source code
Nobile et al. Computational intelligence for parameter estimation of biochemical systems
CN103559129B (en) Statistical regression test data generating method based on genetic algorithm
CN109299501B (en) Vibration spectrum analysis model optimization method based on workflow
CN105868206A (en) Algorithm flow based complicated multi-variable data processing method
Pittman et al. Bayesian analysis of binary prediction tree models for retrospectively sampled outcomes
Sarafanov et al. Evolutionary automated machine learning for multi-scale decomposition and forecasting of sensor time series
Li et al. Improvement of NIR prediction ability by dual model optimization in fusion of NSIA and SA methods
Tian et al. Multi-classification identification of PLS in rice spectra with different pre-treatments and K/S optimisation
CN113297185A (en) Feature derivation method and device
CN107391124B (en) Conditional slicing method based on golden section search and software execution track
CN110855519A (en) Network flow prediction method
CN106198433A (en) Infrared spectrum method for qualitative analysis based on LM GA algorithm
Williams et al. Decision trees
WO2021064924A1 (en) Waveform analysis method and waveform analysis device
Wang et al. Estimation of soil organic matter by in situ Vis-NIR spectroscopy using an automatically optimized hybrid model of convolutional neural network and long short-term memory network
CN106644977A (en) Spectral variable selection method based on bat algorithm
CN113295674A (en) Laser-induced breakdown spectroscopy characteristic nonlinear processing method based on S transformation
Jeong Weighted similarity based just-in-time model predictive control for batch trajectory tracking
Narayanan et al. Consistent value creation from bioprocess data with customized algorithms: Opportunities beyond multivariate analysis
CN109766520A (en) A kind of multiple linear regression analysis method and system based on big data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160817

WD01 Invention patent application deemed withdrawn after publication