CN105868206A - Algorithm flow based complicated multi-variable data processing method - Google Patents
Algorithm flow based complicated multi-variable data processing method Download PDFInfo
- Publication number
- CN105868206A CN105868206A CN201510030208.2A CN201510030208A CN105868206A CN 105868206 A CN105868206 A CN 105868206A CN 201510030208 A CN201510030208 A CN 201510030208A CN 105868206 A CN105868206 A CN 105868206A
- Authority
- CN
- China
- Prior art keywords
- data
- algorithm
- stream
- data processing
- algorithm stream
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The invention discloses an algorithm flow based complicated data processing method, which is suitable for high-dimensional, high-throughput and high-complexity data analysis processing and information extraction mining, and belongs to the field of analytical chemometrics. According to the method, smart data analysis and information mining are realized by integrating and optimizing data processing processes, namely, quick, convenient, accurate and smart analysis of big data is realized by constructing a process optimization combination containing different data processing methods which include batch data loading, preprocessing, characteristic selection, model construction, unknown sample prediction and the like, setting method parameters and injecting to-be-analyzed data into algorithm flow (training set, calibration set, validation set, prediction set, etc.), quick, convenient, accurate and smart analysis of big data is realized. Particularly, the change of algorithm flow construction can realize one-key processing and multi-model processing of complicated data, influence of the data processing methods and the parameters on an analytic result, influence of the same data processing method (the algorithm flow) on processing of different data sets and the like, and smart optimization combination of personalized data and the data processing methods is truly achieved.
Description
Technical field
The present invention relates to amount data processing method complicated and changeable based on algorithm stream, belong to the Chemical Measurement field in analytical chemistry.Specifically
To needing amount data complicated and changeable to be processed, it is loaded into data prediction from data, chooses model construction and extensive whole stream from key feature
Cheng Jinhang integrates and optimizes, and creates the algorithm stream that amount data complicated and changeable process, it is achieved the quick wisdom of data processes.Actual complex high flux data
Analysis, it is only necessary in algorithm stream, add target data can realize a key and process and the intelligentized data such as multi-model analysis process and information excavating.
Background technology
Amount data complicated and changeable process to be excavated with information extraction, strong depend-ence mathematics, statistics, artificial intelligence, chemistry and bioinformatics, and
The application of chemometrics method and development, " the big data " of the most chemical and biological association area process, and need especially by means of calculating quickly
Intelligence, result accurately and reliably, are adaptable to the basic algorithm (higher-dimension, high flux and high complexity) of " three high " data analysis, and this is also several
According to the key point processed.The process the most very very complicated that data process, as a example by the process of the analytical instrument data such as chromatogram and spectrum, generally
Comprise sophisticated signal to smooth, the blob detection under very noisy and ambient interferences, the automatic deconvolution of many components eclipse effect, containing hundreds and thousands of complicated components
The intelligent shift of retention time calibration of Massive Sample, the selection and optimization of key variables, types of databases and search strategy, totally unknown little molecule
The visualization of Structural Identification, large-scale data and analysis result, pattern recognition and classification, quantitative model and model evaluation algorithm etc..
Mining data information obtain analysis result rapidly and accurately, is data owner and thing that data analysis worker dreams of.But pass
On the one hand data analysis on system is the thing wasted time and energy very much, needs artificial optimization to combine various data processing method, and a certain subclass
Data process, just relate to different specific algorithms, not even with use order or the algorithm parameter of data processing algorithm, also result produced notable
Impact.Obtain preferable result, it is necessary to constantly trial and error, cause the substantial amounts of time to be wasted.The sequence flow that particularly data process, the most previous
The result output of secondary method, as the input of latter method, extends the stand-by period that data process.On the other hand, the mode standard that data process
Typically require loading Given information sample, set up sane reliable model with this, then model is acted on known checking collection sample or the forecast set of the unknown
Sample etc..The most progressively select data processing algorithm and the method for real data, cause data analysis process to overflow long and complex, and be difficulty with
Intelligence is quickly.As a example by structure y=f (X) model, the diversity of f and X all makes conventional method cannot really realize the process of " big data " wisdom
Needs with information extraction.Software in terms of the such as data process such as spectrum, chromatogram and mass spectrum, is all according to the design of above-mentioned conventional method, tissue
Framework is also implemented, including the chemometrics application software of current international mainstream, such as The Unscrambler and SIMCA etc., operation exception
Time-consuming complicated, each data processing method needs constantly to repeat, manually find the Combination of Methods of optimum.
Amount data complicated and changeable based on algorithm stream process, can the extensively analyzing and processing of the produced data of applied analysis instrument and information extraction excavation (as
Chromatogram, mass spectrum and spectrum etc.), also can be used for network and internet " big data " simultaneously, and such as pharmacy, tobacco, wine brewing, agricultural, food,
Manufacture and the analyzing and processing of service trade data such as petrochemical industry, environment, quality supervision, biology, have wide range of applications, have good prospects.
Summary of the invention
A kind of amount data processing algorithm stream complicated and changeable (hereinafter referred to as algorithm stream) of offer is provided.Complex data can be simplified by this algorithm stream
Processing procedure, it is only necessary to the various methods that will need in data handling procedure, including data prediction, feature selecting and series modeling method etc. in advance
Add (design) in algorithm stream, and set corresponding Optimal Parameters, various complex datas can be realized a key and process and multi-model processes, suddenly
The slightly troublesome operation flow process needed for one group of data of single treatment, improves data-handling efficiency, thus cost-effective, increase the benefit.Its core point includes:
1), arbitrarily add or remove amount data processing algorithm complicated and changeable and algorithm parameter is set, free arrangement algorithm call order, create include one
Or the algorithm stream of multiple data processing step;2), it can be modified or the exchange of each method operation order when application algorithm stream;3), " note
Entering " processing data to be analyzed (model, verify and predict) is in data processing algorithm stream, and algorithmically in stream, the time order and function order of each method is the most successively
Operation program, it is thus achieved that often walk intermediate object program and the final calculation result of computing;4), by the different configuration to algorithm stream, it is achieved the one of complex data
Key processes and multi-model processes, and data processing method and parameter thereof arrange the impact on data results, and identical data processing method (algorithm
Stream) impact on different types of data collection result, it is achieved individuation data and the optimum organization of data processing method thereof.
The present invention is compared with traditional data processing method, and superiority is obvious.First, can be at integrated arbitrary data by algorithm stream of the present invention
Reason method, is significantly better than conventional method isolated operation different pieces of information processing method, constantly repeats to call various method and the problem of analyzed data, subtracts
Less and optimize operating process;Secondly, data processing method parameter can be arranged and be integrated in algorithm stream by the present invention, can by the change of method parameter with
The comparison of model result, it is achieved the optimizing of parameter and the Combinatorial Optimization of method;Especially, intelligent one can be realized by algorithm stream of the present invention
Key data processes and multi-model analyzing and processing etc., and this is also one of amount data complicated and changeable process maximum difficult point so far.
Problems such as " multi-method, difficult optimizations, mass data, flow process complexity, complex operations " that software is difficult to solve is processed relative to traditional data,
The invention provides good solution, in complicated high flux data message excavates, there is applications well prospect.
Loaded down with trivial details and the repeated analysis met with for the amount Data processing moment complicated and changeable, the present invention by the realization integration to data processing method,
I.e. by the data processing method needed for arbitrarily selecting to the algorithm stream of change flexibly, when processing different target datas, it is only necessary to it " is injected "
Algorithm stream can realize the process to complicated high flux data, it is to avoid use the analysis of single method traditionally, be both needed to method to set up ginseng manually
Number, adds data one by one, and cannot reach the quick, intelligent analysis after causing input data.The present invention by by required method integration in algorithm stream, real
The most quickly, method choice easily, and set relevant parameter, the complex data analysis being particularly suitable for fixing means and flow process processes, such as based on
A certain standard carries out the routine analysis of quality evaluation or inspection monitoring etc., the real work liberating loaded down with trivial details repetition to product and service.
Accompanying drawing explanation
Fig. 1, traditional amount data processing mode complicated and changeable and intelligent data processing mode based on algorithm stream.
Fig. 2, the structure example of amount data processing algorithm stream complicated and changeable based on algorithm stream.In figure, region 1 is integrated with Various types of data processing method;
Region 2 display has been added to the method in algorithm stream, can realize the amendment to algorithm stream by increase, deletion, sequentially exchange etc.;Region 3 is
In region 2, the parameter of currently selected method is arranged;The state of region 4 then Dynamic Announce algorithm stream.
Fig. 3, it is achieved intelligent data based on algorithm stream process, i.e. by selecting target algorithm stream, and selects in " injection " algorithm stream to be analyzed
Multivariate data, including data training set, calibration set, interference collection, checking collection and forecast set etc., according to the structure of algorithm stream, complete target
The analyzing and processing of data.
Fig. 4, an actual near-infrared data instance.
Detailed description of the invention
Embodiment: below as a example by the analyzing and processing of the near infrared spectrum data of a wheat, illustrates amount data processing algorithm complicated and changeable of the present invention
Stream and using method thereof.
According to the structure of algorithm stream of the present invention, by adding or remove different multivariate data processing methods in advance, and algorithm is added in setting to
Method parameter in stream, arbitrary arrangement algorithm order, create algorithm stream.Fig. 1 illustrate traditional amount data processing mode complicated and changeable with based on algorithm stream
Intelligent data process.Usually, the analyzing and processing of multivariate data needs through numerous analytical procedures, the such as analyzing and processing of near-infrared data,
Generally including data quickly (criticizing) to be loaded into, smooth and derivation, the pretreatment operation such as background deduction and baseline correction is to improve the quality of data, and passes through
Variables choice finds the characteristic variable that module part relevance yet to be built with target is high, and last method for optimizing is set up, evaluated and extensive model, has i.e. used
The model established, classifies to totally unknown sample, returns or decision-making prediction.Traditional data processing method needs progressively to run above-mentioned each
Method involved by step, it is thus achieved that operation result, and for the analyzing and processing of next stage.But data processing step is many, even one step bag
Include multiple specific analytical method and the change of parameter interval so that operation exception is complicated;And data processing method of based on algorithm stream, the most only need to arrange calculation
Method stream just without carrying out any other degree of depth intervention again, just can directly obtain the analysis processing result to data;Meanwhile, data can be processed calculation
Algorithm included in method stream is arbitrarily revised, and toward modeling, checking or the prediction data of " injection " process to be analyzed in data processing algorithm stream,
Just intermediate object program and the final calculation result often walking computing can be obtained.
Structural model based on algorithm stream, Fig. 2 shows the make of typical algorithm stream in a real data processing system.User can be any
Call various data processing method, edit relevant parameter, control method operation order, it is achieved the quick, intelligent analysis to data.Described in Fig. 2
The algorithm stream that method builds, has good animal migration, algorithm can be used for analyzing and processing and the information excavating of different pieces of information, meanwhile realize
The comparison of algorithm stream and the Combinatorial Optimization of method.Fig. 3 then illustrates the data adding process to be analyzed in Fig. 2 built algorithm stream, it is achieved data process.
Fig. 4 is an actual wheat near-infrared datagraphic.Based on the algorithm stream analyzing and processing to multivariate data, can be according to Fig. 2 and Fig. 3 institute
The process stated, adds data processing method, selects appropriate algorithm parameter, quickly realizes the analysis to data, and the selection of method includes but not limited to:
1), data prediction, halve that difference, general interpolation, data transposition, data plus noise, sample standardization, variable be upscaled, standard normal becomes
Change, Quantile standardization, data operation, smooth, derivation, background deduction, drift correction, multiplicative scatter correction, Orthogonal Signal Correction Analyze, go
Trending;2), variables choice,;Not method of weighting, method of weighting, Fisher than method, Stepwise Regression Method, Projection Character importance, selectivity ratios,
Eliminating without information variable, Monte Carlo eliminates without information variable, moving window offset minimum binary, S-Plot method, competition self adaptation weight weight sampling, at random
Frog, is spaced impact analysis;3), exploratory analysis, principal component analysis, HCA cluster, K-means cluster;4), classification analysis, K arest neighbors
Analysis, PCA-MD, the independent soft pattern analysis of bunch class, offset minimum binary-discriminant analysis, orthogonal offset minimum binary-discriminant analysis, support vector classification
Machine;5), regression analysis, main composition recurrence, multiple linear regression, offset minimum binary, orthogonal offset minimum binary, support vector regression.
By running the algorithm stream that constructed of aforesaid way, and add the real data shown in Fig. 4 in algorithm stream, each algorithm of operation can be obtained
Obtained intermediate object program and final mask result, include the output form of form and figure, it is achieved the rapid and convenient of amount data complicated and changeable simultaneously,
One key processes and multi-model analysis, causes intelligent data analysis requirements to reach.
Claims (9)
1. an amount data processing method complicated and changeable based on algorithm stream, it is characterised in that comprise the steps of
A. according to user's demand, the most arbitrarily select (add or remove) different classes of and the multivariate data Processing Algorithm of purposes, algorithm parameter, free arrangement, the operation order of swapping algorithm are set, create the algorithm stream comprising the data each step of process;
B. the algorithm included in algorithm stream can be revised (add or remove), adjustable order, can revise algorithm parameter;
C. toward algorithm stream arbitrarily selects/distributes the data of process to be analyzed, each algorithm is run according to the sequence design of algorithm stream, it is thus achieved that the intermediate object program of each algorithm computing and final result;
D. the distribution of data and call and include modeling data collection in algorithm stream, correction data set, interference data set, checking data set, and predictive data set;
E. concrete data processing algorithm and the selection of modeling method in algorithm stream, the difference of parameter arranges and realizes a key data and process, quickly, wisdom multi-model analyze, and the comparison of data processed result and optimization.
2. according to the amount data processing algorithm stream complicated and changeable described in claims 1, it is characterized in that being integrated with mass data processing method, big class mainly comprises: data are criticized loading method, preprocess method, feature selection approach, Exploring Analysis method, classified and homing method, the checking of unknown data and prediction, and intelligent decision etc..
3. according to the method in the data processing algorithm stream described in claims 1, it is characterized in that can editing data processing method included in algorithm stream flexibly, user can arbitrarily select required method to add in algorithm stream, and can be added, delete and adjust sequence to operate the method in algorithm stream, it is achieved the free optimum organization of method.
4. according to the setting of algorithm stream parameter described in claims 1, it is characterized in that the parameter that can carry out the method in algorithm stream rationally, give tacit consent to is arranged, the parameter of every kind of method arranges the science having in advance and limits, and prevents user improper because parameter is arranged and causes data processed result unreasonable.
5. according to the selection different pieces of information processing method subclass described in claims 2, it is characterised in that in the data processed result that application algorithm stream obtains, comprise the chart intermediate object program of each method, final result and method evaluation index.
6. according to the method choice in algorithm stream described in claims 3, it is characterized in that being suitably applied data processes relevant software (such as Chemical Measurement and bioinformatics software), and data processing method in software is integrated in algorithm stream, facilitate user's rapid and convenient to use, reduce frequently calling and the frequent selection to data data processing method.
7. according to process to algorithm stream described in claims 4, it is characterized in that can realizing a key of data is processed, i.e. realize data processing method and parameter arranges and the most creates in advance in algorithm stream, when processing real data, algorithm stream only need to be applied to pending variety classes, different processed target data, get final product a key and obtain final data results.
8. according to process to algorithm stream described in claims 4, it is characterized in that by selecting multiple modeling method in algorithm stream, then can realize data multi-model analysis, and in the case of selecting identical pretreatment or feature selection approach, analyze while can realizing multiple model, generate the result of each model.
9. according to setting to algorithm stream parameter described in claims 4, can be by more whether adding a certain algorithm, or change the algorithm parameter impact on model result, it is achieved the Combinatorial Optimization of method and parameter optimization in algorithm stream.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510030208.2A CN105868206A (en) | 2015-01-21 | 2015-01-21 | Algorithm flow based complicated multi-variable data processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510030208.2A CN105868206A (en) | 2015-01-21 | 2015-01-21 | Algorithm flow based complicated multi-variable data processing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105868206A true CN105868206A (en) | 2016-08-17 |
Family
ID=56623081
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510030208.2A Pending CN105868206A (en) | 2015-01-21 | 2015-01-21 | Algorithm flow based complicated multi-variable data processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105868206A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222310A (en) * | 2019-05-17 | 2019-09-10 | 科迈恩(北京)科技有限公司 | A kind of shared AI scientific instrument Data Analysis Services system and method |
CN111611236A (en) * | 2020-05-28 | 2020-09-01 | 宁波和利时智能科技有限公司 | Data analysis method and system |
-
2015
- 2015-01-21 CN CN201510030208.2A patent/CN105868206A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222310A (en) * | 2019-05-17 | 2019-09-10 | 科迈恩(北京)科技有限公司 | A kind of shared AI scientific instrument Data Analysis Services system and method |
CN111611236A (en) * | 2020-05-28 | 2020-09-01 | 宁波和利时智能科技有限公司 | Data analysis method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108629365B (en) | Analysis data analysis device and analysis data analysis method | |
Kose et al. | Visualizing plant metabolomic correlation networks using clique–metabolite matrices | |
Paton et al. | Iolite: Freeware for the visualisation and processing of mass spectrometric data | |
CN110674604A (en) | Transformer DGA data prediction method based on multi-dimensional time sequence frame convolution LSTM | |
US20060004528A1 (en) | Apparatus and method for extracting similar source code | |
Nobile et al. | Computational intelligence for parameter estimation of biochemical systems | |
CN103559129B (en) | Statistical regression test data generating method based on genetic algorithm | |
CN109299501B (en) | Vibration spectrum analysis model optimization method based on workflow | |
CN105868206A (en) | Algorithm flow based complicated multi-variable data processing method | |
Pittman et al. | Bayesian analysis of binary prediction tree models for retrospectively sampled outcomes | |
Sarafanov et al. | Evolutionary automated machine learning for multi-scale decomposition and forecasting of sensor time series | |
Li et al. | Improvement of NIR prediction ability by dual model optimization in fusion of NSIA and SA methods | |
Tian et al. | Multi-classification identification of PLS in rice spectra with different pre-treatments and K/S optimisation | |
CN113297185A (en) | Feature derivation method and device | |
CN107391124B (en) | Conditional slicing method based on golden section search and software execution track | |
CN110855519A (en) | Network flow prediction method | |
CN106198433A (en) | Infrared spectrum method for qualitative analysis based on LM GA algorithm | |
Williams et al. | Decision trees | |
WO2021064924A1 (en) | Waveform analysis method and waveform analysis device | |
Wang et al. | Estimation of soil organic matter by in situ Vis-NIR spectroscopy using an automatically optimized hybrid model of convolutional neural network and long short-term memory network | |
CN106644977A (en) | Spectral variable selection method based on bat algorithm | |
CN113295674A (en) | Laser-induced breakdown spectroscopy characteristic nonlinear processing method based on S transformation | |
Jeong | Weighted similarity based just-in-time model predictive control for batch trajectory tracking | |
Narayanan et al. | Consistent value creation from bioprocess data with customized algorithms: Opportunities beyond multivariate analysis | |
CN109766520A (en) | A kind of multiple linear regression analysis method and system based on big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160817 |
|
WD01 | Invention patent application deemed withdrawn after publication |